ABSTRACT
Computer vision has advanced rapidly with deep learning, achieving above human performance on some classification benchmarks. At the core of the state-of-the-art approaches for image classification, object detection, and semantic/instance segmentation is sliding window classification, engineered for computational efficiency. Such piecemeal analysis of visual perception often has trouble getting details right and fails miserably with occlusion. Human vision, on the other hand, thrives on occlusion and excels at seeing both large and small, whole and parts at the same time. I will describe several works that build upon concepts of perceptual organization, learn pixel and image relationships in a data-driven fashion, both supervised and unsupervised, integrate multiscale and figure-ground cues, in order to deliver more accurate and generalizing performance at image classification and segmentation.