Abstract
In this talk we present a panoramic view of the geometry underpinning a number of vision problems, ranging from early vision to unsupervised mining in large image collections and beyond. Interplaying between continuous and discrete representations, geometry appears in different forms of duality, embeddings, and manifolds. We begin with planar shape decomposition as studied in psychophysics to model either occlusion or parts of recognition. Focusing on distance maps and the medial axis representation, we then generalize to natural images towards perceptual edge grouping and equivariant local feature detection.
Adopting sets of local features and descriptors as an image representation, we then shift to visual instance search and recognition. We discuss a form of flexible spatial matching as mode seeking in the transformation space, a number of embeddings and match kernels in the descriptor space, and feature selection or aggregation in both. Acknowledging that the problem often boils down to nearest neighbor search in high-dimensional spaces, we consider a number of binary codes and product quantization extensions, highlighting the relation to nonlinear dimensionality reduction. Finally, we touch upon the deep connection between nearest neighbor search and clustering. In doing so, we revisit distance maps and medial representations, now in arbitrary dimensions.