This was a hybrid event with in-person attendance in Wu and Chen and virtual attendance…
Spatial perception —the robot’s ability to sense and understand the surrounding environment— is a key enabler for robot navigation, manipulation, and human-robot interaction. Recent advances in perception algorithms and systems have enabled robots to create large-scale geometric maps of unknown environments and detect objects of interest. Despite these advances, a large gap still separates robot and human perception: Humans are able to quickly form a holistic representation of the scene that encompasses both geometric and semantic aspects, are robust to a broad range of perceptual conditions, and are able to learn without low-level supervision. This talk discusses recent efforts to bridge these gaps. First, we show that scalable metric-semantic scene understanding requires hierarchical representations; these hierarchical representations, or 3D scene graphs, are key to efficient storage and inference, and enable real-time perception algorithms. Second, we discuss progress in the design of certifiable algorithms for robust estimation; in particular we discuss the notion of “estimation contracts”, which provide first-of-a-kind performance guarantees for estimation problems arising in robot perception. Finally, we observe that certification and self-supervision are twin challenges, and the design of certifiable perception algorithms enables a natural self-supervised learning scheme; we apply this insight to 3D object pose estimation and present self-supervised algorithms that perform on par with state-of-the-art, fully supervised methods, while not requiring manual 3D annotations.