This was a hybrid event with in-person attendance in Wu and Chen and virtual attendance…
ABSTRACT
If I show you a photo of a place you have never been to, you can easily imagine what you could do in that picture. Your understanding goes from the surfaces you see to the ones you do not, like parts that are hidden behind furniture. Your understanding even enables reasoning about how the scene could be if someone interacted with it, for instance by opening a cabinet. My research aims to give computers this same level of physical understanding. I believe that this physical understanding will be critical for autonomous agents, as well as for enabling new insights in research fields that vision does not often interact with: progress on many problems across the sciences and humanities can be accelerated by being able to robustly measure some quantity at scale.
My talk will show my research group’s work towards the goal of understanding the physical world from images. I will first show how we can reconstruct 3D scenes, including invisible surfaces, from a single RGB image. We have developed an approach that learns to predict a scene-scale implicit function using realistic 3D supervision that can be gathered by consumers or robots instead of by using artist-created watertight 3D assets. After showing reconstructions of our system in everyday scenarios, I will talk about how measuring the world can unlock new insights in science, from millimeter-sized bird bones to solar physics data where a pixel is a few hundred miles wide. I will conclude by showing work towards understanding how humans can interact with objects, including work on understanding hands and the objects they hold.