*This seminar was held in-person in Levine 307 as well as virtually…
Understanding the 3D structure of real-world environments is a fundamental challenge in machine perception, critical for applications spanning robotic navigation, content creation, and mixed reality scenarios. In recent years, machine learning has undergone rapid advancements; however, in the 3D domain, such data-driven learning is often very challenging under limited 3D/4D data availability. In this talk, we first explore learning 3D priors from data capture and annotation for supervision, leveraging synthetic data as a strong 3D prior for reconstruction and semantic understanding of 3D scenes observed from commodity RGB and RGB-D sensors. As synthetic priors can be limited in diversity, we then discuss real-world 3D data alternatives, followed by relaxing 3D supervision constraints to weakly supervised formulations for such object-based reconstruction and 3D semantic scene understanding. Finally, as real-world scenes are often dynamic, we characterize 3D interactions and propose to distill knowledge from other data modalities to enable zero-shot 3D interaction synthesis. These 3D learning strategies promise to usher in a new paradigm of generalized 3D perception, beyond the limits of existing 3D datasets, to enable in-the-wild 3D analysis of environments.