This was a hybrid event with in-person attendance in Levine 307 and virtual attendance…
In computer vision and robotics, we often need to deal with 3D objects. For instance, we may want to generate instances of 3D chairs, edit the generated chairs using natural language instructions, or arrange them in a canonical orientation. In this talk, I will present some of our work on addressing these problems. First, I will talk about ShapeCrafter, a model for recursively generating and modifying 3D shapes using natural language descriptions. ShapeCrafter generates a 3D shape distribution that gradually evolves as more phrases are added resulting in shapes closer to text instructions. In addition, I will introduce the notions of invariance, equivariance, and ‘canonicalization’, and discuss their importance in 3D understanding. I will describe ConDor, a self-supervised method for canonicalizing the orientation of full and partial 3D shapes. Finally, I will identify future directions including opportunities for expanding 3D understanding to neural fields, articulating objects, and object collections.