*This was a HYBRID Event with in-person attendance in Wu & Chen Auditorium and Virtual attendance via Zoom Webinar
The prevalent approach to object manipulation is based on the availability of explicit 3D object models. By estimating the pose of such object models in a scene, a robot can readily reason about how to pick up an object, place it in a stable position, or avoid collisions. Unfortunately, assuming the availability of object models constrains the settings in which a robot can operate, and noise in estimating a model’s pose can result in brittle manipulation performance. In this talk, I will discuss our work on learning to manipulate unknown objects directly from visual (depth) data. Without any explicit 3D object models, these approaches are able to segment unknown object instances, pickup objects in cluttered scenes, and re-arrange them into desired configurations. I will also present recent work on combining pre-trained language and vision models to efficiently teach a robot to perform a variety of manipulation tasks. I’ll conclude with our initial work toward learning implicit representations for objects.