*This was a HYBRID Event with in-person attendance in Levine 307 and Virtual attendance via Zoom
We would like to have robots which can perform useful manipulation tasks in real-world environments. This requires robots that can perceive the world with both precision and semantic understanding, methods for communicating desired tasks to these systems, and closed loop visual feedback controllers for robustly executing manipulation tasks. This is hard to achieve with previous methods: prior work hasn’t enabled robots to densely understand the visual world with sufficient precision to perform robotic manipulation or endowed them with the semantic understanding needed to perform tasks with novel objects. This limitation arises partly from the object representations that have been used, the challenge in extracting these representations from the available sensor data in real-world settings, and the manner in which tasks have been specified. The talk will have two sections. In the first section I will focus on object-centric representations and will present a family of approaches that leverage self-supervision, both in the visual domain and for learning physical dynamics, to enable robots to perform manipulation tasks. Specifically we (i) demonstrate the novel application of dense visual object descriptors to robotic manipulation and provide a fully self-supervised robot system to acquire them (ii) introduce the concept of category-level manipulation tasks and develop a novel object representation based on semantic 3D keypoints along with a task specification that uses these keypoints to define the task for all objects of a category, including novel instances, (iii) utilize our dense visual object descriptors to quickly learn new manipulation skills through imitation and (iv) use our visual object representations to learn data-driven models that can be used to perform closed loop feedback control in manipulation tasks. The second part of the talk will discuss an alternative action-centric approach that enables the incorporation of language-instructions in our manipulation pipelines.