*This seminar was held in-person in Raisler Lounge (Towne 225) with virtual attendance….
True gains of machine learning in AI sub-fields such as computer vision and natural language processing have come about from the use of large-scale diverse datasets for learning. In this talk, I will discuss how we can leverage large-scale diverse data in the form of egocentric videos (first-person videos of humans conducting different tasks) to similarly scale up policy learning for robots. A central challenge is the gap in embodiment and intentions. I will describe how we can leverage video data in spite of this gap by learning at different levels of abstraction. I will demonstrate applications of this principle for a) acquiring low-level visuomotor subroutines and high-level value functions for navigation, and b) building an interactive understanding of objects, through observation of human hands, for manipulation.