Spring 2025 GRASP Seminar: Mike Shou, National University of Singapore, “Video intelligence in the era of multimodal”

Name: Spring 2025 GRASP Seminar: Mike Shou, National University of Singapore, “Video intelligence in the era of multimodal”
Start: 2025-05-08T10:00:00-04:00
End: 2025-05-08T11:00:00-04:00
Location: Levine 307

May 8 @ 10:00 am - 11:00 am

This will be a hybrid event with in-person attendance in Levine 307 and virtual attendance on Zoom.

ABSTRACT

The past few years have witnessed great success in video intelligence, as supercharged by multimodal models. In this talk, I will start with a brief sharing of our efforts, in building video-language models for understanding and diffusion models for video generation. Yet, video understanding and generation have always been two separate research pillars, despite their strong synergy. This motivates us to develop Show-o, one unified single transformer that can do both multimodal understanding and generation. Show-o is the first to unify autoregressive and discrete diffusion modeling, flexibly supporting a wide range of vision-language tasks of any input/output format, including visual question-answering, text-to-image/video generation, and generation of video keyframes with captions, all within one single 1.3B transformer. Show-o sheds light for building the next-generation multimodal video foundation model, and has sparked many follow-up works already.

Presenter

Mike Shou - Learn More

Mike Shou is an Assistant Professor under Presidential Young Professorship at National University of Singapore. He was a Research Scientist at Facebook AI in the Bay Area. He obtained his Ph.D. degree at Columbia University with Prof Shih-Fu Chang. His research mainly focuses on video and multimodal. He received the Best Paper Finalist at CVPR 2022, Best Student Paper Nomination at CVPR 2017, EgoVis Distinguished Paper Award 2022/23. His team won 1st place in the international challenges including ActivityNet, EPIC-Kitchens, Ego4D. He is a ST Engineering Distinguished Professor and a Fellow of National Research Foundation Singapore. He is on the Forbes 30 Under 30 Asia list.

Details

Date:: May 8
Time:: 10:00 am - 11:00 am
Event Category:: Seminars

Venue

Levine 307

3330 Walnut St
Philadelphia, PA 19104 United States + Google Map

Spring 2025 GRASP Seminar: Mike Shou, National University of Singapore, “Video intelligence in the era of multimodal”

May 8 @ 10:00 am - 11:00 am

ABSTRACT

Presenter

Details

Venue

Related Events

Spring 2025 GRASP SFI: Haimin Hu, Princeton University, “From Gambits to Assurances: Game-Theoretic Integration of Safety and Learning for Human-Centered Robotics”

Spring 2025 GRASP on Robotics: Phillip Isola, Massachusetts Institute of Technology, “Robots and Artificial Life from Visual Foundation Models”

Spring 2025 Robotics MSE Thesis and Capstone Lightning Talks and Poster Session