Abstract: In this talk, we look into the problem of segmenting, tracking, and extracting 3D time varying shape and camera poses for non-rigid objects in monocular videos. Our method segments and tracks objects and their parts using past segmentation and tracking experience from a training set, and uses the segmented point trajectories of each object to extract 3D shape assuming a low-rank shape prior. We segment using motion boundaries and learnt saliency detection, and outperform by a margin the previous state-of-the-art in challenging video scenes. We “learn to track’’ using a novel tracking loss in a distance learning framework, and outperform color and texture histograms as well as deep feature matching learnt from Imagenet Classification or Detection tasks. We extract dense 3D object models from realistic monocular videos, a problem typically studied with lab acquired datasets, pre-segmented objects and oracle trajectories.