Abstract: This work is concerned with the tasks of object and action recognition. Specifically, we focus on designing new methods that explicitly extract and represent shape information present in the images and video sequences and incorporate this information in the process of recognition.
I will begin by describing our approach to representing 2D shapes. Our method assigns for every internal point of a silhouette a value reflecting the mean time required for a random walk beginning at the point to hit the boundaries. This function can be computed by solving Poisson’s equation, with the silhouette contours providing boundary conditions. We show how this function can be used to reliably extract many useful properties of a silhouette.
In the second part of the talk I will show how we can use the Poisson based shape representation for object and action recognition. I will first introduce a shape-based detection and top-down figure-round delineation algorithm using image segments. Unlike common methods which use appearance for detection, our method relies primarily on the shape of objects as is reflected by their bottom-up segmentation. In practice, bottom-up segmentation algorithms often fail to extract complete object silhouettes. Therefore, our method applies to partial silhouettes (shapes) formed by segments at intermediate scales of the bottom-up segmentation, possibly with incomplete boundaries. We employ probabilistic shape modeling and use statistical tests to evaluate ensembles of partial shape hypotheses to identify the presence of objects of interest in the image and sharply delineate them from their background.
Finally, I will briefly show how we can use the Poisson-based shape representation in the task of action recognition. Our approach is based on the observation that in video sequences a human action generates a space-time shape in the space-time volume. These shapes are induced by concatenation of 2D silhouettes in the space-time volume and contain both the spatial information about the pose of the human figure at any time, as well as the dynamic information. Our method utilizes properties of the Poisson solution to extract space-time features such as local space-time saliency, action dynamics, shape structure and orientation. We show that these features are useful for action recognition, detection and clustering.