![]() |
||||
|
|
GRASP Seminar Series: Spring 2006March 3, 11:00 a.m., Wu & Chen Auditorium Andrew Zisserman "Object Recognition Using Bags of Visual Words" Abstract: There has been much recent research activity - and much recent success - in recognizing particular objects and object categories (such as cars, faces, motorbikes) in images and videos. The success has come from representing objects by sets of local iconic image patches, where each patch may be thought of as a "visual word" for describing part of the object. Surprizingly object categories can be recognized without including the spatial organization/location of the patches, and these models are referred to as a "bag of words" in analogy with similar models in the statistical text literature. In the first part of the talk I'll describe an approach to searching for and localizing all the occurrences of an object in a video. The object is represented by a set of visual words that enable recognition to proceed successfully despite changes in viewpoint, illumination and partial occlusion. By pushing this analogy with textual representation, efficient methods from text retrieval can be employed to retrieve shots containing the object in the manner of a Google search of the web. The methods will be demonstrated on several feature length films. In the second part, I'll describe how object categories
can be learnt from sets of visual words by fitting a probabilistic Latent
Semantic Analysis (pLSA) model - a model again borrowed from the statistical
text literature.
|
|||
|
|
||||