Abstract: Automatic
analysis of videos is one of most challenging problems in Computer vision. In
this talk I will introduce the problem of action, event, and activity
representation and recognition from video sequences. I will begin by giving a
brief overview of a few interesting methods to solve this problem, including
trajectories, volumes, and local interest points based representations.
The
main part of the talk will focus on a newly developed framework for the
discovery and statistical representation of motion patterns in videos, which
can act as primitive, atomic actions. These action primitives are employed as a
generalizable representation of articulated human actions, gestures, and facial
expressions. The motion primitives are learned by hierarchical clustering of
observed optical flow in four dimensional, spatial and motion flow space, and a
sequence of these primitives can be represented as a simple string, a
histogram, or a Hidden Markov model.
I
will then describe methods to extend the framework of motion patterns
estimation to the problem of multi-agent activity recognition. First, I will
talk about Similarity invariant matching of motion patterns in order to
recognize simple events in surveillance scenarios. I will end the talk by
presenting a framework in which a motion pattern represents the behavior of a
single agent, while multi-agent activity takes the form of a graph, which can
be compared to other activity graphs, by attributed inexact graph matching.
This method is applied to the problem of American football plays recognition.