Sinisa Todorovic
WHAT DO THOSE IMAGES HAVE IN COMMON?
Beckman Institute
University of Illinois at Urbana-Champaign
This talk is about discovering and modeling previously unspecified, recurring themes in a given set of arbitrary images. Given a set of images containing frequent occurrences of objects from multiple categories, the goal is to learn a compact model of the categories as well as their relationships, for the purposes of later recognizing/segmenting any occurrences in new images. Categories are not defined by the user. Also, whether and where instances of any categories appear in a specific image is not known. This problem is challenging, since it involves the following unanswered questions. What is an object category? What image properties should be used and how to combine them to discover category occurrences? What is an efficient multicategory representation?
We will examine a methodology, developed during my postdoctoral work at UIUC, which addresses these questions when objects are characterized in 2D. A category is defined as a set of 2D objects ( i.e., subimages) sharing photometric, geometric and topological properties of their constituent regions (e.g., color, area, shape, spatial layout, and recursive embedding of regions). Each image is represented by a segmentation tree whose nodes correspond to image regions at all natural scales present, and edges between tree nodes capture the embedding of small regions within larger ones. The presence of any categories in the image set is then reflected in the frequent occurrence of similar subtrees within the image segmentation trees. Our methodology is designed to: (1) match image trees to find similar subtrees; (2) discover categories by clustering similar subtrees, and use the properties of each cluster to learn the model of the associated category; and (3) learn the grammar of the discovered categories that compactly captures their recursive definitions in terms of other simpler (sub)categories and their relationships ( e.g., containment, co-occurrence, and sharing of simple categories by more complex ones). When a new image is encountered, its segmentation tree is matched against the learned grammar to simultaneously recognize and segment all occurrences of the learned categories. This matching also provides a semantic explanation of recognition in terms of the identified subcategories ( i.e., object parts) along with their spatial relationships.
The aforementioned methodology can also be used for identifying recurring image themes of more general kind. An example is that of extracting the stochastically repeating, elementary parts of image texture ( e.g., waterlilies on the water surface, people in a crowd).