Iasonas Kokkinos
Towards Bridging Bottom-Up and Top-Down Vision with Hierarchical Compositional Models
A challenging question for computer vision is how one can combine the efficiency of bottom-up computation and the tractability of top-down inference. In this talk I will present on-going work on the intimately related problem of `object parsing', namely composing the structures constituting an object, while starting from a token-based image representation.
My talk will be on how this can be accomplished based on a hierarchical compositional object representation. The main contribution lies in controlling the problem's combinatorial complexity by driving the search for bottom-up combinations based on how well they can become parts of a parse of the whole object. This is accomplished in the setting of the A* algorithm, by using top-down information to prioritize search. Specifically, I use a formulation of composition rules that allows us to rapidly compute coarse solutions to the parsing problem. These are then used by A* as heuristics that lower bound the parsing cost and rule out unpromising search directions.
This approach is shown to simultaneously localize and parse objects in challenging images containing heavy clutter, and is experimentally validated on several object detection benchmarks.