Lorenzo Torresani
Learning models from visual data for 3D tracking, recognition, and animation.
In this talk I will describe methods for the acquisition of computational models from visual data. In computer graphics, sophisticated models are necessary for simulation of real phenomena. In computer vision, visual models are used as a form of prior knowledge to disambiguate the otherwise ill-posed problem of image understanding. Traditionally, these models are manually constructed. My work proposes the application of machine learning algorithms that can automatically extract highly detailed models from visual observations.
I will begin with an algorithm for recovering non-rigid 3D models from image streams, without the use of training data or any prior knowledge about the modes of deformation of the object. I will describe how this method can be used to reconstruct subtle human body deformations in 3D space from single-view video under severe cases of occlusion and variable illumination.
I will then describe a technique for learning low-dimensional representations of high-dimensional data, such as images consisting of thousands of pixels, for the purpose of classification. When applied to a set of face images with identity labels, this algorithm automatically extracts the features that are most salient for the purpose of identification and discards visual effects unimportant for recognition, such as non-uniform illumination and facial expressions.
I will conclude by presenting a system for motion style editing based on a style model learned from human perceptual observations and motion capture data. This system enables users to create novel versions of pre-recorded motion sequences in desired styles.