Piotr Dollar
Bag of features and related representations have been used extensively in computer vision and audition, for tasks such as object recognition and speaker identification. In these domains, representing a data sample as an unordered set of feature vectors rather than as a single feature vector can significantly improve classification performance. Existing methods for bag classification, including bag of features representations as well as kernel methods, either implicitly or explicitly rely on the fact that computing distances between sets can be reduced to computing distances between instances. In this paper we describe a general framework for supervised bag classification, called Multiple Component Learning (mcl), that does not rely on any a priori measure of distance between instances. We show the mcl can be reduced to the well studied Multiple Instance Learning (mil) problem, and develop an effective boosting algorithm for learning mcl classifiers. The resulting approach is robust and can potentially be applied to a wide variety of domains. We show comparative studies for a number of different problems both in the audio and image domains, and the results are promising.