Gomes Meetings Log

From Vision Wiki
Jump to navigation Jump to search

PP, Gomes - Meeting on Feb 25, 2008

Things to be done and open problems:

  • Introduce a model of time-correlation: the category of the input sample at time t is correlated to the category of input sample at time t-1.
  • Reduce automatically the dimensionality of the input data: at the beginning when few input samples have been observed should use low dimensional description (e.g. by random projection). Later on, when we have more categories and more training examples, it may be better to increase the dimensionality gradually. How do we transfer knowledge when we increase the dimensionality?
  • Learning geometric relationships across segments.
  • Adopt a more sophisticated model for the blobs: each blob should be a mixture of Gaussians, rather than one Gaussian. The Gaussians in the mixture should `hang together' preferentially. See e.g. Chris Williams' hierarchical Gaussian mixture model (hGmm).


PP, Gomes - Observation on Dec 1 2006

  • There may be an alternative approach based on Particle filtering. Short term memory holds not data points, but particles: weighted samples from the posterior distribution after each update.
  • [1]

PP, Gomes - Oct 20 2006

Synopsis of approaches

  • Point-based approaches
    • last M points are kept in short term memory (stm)
    • Entropy-based:
      • Discard points with lowest assignment entropy (we know that this is suboptimal)
      • Minimize entropy of model parameters (parameters of the groups). Or, derive rule by minimizing KL divergence. Ryan is working on this.
  • Sufficient-statistics approaches
    • Fine-coarse clustering approach: cluster using \alpha so that all the memory is used up, then when an answer is needed, use the sufficient statistics of the fine-grained clusters to compute the coarse grained clusters.

PP, Gomes - Oct 13 2006

  • If one takes the `old' point of view of keeping data points in memory (not models), then there are various ways to decide which points to keep:
    • Max entropy (currently implemented). This has the drawback that it likes points in between clusters, but these points may be well modeled and do not need to be kept in memory
    • Points that are `captured' by the background clutter cluster. Assume a uniform pdf for the background and look for points that are better explained by that model, rather than by the combination of clusters estimated so far.
    • Choose the points to be discarded from memory in a way that minimizes the KL distance between the `true' model and the estimate of the model to be obtained in the next step. Is this equivalent to the mutual information criterion?
  • Possible experiments to present to CVPR07:
  1. Based on Russell et al's paper on discovering new categories: PDF. Can we make the process causal and incremental?
  2. Based on Ziserman's face cataloguing experiments in films and TV. Make incremental.

Plan for conference papers:

CVPR 07

  • Application to one or two vision problems: discovering multiple categories, analysis of surveillance video ...
  • Assume constant memory budget.
  • Gibbs sampler (for safety)
  • Dumb fine-to-coarse clustering transition

NIPS 07

  • efficient fine-to-coarse grained clustering transition
  • cost of memory vs benefit of better representation tradeoff
  • variational implementation

PP, Gomes, Koch - Oct 6 2006

  • One way to look at the problem is thinking about a hierarchy of `clusters': the fine-grained ones keep track of sufficient statistics which may be used to modify the coarser grained ones.
  • Given a certain memory budget we should as many groups as memory allows. Therefore, we may titrate (m for memory) so that the number of groups matches the available memory. At any point, given the `correct' prior , we may synthesize our best interpretation of the data, starting from the groups formed using .
  • A more general formulation of the memory budget problem is, rather than imposing a firm bound on memory, to assume a cost for each amount of memory. E.g. for a computer with 1Gb of RAM, the cost may be modest for up to 0.5Gb, and it rapidly increases to infinity as 1Gb is approached (the computer still needs RAM to function, so we cannot give it all away). This cost is balanced by the benefit of using more memory: by using more groups we may better keep track of the data as it arrives and form better models at the end. If we were able to measure this benefit, then we would have a rational trade-off policy.
  • Ryan brings up the `Information Bottleneck' paper by Bialek and Tishby. Perhaps this is the way to think about the benefit of using more memory?
  • Technical question: given the sufficient statistics of a fine-grained clustering of the data, how do we obtain the coarse-grained clustering that we are looking for? The most obvious technique is to use the fine-grained clusters as generative models: produce representative data and use that data to derive the final coarse-grained clustering. But is there a more `direct' way to obtain the coarse-grained cluster?


PP, Koch, Gomes - Aug 21, 2006

Updated Koch on Aug 17 discussion. Agreed that we would meet weekly on Fridays 11-12.

Koch mentions the FAA data. Ryan will obtain it from Jonathan.

PP & Gomes - Aug 17 2006

Short-term memory incremental EM

  • In traditional EM the data are processed batch. I.e. given the observations one applies EM to calculate a model with parameters .
  • In incremental EM (as well as in the Kalman filter and other Bayesian methods) one updates the model incrementally with each new piece of data. I.e. at time t one has the model and a data point and from these one calculates the new model .
  • One could explore a short-term memory version of EM which is similar to incremental EM with the addition of a limited size buffer containing a meaningful selection of past data (m is a set of indices). At each iteration the model, as well as short-term memory, are updated from three pieces of information: . Many questions arise from this formulation:
    • Suppose that one had a pre-assigned sequence of , how would one optimally update the model in a EM-like fashion?
    • Suppose now that one was allowed to update at each iteration, what would be the best strategy? (Of course, one can only kick out one of the elements of , and/or if the short term memory is not completely full, one may add to it.)
    • How does the size of affect the computational cost and the algorithm's performance?
    • Suppose now that one was allowed to update the structure of the model, e.g. using a Dirichelet prior on the n. of components, what would one do?

Temporal and spatial coherence in Dirichlet process

are the (multivariate) data points, and are a set of assignment variables. specifies that the n-th datapoint is generated by the j-th mixture component.

specifies the joint probablity distribution of the data and assignments in terms of a likelihood function (for example a Gaussian) and a prior function P(Z) on the assignments. The Dirichlet process is a non-parametric prior on the component assignments.

  • Consider the Chinese restaurant process studied by Blei and Jordan
  • Suppose that one had a notion of `distance' for the points that join the restaurant: i.e. a red point is more likely to join a table with many red customers (this problem was probably already solved by Jordan's group - see Fei Fei - check out her paper).
    • Spatial coherence is already expressed by the likelihood function , so it may not be necessary nor desirable to alter the Dirichlet prior to take it into account. Ryan 12:55, 21 Aug 2006 (PDT)

How do we put all this together? Once done, add the short-term-memory idea above.