Evgeniy Bart's Wiki

From Vision Wiki
Jump to navigation Jump to search

Compression / description length

I computed the description length of the data (and the model) as a function of the number of levels. The description length consists of the following parts: describe the tree; describe the topics; for each detection, describe to which (a) level, (b) topic, and (c) visual word it belongs. Below, I summarize the total description length and several sub-parts as a function of the number of levels in the taxonomy.

Synthetic data

1 level 2 levels 3 levels 4 levels
total 126K 109K 115K 122K
level 0 21K 33K 42K
topic 85K 45K 41K 41K
word 40K 42K 39K 37K
tree 0 0.3K 0.6K 1K

Corel 300

1 level 2 levels 3 levels 4 levels
total 823K 578K 604K 633K
level 0 104K 165K 208K
topic 523K 284K 249K 230K
word 299K 164K 162K 166K
tree 0 2K 3K 5K


1 level 2 levels 3 levels 4 levels
total 823K 474K 439K 425K
topic 523K 284K 249K 230K
word 299K 164K 162K 166K
tree 0 2K 3K 5K

Corel 500

1 level 2 levels 3 levels 4 levels
total 1412K 1010K
level 0 173K
topic 882K 401K
word 529K 408K
tree 0 2K

Perplexity

Conclusions

  • What we consider a 'good taxonomy' or 'ground truth' is not necessarily the best fit in the TAX model. Other models may have better perplexity.
  • The greedy initialization algorithm indeed starts at better perplexity compared to random initialization (-3.49 for greedy compared to -3.81 for random). But it is difficult to get out of this local minimum.

Computation

For each training image, I put half the samples into the training set and hold out another half. I then compute the perplexity of the training samples (called 'train train perplexity') and the perplexity of the held-out samples (called 'train hold-out perplexity'). I also have a set of test images. I fold in these images into the existing taxonomy using half the samples in each test image and holding out another half. The perplexity of the first half is called 'test fold-in perplexity', and the perplexity of the second half is called 'test hold-out perplexity'.

Experiments

Ground truth

Here's a ground truth synthetic taxonomy. Its train hold-out perplexity is -3.31. If the TAX model is initialized at this ground truth, it does not diverge away from it, i.e. it is a local minimum.

The test fold-in perplexity here is -3.23, and the test hold-out perplexity is -3.24. Note that this is lower than the ground truth perplexity of -3.31. Since the testing images were folded in ignoring the ground truth assignments, this indicates that ground truth is not the global minimum.

Random init

Here is a taxonomy learned when the TAX model was initialized at random. As can be seen, the three classes are separated well, but the model failed to put two of them together at the second level. The train hold-out perplexity of this taxonomy is -3.34. The image below shows perplexity as function of iteration.

RandomInitPerplexity.png

Ad hoc init

Here is an example of a taxonomy initialized using our greedy algorithm. As can be seen, the classes are not well separated. The train hold-out perplexity is -3.46. The image below shows perplexity as function of iteration.

AdHocInitPerplexity.png

To do

  • Use Sivic trick


Log

Corel 1 after 150 iterations here already looks very similar to the final version here (after 300 iterations). So I'll look at interrupted runs and discard the ones that don't look promising.