Evgeniy Bart's Wiki
Compression / description length
I computed the description length of the data (and the model) as a function of the number of levels. The description length consists of the following parts: describe the tree; describe the topics; for each detection, describe to which (a) level, (b) topic, and (c) visual word it belongs. Below, I summarize the total description length and several sub-parts as a function of the number of levels in the taxonomy.
Synthetic data
1 level | 2 levels | 3 levels | 4 levels | |
---|---|---|---|---|
total | 126K | 109K | 115K | 122K |
level | 0 | 21K | 33K | 42K |
topic | 85K | 45K | 41K | 41K |
word | 40K | 42K | 39K | 37K |
tree | 0 | 0.3K | 0.6K | 1K |
Corel 300
1 level | 2 levels | 3 levels | 4 levels | |
---|---|---|---|---|
total | 823K | 578K | 604K | 633K |
level | 0 | 104K | 165K | 208K |
topic | 523K | 284K | 249K | 230K |
word | 299K | 164K | 162K | 166K |
tree | 0 | 2K | 3K | 5K |
1 level | 2 levels | 3 levels | 4 levels | |
---|---|---|---|---|
total | 823K | 474K | 439K | 425K |
topic | 523K | 284K | 249K | 230K |
word | 299K | 164K | 162K | 166K |
tree | 0 | 2K | 3K | 5K |
Corel 500
1 level | 2 levels | 3 levels | 4 levels | |
---|---|---|---|---|
total | 1412K | 1010K | ||
level | 0 | 173K | ||
topic | 882K | 401K | ||
word | 529K | 408K | ||
tree | 0 | 2K |
Perplexity
Conclusions
- What we consider a 'good taxonomy' or 'ground truth' is not necessarily the best fit in the TAX model. Other models may have better perplexity.
- The greedy initialization algorithm indeed starts at better perplexity compared to random initialization (-3.49 for greedy compared to -3.81 for random). But it is difficult to get out of this local minimum.
Computation
For each training image, I put half the samples into the training set and hold out another half. I then compute the perplexity of the training samples (called 'train train perplexity') and the perplexity of the held-out samples (called 'train hold-out perplexity'). I also have a set of test images. I fold in these images into the existing taxonomy using half the samples in each test image and holding out another half. The perplexity of the first half is called 'test fold-in perplexity', and the perplexity of the second half is called 'test hold-out perplexity'.
Experiments
Ground truth
Here's a ground truth synthetic taxonomy. Its train hold-out perplexity is -3.31. If the TAX model is initialized at this ground truth, it does not diverge away from it, i.e. it is a local minimum.
The test fold-in perplexity here is -3.23, and the test hold-out perplexity is -3.24. Note that this is lower than the ground truth perplexity of -3.31. Since the testing images were folded in ignoring the ground truth assignments, this indicates that ground truth is not the global minimum.
Random init
Here is a taxonomy learned when the TAX model was initialized at random. As can be seen, the three classes are separated well, but the model failed to put two of them together at the second level. The train hold-out perplexity of this taxonomy is -3.34. The image below shows perplexity as function of iteration.
Ad hoc init
Here is an example of a taxonomy initialized using our greedy algorithm. As can be seen, the classes are not well separated. The train hold-out perplexity is -3.46. The image below shows perplexity as function of iteration.
To do
- Use Sivic trick
- Affinity propagation http://www.psi.toronto.edu/affinitypropagation/
Log
Corel 1 after 150 iterations here already looks very similar to the final version here (after 300 iterations). So I'll look at interrupted runs and discard the ones that don't look promising.