K-Means cluster properties
Jump to navigation
Jump to search
Overview
- A brief investigation into the cluster sizes (no. of assigned test points) and cluster diameters (different measures).
- Note that this study considers only clusters of SIFT features extracted by Vedaldi's code from the COREL dataset.
Method
- Training points N = 50,000
- Centers K = 5,000
- Test points T = 100,000
- K-means algorithm
- Approximate K-means using k-d trees (Lowe's FLANN code)
- 4 trees
- 16 checks
- 15 iterations (of K-means algorithm)
- See /common/welinder/benchmarking/kmeans_kd-tree/test_kmeans_cluster_sizes.m for more info.
Results
Cluster sizes
- NOTE: the to find the NN in the vocabulary I use an exact NN algorithm, no approximations. (The approximations are only used in the K-means algorithm when building the vocabulary)
Cluster diameters vs cluster sizes
Dataset
- Note that the x-axis is not completely linear towards the end.
Testset
- Note that the x-axis is not completely linear towards the end.