K-Means cluster properties

From Vision Wiki
Jump to navigation Jump to search

Overview

  • A brief investigation into the cluster sizes (no. of assigned test points) and cluster diameters (different measures).
  • Note that this study considers only clusters of SIFT features extracted by Vedaldi's code from the COREL dataset.

Method

  • Training points N = 50,000
  • Centers K = 5,000
  • Test points T = 100,000
  • K-means algorithm
    • Approximate K-means using k-d trees (Lowe's FLANN code)
    • 4 trees
    • 16 checks
    • 15 iterations (of K-means algorithm)
  • See /common/welinder/benchmarking/kmeans_kd-tree/test_kmeans_cluster_sizes.m for more info.

Results

Cluster sizes

Kmeans clusters1 size dist.png

  • NOTE: the to find the NN in the vocabulary I use an exact NN algorithm, no approximations. (The approximations are only used in the K-means algorithm when building the vocabulary)

Cluster diameters vs cluster sizes

Dataset

Kmeans clusters2 diameter vs size.png

  • Note that the x-axis is not completely linear towards the end.

Testset

Kmeans clusters3 diameter vs size.png

  • Note that the x-axis is not completely linear towards the end.