Home Bookmarks Papers Blog

How fast is the k-means Method?

Sariel Har-Peled and Bardia Sadri


We present polynomial upper and lower bounds on the number of iterations performed by Lloyd's method for k-means clustering. Our upper bounds are polynomial in the number of points, number of clusters, and the spread of the point set. We also present a lower bound, showing that in the worst case the \kmeans{} heuristic needs to perform Omega(n) iterations, for n points on the real line and two centers. Surprisingly, our construction spread is polynomial. This is the first construction showing that the \kmeans{} heuristic requires more than a polylogarithmic number of iterations. We also present alternative algorithms, with guaranteed performance, which are simple variants of Lloyd's method. We also performed experimental study of the performance of those algorithms.

Postscript, PDF.
html (You would need a browser supporting mathml.)


Implementation and input data are available here.
Appeared in SODA 05.
Talk slides
@article{hs-hfkmm-05,
  keyent       = {hs-hfkmm-04},
  author       = {S. {Har-Peled} and B. Sadri},
  title        = {How fast is the $k$-means Method?},
  journal      = ALGORITHMICA,
  Volume       = 41,
  Number       = 3,
  Pages        = {185--202},
  year         = 2005,
  month        = {January},
  url          = {http://www.uiuc.edu/p/03/lloyd_kmeans/}
}

Last modified: Fri Feb 18 12:24:57 CST 2005