Home Bookmarks Papers Blog

How fast is the $k$-means Method?

Sariel Har-Peled and Bardia Sadri

We present polynomial upper and lower bounds on the number of iterations performed by Lloyd's method for $k$-means clustering. Our upper bounds are polynomial in the number of points, number of clusters, and the spread of the point set. We also present a lower bound, showing that in the worst case the $k$-means heuristic needs to perform $\Omega(n)$ iterations, for $n$ points on the real line and two centers. Surprisingly, our construction spread is polynomial. This is the first construction showing that the $k$-means heuristic requires more than a polylogarithmic number of iterations. We also present alternative algorithms, with guaranteed performance, which are simple variants of Lloyd's method. We also performed experimental study of the performance of those algorithms.

Postscript, PDF.
html (You would need a browser supporting mathml.)

Implementation and input data are available here.
Appeared in SODA 05.
Talk slides
  keyent       = {hs-hfkmm-04},
  author       = {S. {Har-Peled} and B. Sadri},
  title        = {How fast is the $k$-means Method?},
  journal      = ALGORITHMICA,
  Volume       = 41,
  Number       = 3,
  Pages        = {185--202},
  year         = 2005,
  month        = {January},
  url          = {http://www.uiuc.edu/p/03/lloyd_kmeans/}

Last modified: Sun Jan 11 13:18:41 CST 2015