Home Bookmarks Papers Blog

Coresets for k-Means and k-Median Clustering and their Applications

Sariel Har-Peled and Soham Mazumdar

In this paper, we show the existence of small coresets for the problems of computing k-median and k-means clustering for points in low dimension. In other words, we show that given a point set P in Rd, one can compute a weighted set \Coreset \subseteq P, of size O(k µ-d log n), such that one can compute the k-median/means clustering on \Coreset instead of on P, and get an (1+µ)-approximation.

As a result, we improve the fastest known algorithms for (1+µ)-approximate k-means and k-median. Our algorithms have \emph linear running time for a fixed k and µ. In addition, we can maintain the (1+µ)-approximate k-median or k-means clustering of a stream when points are being \emph only inserted , using polylogarithmic space and constant update time.

Postscript, PDF.

  author       = {S. {Har-Peled} and S.~{Mazumdar}},
  booktitle    = STOC_2004,
  title =        {Coresets for $k$-Means and $k$-Median Clustering
                  and their Applications},
  year =         2004,
  pages =        {291--300}

Last modified: Wed Jun 16 11:41:25 CDT 2004