Home Bookmarks Papers Blog

Coresets for k-Means and k-Median Clustering and their Applications $\renewcommand{\Re}{{\rm I\!\hspace{-0.025em} R}} \newcommand{\eps}{{\varepsilon}}% \newcommand{\Coreset}{{\mathcal{S}}} $

Sariel Har-Peled and Soham Mazumdar

In this paper, we show the existence of small coresets for the problems of computing $k$-median and $k$-means clustering for points in low dimension. In other words, we show that given a point set $P$ in $\Re^d$, one can compute a weighted set $\Coreset \subseteq P$, of size $O(k \eps^{-d} \log{n})$, such that one can compute the $k$-median/means clustering on $\Coreset$ instead of on $P$, and get an $(1+\eps)$-approximation. As a result, we improve the fastest known algorithms for $(1+\eps)$-approximate $k$-means and $k$-median. Our algorithms have \emph{linear} running time for a fixed $k$ and $\eps$. In addition, we can maintain the $(1+\eps)$-approximate $k$-median or $k$-means clustering of a stream when points are being \emph{only inserted}, using polylogarithmic space and update time.


  author       = {S. {Har-Peled} and S.~{Mazumdar}},
  booktitle    = STOC_2004,
  title =        {Coresets for $k$-Means and $k$-Median Clustering
                  and their Applications},
  year =         2004,
  pages =        {291--300}

Last modified: Wed Jun 16 11:41:25 CDT 2004