Home Bookmarks Papers Blog


#### Sariel Har-Peled and Soham Mazumdar

In this paper, we show the existence of small coresets for the problems of computing $k$-median and $k$-means clustering for points in low dimension. In other words, we show that given a point set $P$ in $\Re^d$, one can compute a weighted set $\Coreset \subseteq P$, of size $O(k \eps^{-d} \log{n})$, such that one can compute the $k$-median/means clustering on $\Coreset$ instead of on $P$, and get an $(1+\eps)$-approximation. As a result, we improve the fastest known algorithms for $(1+\eps)$-approximate $k$-means and $k$-median. Our algorithms have \emph{linear} running time for a fixed $k$ and $\eps$. In addition, we can maintain the $(1+\eps)$-approximate $k$-median or $k$-means clustering of a stream when points are being \emph{only inserted}, using polylogarithmic space and update time.

PDF.

@inproceedings{hm-ckmkm-04,
author       = {S. {Har-Peled} and S.~{Mazumdar}},
booktitle    = STOC_2004,
title =        {Coresets for $k$-Means and $k$-Median Clustering
and their Applications},
year =         2004,
pages =        {291--300}
}