Coresets for k-Means and k-Median
Clustering and their Applications

Sariel Har-Peled and
Soham Mazumdar

In this paper, we show the existence of small coresets for the
problems of computing k-median and k-means clustering
for points in low dimension. In other words, we show that given
a point set P in R^{d}, one can compute
a weighted set \Coreset \subseteq P, of size O(k
µ^{-d}
log n), such that one can compute the k-median/means
clustering on \Coreset instead of on P, and get
an (1+µ)-approximation.

As a result, we improve
the fastest known algorithms for (1+µ)-approximate
k-means and k-median. Our algorithms have \emph
linear running time for a fixed k and µ. In
addition, we can maintain the (1+µ)-approximate
k-median
or k-means clustering of a stream when points are being
\emph only inserted , using polylogarithmic space and constant
update time.

@inproceedings{hm-ckmkm-04,
author = {S. {Har-Peled} and S.~{Mazumdar}},
booktitle = STOC_2004,
title = {Coresets for $k$-Means and $k$-Median Clustering
and their Applications},
year = 2004,
pages = {291--300}
}