Home | Bookmarks | Papers | Blog |

Sepideh Mahabadi, and Sariel Har-Peled.

We introduce a new variant of the nearest neighbor search problem, which allows for some coordinates of the dataset to be arbitrarily corrupted or unknown. Formally, given a dataset of $n$ points $\PntSet=\brc{ \pnt_1,\ldots, \pnt_n}$ in high-dimensions, and a parameter $k$, the goal is to preprocess the dataset, such that given a query point $\query$, one can compute quickly a point $\pnt \in \PntSet$, such that the distance of the query to the point $\pnt$ is minimized, when ignoring the ``optimal'' $k$ coordinates. Note, that the coordinates being ignored are a function of both the query point and the point returned.

We present a general reduction from this problem to answering \ANN queries, which is similar in spirit to \LSH (locality sensitive hashing) \cite{im-anntr-98}. Specifically, we give a sampling technique which achieves a bi-criterion approximation for this problem. If the distance to the nearest neighbor after ignoring $k$ coordinates is $\rr$, the data-structure returns a point that is within a distance of $O(\rr)$ after ignoring $O(k)$ coordinates. We also present other applications and further extensions and refinements of the above result.

The new data-structures are simple and (arguably) elegant, and should be practical -- specifically, all bounds are polynomial in all relevant parameters (including the dimension of the space, and the robustness parameter $k$).

PDF.

Slides of talk given in seminar in TAU (11/9/2016).

Last modified: Tue Nov 3 14:06:01 CST 2015