Proximity in the age of distraction: Robust approximate nearest neighbor search $\newcommand{\query}{{{q}}}% \newcommand{\kalg}{{k_{\mathrm{alg}}}}% \newcommand{\kopt}{{k_{\mathrm{opt}}}}% \newcommand{\algset}{{T}} \renewcommand{\Re}{\mathbb{R}}% \newcommand{\eps}{\varepsilon} \newcommand{\PntSet}{{P}} \newcommand{\brc}[1]{\left\{ {#1} \right\}} \newcommand{\pnt}{{{x}}}% \newcommand{\rr}{r} \newcommand{\pth}[2][\!]{#1\left({#2}\right)} \newcommand{\npoints}{n} \newcommand{\ballD}{\mathsf{b}} \newcommand{\dataset}{{P}}$

We introduce a new variant of the nearest neighbor search problem, which allows for some coordinates of the dataset to be arbitrarily corrupted or unknown. Formally, given a dataset of $n$ points $\PntSet=\brc{ \pnt_1,\ldots, \pnt_n}$ in high-dimensions, and a parameter $k$, the goal is to preprocess the dataset, such that given a query point $\query$, one can compute quickly a point $\pnt \in \PntSet$, such that the distance of the query to the point $\pnt$ is minimized, when ignoring the optimal'' $k$ coordinates. Note, that the coordinates being ignored are a function of both the query point and the point returned.
We present a general reduction from this problem to answering \ANN queries, which is similar in spirit to \LSH (locality sensitive hashing) \cite{im-anntr-98}. Specifically, we give a sampling technique which achieves a bi-criterion approximation for this problem. If the distance to the nearest neighbor after ignoring $k$ coordinates is $\rr$, the data-structure returns a point that is within a distance of $O(\rr)$ after ignoring $O(k)$ coordinates. We also present other applications and further extensions and refinements of the above result.
The new data-structures are simple and (arguably) elegant, and should be practical -- specifically, all bounds are polynomial in all relevant parameters (including the dimension of the space, and the robustness parameter $k$).