Similarity Search in High-Dimensional Vector Spaces

Weber, R.
Pub. date
January 2001
74 of Dissertations in Database and Information Systems
ISBN print
Computer & Communication Sciences, Computer Science

With the example of an image database and the notion of relevance feedback, the dissertation addresses the important and challenging problem of identifying the most similar objects in a database given a set of reference objects and and a set of features. Similarity search is typically implemented as "Nearest Neighbor Search" (NN- Search) in high-dimensional vector spaces. Although a large number of existing work have provided solutions for the NN-Search problem, most of them have not taken the specific problems of high dimensionality explicitly into account. This work carefully investigates the so-called "Curse of Dimensionality", and presents a novel, flat organization for NN-Search optimized for high-dimensional spaces: the so-called "Vector Approximation File" (VA-File). The superiority of the VA-File over conventional indexing structures is shown theoretically and with extensive experiments. Further refinements of the VA-File discussed in the current work include approximate search and parallel search in a cluster of workstations. From a practical perspective, the dissertation provides an indexing technique that allows for interactive-time
similarity search even in huge databases with gigabytes of vector data. As such, it is the most favorable indexing technique for future multimedia retrieval systems.