Similarity Search in High-Dimensional Vector Spaces
- Editor
- Weber, R.
- Pub. date
- January 2001
- Pages
- 210
- Binding
- softcover
- Volume
- 74 of Dissertations in Database and Information Systems
- ISBN
- 978-1-58603-177-0
- Subject
- Computer & Communication Sciences, Computer Science
With the example of an image database and the notion of relevance feedback, the dissertation addresses the important and challenging problem of identifying the most similar objects in a database given a set of reference objects and and a set of features. Similarity search is typically implemented as "Nearest Neighbor Search" (NN- Search) in high-dimensional vector spaces. Although a large number of existing work have provided solutions for the NN-Search problem, most of them have not taken the specific problems of high dimensionality explicitly into account. This work carefully investigates the so-called "Curse of Dimensionality", and presents a novel, flat organization for NN-Search optimized for high-dimensional spaces: the so-called "Vector Approximation File" (VA-File). The superiority of the VA-File over conventional indexing structures is shown theoretically and with extensive experiments. Further refinements of the VA-File discussed in the current work include approximate search and parallel search in a cluster of workstations. From a practical perspective, the dissertation provides an indexing technique that allows for interactive-time
similarity search even in huge databases with gigabytes of vector data. As such, it is the most favorable indexing technique for future multimedia retrieval systems.