Storage and Retrieval of XML Documents with a Cluster of Database Systems
XML – short for the W3C eXtended Markup Language – is highly successful as a format for data interchange. So far, the focus with XML has been on data-centric settings, i.e., XML documents with strict and regular structure. However, this disregards many important settings that require textual or semi-structured data with little or flexible structure. XML, however, is flexible enough to cover these so-called document-centric settings in addition to data-centric ones. This book presents an XML engine for storage and retrieval of XML documents which covers the full range from data-centric to document-centric applications on a single integrated platform. It proposes to extend data-centric XML query languages such as W3C XPath with document-centric functionality needed for relevance-oriented ranked retrieval on XML documents. Moreover, it investigates transaction management for concurrent XML processing and contributes a novel locking protocol that allows for higher concurrency and more parallelism than off-the-shelf database transaction management. To make XML storage and retrieval efficient and highly scalable, both data-centric and document-centric XML contents are stored on a cluster of relational database systems. The overall result is a scalable infrastructure for storage and retrieval of XML documents with up-to-date retrieval results supporting state-of-the-art ranked retrieval models.