Efficient source selection is one of the most important optimization steps in federated SPARQL query processing. An overestimation of sources increases the network traffic, leads to irrelevant intermediate results, and can significantly affect the overall query processing time. Previous works have focused on generating optimized query execution plans for fast result retrieval. However, devising join-aware source selection approaches has not received much attention. Similarly, only little attention has been paid to the effect of duplicated data on federated querying. This book presents solutions to the join-aware source selection as well as duplicate-aware federated querying over the Web of Data.
Benchmarking is indispensable when aiming to assess technologies with respect to their suitability for given tasks. While several benchmarks have been developed to evaluate federated SPARQL engines and triple stores, they mostly provide a one-fits-all solution to the benchmarking problem. This approach to benchmarking is however unsuitable to evaluate the performance of a triple store for a given application with particular requirements. We address these drawbacks by presenting an automatic approach for the generation of benchmarks out of real query logs.
The book will be of interest to all those working on these two key areas of federated SPARQL query processing. The tools presented in this book are open source.