Linked Data publishing has brought about a novel “Web of Data”: a wealth of diverse, interlinked, structured data published on the Web. These Linked Datasets are described using the Semantic Web standards and are openly available to all, produced by governments, businesses, communities and academia alike. However, the heterogeneity of such data – in terms of how resources are described and identified – poses major challenges to potential consumers.
Herein, use cases for pragmatic, lightweight reasoning techniques that leverage Web vocabularies (described in RDFS and OWL) to better integrate large scale, diverse, Linked Data corpora are examined. A test corpus of 1.1 billion RDF statements collected from 4 million RDF Web documents is taken and the use of RDFS and OWL analysed. The next part of the book details and evaluates scalable and distributed techniques for applying rule-based materialisation to translate data between different vocabularies, and to resolve coreferent resources that talk about the same thing. It is shown how such techniques can be made robust in the face of noisy and often impudent Web data. Also examined is a use case for incorporating a PagerRank-style algorithm to rank the trustworthiness of facts produced by reasoning, subsequently using those ranks to fix formal contradictions in the data. All the methods are validated against our real world, large scale, open domain, Linked Data evaluation corpus.