Most people need textual or visual interfaces to help them make sense of Semantic Web data. In this book, the author investigates the problems associated with generating natural language summaries for structured data encoded as triples using deep neural networks.
An end-to-end trainable architecture is proposed, which encodes the information from a set of knowledge graph triples into a vector of fixed dimensionality, and generates a textual summary by conditioning the output on this encoded vector. Different methodologies for building the required data-to-text corpora are explored to train and evaluate the performance of the approach. Attention is first focused on generating biographies, and the author demonstrates that the technique is capable of scaling to domains with larger and more challenging vocabularies.
The applicability of the technique for the generation of open-domain Wikipedia summaries in Arabic and Esperanto – two under-resourced languages – is then discussed, and a set of community studies, devised to measure the usability of the automatically generated content by Wikipedia readers and editors, is described.
Finally, the book explains an extension of the original model with a pointer mechanism that enables it to learn to verbalise in a different number of ways the content from the triples while retaining the capacity to generate words from a fixed target vocabulary. The evaluation of performance using a dataset encompassing all of English Wikipedia is described, with results from both automatic and human evaluation both of which highlight the superiority of the latter approach as compared to the original architecture.