Natural Language Generation (NLG) systems in interactive settings often face a multitude of choices, given that the communicative effect of each utterance they generate depends crucially on the interplay between its situational circumstances, addressee and interaction history. This is particularly true in interactive and situated settings. Traditionally, the generation process has been divided into distinct stages of decision making, e.g. content selection, utterance planning and surface realization. However, this sequential model does not account for the interdependencies that exist among these stages, which in practice can become manifest in inefficient, ineffective communication and an increased cognitive load for the user.
This book presents a joint optimization framework for NLG in dialogue that is based on Hierarchical Reinforcement Learning and learns the best utterance for a context by optimization through trial and error. The joint model considers decisions at different NLG stages interdependently and produces more context-sensitive utterances than a model that considers decisions in isolation. To enhance the human-likeness of the presented framework, we integrate graphical models, trained from human data, as generation space models for natural surface realization. The proposed technique is evaluated in a study with human participants and confirms that the hierarchical learner is able to learn an adaptive policy that leads to smooth and successful interactions. Results also suggest that a joint optimization leads to substantially higher user satisfaction and task success and is better perceived by human users than its isolated counterpart.