The Theory and Practice of Discourse Parsing and
Summarization
Until now, most discourse researchers have assumed that full semantic
understanding is necessary to derive the discourse structure of
texts. This book documents the first serious attempt to construct
automatically and use nonsemantic computational structures for text
summarization. Daniel Marcu develops a semantics-free theoretical
framework that is both general enough to be applicable to naturally
occurring texts and concise enough to facilitate an algorithmic
approach to discourse analysis. He presents and evaluates two
discourse parsing methods: one uses manually written rules that
reflect common patterns of usage of cue phrases such as "however" and
"in addition to"; the other uses rules that are learned automatically
from a corpus of discourse structures. By means of a psycholinguistic
experiment, Marcu demonstrates how a discourse-based summarizer
identifies the most important parts of texts at levels of performance
that are close to those of humans.
Marcu also discusses how the automatic derivation of discourse
structures may be used to improve the performance of current natural
language generation, machine translation, summarization, question
answering, and information retrieval systems.