Knight, K. and Langkilde, I. 2000. Preserving Ambiguities in Generation via Automata Intersection. American Association for Artificial Intelligence conference (AAAI'00)
PostScript
We discuss the problem of generating text that preserves certain
ambiguities, a capability that is useful in applications such as
machine translation. We show that it is relatively simple to extend a
hybrid symbolic/statistical generator to do ambiguity preservation.
The paper gives algorithms and examples, and it discusses practical
linguistic difficulties that arise in ambiguity preservation.
Langkilde, I. 2000. Forest-Based Statistical Sentence Generation. Association for Computational Linguistics conference, North American
chapter (NAACL'00).
PostScript
This paper presents a new approach to statistical sentence generation
in which alternative phrases are represented as packed sets of trees,
or forests, and then ranked statistically to choose the best one.
This representation offers advantages in compactness and in the
ability to represent syntactic information. It also facilitates more
efficient statistical ranking than a previous approach to statistical
generation. An efficient ranking algorithm is described, together
with experimental results showing significant improvements over simple
enumeration or a lattice-based approach.
Langkilde, I. and Knight, K. 1998. The Practical Value of N-Grams in Generation. Proceedings of the International Natural Language Generation Workshop. Niagra-on-the-Lake, Ontario.
PostScript
We examine the practical synergy between symbolic and statistical
language processing in a generator called Nitrogen. The analysis
provides insight into the kinds of linguistic decisions that bigram
frequency statistics can make, and how it improves scalability.
We also discuss the limits of bigram statistical knowledge.
We focus on specific examples of Nitrogen's output.
Langkilde, I. and Knight, K. 1998. Generation that Exploits Corpus-based Statistical Knowledge. Proceedings of the ACL/COLING-98. Montreal, Quebec.
We describe novel aspects of a new natural language generator called
Nitrogen. This generator has a highly flexible input representation
that allows a spectrum of input from syntactic to semantic depth, and
shifts the burden of many linguistic decisions to the statistical
post-processor. The generation algorithm is compositional, making it
efficient, yet it also handles non-compositional aspects of language.
Nitrogen's design makes it robust and scalable, operating with lexicons
and knowledge bases of one hundred thousand entities.
Knight, K. and Hatzivassiloglou, V. 1995. Two-Level, Many-Paths Generation. Proceedings of the ACL-95. Cambridge.
PostScript