Description

Members

Publications

Demos

Funding

Research Home

Nitrogen
Hybrid Statistical-Symbolic Natural Language Generation

Publications

Knight, K. and Langkilde, I. 2000. Preserving Ambiguities in Generation via Automata Intersection. American Association for Artificial Intelligence conference (AAAI'00) PostScript

We discuss the problem of generating text that preserves certain ambiguities, a capability that is useful in applications such as machine translation. We show that it is relatively simple to extend a hybrid symbolic/statistical generator to do ambiguity preservation. The paper gives algorithms and examples, and it discusses practical linguistic difficulties that arise in ambiguity preservation.

Langkilde, I. 2000. Forest-Based Statistical Sentence Generation. Association for Computational Linguistics conference, North American chapter (NAACL'00). PostScript

This paper presents a new approach to statistical sentence generation in which alternative phrases are represented as packed sets of trees, or forests, and then ranked statistically to choose the best one. This representation offers advantages in compactness and in the ability to represent syntactic information. It also facilitates more efficient statistical ranking than a previous approach to statistical generation. An efficient ranking algorithm is described, together with experimental results showing significant improvements over simple enumeration or a lattice-based approach.

Langkilde, I. and Knight, K. 1998. The Practical Value of N-Grams in Generation. Proceedings of the International Natural Language Generation Workshop. Niagra-on-the-Lake, Ontario. PostScript

We examine the practical synergy between symbolic and statistical language processing in a generator called Nitrogen. The analysis provides insight into the kinds of linguistic decisions that bigram frequency statistics can make, and how it improves scalability. We also discuss the limits of bigram statistical knowledge. We focus on specific examples of Nitrogen's output.

Langkilde, I. and Knight, K. 1998. Generation that Exploits Corpus-based Statistical Knowledge. Proceedings of the ACL/COLING-98. Montreal, Quebec.

We describe novel aspects of a new natural language generator called Nitrogen. This generator has a highly flexible input representation that allows a spectrum of input from syntactic to semantic depth, and shifts the burden of many linguistic decisions to the statistical post-processor. The generation algorithm is compositional, making it efficient, yet it also handles non-compositional aspects of language. Nitrogen's design makes it robust and scalable, operating with lexicons and knowledge bases of one hundred thousand entities.

Knight, K. and Hatzivassiloglou, V. 1995. Two-Level, Many-Paths Generation. Proceedings of the ACL-95. Cambridge. PostScript