Single-Sentence Natural Language Generation: Penman

Objective

The objective of the Penman system is to function as a useful and theoretically motivated sentence generator for research groups interested in the nature of language, as well as to provide a text generation system that can be used routinely by computer system developers.

Approach

Penman is a natural language sentence generation program developed at USC/ISI. It provides computational technology for generating English sentences, starting with non-linguistic input specifications. The culmination of a continuous research effort since 1978, Penman embodies one of the most comprehensive computational generators of English sentences in the world. The system has been distributed to more than 100 research institutions over the past seven years.

Penman consists of a number of components. Nigel, the English grammar, is based on the theory of Systemic Functional Linguistics (a theory of language as a means of communicating semantics, interpersonal meaning, and contextual effect [Halliday 85], and used in various other AI applications such as SHRDLU). Nigel is a network of over 700 nodes called systems, each node representing a single minimal grammatical alternation. In order to generate a sentence, Penman traverses the network guided by its inputs and default settings. At each system node, Penman selects a feature until it has assembled enough features to fully specify a sentence. After constructing a syntax tree and choosing words to satisfy the features selected, Penman then generates the sentence.

Penman also contains a number of information resources, including a lexicon of over 90,000 English words (containing word definitions, inflectional forms, etc.) and the Penman Upper Model, a taxonomy of 250 very general abstractions of the objects, processes, and relations in the world, organized to support linguistic processing [Bateman et al. 89]. This taxonomy serves to link the terms in a user's application domain to the terms used within Penman. The Upper Model is being extended to the Middle Model, a taxonomy of approx. 70,000 concepts modeling the world.

By accepting various input notations, ranging from linguistically very sophisticated to application domain-oriented, Penman is designed to be used effectively by people with various degrees of linguistic and computational sophistication. At USC/ISI, Penman is currently being used primarily as the output generator of the Spangloss Machine Translation system. Penman is written in Common Lisp and currently operates on Sun SPARCStations, Sun 4s, and Macintosh-II computers.

Sister Projects, Collaborators, and Previous Project Members

In order to promote increased development of various computational aspects of Systemic Linguistics, Penman researchers have formed a multinational collaboration, in which work is shared among the partners through periodic updates and sharing of researchers. The partners are:

Penman is the result of work by many people. Important contributions were made by Drs. Christian Matthiessen and William Mann (the two principal architects of the system), Drs. John Bateman, Robert Kasper, Cécile Paris, Peter Fries, Michael Halliday, Norman Sondheimer, Susanna Cumming, Cecilia Ford, William Swartout, Ms. Lynn Poulton, Messrs. Robert Albano and Mick O'Donnell, and by the current Penman staff. In addition, for many years the Penman project has benefitted from the work of visitors too numerous to list.

Project Members

Eduard Hovy, project leader and Principal Investigator
Richard Whitney, programmer
Kevin Knight, research scientist

Recent Publications

The Penman Primer, User Guide, and Reference Manual. 1988.
Unpublished USC/ISI documentation.
The Penman Primer introduces a new, computationally and linguistically naive user to the Penman system. The User Guide provides more detail, describing Penman's linguistic resources. The Reference Manual is a compendium of system and input parameters and their possible values.

Considerably more linguistic/grammatical detail is provided in the grammar manual Nigel's Lexicogrammatical Cartography.

Matthiessen, C.M.I.M. and J.A. Bateman. 1991.
Text Generation and Systemic-Functional Linguistics. London: Pinter.
This book described automated sentence generation from the perspective of Systemic Linguistics, taking Penman as the paradigm case.
Matthiessen, C.M.I.M. 1984.
Systemic Grammar in Computation: The Nigel Case. Proceedings of the 1st Conference of the European Association for Computational Linguistics EACL-84, Pisa. Also available as USC/ISI Research Report RR-84-121, 1984.
[***insert abstract here***]
Mann, W.C. 1982.
The Anatomy of a Systemic Choice. USC/ISI Technical Report RR-82-104.
[***insert abstract here***]
Mann, W.C. and C.M.I.M. Matthiessen. 1985.
Nigel: A Systemic Grammar for Text Generation. In R. Benson and J. Greaves (eds), Systemic Perspectives on Discourse: Selected Papers Papers from the 9th International Systemics Workshop. London: Ablex. Also available as USC/ISI Research Report RR-83-105.
[***insert abstract here***]
Bateman, J.A., R.T. Kasper, J.D. Moore, and R.A. Whitney. 1989.
A General Organization of Knowledge for Natural Language Processing: The Penman Upper Model. Unpublished research report, USC/Information Sciences Institute, Marina del Rey.
[***insert abstract here***]
Bateman, J.A. and C.L. Paris. 1989.
Phrasing a Text in Terms the User can Understand. Proceedings of the International Joint Conference on Artificial Intelligence IJCAI-89. Detroit, MI.
[***insert abstract here***]
Kasper, R.T. 1988.
Systemic Grammar and Functional Unification Grammar. In J. Benson and W. Greaves (eds), Systemic Functional Approaches to Discourse. Norwood: Ablex.
[***insert abstract here***]

Go to Natural Language Project home page.