Pre-Conference Tutorials


Ontological Semantics

Presenter: Sergei Nirenburg, Computing Research Laboratory, New Mexico State University

In computational linguistics the term ontology has come to denote a world model used for specifying the meaning of lexical units in a language. Elements of the ontology, thus, can be viewed as the lexis of a metalanguage for describing the lexical semantics of a particular language. Once the ontological approach is chosen for describing lexical meaning, the lexicon and the ontology become coupled. Depending on the type of computational linguistic application that a lexicon is supposed to support, the ontology that underlies its semantic component will contain different (though possibly compatible) information. Among the possible applications are: knowledge-based machine translation (MT); lexical disambiguation as a module in transfer-based MT or in an information extraction (IE) system; text summarization; human-computer interaction; planning and plan recognition for a society of software and human agents; object and scene recognition; and others. To illustrate the application-oriented differences in ontology content, the work on agents requires detailed statements about "workflow scripts" that these agents follow as well as domain-related plans, both realizable as complex events, while the work on knowledge-based MT typically does not. Lexical disambiguation is often considered feasible without ontological underpinnings in the lexicon but based on a set of semantic features assigned to a lexical item (if not based entirely on corpus-based co-occurrence calculations).

The application on which we will concentrate will be knowledge-based MT. In the framework of knowledge-based MT, ontology supplies major chunks of the metalanguage not only for the semantic component of the lexicon but also for the language in which the meaning of texts is represented. The latter language (called TMR, for Text Meaning Representation) is the interlingua in the KBMT system.

The tutorial will include the following topics:

  1. Design of an MT system based on ontological semantics
  2. The Static Knowledge Sources for KBMT: the TMR, the Ontology and the Lexicon
  3. The TMR
    a) The TMR content
    b) The TMR format
  4. The Ontology
    a) The syntax of the ontology entry
    d) The content of the ontology
    e) A brief comparison with other ontologies used for language processing, notably, CYC, WordNet and Sensus.
  5. Ontology Acquisition
    a) The acquisition methodology
    b) Examples of concept acquisition
  6. The Lexicon
    a) The analysis lexicon
    b) The generation lexicon
    c) The onomasticon
  7. Lexicon Acquisition
    a) The acquisition methodology
    b) Examples of lexicon entry acquisition
  8. Interaction among the TMR, the Ontology and the Lexicon in Mikrokosmos
  9. Ontological support for applications other than MT (IE, summarization, agents).

The tutorial is intended for computational lexicographers, designers and implementers of NLP systems, including MT, IE, IR, and text summarizers.


A Gentle Introduction to MT: Theory and Current Practice

Presenter: Eduard Hovy, Information Sciences Institute of the University of Southern California

This tutorial provides a nontechnical introduction to machine translation. It reviews the whole scope of MT, outlining briefly its history and the major application areas today, and describing the various kinds of MT techniques that have been invented---from direct replacement through transfer to the holy grail of interlinguas. It briefly outlines the difficult questions of MT evaluation and provides an introduction to the newest statistics-based techniques (which are the topic of another tutorial).

Topics include:

Eduard Hovy is the director of the Natural Language Group at the Information Sciences Institute of the University of Southern California, and is a member of the Computer Science Departments of USC and of the University of Waterloo. His research focuses on machine translation, automated text summarization, automated question answering, multilingual information retrieval, and the semi-automated construction of large lexicons and terminology banks. He is the author or editor of four books and over 100 technical articles. Currently Dr. Hovy serves as the President of the Association of Machine Translation in the Americas (AMTA) and as Vice President of the ACL and as President-Elect of the International Association of Machine Translation (IAMT). Dr. Hovy regularly co-teaches a course in the new Master's Degree Program in Computational Linguistics at the University of Southern California, as well as occasional short courses on MT and other topics at universities and conferences.


Controlled Languages

Presenters: Teruko Mitamura & Eric Nyberg, Carnegie Mellon University

The notion of Controlled Language (CL) is becoming increasingly important for both authors and translators working a large-scale document production environment. Good design, process and implementation of a Controlled Language can provide higher-quality documentation and more productive translation. Even so, there are some issues associated with introducing Controlled Language into document production environment which must be considered carefully. The goal of this tutorial is to introduce the concept of Controlled Language, to discuss design and deployment issues, and to summarize the state of the art in CL development.

Intended audience: MT users, Authors, Translators, anyone who would be interested in learning about CL.


Statistical Machine Translation

Presenter: Kevin Knight, Information Sciences Institute of the University of Southern California

The statistical approach to machine translation (MT) seeks to extract translation knowledge automatically from online bilingual texts (e.g., publications of the Canadian or Hong Kong governments). This idea can be traced back to suggestions made by Warren Weaver in the 1940s. It was pioneered at IBM in the 1990s and continues to be inspired by relative successes in statistical speech recognition. We will present an accessible but technical tutorial that will cover the statistical MT literature to date. We will use graphical influence diagrams to explain statistical translation models used in different research projects around the world. We will also cover language models and "decoding" algorithms that perform online translations.

Outline:


The Diversity and Distribution of Languages

Presenter: Laurie Gerber, Information Sciences Institute of the University of Southern California

Funding agencies and the market are placing greater emphasis on less common languages. Rapid response and short development times are crucial as economic or political events bring diverse regions and their languages to the front of the international stage. However, most MT development groups have worked on a relatively small set of languages - namely Indo-European. Even where other languages are addressed, the frameworks and architectures within which such development takes place were only designed to cover this relatively homogeneous group. Can extensions to existing paradigms cover the full diversity of the world's estimated 6,000 languages? Is it possible to build a single architecture that can handle the full range of diversity? How weird does it get? And are there any regularities that can be exploited in tackling the great diversity we face?

Outline:


MTranslatability

Presenter: Arendse Bernth and Claudia Gdaniec

Current MT systems are often unable to produce high-quality output on arbitrary, unseen input. The output frequently does not meet user needs and requirements. We will address some of the reasons for the unsatisfactory quality of MT output, ways to improve translatability, and ways to measure the translatability of a document.

Intended audience: MT users and consultants, people in charge of information development.

Presenters: Arendse Bernth & Claudia Gdaniec, IBM T.J. Watson Research Center. The presenters have worked in the MT field for many years. Both have also worked on MT-related tools -- for pre-editing, and for automatically estimating the quality of MT output.

Outline