Knowledge-Based Interlingual Machine Translation of Text from Spanish to English.
Develop new techniques in high-quality knowledge-based interlingual machine translation.
The Spangloss project is a collaboration among three sites (the Center for Machine Translation at Carnegie Mellon University, the Computing Research Laboratory at New Mexico State University, and USC/ISI) and is devoted to enhancing the state of the art in knowledge-based machine translation using a language-neutral interlingua.
Spangloss is a human-assisted MT system with the following features:
The project started in late 1991 under funding from the U.S. Advanced Research Projects Agency (ARPA) and the Department of Defense.
Since MT systems, whatever translation method they employ, do not reach an optimum output on free text; each method handles some problems better than others. The Pangloss Mark III system is an MT environment that uses the best results from a variety of independent MT systems or engines working simultaneously within a single framework on the same text. This paper describes the method used to combine the outputs of the engines into a single text.
This document describes the multi-engine Spanish-to-English MT system Pangloss. Originally, Pangloss was supposed to be a pure knowledge-based machine translation (KBMT) system implemented in a version of the interlingua architecture. The project, however, evolved toward a more eclectic approach, mostly due to the necessity to perform well during periodic external evaluations whose timing and frequency was established after the project started. The report contains a description of each of the three translation engines of Pangloss as well as a detailed worked example of the processing of a single sentence by each of them.
The ARPA Machine Translation program in the U.S.A. included three MT systems, one purely statistical, one purely symbolic/linguistic, and one a mixture. Three years later, all three systems are hybrids, although their initial approaches still predominate. This paper asks why such hybridization took place, and argues from inspection of the systems' development paths that some portions of the MT process are best addressed by statistical means and others by symbolic means. In consequence, it is likely that all future non-toy MT systems will be hybrids, but that hybridization will differ depending on the task the MT system is to perform.
We address the problem of constructing in a principled way an ontology of terms to be used in an interlingua for machine translation. Given our belief that the a true language-neutral ontology of terms can only be approached asymptotically, the construction method outlined involves a stepwise folding in of one language at a time. This is effected in three steps: first building for each language a taxonomy of the linguistic generalizations required to analyze and generate that language, then organizing the domain entities in terms of that taxonomy, and finally merging the result with the existing interlingua ontology in a well-defined way. This methodology is based not on intuitive grounds about what is and is not `true' about the world, which is a question of language-independence, but instead on practical concerns, namely what information the analysis and generation programs require in order to perform their tasks, a question of language-neutrality. After each merging is complete, the resulting taxonomy contains, declaratively and explicitly represented, those distinctions required to control the analysis and generation of the linguistic phenomena. The paper is based on current work of the Pangloss MT project.
Ideally, we might hope to improve the performance of our MT systems by improving the system, but it might even be more important to improve performance by looking for a more appropriate application. A survey of the literature on evaluation of MT systems seems to suggest that the success of the evaluation often depends very strongly on the selection of an appropriate application. If the application is well-chosen, then it often becomes fairly clear how the system should be evaluated. Moreover, the evaluation is likely to make the system look good. Conversely, if the application is not clearly identified (or worse, if the application is poorly chosen), then it is often very difficult to find a satisfying evaluation paradigm. We begin our discussion with a brief review of some evaluation metrics that have been tried in the past and conclude that it is difficult to identify a satisfying evaluation paradigm that will make sense over all possible applications. It is probably wise to identify the application first, and then we will be in a much better position to address evaluation questions. The discussion then turns to the main point, an essay on how to pick a good niche application for state-of-the-art (crummy) machine translation.
An introductory overview of the basic principles and paradigms of Machine Translation.