MT EVALUATION WORKSHOP: Hands-On Evaluation
Motivation: Arguments about the evaluation of machine translation (MT) are even more plentiful than systems to accomplish MT. In an effort to drive the evaluation process to the next level and to unify past work, this workshop is going to focus on a challenge and a framework in which the challenge could fit. The challenge is Hands-On Evaluation. The framework is being developed by the ISLE MT Evaluation effort. Both will be discussed throughout the course of the workshop.
Focus of the first part of the workshop will be on real-world evaluation, encouraging both developers and users. In an effort to facilitate a common ground for discussions, if they desire, participants may be given a) sample online data to be translated; b) a minimal task to accomplish with this data; c) currently existing tools for processing this data. With these three items, participants are expected to address issues in the evaluation of machine translation. A domain of particular interest is evaluation for Arabic data; Arabic tools, and filtering and text mining applications although participants are required only to evaluate using real-world data and actual systems. This common framework will give insights into the evaluation process and useful metrics for driving the development process. Additionally, we hope that the common challenge will motivate interesting discussion. As part of this, we are expecting to set up a web page to host previous work in evaluation. The URL will be released when the page is prepared.
The second part of the workshop will concentrate on the ISLE MT Evaluation effort, funded by NSF and the EU, to create a general framework of characteristics in terms of which MT evaluations, past and future, can be described and classified. The framework, whose antecedents are the JEIDA and EAGLES reports, consists of two taxonomies of increasingly specific features, with associated measures and pointers to systems. The discussion will review the current state of the classification effort, critique the framework in light of the hands-on evaluation performed earlier in the workshop, and suggest additional systems and measures to be considered.
Questions and Issues: The questions and issues to be answered are diverse. The constants are the situation: the available systems; the available data; the sample tasks. This will, hopefully, eliminate some of the variables of evaluation. All papers on evaluation and evaluation issues will be considered, but preference will be given to papers following the framework. The following questions suggest possible evaluation threads within the framework.
While we encourage papers on Arabic MT evaluation, we will consider papers on related issues such as real-world data evaluation or that which is related to the ISLE work.
This work will be considered for a special journal issue of MT journal.
Intention to participate: 28th July 2000
Release of data / software: 9th August 2000
Draft submission: 1st September 2000
Notification of acceptance: 15th September 2000
Final Papers Due: 29th September 2000
Workshop: 10th October 2000
Contact Points & Organizers:
Florence Reeder (MITRE Corporation) firstname.lastname@example.org
Eduard Hovy (ISI/USC) email@example.com
Cost of the Workshop: $60.00
Main conference site:http://www.isi.edu/natural-language/conferences/amta2000
Michelle Vanni (Georgetown University / Department of Defense)
Keith Miller (MITRE Corporation)
Jack Benoit (MITRE Corporation)
Maghi King (ISSCO, University of Geneva)
Carol Van Ess-Dykema (Department of Defense)
John White (Litton/PRC)