FEMTI - A Framework for the Evaluation of Machine Translation in ISLE

A Bibliography of Machine Translation Evaluation

This bibliography has been compiled by Florence Reeder (MITRE Corporation) and updated by Sandrine Zufferey and Andrei Popescu-Belis (University of Geneva). The references quoted explicitely in the FEMTI taxonomies are highlighted. The list may may contain incomplete or duplicate references, therefore we are grateful for any additions or comments.

Ackerman, A., Fowler, P. & Ebenau, R. 1984. Software inspection and the industrial production of software, Software Validation. Proceedings of the Symposium on Software Validation pp. 13-40.

Advanced Research Projects Agency (ARPA). 1993. Human Language Technology: Proceedings of a Workshop. Morgan-Kaufmann

Ahlswede, T., & Lorand, D. 1993. Word Sense Disambiguation by Human Subjects: Computational and Psycholinguistic Applications. In Boguraev, B., & Pustejovsky, J. (eds.), Proceedings of a Workshop Sponsored by the SIGLEX of the ACL. Ohio State University.

Ahlswede, T. & Evans, M. 1988. Parsing versus Text Processing in the Analysis of Dictionary Definitions. In Proceedings of the 26th ACL.

Ahmad, K., Holmes-Higgin, P., Rogers, M., Hvge, M., Le-Hong, K., Huwig, C., Kese, R. & Mayer, R. 1993. User-driven software development: Translator's workbench - an exemplar case study., in M. Smith and G. Salvendy (eds.), Proceedings of the fifth International Conference on Human-Computer Interaction, (HCI International '93), Orlando, Florida, August 8 - 13, Vol. 1, pp. 319-324.

Ahrenberg, L., & Merkel, M. 2000. Correspondence Measures for MT Evaluation In Maegaard, B., ed., Proceedings of the Workshop on Machine Translation Evaluation at LREC-2000. Athens, Greece.

Albisser, D. 1993. Evaluation of MT Systems at Union Bank of Switzerland. Machine Translation 8 (1/2), 25-28.

* Alshawi, H., Srinivas, B. and Douglas, S, 2000. Learning Dependency Translation Models as Collections of Finite State Head Transducers, Computational Linguistics, vol. 26.

* ALPAC. 1966. Language and machines: Computers in Translation and Linguistics. A Report by the Automatic Language Processing Advisory Committee. Division of Behavioral Sciences, National Academy of Sciences, National Research Council, Washington, D.C.

* AMTA. 1992. MT Evaluation: Basis for Future Directions. Proceedings of workshop held in San Diego. Available from the Association for Machine Translation in the Americas (AMTA), Washington, DC.

Anderson, J. 1977. On Case Grammar. Atlantic Highlands, NJU: Humanities Press.

Arnold, A., Sadler, L., & Humphreys, R. 1993. Evaluation: an Assessment. Machine Translation 8 (1/2), 1-24.

* Arnold, D., Balkan, L., Meijer, S., Humphreys, R., & Sadler, L. 1994. Machine Translation: An Introductory Guide. Manchester, UK: NEC Blackwell. http://clwww.essex.ac.uk/~doug/book/book.html.

Arnold, D., Moffat, D., Sadler, L. & Way, A. 1993. Automatic Test Suite Generation. Machine Translation, 8(1,2):29-38.

* Arnold D., Humphreys R:L: & Sadler L. (eds). 1993. Special Issue on Evaluation of MT Systems. Machine Translation vol. 8, Nos. 1-2, 1993.

.

Arnold, D.J. 1990. Text typology and Machine Translation: An overview. In Pamela Mayorcas, ed., Translating and the Computer 10, pp. 73-89. Aslib, London.

Athappily, K. & Galbreath, R. 1986. Practical methodology simplifies DSS software evaluation process, Data Management 24(2): 10-28.

Avellis, G. & Laviosa, S. 2000. A COMIC Multimedia Resource for Translating into and out of Business Italian. CULT2K.

* Babych, B. and Hartley, A. 2003. Improving Machine Translation Quality with Automatic Named Entity Recognition. Proceedings of EAMT/EACL Workshop on MT, Budapest.

* Balkan, L. 1991. Quality Criteria for MT. Proceedings of the Evaluators' Forum, April 21-24, 1991, Les Rasses, Vaud, Switzerland.

Balkan, L. 1994. Test Suites: Some issues on their use and design. Machine Translation Ten Years On, Conference at the University of Cranfield. 26-1

Balkan, L., Jaeschke, M., Humphreys, L., Meijer, S., & Way, A. 1991. Declarative evaluation of an MT system: Practical experiences. Applied Computer Translation 1(3). 49-59.

Balkan, L., Netter, K., Arnold, D. & Meijer, S. 1994. TSNLP - test suites for natural language processing, Proceedings of the Language Engineering Convention, ELSNET, Paris, pp. 17-22.

Ballesteros, L. and W. B. Croft. 1998. Statistical Methods for Cross-Language Information Retrieval. In G. Grefenstette, (ed.), Cross-Language Information Retrieval (23-40. Boston: Kluwer.

Bangalore, S., & Riccardi, G. 2000. Stochastic Finite-State models for Spoken Language Machine Translation. In Van Ess-Dykema, C., Voss, C., & Reeder, F., eds. Proceedings of the Workshop of Embedded Machine Translation Systems, ANLP/NAACL-2000. Association for Computational Linguistics, Seattle, Washington.

Barett, J., et al. Capturing Language-Specific Semantic Distinctions in Interlingua-Based MT

Bar-Hillel, Y. 1951. The state of translation in 1951. American Documentation, 2:229-237. Reprinted in Bar-Hillel, Y. 1964. Language and Information. Addison-Wesley, Reading, Mass

Barlow, M. 2000. Designing a Parallel Concordancer. In CULT2K.

Barton, G., Berwick, R., Ristad, E. 1987. Computational Complexity and Natural Language. MIT Press, Cambridge, MA.

Bates M. & Ralph W. 1987. Evaluating Natural Language Interfaces Presented as a Tutorial at the 25th Annual Meeting of the Association for Computational Linguistics, July 6, 1987, Stanford University BBN Laboratories Inc.

Bates M. 1988. Draft Corpus for testing Natural Language DB query interfaces Distributed at the workshop on evaluation of natural language processing systems. Wayne, Philadelphia December 8-9, 1988.

Bates, M. 1988. Reports on Evaluations of Natural Language Systems, Talk presented at the workshop on Evaluation of Natural Language Processing Systems. Wayne, Philadelphia December 8-9, 1988.

Beaven, J. & Whitelock, P. 1988. Machine Translation Using Isomorphic UCGs. Vargha. 32- 35.

Bech, A. 1997. MT From an Everyday User's Point of View. In MT Summit. Pp. 98-105

Beerepoot-Sangen, Y. &Leentvaar-Leistra, G. 1991. Consument en produktkwaliteit. Kluwer, Deventer.

Belonogov, G. G., Kuznetsov, B. A. & Krichevskij, V.K. 1986. Evaluer l'efficaciti d'un systhme de recherche documentaire a l'indexation automatique, (Evaluating the efficiency of an information retrieval system with automated indexing) Naucno-tehniceskaja informacija- Vsesojuznyj institut naucnoj i tehniceskoj informacii. Serija 2. Informacionnye processy i sistemy, ISSN 0548-0027, SUN, No. 8, pp. 6-13, CNRS-10522B.

Bennett, W.S. 1990. How Much Semantics is Necessary for MT Systems? Proceedings of the Third International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Linguistic Research Center, University of Texas, Austin, TX. 261-270.

Bevan N. & Curson I. 1997. Methods of Measuring Usability. Proceedings of the sixth IFIP conference on human-computer interaction, Sydney, Australia, July 1997.

Bevan N. 1997. Quality in Use; Incorporating Human Factors into the software engineering lifecycle. Proceedings of the Third International Symposium and Forum on Software Engineering Standards, ISESS'97 conference, August 1997.

Bevan, N. 1980. Human Factors in the Use of EURODICAUTOM and SYSTRAN. Second Report to the Commission of the European Communities CETIL/199/80. Luxembourg May,1980.

Billmeier, R, 1982. Zu den linguistischen Grundlagen von SYSTRAN. Multilingua 5(4):83-96, Mouton Publishers.

Boggio. G, & Spachis-Papazois, E. (eds.) 1984. Evaluation of Research and Development. Methodologies for R&D Evaluation in the European Community Member States, the United States of America and Japan. Proceedings of the Seminar held in Brussels, Belgium, October 17-18, 1983. D. Reidel Publishing Company, Dordrecht.

Boguraev, B. & Briscoe, T., eds. 1989. Computational Lexicography for Natural Language Processing. Longman.

Boisen, S. & Bates, M. 1992. A practical methodology for the evaluation of spoken language systems, Proceedings of the Third Conference on Applied Natural Language Processing, Trento, pp. 162-169.

Bourbeau, L. 1990. Ilaboration et mise au point d'une mithodologie d'ivaluation linguistique de systemes de traduction assistie par ordinateur (Rapport final). Secritariat d'Itat du Canada, Secteur Langues Officielles et Traduction, Direction de la Planification, Gestion et Technologie, Quibec.

Bowker, L. 1998. Using Specialized Monolingual Native-Language Corpora as a Translation Resource: A Pilot Study. In Laviosa, S., ed., L'Approche Basie sur le Corpus / The Corpus-based Approach. Special Issue of META 43(4): 631-651.

Bowker, L., & Bennison, P. 2000. Translation Archive and Student Translation Tracking System: Design, Development and Application. In CULT2K.

Box, J. 1979. Konsument en informatie - de rol van vergelijkend warenonderzoek, Thesis, Delftse Universitaire Pers., Delft.

Bradford, J. 1982. A metric space defined on English and its relation to error correction. Proceedings of COLING-82, pp. 43-48.

Breck, E., Burger, J., Ferro, L., Hirschman, L., House, D., Light, M., & Mani, I. 1999. How to Evaluate Your Question Answering System Every Day - and Still Get Real Work Done. FIND REST OF REFERENCE

Brislin , R. 1976. Translation: Applications and Research. New York: Gardner Press, Inc

Brown, P., Cocke, J., Della-Pietra, S., Della-Pietra, V., Jelinek, F., Mercer, R., & Roossin, P. 1988. A Statistical Approach to French/English Translation. Proceedings of the Second International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages. Carnegie-Mellon University, Pittsburgh, PA.

Bruderer, H. E. 1978. Handbuch der maschinellen und maschinenunterst|tzten Sprach|bersetzung. Verlag Dokumentation Saur KG, M|nchen.

Budiansky, S. 1998. Lost in Translation. The Atlantic Monthly, 282(6). Also available at http://www.theatlantic.com/issues/98dec/computer.htm

Buchmann, B. 1987. Early history of Machine Translation. In M. King, ed., Machine Translation Today: The State of the Art, Proceedings of the Third Lugano Tutorial 1984. Edinburgh University Press

Buchmann, B. & Warwick, S. 1985. Machine Translation. Pre-ALPAC History. Post-ALPAC Overview, ISSCO Working Papers Number 50, Fondazione Dalle Molle, Geneva.

Bukowski, J. 1987. Evaluating software test results: A new approach, Proceedings Annual Reliability and Maintainability Symposium, Philadelphia, USA, 27 -29. Jan, pp. 369-375.

Buschbeck-Wolf, B. & Dorna, M. 1998. Quality and Robustness in MT - A Balancing Act. AMTA-98

Calzolari, N., & Picchi, E. 1988. Acquisition of semantic information from an on-line dictionary. In COLING 88.

Campbell, D., & Stanley, J. 1963. Experimental and Quasi-Experimental Designs for Research. Chicago: Rand McNally.

Carbonell, J. 1988. Moving Beyond Vodka and Rotten Meat. CMU Magazine. Spring, 1988, 14-19.

Carbonell, J. & Tomita, M. 1987. Knowledge-based Machine Translation, the CMU Approach. Machine Translation: Theoretical and Methodological Issues, Sergei Nirenburg (ed.), Cambridge University Press, 68-89.

Carbonell, J., Cullingford, R., & Gershman, A. 1981. Steps Towards Knowledge-Based Machine Translation. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-3 (4.

Carbonell, J. 1979. Towards a self-extending parser. In Proceedings of the 17th Annual Meeting of the ACL. Zernik, U. & Jacobs, P. 1988. Acquiring lexical knowledge from text: a case study. In Proceedings of the AAAI, 1988.

Card, S., Moran, T. & Newell, A. 1983. The psychology of human-computer interaction. Lawrence Erlbaum Associates, Hillsdale, NJ.

* Carlson L., Marcu D., and Okurowski M.E. 2001. Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory. Proceedings of the 2nd SIGDIAL Workshop on Discourse and Dialogue, Eurospeech 2001, Denmark.

Carroll, B. & Hall, P. 1985. Make Your Own Language Tests: A Practical Guide to Writing Language Performance Tests. Oxford, Permagon.

Carroll, J. 1966. An experiment in evaluating the quality of translations. In Pierce, J. Language and Machines: Computers in Translation and Linguistics. Report by the Automatic Language Processing Advisory Committee (ALPAC). Publication 1416. National Academy of Sciences National Research Council. O Cary, R. & Sproles, G. 1978. Evaluating product testing methods: A theoretical framework, Home Economics Research Journal, 7: 66-75.

Carroll, G. & Rooth, M. 1998. Valence Induction with a Head-Lexicalized PCFG. Paper presented at Third Conference on Empirical Methods in Natural Language Processing. Granada, Spain.

Caspari, G. 1987. Untersuchungen zu Bewertungskriterien für maschinell erstellte |bersetzungen. Unvervffentliche Diplomarbeit. Universitdt des Saarlandes.

Cavar, D., Küssner, U., & Tidhar, D. 2000. From Human Evaluation to Automatic Selection of Good Translations. In Maegaard, B., ed., Proceedings of the Workshop on Machine Translation Evaluation at LREC-2000. Athens, Greece.

CETIL, 1979. Comments by Mr. Leamy on the B.M.v.D. evaluation report on Systran French- English, CETIL/159/79, Luxembourg.

CETIL, 1979. Systran Evaluation and Comparison. Summary Report of Revisers' Comments on Machine Produced Translations. Working Document for the CETIL meeting 26 and 27 March 1979 CETIL/139/79, Luxembourg.

CETIL, 1979. The Development Potential of Systran in the European Commission CEC Contract TH-17, Cambridge Research Unit, CETIL 153/79, Luxembourg.

Chandler, R. 1989. Grammar problems?, Electric Word, Sept-Oct 1989.

Chang, J., Chen, M., Ker, S. 1998. Taxonomy and Semantic Processing from the Perspective of Machine Readable Dictionaries. AMTA-98

Chen, H-H., Huang, S-J., Ding, Y-W., Tsai, S-C. 1998. Proper Name Translation in Cross- Language Information Retrieval. ACL/COLING-98.

Chinchor, N. 1991. MUC-3 evaluations metrics, Proceedings of the Third Message Understanding Conference (MUC-3), Morgan Kaufmann, San Mateo, CA, pp. 17-24.

Choi, S-K., Jung, H-M., Sim, C-M., Kim, T., Park, D-I., Park, J-S., Choi, K-S. 1998. Hybrid Approaches to Improvement of Translation Quality in Web-based English-Korean Machine Translation. ACL/COLING-98

Church, K.W. and E.H. Hovy. 1993. Good Applications for Crummy MT. Machine Translation 8 (pp 239-258).

Coates-Stephens, S. 1991. Automatic Lexical Acquisition Using Within-Text Descriptions of Proper Nouns. In Proceedings of the 7th Annual Conference of the UW Centre for the NEW OED and Text Research Using Corpora. St. Catherine's College.

Coates-Stephens, S. 1990. Expectation Based Word Learning, Technical Report TCU/CS/1990/7. City University Department of Computer Science.

Collier, N., Kirakawa, H., Kumano, A. 1998. Machine Translation vs. Dictionary Term Translation - a comparison for English-Japanese News Article Alignment. ACL/COLING-98.

Collins, M.J. 1996. A New Statistical Parser Based on Bigram Lexical Dependencies. Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL). Santa Cruz, CA (184-191.

Collins, M.J. 1997. Three Generative, Lexicalised Models for Statistical Parsing. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL). Madrid, Spain (16-23.

Commission of the European Communities, 1986. Communication to the Council Concerning a Community Plan of Action Relating to the Evaluation of Community Research and Development Activities for the Years 1987 to 1991. Com(86) final, Brussels, 20 November 1986.

* Connell J. & Shaffer R. 1995. Object-Oriented Rapid Prototyping, Prentice-Hall, Englewood Cliffs, NJ.

Connor-Linton, J. 1995. Cross-cultural comparison of writing standards: American ESL and Japanese EFL. World Englishes, 14.1:99-115. Oxford: Basil

Copeland,, C., Durand, J., Krauwer, S. & Maergaard, B. 1991. Studies in Machine Translation and Natural Language Processing, vols. 1 & 2. Office for Official Publications of the European Community, Luxembourg.

Crellin, J., Horn, T. and Preece, J. 1990. Evaluating evaluation: A case study of the use of novel and conventional evaluation techniques in a small company, in D. Diaper, D. Gilmore, G. Cockton and B. Shackel (eds.), Human Computer Interaction - INTERACT '90, Elsevier, Amsterdam, pp. 329-335.

Crook, M. & Bishop, H. 1965. Evaluation of Machine Translation. Final report, Institute for Psychological Research, Tufts University, April 1965.

Cude, B. 1980. An objective method of determining the relevancy of product characteristics. ACCI-Proceedings 1980, pp. 111-116.

Cuthbert, J. 1979. Testing for consumers, Proceedings of the First North American Conference of Consumer Product Testing, Ottawa Consumers' Association of Canada, Ottawa, pp. 9-21.

Dagan, I., & Church, K. 1997. Termight: Coordinating Humans and Machines in Bilingual Terminology Acquisition. Machine Translation. 12:89-107.

Dahl, D. A., Hirschman, L. & Ball, C. N., 1988. Black Box Evaluation of PUNDIT. Talk presented at the workshop on Evaluation of Natural Language Processing Systems. Wayne, Philadelphia December 8-9, 1988.

Damerau, F. 1980. The transformational question answering system: Description, operating experience and implications, Report RC8287, IBM Thomas J. Watson Research Center, Yorktown Heights, NY.

Danielsson, P. & M|hlenbock, K. 1998. When Stelhandske Becomes Steelglove. A Corpus Based Study of Names in Parallel Texts. In Proceedings of AMTA-98.

Deutsch, M. 1982. Software Verification and Validation, Englewood Cliffs, NJ 07632.

Dorr, B. 1987. UNITRAN: A Principle-Based Approach to Machine Translation. AI-Technical Report 1000, Massachusetts Institute of Technology, Cambridge, MA

* Dorr, B. 1990. A Cross-Linguistic Approach to Machine Translation. Proceedings of the Third International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Linguistic Research Center, University of Texas, Austin, TX. 13-32.

* Dorr, B. 1990. Lexical Conceptual Structure and Machine Translation. Ph.D. Thesis. Department of Electrical and Computer Science, MIT.

Dostert, B. 1973. User's Evaluation of Machine Translation, Georgetown MT System, 1963- 1973. Rome Air Development Center Report AD-768-451. Texas A&M University

Dostert, B., McDonald, R., & Zarechnak, M. 1979. Machine Translation. Trends in Linguistics: Studies and Monographs, 11. The Hague: Mouton.

Doyon, J., Taylor, K. & White, J. 1999. Task-based Evaluation for Machine Translation. MT- Summit 7.

Doyon, J., Taylor, K., & White, J. 1998. The DARPA Machine Translation Evaluation Methodology: Past and Present. AMTA-98. Philadelphia, PA.

Duff, A. 1981. The Third Language Recurrent Problems of Translation into English. First edition. New York: Pergamon press.

EAGLES MT Evaluation Working Group. 1998. EAGLES Evaluation of Natural Language Processing Systems, Final Report. EAGLES Document ***, ISBN ***. Center for Sprogteknologi, Copenhagen.

*The EAGLES MT Evaluation Working Group. 1996. EAGLES Evaluation of Natural Language Processing Systems. Final Report. EAGLES Document EAG-EWG-PR.2, ISBN 87-90708-00-8. Center for Sprogteknologi, Copenhagen.

Emele, M. & Dorna, M. 1998. Ambiguity Preserving Machine Translation Using Packed Representations. ACL/COLING-98.

Ericson, K. A. & Simon, H. A. 1984. Protocol Analysis: verbal reports as data, MIT Press, Boston.

Færch, C., Haastrup, K. & Phillipson, R. 1984. Learner Language and Language Learning, Nordisk Forlag, Copenhagen.

Fagan, M. 1976. Design and code inspection to reduce errors in program development, IBM System Journal 15(3).

Falkedal, K. A Practical Guide to the Evaluation of MT Systems. ISSCO: Interim Report to Suissetra.

Falkedal, K. (ed.) 1994. Proceedings of the evaluators' forum, Les Rasses, ISSCO, University of Geneva, Geneva.

Falkedal, K. 1994. Evaluation methods for machine translation systems: An historical overview and critical account, ISSCO draft report, University of Geneva, Geneva.

Fasella, P. 1984. The Evaluation of the European Community's Research and Development Programs, in G Boggio et al (eds.), Evaluation of Research and Development. Methodologies for R&D Evaluation in the European Community Member States, the United States of America and Japan. Proceedings of the Seminar held in Brussels, Belgium, October 17-18,1983. D. Reidel Publishing Company, Dordrecht, pp. 3-13.

* Flanagan, M. 1994. Error Classification for MT Evaluation. In Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, MD.

Flanagan, M. 1994. Machine Translation Evaluation: A Strategy for Compuserve. Compuserve Technical Report

Flanagan, M., Two Years Online: Experiences, Challenges and Trends. In Expanding MT Horizons: Proceedings of the Second Conference of the Association for Machine Translation in the Americas, (pp. 192-197. Washington, DC: AMTA.

Flank, S., Temin, A., Blejer, H., Kehler, A. & Greenstein, S. 1993. Module-Level Testing for Natural Language Understanding. Machine Translation, 8(1,2):39-48.

Flickinger, D., Nerbonne, J., Sag, I. & Wasow, T. 1987. Toward Evaluation of NLP Systems. Unpublished. Paper presented at Forum for the Association of Computational Linguistics, 6 July 1987, Stanford University.

Frankenberg-Garcia, A. & Santos, D. 2000. Introducing COMPARA, the Portuguese-English parallel translation corpus. In CULT2K. www.portugues.mct.pt/COMPARA/

* Frederking, R., S. Nirenburg, D. Farwell, S. Helmreich, E.H. Hovy, K. Knight, S. Beale, C. Domashnev, D. Attardo, D. Grannes, and R. Brown. 1994. The Pangloss Mark III Machine Translation System. Proceedings of the 1st AMTA Conference. Columbia, MD.

Fuji, M. 1999. Evaluation Experiment for Reading Comprehension of Machine Translation Outputs. In Proceedings of MT Summit VII.

Fuji, M. 1999. Evaluation Experiment for Reading Comprehension of Machine Translation Outputs. Machine Translation Summit VII '99, Singapore, pp. 285-289.

Fulford, H. & Höge, M. 1989. Preliminary study of user requirements - methods of investigation, Internal report of the ESPRIT II project 2315 translator's workbench (TWB), University of Surrey, Stuttgart and Guildford.

Fulford, H., Höge, M. & Ahmad, K. 1990. User requirements study, Final report of the ESPRIT II project 2315 translator's workbench (TWB), EC, Stuttgart and Guildford.

Fundingsland, O. T. 1984. Perspectives on Evaluating Federally Sponsored Research and Development in the United States, in G Boggio et al (eds.), Evaluation of Research and Development. Methodologies for R&D Evaluation in the European Community Member States, the United States of America and Japan. Proceedings of the Seminar held in Brussels, Belgium, October 17-18,1983. D. Reidel Publishing Company, Dordrecht, pp. 105-114.

Fung, P. & McKeown, K. 1997. A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups. Machine Translation. 12:53-7

Fung, P., Kan, M. & Horita, Y. 1996. Extracting Japanese Domain and technical Terms is Relatively Easy. NEMLAP-2. Proceedings of the Second International Conference on New Methods in Language Processing. Bilkent University, Ankara, Turkey.

Gambdck, B., Alshawi, H., Carter, D., & Rayner, M. 1991. Measuring compositionality in transfer-based machine translation systems. In J. G. Neal & S. M. Walter (eds.) Proceedings of the 1991 Natural Language Processing Systems Evaluation Workshop. Rome Laboratory Final Technical Report RL-TR-91-362.

Gaspari, F. 2000. Relevance of parallel corpora to the latest developments of machine translation and computer-assisted translation. In CULT2K.

Geistfield, L. V., Sproles, G. B. & Badenhop, S. B. 1977. The concept and measurement of a hierarchy of product characteristics, Advances in Consumer Research IV: 302-307.

Gerber, L. & Hovy, E. 1998. Improving Translation Quality by Manipulating Sentence Length. AMTA-98.

Gershman, A. 1988. Evaluation of Natural Language Processing Systems. Talk presented at the workshop on Evaluation of Natural Language Processing Systems. Wayne, Philadelphia December 8-9, 1988.

Gervais, A. 1980. Evaluation du systhme-pilote de traduction automatique TAUM-AVIATION. Rapport final, Bureau des traductions, Secritariat d'Itat, Ottawa, Canada.

Gilb, T., & Finzi, S. 1988. Principles of Software Engineering Management. Addison-Wesley Pub. Co., Reading, Mass.

Gorin, A., Riccardi, G., & Wright, J. 1997. How May I Help You? Speech Communication, 23:113-127.

Granger, R. H. 1980. When expectation fails: towards a self-correcting inference system. AAAI-80, pp. 301-305.

Granger, R. H. 1983. The NOMAD system: expectation-based detection and correction of errors during understanding of syntactically and semantically ill-formed text. American Journal of Computational Linguistics, 9(3-4:188-196 .

Granger. 1977. FOUL-UP: A program that figures out meanings of words from context. In Proceedings of IJCAI, 1977.

Grimaila, A., & Chandioux, J. 1992. Made to measure solutions. In J. Newton, ed., Computers in Translation: A Practical Appraisal. 33-45. Routledge, London.

Groenenveld, J. 1984. Simple tests manual, Consumentenbond/IOCU, 's-Gravenhage.

Grosjean, F. & Dommergues, J-Y. 1988. Evaluation du systhme de reconnaissance de parole RDP8-A de Systhmes G. Laboratoire de traitement du langue et de la parole, Universiti de Neuchbtel.

Grosjean, F. 1988. Evaluating Natural Language Processing Products. Laboratoire de traitement du langue et de la parole, Universiti de Neuchbtel.

Gruber, T. 1989. The acquisition of strategic knowledge, Academic Press, San Diego.

Guida, G. & Mauri, G. 1986. Evaluation of Natural Language Processing systems: Issues and approaches. Proceedings of the IEEE, 74(7): 1026-1035.

Guida, G. & Mauri, G. 1984. A Formal Basis for Performance Evaluation of Natural Language Understanding Systems. Computational Linguistics, 10(1):15-30.

Habermann, F. W. A. 1986. Provision and Use of Raw Machine Translation. Terminologie et traduction. Numiro spicial "World Systran Conference" , 1:29-43.

Habermann, F. W. A. 1987. Erfahrungen mit maschinelle Uebersetzungen im Kernforschungszentrum Karlsruhe. Talk presented at the Jahrestagung der Internationalen Vereinigung Sprache und Wirtschaft, 1987.

Hajic, J., Hric, J., Kubon, V. 2000. Machine Translation Between Very Close Languages. NAACL-00

Halliday, T. & Briss, E. 1977. The Evaluation and Systems Analysis of the Systran Machine Translation System. Report RADC-TR-76-399, January, 1977. Rome Air Development Center, Griffiss Air Force base, New York.

Harman, D. (in press). The first text retrieval conference (trec1), National Institute of Standards and Technology special publication 500-207, NIST, Gaithersburg, MD.

* Hartley A. & Rajman M. 2001. Automatically Predicting MT Systems Rankings Compatible with Fluency, Adequacy or Informativeness Scores. Proc. of Workshop on MT Evaluation "Who did what to whom?" at MT Summit VIII, Santiago de Compostela, Spain, pp. 29-34.
http://www.issco.unige.ch/projects/isle/MT-Summit-wsp.html.

Hatim, B., & Mason, I. 1990. Discourse and the Translator. Longman, London.

Hausen, H. & Müllerburg, M. ( 1982. Kombination von verfahren fur die software-prufung, Internationaler Kongress fur Datenverarbeitung und Informationstechnologie (IKD) pp. 111-125.

Hausen, H. 1984. Comments on practical constraints of software validation techniques, Proceedings of symposium on software validation., pp. 323-333.

Hausen, H., Müllerburg, M. & Schmidt, M. 1987. Uber das prufen, messen und bewerten von software. methoden und techniken der analytischen software-qualitatssicherung, Informatik Spektrum 10(3): 123-144.

Hays, D. G. & Mathias, J. (eds) 1976. FBIS Seminar on Machine Translation. Summary proceedings of a Seminar held at Rosslyn, Virginia, on 8-9 March 1976, organized by MRM Inc. for the U.S. Government Foreign Broadcast Information Service. American Journal of Computational Linguistics, Microfiches 46, 51

Hayward, S., Breuker, J. A. & Wielinga, B. J. 1987. The KADS methodology: Analysis and design for knowledge based systems, ESPRIT P1098 Deliverable Y1, STC Technology Ltd., Alborg.

Heaton, J. 1975. Writing English Language Tests. London: Longman.

Heid, U. 1988. Evaluation der franzvsisch-deutschen SYSTRAN-|bersetzung. Vorhabenskizze, IMS, Stuttgart.

Heid, U. 1990. Evaluation und Verbesserung der Sprachrichtung Franzvsisch-Deutsch des Maschinellen \bersetzungssystems SYSTRAN. Bericht des IMS f|r den Zeitraum 1.7.89 - 30.4. 1990. Vorversion.

Helmreich, S. & Farwell, D. 1998. Translation differences and pragmatics-based MT. Machine Translation 13:1, 17-39.

Hendry. D.G. & Green, T. R. G. 1993. Spelling mistakes: how well do correctors perform? in: Adjunct Proceedings of InterCHI'93.

Hermjakob, U. 1999. Machine Learning Based Parsing: A Deterministic Approach Demonstrated for Japanese. Submitted.

Hermjakob, U. & R.J. Mooney. 1997. Learning parse and Translation Decisions from Examples with Rich Context. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL). Madrid, Spain (482-489.

Heywood, J. 1989. Assessment in Higher Education (Second Edition). Chichester: John Wiley.

Hildenbrand, E. & Heid, U. 1990. Ansdtze zur Ermittlung der linguistischen Leistungsfdhigkeit von maschinellen |bersetzungssystemen. Zur Entwicklung von Franzvsisch-Deutschem Testmaterial f|r SYSTRAN. Talk presented at Linguistisches Kolloquium, Paderborn, September 1990.

Hindle, D. 1993. Parsing a Probabilistic Dependency Grammar. In Probabilistic Approaches to Natural Language: Papers from the 1992 Fall Symposium: October 23-25, Cambridge Massachusetts. Menlo Park, California: AAAI Press.

Hirschman, L. 1998. Language understanding evaluations: lessons learned from MUC and ATIS. In LREC-98, 117-122.

Hirschman, L. 1986. Discovering sublanguage structures. In R. Grishman & R. Kitteredge, eds., Analyzing Language in Restricted Domains: Sublanguage Description and Processing. Lawrence Erlbaum Associates, New Jersey: Hillsdale.

Hobbs, J. R. 1998. A Canonical Corpus. Talk presented at the workshop on Evaluation of Natural Language Processing Systems, Wayne, Philadelphia December 8-9, 1988.

Hofmann, U. & Heino, H. 1992. Maschinelles |bersetzen -- vorteile und grenzen, TEKOM Nachrichten der Gesellschaft f|r technische Kommunikation .

Hogan, C. & Frederking, R. 1998. An Evaluation of the Multi-Engine MT Architecture. AMTA- 98

Höge, M. & Kroupa, E. 1991. Towards the design of a translator's workstation - organisational background and user implications, in H.-J. Bullinger (ed.), Human Aspects in Computing: Design and Use of Interactive Systems and Information Management, 18B. Proceedings of the Fourth International Conference of Human-Computer Interaction, Stuttgart, Germany, Elsevier, Amsterdam, pp. 1036-1040.

Höge, M., Hohmann, A. & Le-Hong, K. 1993. User-centered software development and evaluation, Poster Sessions. Abridged Proceedings of the fifth International Conference on Human-Computer Interaction, (HCI International '93), August 8 - 13, 1993, Orlando, Florida, p. 166.

Höge, M., Hohmann, A. & Mayer, R. 1992. Evaluation of TWB - operationalization and test results, Final report of the ESPRIT II project 2315 Translator's Workbench (TWB), Fraunhofer Society IAO and Mercedes-Benz AG, Stuttgart.

Höge, M., Hohmann, A., van der Horst, K., Evans, S. & Caeyers, H. 1993. User participation in the TWB II project - the first test cycle, Report of the ESPRIT II project 6005 Translator's Workbench II (TWB II), Mercedes-Benz AG, SITE and CEC Language Services, Stuttgart, Paris, Luxembourg.

Höge, M., Wiedenmann, O. & Kroupa, E. 1991. Evaluation of the TWB -- theoretical framework and practical application, Report of the ESPRIT II project 2315 translator's workbench (TWB), EC, Stuttgart.

Hohmann, A., Le-Hong, K. & van der Horst, K. 1994. User participation in the TWB II project - the second test cycle, Report of the ESPRIT II project 6005 Translator's Workbench II (TWB II), Mercedes-Benz AG and CEC Language Services, Stuttgart and Luxembourg.

Hovy & Church, 1991. Good applications for crummy machine translation. In Proceedings of the Natural Language Processing Systems Evaluation Workshop., edited by J. Neal and S. Walter. Calspan-UB Research Center.

Hovy, E. 1989. New Possibilities in Machine Translation. Proceedings of the DARPA Workshop of Speech and Natural Language, Cape Cod, MA, 99-111

* Hovy, E.H. 1988 Generating Natural Language under Pragmatic Constraints. Lawrence Erlbaum Associates, Hillsdale, NJ.

Hovy, E. 1994. Discussion on "Apples, Oranges or Kiwis? Criteria for the comparison of MT systems." Panel Discussion. In M. Vasconcellos, ed., MT Evaluation: Basis for Future Directions. National Science Foundation.

* Hovy, E.H. 1999. Toward Finely Differentiated Evaluation Metrics for Machine Translation. Proceedings of the EAGLES Workshop on Standards and Evaluation. Pisa, Italy.

Hovy, E., Grisham, R., Hobbs, J., Sanfilippo, A., Wilks, Y. 1999. Cross-lingual Information Extraction. In Hovy, E., Ide, N., Frederking, R., Mariani, J. Zampolli, A. (eds.), Multilingual Information Management: Current Levels and Future Abilities. Report for National Science Foundation. http://www.cs.cmu.edu/~ref/mlim/index.html

Howden, W. 1980. Functional program testing, IEEE Transactions on Software Engineering 6: 162-169. http://www.mitre.org/resources/centers/iime/nlp-evaluation.html

Huang, X. 1988. Semantic Analysis in XTRA, an English-Chinese MT System. Computers and Translation 3, 101-120.

Huang, X. 1990. Machine Translation in a Monolingual Environment. Proceedings of the Third International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Linguistic Research Center, University of Texas, Austin, TX. 33-41.

Humphreys, R. L. 1988. User-Oriented Evaluation of MT Systems. Working Papers in Language Processing: 16, Department of Language and Linguistics, University of Essex, December 1988.

Hundt, M.G. 1982. Working with the Weidner Machine-aided Translation System. Proceedings of the Conference for Practical Experience of MT, London, 45-51.

* Hutchins, H., & Somers, H. 1992. An Introduction to Machine Translation. Academic Press.

Hutchins, W. 1986. Machine Translation: Past, Present, Future. Ellis-Horwood Limited, Chichester, England.

* Hutchins, W. 1988. Recent Development in Machine Translation. New Directions in Machine Translation. D. Maxwell, K. Schubert, T. Witkam, (eds.), Foris, Dordrecht.

* Hutchins, W. J. 1997. From first conception to first demonstration: the nascent years of machine translation, 1947-1954. A chronology. Machine Translation 12(3),195-252.

* Hutchins, W. J. 2000. Early years in machine translation: memoirs and biographies of pioneers. John Benjamins, Amsterdam, xii+400 pp.

Ide, N., Chanod, J., Hobbs, J., Hovy, E., Jelinek, F. & Rajman, M. 1999. Methods and Techniques of Processing. In Hovy, E., Ide, N., Frederking, R., Mariani, J. Zampolli, A. (eds.), Multilingual Information Management: Current Levels and Future Abilities. Report for National Science Foundation. http://www.cs.cmu.edu/~ref/mlim/index.html

Infoshop. 1999. Language Translations: World Market Overview, Current Developments and Competitive Assessment. Report by Infoshop Japan, Global Information Inc., Kawasaki, Japan. http://www.infoshop-japan.com/study/ab3365_languagetranslation_toc.html

Ingria, R. J. P. 1989. Grammar Construction and Grammar Evaluation in the BBN Spoken Language System. Presented as a Tutorial at the Pre-Glow Working Days in Computational Linguistics, OST, Utrecht State University, April 3rd, 1989. BBN systems and Technologies Corporation, Cambridge, MA.

IOCU 1977. Comparative Testing Guide, IOCU Testing Committee IOCU, The Hague.

IOCU 1985. Guide to the Principles of Comparative Testing, IOCU Testing Committee IOCU, Penang.

Isabelle, P. & Bourbeau, L. 1988 TAUM-AVIATION: Its Technical Features and Some Experimental Results. in J. Slocum (ed.), Machine translation systems. Cambridge University Press.

Isahara, H. et al. 1996. Technical Evaluation of MT Systems from the Developer's Point of View: Exploiting Test-sets for Quality Evaluation. (in Japanese). Journal of Natural Language Processing 3(3): 83-102.

Isahara, H., et al. 1995. JEIDA's Test-Sets for Quality Evaluation of MT Systems. MT Summit V.

* ISO/IEC. 1991. International Standard ISO/IEC 9126. Information technology -- Software product evaluation - Quality characteristics and guidelines for their use. International Organization for Standardization / International Electrotechnical Commission, Geneva.

* ISO/IEC. 2001. International Standard ISO/IEC 9126-1. Software engineering -- Product quality -- Part 1: Quality model. International Organization for Standardization / International Electrotechnical Commission. Geneva.

* ISO/IEC. 1999. International Standard ISO/IEC 14598-2. Information technology --- Software product evaluation --- Part 1: General overview. International Organization for Standardization / International Electrotechnical Commission. Geneva.

* ISO/IEC. 2000. International Standard ISO/IEC 14598-2. Software engineering --- Product evaluation --- Part 2: Planning and management. International Organization for Standardization / International Electrotechnical Commission. Geneva.

* ISO/IEC. 2000. International Standard ISO/IEC 14598-3. Software engineering --- Product evaluation --- Part 3: Process for developers. International Organization for Standardization / International Electrotechnical Commission. Geneva.

* ISO/IEC. 1999. International Standard ISO/IEC 14598-4. Software engineering --- Product evaluation --- Part 4: Process for acquirers. International Organization for Standardization / International Electrotechnical Commission. Geneva.

* ISO/IEC. 1998. International Standard ISO/IEC 14598-5. Software engineering --- Product evaluation --- Part 5: Process for evaluators. International Organization for Standardization / International Electrotechnical Commission. Geneva.

* ISO/IEC. 2001. International Standard ISO/IEC 14598-6. Software engineering --- Product evaluation --- Part 6: Documentation of evaluation modules. International Organization for Standardization / International Electrotechnical Commission. Geneva.

Jackson, M. 1995. Problems and requirements, Proceedings of the Second IEEE International Symposium on Requirements Engineering, York, England, IEEE Computer Society Press, Los Alamitos, California, pp. 2-9.

Japanese Electronic Industry Development Association. 1992. JEIDA Methodology and Criteria on Machine Translation Evaluation. Tokyo: JEIDA.

Japanese to English Machine Translation (including talk by Bernard Scott of LOGOS). 1989. Report of a Symposium. Office of Japan Affairs. Computer Science and Technology Board, National Research Council; National Academy Press.

Jarke, M., Krause J.& Vassiliou, Y. 1984. Studies in the Evaluation of a Domain-Independent Natural Language Query System. Cooperative Interactive Information Systems, Springer- Verlag.

Jarke, M., Turner, J.A., Stohr, E.A., Vassiliou, Y., White, N. H. & Michielsen K. 1985. A Field Evaluation of Natural Language for Data Retrieval. IEEE Transactions on Software Engineering, SE-II(1): 97-113.

* JEIDA 1992. JEIDA Methodology and Criteria on Machine Translation Evaluation (JEIDA Report). H. Nomura (editor). Japan Electronic Industry Development Association.

JEIDA, 1989. A Japanese View of Machine Translation in the Light of the Considerations and Recommendations Reported by ALPAC, USA. Japan Electronic Industry Development Association, Tokyo.

Jelinek, F. 1988. Evaluation of Grammar Quality. Distributed at the workshop on evaluation of natural language processing systems. Wayne, Philadelphia December 8-9, 1988.

Jones, D. & Rusk, G. 2000. Toward a Scoring Function for Quality-Driven Machine Translation. In Proceedings of COLING-2000.

* Jordan, P., Dorr, B., & Benoit, J. 1993. A First-pass Approach for Evaluating Machine Translation Systems. Machine Translation 8 (1/2), 49-58.

Kamei, S. & Wakao, T. 1992. Metonymy: How to Treat It Properly in a Multilingual Machine Translation System. SPICIS'92.

Kaplan, R., & Bresnan, J. 1982. Lexical Functional Grammar: A Formal System for Grammatical Representation. The Mental Representation of Grammatical Relations, J. Bresnan (ed.), MIT Press, Cambridge, MA., 173-281

Kaplan, R., Netter, K., Wedkind, J., & Zaenen, A. 1989. Translation by structural correspondences. In Proceedings of the Fourth Conference of the European Chapter of the Association for Computational Linguistics, 272-281.

Karat, C. 1990. Cost-benefit analysis of iterative usability testing, in D. Diaper, D. Gilmore, G. Cockton and B. Shackel (eds), Human Computer Interaction - INTERACT '90, Elsevier, IFIP, pp. 351-356.

Karlgren, H. 1987. Good Use of Poor Translations. Introduction. Forum Inf. and Docum., 12 (4: 23-29.

Kawada, T, Amano, S. & Sakai, K. 1980. Linguistic error correction of Japanese sentences. COLING-80, pp. 257-261.

Kay, M. 1980. The Proper Place of Men and Machines in Language Translation. XEROX PARC Research Report CSL-80-11.

Kay, M. 1984. Functional Unification grammar: A Formalism for Machine Translation. Proceedings of the Tenth International Conference on Computational Linguistics, Stanford University, Stanford, CA, 75-78

Kelly, I. D. K. (ed.) 1989. Progress in Machine Translation. Natural Language and Personal Computers. Papers from the International Conference in Machine Translation held by the Natural Language Translation Specialist Group of the British Computer Society at Cranfield Institute of Technology in February 1984. Sigma Press, Wilmslow, UK.

King, M. 1981. Design Characteristics of a Machine Translation System. Proceedings of the Seventh International Joint Conference of Artificial Intelligence. University of British Columbia, Vancouver, B.C., Canada, 43-46.

King, M. 1989. New Directions in MT Systems: A Change in Paradigm. Proceedings of MT Summit II, Munich, Germany. 95-96.

* King, M. 1990. A Workshop on Evaluation: Background paper. Proceedings of the Third International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Linguistic Research Center, University of Texas, Austin, TX. 255-259

King, M. 1995. Les belles infidhles: Fidelity as a criterion of good translation. In B.H. Partee and P. Sgall (eds.) Discourse and Meaning, Papers in Honor of Eva Hajicova. John Benjamins, Amsterdam and Philadelphia.

King, M. 1996. Validity and Evaluation of MT Systems. In H. Somers (ed.), Terminology, LSP and Translation, Studies in Language Engineering in honour of Juan C. Sager. John Benjamins, Amsterdam and Philadelphia.

King, M. 1998. Evaluation Design: the EAGLES Framework. In Proceedings of the Konvens '98. Bonn. Verlag, St. Augustin.

King, M. 1999. The 7-step recipe. Working Document of the EAGLES Evaluation Group. EELS Conference. Hoevelaken.

King, M. (ed.) 1987. Machine Translation Today. Edinburgh Information Technology Series 2, Edinburgh University Press.

King, M. 1989. A Practical Guide to the Evaluation of Machine Translation Systems. ISSCO, Geneva.

King, M. 1990. BABEL-Research: Auditor's Report. ISSCO, Geneva.

King, M. 1998. Language Resources and Evaluation. In Proceedings of AI&NLP '98. Moncton, Canada.

* King, M. and K. Falkedal. 1990. Using Test Suites in Evaluation of Machine Translation Systems. Proceedings of the 18th COLING Conference vol. 2. Helsinki, Finland.

King, M., & Maegaard, B. 1998. Issues in Natural Language Systems Evaluation. In Proceedings of the First International Conference on Language Resources and Evaluation (LREC). Granada, Spain.

King, M., chair. 1991. Evaluation of MT Systems Panel Discussion. Proceedings of MT Summit III.

King, M., Hovy, E., T'sou, B., White, J., & Zaharin, Y. 1999. MT Evaluation - Panel Discussion. In Proceedings of MT-Summit VII.

Kingscott, G. 1989. Applications of Machine Translation. Study for the Commission of the European Communities. Praetorius Limited, for the CEC, Nottingham, UK. September 1989.

Kittredge, R. 1987. The Significance of Sublanguage for Automatic Translation. Machine Translation: Theoretical and Methodological Issues, Sergei Nirenburg, (ed.), Cambridge University Press, Cambridge, England, 59-67.

Klavans and Tzoukermann. 1996. Dictionaries and Corpora: Combining Corpus and Machine- readable Dictionary Data for Building Bilingual Lexicons. Machine Translation 10 (3-4.

Klavans, J., Hovy, E., Fluhr, C., Frederking, R., Oard, D., Okumura, A., Ishikawa, K., & Satoh, K. 1999. Multilingual (or Cross-Lingual) Information Retrieval. In Hovy, E., Ide, N., Frederking, R., Mariani, J. Zampolli, A. (eds.), Multilingual Information Management: Current Levels and Future Abilities. Report for National Science Foundation. http://www.cs.cmu.edu/~ref/mlim/index.html

Klein, F. 1988. Factors in the Evaluation of MT: A Pragmatic Approach. in Muriel Vasconcellos (ed.) Technology as Translation Strategy, American Translators Association Scholarly Monograph Series II, State University of New York at Binghamton (SUNY), pp. 198- 202.

Knight, K. 2000. Machine Translation. Tutorial at ANLP/NAACL-2000.

Knowles, F. 1979. Error analysis of Systran output - a suggested criterion for the 'internal' evaluation of translation quality and a possible corrective for system design. in Snell, Barbara M. (ed.) Translating and the Computer, North-Holland Publishing Company, pp. 109-134.

Krause, J. 1980. Natural Language Access to Information Systems: an evaluation study of its acceptance by end users. Univ. Regensburg, abt nichtnumer. Datenverarbeitung/regensburg 8400/DEU, Inf. Syst., ISSN:0306-4379 Vol. 5, No. 4, pp. 297-318.

Krauwer, S. 1993. Evaluation of MT Systems: A Programmatic View. Machine Translation 8: 1-2, 59-66.

Kukich, K. 1992. Techniques for Automatically Correcting Words in Text. ACM Computing Surveys, 24(4): 377-438.

Lancaster, F.W., Rapport, R.L. & Penry, J.K. 1972. Evaluating the effectiveness of an on-line, natural language retrieval system. Grad. Sch. Libr. Sci., Vol, 8, No. 5. University of Illinois, Urbana, Illinois, pp. 223-245.

Lang, E. & Gerber, L. 1994. Internal evaluation: SYSTRAN. In M Vasconcellos, ed., MT Evaluation: Basis for Future Directions. Proceedings of a workshop sponsored by the National Science Foundation. Washington, DC., Association for Machine Translation.

Langlais, P., Foster, G., & Lapalme, G. 2000. TransType: a Computer-Aided Translation Typing System. In Van Ess-Dykema, C., Voss, C., & Reeder, F., eds. Proceedings of the Workshop of Embedded Machine Translation Systems, ANLP/NAACL-2000. Association for Computational Linguistics, Seattle, Washington.

Language and Machines. Computers in Translation and Linguistics. (ALPAC report, 1966). National Academy of Sciences.

Laurian, A-M. 1984. Machine Translation: what type of post-editing on what type of documents for what type of users. COLING-84: Proceedings of the Tenth International Conference on Computational Linguistics, Stanford University, pp. 236-238.

Lawson, V. (ed.) 1982. Practical Experience of Machine Translation. North-Holland Publishing Company.

Lawson, V. 1979. Tigers and Polar Bears. The Incorporated Linguist, 18(3).

Lazzari, G., Frederking, R. & Minker, W. 1999. Speaker-Language Identification and Speech Translation. In Hovy, E., Ide, N., Frederking, R., Mariani, J. Zampolli, A. (eds.), Multilingual Information Management: Current Levels and Future Abilities. Report for National Science Foundation. http://www.cs.cmu.edu/~ref/mlim/index.html

Leavitt, A., Gates, J., & Shannon, S. 1971. Machine Translation Quality and Production Process Evaluation. Report RADC-TR-71-206, October 1971. Rome Air Development Center, Griffiss Air Force Base, New York.

Le-Hong, K., Hvge, M. & Hohmann, A. 1992. User's point of view of the translator's workbench, Translating and the Computer. Quality Standards and the Implementation of Technology in Translation. ASLIB, 10-11 November 1992 14: 25-31.

Lehrberger, J. & Bourbeau, L. 1987. Machine Translation: Linguistic characteristics of MT Systems and General Methodology of Evaluation. Amsterdam: John-Benjamin.

Leick, J. M. & Schroen, D. 1978. Quelques risultats statistiques d'une ivaluation sommaire du systhme de traduction automatique. Systran Information document, CETIL, CCE.

Lesn, M. 1994. Results of the Warmup Exercises. In Vasconcellos, ed., MT Evaluation: Basis for Future Directions. Proceedings of a workshop sponsored by the National Science Foundation. San Diego, California.

Lesmo, L. & Torasso, P. 1984. Interpreting syntactically ill-formed sentences. COLING-84: Proceedings of the Tenth International Conference on Computational Linguistics, Stanford University, pp. 534-539.

Levy, M. 1988. Implementation of a Computer-aided Translation Project at the Federal Government Translation Bureau in Canada, Presentation given at the 29th Annual ATA Conference, Seattle, October 1988.

Levy, M. 1989. Consolidating a Machine Translation Project at the Post-implementation Stage. Presentation given at the 30th Annual ATA Conference, Washington D. C., October 11- 15, 1989. Secretary of State Department of Canada.

Lewis, D. 1997. MT Evaluation: Science or Art? Machine Translation Review, No. 6, 25-36. Also, Translating and the Computer, 19. Reprinted at http://www.bcs.org.uk/siggroup/sg37.htm

Lewis, D. D. 1988. Evaluation in Information Retrieval. Talk presented at the workshop on Evaluation of Natural Language Processing Systems, Wayne, Philadelphia December 8- 9, 1988.

Lewis, J., Henry, S. & Mack, R. 1990. Integrated office software benchmarks: A case study, in D. Diaper, D. Gilmore, G. Cockton and B. Shackel (eds), Human Computer Interaction - INTERACT '90, Elsevier, Amsterdam, pp. 337-343.

Loehr, D. 1998. Can Simultaneous Interpretation Help Machine Translation. AMTA-98

* Loffler-Laurian, A-M. 1983. Pour une typologie des erreurs dans la traduction automatique. Multilingua 2 (2):65-78, Mouton Publishers.

Macklovitch, E. 1989. Recent Canadian Experience in Machine Translation. in I. D. K. Kelly (ed.) Progress In Machine Translation. Natural Language and Personal Computers. Sigma Press Wilmslow, UK, pp. 59-67.

Macleod M. 1996. Performance measurement and ecological validity. in P Jordan (ed.) Usability Evaluation in Industry. Taylor and Francis, London.

Maegaard, B. 1997. Evaluation of Language Tools. Translating and the Computer, 19, ASLIB, London, 5p.

Maegaard, B., Bel, N., Dorr, B., Hovy, E., Knight, K., Iida, H., Boitet, C. Wilks, Y. 1999. Machine Translation. In Hovy, E., Ide, N., Frederking, R., Mariani, J. Zampolli, A. (eds.), Multilingual Information Management: Current Levels and Future Abilities. Report for National Science Foundation. http://www.cs.cmu.edu/~ref/mlim/index.html

Mani, I. & Hirschman, L. 21 Evaluation

* Mann, W. & Thompson, S. 1988 Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text, 8(3), p. 243-281.

Manzi, S., King, M. & Douglas, S. 1996. Working towards user-oriented evaluation. Proceedings of the International Conference on Natural Language Processing and Industrial Applications (NLP+IA 96), Moncton, New-Brunswick, Canada, pp 155-160.

Mariani, J., Kordi, K., Rosenfeld, R., Choukri, K., Cole, R. & Micca, G. 1999. Multilingual Speech Processing (Recognition and Synthesis). In Hovy, E., Ide, N., Frederking, R., Mariani, J. Zampolli, A. (eds.), Multilingual Information Management: Current Levels and Future Abilities. Report for National Science Foundation. http://www.cs.cmu.edu/~ref/mlim/index.html

Martin, J. 1990. Category Neutral Parsing. Technical Report, UMIACS, University of Maryland, College Park, MD.

Maruyama, H. 1990. An Interactive Japanese Parser for MT. Proceedings of COLING-90, Helsinki, Finland, 257-262.

Marx, J., Smith, N., & Staudinger, B. 1998. Some problems in the evaluation of the Russian- German machine translation system MIROSLAV. Proceedings of the First International Conference on Language Resources and Evaluation. Granada: European Language Resource Association.

Mason, J., & Rinsche, A. 1995. Ovum Evaluates Translation Technology Products. London: Ovum, Ltd.

Maxwell, D., Schubert, K., & Witkam, T. 1988. Recent Developments in Machine Translation. Foris, Dordrecht.

Maybury, M. T. 1990. Evaluation Spaces: A Framework for Evaluating Natural Language Generation Systems. AAAI-90 Workshop in Evaluating Natural Language Generation Systems.

Maybury, M., Stock, O., Carayannis, G., & Hovy, E. 1999. Multimedia Communication, including Text. In Hovy, E., Ide, N., Frederking, R., Mariani, J. Zampolli, A. (eds.), Multilingual Information Management: Current Levels and Future Abilities. Report for National Science Foundation. http://www.cs.cmu.edu/~ref/mlim/index.html

McCord, M. 1989. Design of LMT: A Prolog-Based Machine Translation System. Computational Linguistics, 15(1): 33-52.

McDonald, D. 1993. Acquisition of Lexical Knowledge from Text. In Boguraev, B., & Pustejovsky, J. (eds.), Proceedings of a Workshop Sponsored by the SIGLEX of the ACL. Ohio State University.

Mel'cuk, I. & Zholkovsky. 1988. The explanatory combinatory dictionary. In M. Evens, ed., Relational Models of the Lexicon: Representing Knowledge in Semantic Networks. Cambridge University Press.

Melamed, I. D. 1997. A Word-to-word Model of Translation Equivalence. ACL/EACL-97

Melamed, D. 1997. Measuring Semantic Entropy. SIGLEX Workshop on Tagging Text with Lexical Semantics. Washing, DC. Association for Computational Linguistics.

Meng, H., Khudanpur, S., Levow, G., Oard, D., & Wang, H-M. 2000. Mandarin-English Information (MEI): Investigating Translingual Speech Retrieval. In Van Ess-Dykema, C., Voss, C., & Reeder, F., eds. Proceedings of the Workshop of Embedded Machine Translation Systems, ANLP/NAACL-2000. Association for Computational Linguistics, Seattle, Washington.

Menzel, W. 1987. Automated reasoning about natural language correctness. Proceedings of the Third Conference of the European Chapter of the Association for Computational Linguistics, (EACL-87), University of Copenhagen, Denmark.

Merlo, P. 1990. A Modest Proposal to Adopt a Grammar of Principles and Parameters in the Design of Parsing Systems for Machine Translation. Manuscript, University of Maryland, College Park, MD.

Miller, E. & Howden, W. (eds.) 1981. Intorial: Software Testing and Validation Techniques, IEEE, London.

Miller, E. 1984. Quality management technology: Practical applications, Software Validation pp. 255-266.

Miller, G. A. & Beebe-Center, J. G. 1958. Some Psychological Methods for Evaluating the Quality of Translations. Mechanical Translation, 3:73-80.

Miller, K.J. 2000. The Lexical Choice of Prepositions in Machine Translation. Unpublished Ph.D. thesis, Georgetown University.

Miller, K.J. and Vanni, M. 2001. Scaling the ISLE Taxonomy: Development of Metrics for the Multi-Dimensional Characterisation of MT Quality. Proceedings of MT Summit VIII, Santiago de Compostela, Spain, p. 229-234.

Minnis, S. 1993. Constructive machine translation evaluation. Machine Translation 8 (1/2), 67- 76.

* Mitkov R., Boguraev B. & Lappin S. (eds). 2001. Special Issue of 'Computational Linguistics' on Computational Anaphora Resolution, Computational Linguistics, 27(4).

Miyazawa, S., Yokoyama, S., Matsudaira, M., Kumano, A., Kodama, S., Kashioka, H., Shirokizawa, Y., Nakajima, Y. 1999. Study on Evaluation of WWW MT Systems. In Proceedings of MT Summit VII

Moll, T. & Ulich, E. 1988. Einige methodische fragen in der ananlyse von mensch-computer interaktion, Zeitschrift fur Arbeitswissenschaft 42(2): 70-76.

* Morris, J. and Hirst, G. 1991 Lexical cohesion, the thesaurus, and the structure of text. Computational linguistics, 17(1), March 1991, 21-48.
http://ftp.cs.toronto.edu/pub/gh/Morris+Hirst-91.pdf.

MUC-3 1991. Proceedings of the Third Message Understanding Conference (MUC-3), Morgan Kaufmann, San Mateo, CA.

Murine, G. & Carpenter, C. 1983. Applying software quality metrics, Proceedings from the ASQC Quality Congress, Transactions, Boston.

Musa, J., Iannino, A. & Okumoto, K. 1987. Software Reliability, Measurement, Prediction, Application, McGraw-Hill Book Co., New York.

Nomura, H. and J. Isahara. 1992. JEIDA Report on Machine Translation. Proceedings of the AMTA Workshop on MT Evaluation, San Diego. See also (JEIDA, 1992)

Nagao, M. 1985. Evaluation of the Quality of Machine Translation Sentences and the Control of Language (in Japanese). Information Processing 26(10). Information Processing Society of Japan.

Nagao, M. 1989. A Japanese View on Machine Translation in Light of the Considerations and Recommendations reported by ALPAC, USA. Japanese Electronic Industry Development Association.

Nagao, M. 1989. Machine Translation: How Far Can It Go? Oxford University Press, Oxford.

Nagao, M., Tsuji, J. & Nakamura, J. 1988. The Japanese government project, in J. Slocum (ed.), Machine translation systems, CUP, Cambridge.

Nagao, M., Tsujii, J., Nakamura, J. 1985. The Japanese Government project for machine translation. Computational Linguistics 11(2/3), 91-109.

Neal, A. & Simons, R. 1985. Playback: A method for evaluating the usability of software and its documentation, Proceedings of the Anniversary Meeting 1985, User Friendly Computing September 23-27, 1985, Vol. 2, pp. 1051-1075.

Neal, J. G., Feit, E. L. & Montgomery, C. A. 1993. Benchmark Investigation/Identification Project. Machine Translation, 8(1,2):77-84.

Nerbonne, J., Flickinger, D. & Wasow, T. 1988. The HP Labs Natural Language Evaluation Tool. Talk presented at the workshop on Evaluation of Natural Language Processing Systems, Wayne, Philadelphia December 8-9, 1988.

Nerbonne, J., Netter, K., Diagne, A. K., Klein, J. & Dickmann, L. 1993. A Diagnostic Tool for German Syntax. Machine Translation, 8(1,2):85-108.

Newton, J. 1992. Computers in Translation: A Practical Appraisal. Routledge, London.

* Niessen, S., Och, F.J., Leusch, G., & Ney, H. 2000. An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research. Proceedings of the Second International Conference on Language Resources and Evaluation (LREC), Athens, Greece, vol. 1, p. 39-45.

Nirenburg, S. & Goodman, K. 1990. Treatment of Meaning in MT Systems. Proceedings of the Third International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Linguistic Research Center, University of Texas, Austin, TX. 171-187.

Nirenburg, S. (ed.) 1987. Machine Translation. Theoretical and Methodological Issues. Studies in Natural Language Processing, Cambridge University Press.

Nirenburg, S., & Levin, L. 1989. Knowledge Representation Support. Machine Translation 4(1): 25-52.

Nishida, F. & Takamatsu, S. 1990. Automated Procedures for the Improvement of a Machine Translation System by Feedback from Postediting. Machine Translation 5(3).

Nomura, H. & Ishara, J. 1992. JEIDA Report on Machine Translation. Proceedings of the AMTA Workshop on MT Evaluation. San Diego, CA.

Norman, D. 1985. Four stages of user's activities, Proceedings of Human-Computer Interaction - Interact'84.

Nuebel, R. 1997. End-to-end Evaluation in Verbmobil 1. Proceedings of Machine Translation Summit VI, San Diego, California, 29th October-1st November 1997.

* Nunberg, G. 1990. The linguistics of punctuation. CSLI Publications, Stanford, CA, USA.

* Oard, D. & Gonzalo, J. 2001. The CLEF-2001 Interactive Track. Proceedings of the 2001 Cross-Language Evaluation Forum Workshop, Darmstadt, Germany.

* Och, F.-J., C. Tillmann, and H. Ney. 1999. Improved Alignment Models for Statistical Machine Translation. Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP).
http://www.aclweb.org/anthology/W99-0604.

O'Connell, T., O'Mara, F. & White, J. 1994. The ARPA MT evaluation methodologies: Evolution, lessons and further approaches, Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, U.S.A.

Oller, J. 1975. Research with Cloze Procedure in Measuring the Proficiency of Non-native Speakers of English: An Annotated Bibliography. Arlington, VA: ERIC Clearinghouse on Language and Linguistics, Center for Applied Linguistics.

Orr, D. & Small, V. 1967. Comprehensibility of Machine-Aided Translations of Russian Scientific Documents. Mechanical Translation and Computational Linguistics, 10, 1-10.

Osterweil, L. 1984. Integrating the testing, analysis and debugging of programs, in H. Hausen (ed.), Software Validation, Amsterdam, North-Holland, pp. 73-93.

OVUM 1995. Mason, J. and A. Rinsche. 1995. Translation Technology Products. OVUM Ltd., London.

Paggio P. & Underwood, N.L. 1998 Validating the TEMAA LE evaluation methodology: a case study on Danish spelling checkers. Natural Language Engineering. Cambridge University Press, in press.

Pallett, D. S. 1988. Types of evaluation methodology. Talk presented at the workshop on Evaluation of Natural Language Processing Systems, Wayne, Philadelphia December 8- 9, 1988.

Palmer, M. , Calzolari, N., Choukri, K., Fellbaum, C., Hovy, E., & Ide, N. 1999. Multilingual Resources. In Hovy, E., Ide, N., Frederking, R., Mariani, J. Zampolli, A. (eds.), Multilingual Information Management: Current Levels and Future Abilities. Report for National Science Foundation. http://www.cs.cmu.edu/~ref/mlim/index.html

Palmer, M., & Finin, T. 1990. Workshop on the evaluation of natural language processing systems. Computational Linguistics, 16(3).

Pankowicz, Z. 1967. Commentary on ALPAC Report ("Language and Machines; Computers in Translation and Linguistics"). Griffiss Air Force Base, Rome Air Development Center, New York.

Pankowicz, Z. L. 1978. Facts of Life in Assessment of Machine Translation, CEC, Luxembourg.

Pantel, P., Lin, D. 2000. Word for word Glossing with Contextually Similar Words. NAACL-00.

* Papineni K., Roukos S., Ward T. & Zhu w.-J. 2001. BLEU: a Method for Automatic Evaluation of Machine Translation, Research Report, Computer Science IBM Research Division, T.J.Watson Research Center, RC22176 (W0109-022), 17 September 2001.

Pfafflin, S. 1965. Evaluation of Machine Translations by Reading Comprehension Tests and Subjective Judgments. Mechanical Translation 8, 2-8.

* Pierce, J., chair. 1966. Language and machines: Computers in Translation and Linguistics. Report by the Automatic Language Processing Advisory Committee (ALPAC). Publication 1416. National Academy of Sciences National Research council.

Pigott, I. M. 1989. Operational Machine Translation System, iesnews, 21, Luxembourg.

Proceedings of the Evaluators' Forum, April 21-24, 1991, Les Rasses, Vaud, Switzerland.

Pugh, J. 1992. The story so far: An evaluation of Machine Translation in the world today. In J. Newton, ed., Computers in Translation: A Practical Appraisal. Routledge, London.

Raghavan, V. V., Bollmann, P. & Jung, G. S. 1989. Retrieval System Evaluation Using Recall and Precision: Problems and Answers. Proceedings of the 12th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval (SIGIR89), pp. 59-68.

Rahmstorf, G. & Rabinovitz, R. 1993. Better writing through electricity, PC Magazine May 1993: 147-200.

* Rajman M. & Hartley T. 2002. Automatic Ranking of MT Systems, Proc. of Third International Conference on Language Resources and Evaluation (LREC), Las Palmas de Gran Canaria, Spain, vol. 4, pp. 1247-1253.

Rapp, R. 1999. Automatic Identification of Word Translations from Unrelated English and German Corpora. ACL-99

* Reeder, F. 2003. Title to be announced. Doctoral dissertation.

Read, W., et al. 1988. Evaluating Natural Language Systems: A Sourcebook Approach. COLING 1988: Proceedings of the 12th International Conference on Computational Linguistics, Budapest, pp. 530-534.

* Reiter E., Mellish C., Levine J. 1995. Automatic generation of technical documentation. Applied Artificial Intelligence 9(3): 259-287.

Resnik, P. 1992. Selection and Information: A Class-Based Approach to Lexical Relationships. Ph.D. thesis, University of Pennsylvania.

Resnik, P. 1996. Selectional Constraints: An Information-Theoretic Model and Its Computational Realization. Cognition 61:127-159.

Resnik, P. 1997. Evaluating multilingual gisting of web pages. UMIACS Technical Report. University of Maryland Institute for Advanced Computer Studies.

* Review and System Analysis of the SYSTRAN Machine Translation System, RADC-TR-76-399. Final technical report, Battelle Colombus Laboratories, Rome Air Development Center, Air Force Systems Command, Griffiss Air Force Base, NY, 1977.

Rinsche, A. 1991-9. Towards a System of Benchmarking MT Systems. Report for EC Commission. October 10, 1991.

Rinsche, A. 1993. Towards a MT Evaluation Methodology. In Proceedings of the Theoretical and Methodological Implications of Machine Translation (TMI-93)

Ristad, E. 1995. A Natural Law of Succession, Technical Report CS-TR-495-95. Princeton University.

Rohrer, C. 1986. Linguistic Bases for Machine Translation. Proceedings of COLING-86. Bonn, Germany, 353-355.

Roudaud, B., Puerta, M., Gamrat, O. 1993. A procedure for the evaluation and improvement of an MT system by the end-user. Machine Translation 8 (1/2), 109-116.

Roukos, S. 1988. Performance evaluation in speech processing. Talk presented at the workshop on Evaluation of Natural Language Processing Systems, Wayne, Philadelphia December 8- 9, 1988.

Rowe, N. 1982. On some arguable claims in B Shneiderman's evaluation of natural language interaction with database systems. SIGMOD Record 13 (1):92-97.

Rushinek, A. & Rushinek, S. 1985. Accounting and auditing software evaluation with knowledge based expert systems: An empirical multivariate model, Fourth Annual International Conference on Computers and Communications '85, Conference Proceedings, March, 20-22, 1985, pp. 250-254.

Russo, J. E. 1988. Information processing from the consumer's perspective, Proceedings of the International Conference on Research in the Consumer Interest, pp. 185-217.

Sadler, L., Crookston, I., Arnold, D., & Way, A. 1990. LFG and Translation. Proceedings of the Third International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Linguistic Research Center, University of Texas, Austin, TX. 121-130.

* Sager, J. 1978. Criteria for Machine Translation Evaluation. Proceedings of Workshop on Evaluation Problems in Machine Translation. Luxembourg. February, 1978.

Sager, J. C. 1979. Text quality and cost-efficiency of translation (some tentative suggestions for diversification of the translation effort). Information Paper for CETIL CCE.

Saito, H., & Tomita, M. 1986. On automatic Composition of Stereotypic Documents in Foreign Languages. CMU Technical Report - CMU-CS-86-107. Computer Science Department, Carnegie-Mellon University, Pittsburgh, PA.

Salton, G. & Buckley, C. 1990. An Evaluation of Text Matching Systems for Text Excerpts of Varying Scope, Technical Report no. TR~90-1134, June 1990, Department of Computer Science, Cornell University, Ithaca, N.Y.

Sato, S. & Nagao, M. 1990. Toward Memory-based Translation. Proceedings of COLING-90, Helsinki, Finland. 247-252.

Schank, R. 1974. Conceptual Information Processing. Elsevier Science Publishers, Amsterdam, The Netherlands.

Schmied, W.-S. & Winkler, H. 1989) . Software-Qualitdt. Ausgewdhlte Methoden und Werkzeuge der Softwarepr|fung, Siemens-Schriftenreihe data praxis, Siemens, M|nchen.

Schuster, E. & Finis, T. W. 1985. VP2: the role of user modelling in correcting errors in second language learning. AISB-85 pp. 187-195.

Schuster, E. 1986. The role of native grammars in correcting errors in second language learning. Computational Intelligence 2(2):93-98.

Schütz, J., N|bel, R. 1998. Evaluating Language Technologies: The MULTIDOC Approach to Taming the Knowledge Soup. AMTA-98

Scott, B. 2000. Linguistic and Computational Motivations for the LOGOS Machine Translation System: An Overview. In Streiter, O., Carl, M., & Haller, J. (eds.), Hybrid Approaches to Machine Translation, IAI Working Paper No. 35, Institute of Applied Information Sciences, Saarbr|cken, Germany http://rockey.iis.sinica.edu.tw/oliver/iaiwp/p11/index.html

Sharp, R. 1985. A Model of Grammar Based on Principles of Government and Binding. Master of Science thesis, Department of Computer Science, University of British Columbia.

Shieber, S., van Noord, G., Moore, R., & Pereira, F. 1989. A Semantic-Head-Driven Generation Algorithm for Unification-Based Formalisms. Proceedings of the 27th Annual Conference of the Association for Computational Linguistics, University of British Columbia, Vancouver, B.C., Canada,. 7-17.

Shinghal, R. 1982. An error correcting contextual algorithm for text recognition. Proceedings of the Fourth Biennial Conference of the Canadian Society for Computational Studies of Intelligence, pp. 66-70.

Shinnou, H. 1998. Revision of Morphological Analysis Errors through Personal Name Construction Model. AMTA-98.

Shiwen, Y. 1993. Automatic evaluation of output quality for machine translation systems Machine Translation 8 (1/2), 1-24.

Silberer, G. 1985. The impact of comparative product testing upon consumers. selected findings of a research project, Journal of Consumer Policy 8: 1-27.

Sinaico, W. H.& Klare, G. R. 1971. Further Experiments in Language Translation: Readability of Computer Translations. Institute for Defence Analyses. Arlington, Va. August and December 1971.

Sinaiko, H. & Klare, G. 1973. Further Experiments in Language Translation: A Second Evaluation of the Readability of Computer Translations. ITL 19, 29-52.

Sinaiko, H. W. 1979 Measurement of Usefulness by performance test. In Van Slype, G. 1979. Critical Methods for Evaluating the Quality of Machine Translation. Prepared for the European Commission Directorate General Scientific and Technical Information and Information Management. Report BR-19142. Bureau Marcel van Dijk.

Slage, J. & Wick, M. 1988. A method for evaluating expert system applications, AI Magazine 9.

Slocum, J. & Justus, C. 1985. Transportability to Other Languages: The Natural Language Processing Project in the AI Program at MCC. ACM Transactions on Office Information Systems. 3(2): 204-230.

Slocum, J. (ed.) 1988. Machine translation systems. Cambridge University Press.

Slocum, J. 1988. Evaluating Machine Translation Systems: a business viewpoint. Talk presented at the workshop on Evaluation of Natural Language Processing Systems, Wayne, Philadelphia December 8-9, 1988.

Slocum, J., et al. 1985. An Evaluation of METAL: the LRC Machine Translation System. Proceedings of the Second Conference of the European Chapter of the Association for Computational Linguistics, Geneva, pp. 62 - 69.

Sneed, H. 1987. Software-testen - state of the art, Software Entwicklungs-Systeme und Werkzeuge, 2 Kolloquium, 8-10, September 1987 .

Snell, B. M. (ed.) 1979. Translating and the Computer. North-Holland Publishing Company.

Snow, J. A. 1984. Research and Development: Programs and Priorities in a United States Mission Agency. in G Boggio et al (eds.), Evaluation of Research and Development. Methodologies for R&D Evaluation in the European Community Member States, the United States of America and Japan. Proceedings of the Seminar held in Brussels, Belgium, October 17-18, 1983. D. Reidel Publishing Company, Dordrecht, pp. 95-114.

Somers, H. 1990. Current Research in Machine Translation. Proceedings of the Third International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Linguistic Research Center, University of Texas, Austin, TX. 1-12.

Somers, H. & Prieto-Alvarez, N. 2000. Multiple Choice Reading Comprehension Tests for Comparative Evaluation of MT Systems. In Proceedings of the Workshop on MT Evaluation at AMTA-2000

Somers, H. L. 1997. The Current State of Machine Translation. In MT-Summit. Pp. 115-123.

Somers, H. 1997. Machine Translation and Minority Languages. In Proceedings of Translating and the Computer, ASLIB-97. Also available at: http://www.ling.lancs.ac.uk/monkey/ihe/mille/paper2.htm

* Somers, H. and Wild, E. 2000. Evaluating Machine Translation: the Cloze procedure revisited. Translating and the Computer 22, London, November 2000.

Sondheimer, N. K. 1981. Evaluation of Natural Language Interfaces to Database Systems: A Panel Discussion. Proceedings ACL 1981, p 29.

Spalink, K. 1994. Proposal for a differentiated text-related machine translation evaluation methodology. MT News International, 9.

* Sparck Jones, K. & Galliers, J.R. 1996. Evaluating Natural Language Processing Systems. Springer Verlag.

Steenkamp, J. B. E. M. 1989. Product quality: an investigation into the concept and how it is perceived by consumers. van Gorcum, Assen/Maastricht.

Streiter, O., Carl., M., Haller, J. 2000. Introduction. In Streiter, O., Carl, M., & Haller, J. (eds.), Hybrid Approaches to Machine Translation, IAI Working Paper No. 35, Institute of Applied Information Sciences, Saarbr|cken, Germany. http://rockey.iis.sinica.edu.tw/oliver/iaiwp

Sugaya, F., Takezawa, T., Yokoo, A., & Yamamoto, S. 1999. End-to-end Evaluation in ATR- MATRIX: Speech Translation System between English and Japanese. In Proceedings of EUROSPEECH-99.

Sundheim, B. 1991. Overview of the third message understanding evaluation and conference, Proceedings of the Third Message Understanding Conference (MUC-3), Morgan Kaufmann, San Mateo, CA, pp. 3-24.

Sydeserff, H. A., Caley, R. J., Isard, S. D., Jack, M. A., Monaghan, A. I. C. & Verhoeven, J. 1991. Evaluation of speech synthesis techniques in a comprehension task. Eurospeech 91: Proceedings of the Second European Conference on Speech Communication and Technology, Genoa.

Takezawa, T., Sugaya, F., Yokoo, A., Yamamoto, S. 1999. A New Evaluation Method for Speech Translation Systems and a Case Study on ATR-MATRIX from Japanese to English. In Proceedings of MT Summit VII.

Tansley, D. S. W. & Hayball, C. C. ( 1993. Knowledge Based Systems Analysis and Design: A KADS Developer's Handbook, Prentice Hall, Englewood Cliffs, NJ.

Taylor, K. & J. White. 1998. Predicting What MT is Good for: User Judgments and Task Performance. AMTA-98, p. 364-373.

Taylor, W. 1953. Cloze Procedure: A New Tool for Measuring Readability. Journalism Quarterly 30:415-433.

Taylor, W. 1956. Recent developments in the Use of Cloze Procedure. Journalism Quarterly 33:42-48, 99.

TEMAA 1996. TEMAA Final Report, LRE-62-070. March 1996. Center for Sprogteknologi, Copenhagen, Denmark. Electronic version also available from: http://www.cst.ku.dk/projects/temaa/D16/d16exp.html

Tennant, H. 1979. Experience with the Evaluation of Natural Language Question Answerers. Proceedings of the Sixth International Joint Conference on Artificial Intelligence, Tokyo, pp. 874-876.

Tennant, H. 1981. What Makes Evaluation Hard? Proceedings of ACL 1981, pp. 37-38.

Teubert, W. 2000. Extracting translation equivalents from a parallel corpus. Arbeit/travail/work in comparison. In CULT2K.

Thaller, G. 1993. Qualitdtsoptimierung der Software-Entwicklung. Das Capability Maturity Model (CMM), Verlag Vieweg, Braunschweig/Wiesbaden.

Thaller, G. 1994. Verifikation und Validation. Software Tests f|r Studenten und Praktiker, Vieweg, Braunschweig.

Thompson, B. H. 1981. Evaluation of Natural Language Interface to Data Base Systems Proceedings of ACL 1981, pp. 39-42.

Thompson, H. 1994. TEMAA : A testbed study of evaluation methodologies : Authoring aids, Proceedings of the Language Engineering Convention, ELSNET, Paris, pp. 147-148.

Thompson, H. S. (ed.) 1992. The Strategic Role of Evaluation in Natural Language Processing and Speech Technology. Technical Report, May 1992, University of Edinburgh Record of a workshop sponsored by DANDI, ELSNET and HCRC.

Thompson, H. S. 1989. Evaluation of phoneme lattices: Four methods compared. Proceedings of the Workshop on Speech Input /Output Assessment and Speech Databases, European Speech Communication Association, Brussels.

Thompson, H. S. 1991. Automatic evaluation of translation quality: Outline of methodology and report on pilot experiment. in Kirsten Falkedal (ed.) Proceedings of the Evaluators' Forum, ISSCO, Geneva.

Thompson, H. & Brew, C. 1996. Automatic Evaluation of Computer-Generated Text: Final Report of the TextEval Project. Human Communication Research Centre, University of Edinburgh, Scotland.

Thorelli, H. B. 1979. The future for consumer information systems, in W. L. Wilkie (ed.), Advances in Consumer Research, Vol. 6, Association for Consumer Research, Ann Arbor, pp. 227-232.

Toma, P. and LATSEC, Inc. 1976. SYSTRAN '76: A Brief Description of the Status, Applications, Configuration, and Components of the SYSTRAN Machine Translation System. SYS/001/76/5, LATSEC, Inc. La Jolla, California.

Tomita, M. 1992. Application of the TOEFL Test to the Evaluation of Japanese-English MT. Proceedings of MT Evaluation Workshop. AAMT.

Tomita, M., Shirai, M., Tsutsumi, J., Matsumura, M. & Yoshikawa, Y. 1993. Evaluation of MT Systems by TOEFL. In Proceedings of the Theoretical and Methodological Implications of Machine Translation (TMI-93.

* XX. 1985 Trial of the Weidner Computer-Assisted Translation System. Supply and Services Canada, Bureau of Management Consulting. Project No 5-5462. Report, October 1985.

Turcato, D., Popowich, F., McFetridge, P., Nicholson, D., & Toole, J. 2000. Pre-processing Closed Captions for Machine Translation. In Van Ess-Dykema, C., Voss, C., & Reeder, F., eds. Proceedings of the Workshop of Embedded Machine Translation Systems, ANLP/NAACL-2000. Association for Computational Linguistics, Seattle, Washington.

Turk, C. 1984. A correction NL mechanism. ECAI-84 pp. 225-226.

Turner, J. A., Jarke, M. Stohr, E. A.,Vassiliou, Y. & White, N. H. 1982. Using Restricted Natural Language for Data Retrieval: A Plan for Field Evaluation. Presented at NYU Symposium on User Interfaces, May 1982.

Vainio-Larsson, A. 1990. Evaluating the usability of user interfaces: Research in practice, in D. Diaper, D. Gilmore, G. Cockton and B. Shackel (eds.), Human Computer Interaction - INTERACT '90, Elsevier, Amsterdam, pp. 323-328.

Van Leuven-Zwart, K. 1990. Translation and Original: Similarities and Dissimilarities, II. Target 2(1). 69-95.

Van Slype, G. 1978. Analyse des risultats de l'opiration-pilote de pri-traduction automatique anglais-frangais, de janvier a mars 1978. Bureau Marcel van Dijk, CCE .

Van Slype, G. 1978. Note sur la méthodologie de notre deuxième évaluation de Systran anglais- français, Bureau Marcel Van Dijk, Bruxelles and CCE.

* Van Slype, G. 1978. Evaluation of the 1978 Version of the SYSTRAN English-French Automatic system of the Commission of the European Communities.

Van Slype, G. 1978. Second Evaluation of the SYSTRAN Automatic Translation System, Draft Report, Bureau Marcel Van Dijk, Bruxelles and CCE.

* Van Slype, G. 1979. Critical Methods for Evaluating the Quality of Machine Translation. Prepared for the European Commission Directorate General Scientific and Technical Information and Information Management. Report BR-19142. Bureau Marcel van Dijk (PDF available).

Van Slype, G. 1982. Conception d'une méthodologie générale d'évaluation de la traduction automatique. Multilingua, 1(4): 221-237.

Van Slype, G. 1979. Evaluation de la qualité de la traduction automatique, Raport final sur Contract ML 9, Bureau Marcel Van Dijk, Bruxelles and CCE.

Van Slype, G. 1979. First evaluation of the SYSTRAN French-English automatic translation system of the Commission of the European Communities. Draft Report, CCE, Luxembourg.

Van Slype, G. 1979. Première évaluation du système de traduction automatique SYSTRAN anglais - italien de la Commission des Communautés Européennes. Rapport final sur Contract ML 9, Bureau Marcel Van Dijk, Bruxelles and CCE.

Vanni, M. T. 1998. Evaluating MT Systems: Testing and Researching the Feasibility of a Task- Diagnostic Approach. ASLIB.

* Vanni, M. & Miller, K.J. 2001. Scoring Methods for Multi-Dimensional Measurement of Machine Translation Quality. Proceedings of the Workshop on MT Evaluation ``Who did what to whom?'' at MT Summit VIII, Santiago de Compostela, Spain, p. 21-28.

* Vanni, M. & Miller, K.J. 2002. Scaling the ISLE Framework: Use of Existing Corpus Resources for Validation of MT Metrics across Languages. Proceedings of the Third International Conference on Language Resources and Evaluation (LREC), Las Palmas de Gran Canaria, Spain, vol. 4, p. 1254-1262.

* Vanni, M. & Reeder, F., 2000. How Are You Doing? A Look at MT Evaluation. In White J.S. (Ed.): Envisioning Machine Translation in the Information Future, 4th Conference of the Association for Machine Translation in the Americas, AMTA 2000, Cuernavaca, Mexico, October 10-14, 2000. LNCS 1934, Springer, p.109-116.

Vasconcellos, M. 1994. MT Evaluation: Basis for Future Directions. Proceedings of a workshop sponsored by the National Science Foundation. Washington, D.C.: Association for Machine Translation.

Vasconcellos, M. (ed.) 1988. Technology as Translation Strategy. American Translators Association Scholarly Monograph Series, Vol. II, State University of New York at Binghamton (SUNY).

Vincent, D. 1985. Reading Tests in the Classroom, An Introduction. Windsor: NFER-Nelson.

Vogel, S., Niessen, S. & Ney, H. 2000. Automatic Extrapolation of Human Assessment of Translation Quality. In Maegaard, B., ed., Proceedings of the Workshop on Machine Translation Evaluation at LREC-2000. Athens, Greece.

Volk, M. 1997. Probing the lexicon in Evaluating Commercial MT Systems. ACL-EACL-97

Volk, M. Probing the lexicon in evaluating commercial MT systems. (AMTA-94?)

Voss, C. & Van Ess-Dykema, C. 2000. When is an Embedded MT System "Good Enough" for Filtering? In Van Ess-Dykema, C., Voss, C., & Reeder, F., eds. Proceedings of the Workshop of Embedded Machine Translation Systems, ANLP/NAACL-2000. Association for Computational Linguistics, Seattle, Washington.

Wakao, T., & Helmreich, S. 1993. Translation of Metonymy in an Interlingual MT System. In Proceedings of the Pacific Association for Computational Linguistics (PACLING), Vancouver.

Warwick, S. 1987. An overview of post-ALPAC developments. In M. King, ed., Machine Translation Today: The State of the Art, Proceedings of the Third Lugano Tutorial 1984. Edinburgh University Press.

Watters, P.A. & Patel, M. 1998. The iterative semantic processing paradigm: A dynamical systems metaphor for machine translation. Technical Report C/TR 98-05, Department of Computing, Macquarie University, Australia. Electronic version also available from: http://www.comp.mq.edu.au/~pwatters/ctr-9805.pdf

Way 1991. A Practical Developer-Oriented Evaluation of Two MT Systems, Working Papers in Language Processing, 26.

Wehrli, E. 1998. Translating Idioms. Proceedings of ACL/COLING-98, Montreal, Quebec, Canada.

White, J. 1995. Approaches to Black Box Evaluation. In Proceedings of the MT Summit, Luxembourg.

White, J. 1997. Single measure machine translation evaluation. MS.

White, J. 1998. Evaluation of Machine Translation. A Tutorial. AMTA-98.

* White, J. 2000. Toward an Automated, Task-Based MT Evaluation Strategy. In Maegaard, B., ed., Proceedings of the Workshop on Machine Translation Evaluation at LREC-2000. Athens, Greece.

White, J. & O'Connell, T. 1996. Adaptation of the DARPA machine translation evaluation paradigm to end-to-end systems. Proceedings of AMTA-96.

White, J. & Taylor, K. 1998. A Task-oriented metric for machine translation. Proceedings of the First Language Resources and Evaluation Conference. Granada, Spain.

White, J. S. 1998. Methodology Research for MT Evaluation: The MT Proficiency Scale. FIDUL Report

* White, J., & O'Connell, T. 1994. The ARPA MT evaluation methodologies: evolution, lessons, and future approaches. Proceedings of the 1994 Conference, Association for Machine Translation in the Americas.

White, J., Doyon, J. & Talbott, S. 2000. Task Tolerance of MT Output in Integrated Text Processes. In Van Ess-Dykema, C., Voss, C., & Reeder, F., eds. Proceedings of the Workshop of Embedded Machine Translation Systems, ANLP/NAACL-2000. Association for Computational Linguistics, Seattle, Washington.

White, J., Hirschman, L., Mariani, J., Martin, A., Paroubek, P., Rajman, M. & Sundheim, B. 1999. Evaluation and Assessment Techniques. In Hovy, E., Ide, N., Frederking, R., Mariani, J. Zampolli, A. (eds.), Multilingual Information Management: Current Levels and Future Abilities. Report for National Science Foundation. http://www.cs.cmu.edu/~ref/mlim/index.html

White, J. et al. 1992-94. ARPA Workshops on Machine Translation. Series of 4 workshops on comparative evaluation. PRC Inc., McLean, V.

White, M., Cardie, C., Han, C., Kim, N., Lavoie, B., Palmer, M., Rambow, O., & Yoon, J. 2000. Towards Translingual Information Access using Portable Information Extraction. In Van Ess-Dykema, C., Voss, C., & Reeder, F., eds. Proceedings of the Workshop of Embedded Machine Translation Systems, ANLP/NAACL-2000. Association for Computational Linguistics, Seattle, WA, USA, pp. 31-37.

Whittaker, S. & Stenton, P. 1989. User studies and the design of Natural Language Systems. Proceedings of the Fourth Conference of the European Chapter of ACL, (EACL-89), Manchester, pp. 116 - 123.

Whittaker, S. & Walker, M. 1989. Comparing two user-oriented database query languages: A field study, Technical report HPL-ISC-89-060, Hewlett Packard Laboratories, Bristol.

Wilks, Y. 1992. Systran: It obviously works, but how much can it be improved? Newton, J. 1992. Computers in Translation: A Practical Appraisal. Routledge, London.

Wilks, Y., Fass, D., McDonald, J., Plate, T., & Slator, B. 1989. A Tractable Machine Dictionary as a Resource for Computational Semantics. In Boguraev & Briscoe (eds.), Computational Lexicography for Natural Language Processing. Lawrence-Erlbaum.

Wilks, Y. and LATSEC Inc. 1979. Comparative Translation Quality Analysis. Final Report. Contract F33657-77-C-0695, LATSEC Inc. La Jolla, California.

Wojcik, R. H., Harrison, P. & Bremer, J. 1993. Using bracketed parses to evaluate a grammar checking application. Proceedings of ACL93.

Woods, W. 1973. Progress in NLU - an application to lunar geology, AFIPS Conference Proceedings 42, pp. 441-450.

* XX. 1985. Trial of the Weidner Computer-Assisted Translation System, Supply and Services Canada, Bureau of Management Consulting. Project No 5-5462. Report, October 1985.

Yamauchi, S. 1999. A Method of Evaluation of the Quality of Translated Text. In Proceedings of MT-Summit VII.

Yang, J. & Lang, E. 1998. SYSTRAN on AltaVista: A User Study of Real-Time MT on the Internet. AMTA-98

Yokoyama, S. 1992. Toward a Systematic Evaluation of Machine Translation: from the Viewpoint of Natural Language Processing. Proceedings of the International Symposium on Natural Language Understanding and AI as a Part of International Symposia on Information Sciences (ISKIT '92).

Yokoyama, S. 1993. Evaluation Method of Machine Translation: From the Viewpoint of Natural Language Processing. In Proceedings of MT Summit IV.

Yokoyama, S. 1994. Collection and Classification of Sentences Difficult to Machine Translate (in Japanese). Information Processing Society of Japan (IPSJ). SIG NLP NL101-5.

Yokoyama, S. 1994. Machine Translation and Evaluation of Japanese Sentences Difficult to Translate. (in Japanese). Information Processing Society of Japan (IPSJ). NL101-6.

Yokoyama, S., Kumano, A., Matsudaira, M., Shirokizawa, Y., Kawagoe, M., Kodama, S., Kashioka, H., Ehara, T., Miyazawa, S., Nakajima, Y. 1999. Quantitative Evaluation of Machine translation using Two-way MT. In Proceedings of MT Summit-VII.

Zajac, R., Helmreich, S., & Megerdoomian, K. 2000. Black-Box / Glass-Box Evaluation in Shiraz. In Maegaard, B., ed., Proceedings of the Workshop on Machine Translation Evaluation at LREC-2000. Athens, Greece.

Zanettin, F. 1998. Bilingual Comparable Corpora and the Training of Translators. In Laviosa, S., ed., L'Approche Basie sur le Corpus / The Corpus-based Approach. Special Issue of META 43(4): 616-630.

Last modified: Mon Aug 11 12:25:45 MET DST 2003