Cross-language / Contrastive suitability

Definition

Qualities of the translation that must be evaluated on the basis of both the source language and the output of the system in the target language.

Suitability of source-to-target mapping to a particular task.

Coverage of cross language phenomena concerns the ability of the system to deal satisfactorily with the commonly recognized differences between the source and the target languages, with or without taking into account the presence or absence of these phenomena in any particular corpus.

Metrics

By use of a set of test patterns - these should be in the form of simple source language patterns that are theory neutral, that is, descriptive in pedagogical terms rather than in terms of a particular syntactic theory whose principles could obscure the issue. For a number of European languages, such test suites are available as a product of the TSNLP project, which focused mainly on syntactic phenomena, and theDiET project. The Japanese MT research community has also produced such test suites as part of the Jeida project.

Commercial MT companies should also all have similar test suites: Logos and Systran both have test suites of this type. IBM has relevant test suites that were presented to the research community at LREC2000 in Athens and ACL-2001 in Toulouse.

In order to arrive at a measurement, test suites of this type can be used, with either a correct/incorrect verdict for each sentence in the test suite, a percentage correct for each sentence (as long as the notion of "percentage correct is well-defined), or a (3 to 10 point) scale of correctness for each sentence. The agregate measure could be the percentage of sentences correct, the percentage of linguistic phenomena covered, or an aggregate measure of linguistic phenomena covered, weighted for phenomena important to the language pair and task of interest.

It is also possible to use word error rate as a measurement, along the lines of automatic scoring of insertions, deletions, and substitutions relative to a gold standard (Niessen, Och, Leusch, and Ney, 2000), or as described in Vanni & Miller, 2002) and (Vanni & Miller, 2001).

References

Dorr, 1990a; Dorr, 1990b.

Notes

Whereas TSNLP and some other test suites of this type focus mainly on syntactic phenomena, test suites for general cross-language coverage should ideally address other cross-language phenomena as well: idioms, lexical and conflational divergences, etc.

Each commercial MT company should have such a test suite, which they may use for regression testing or for testing of improvements to the system. Ideally, in order to test systems from developers A and B, a test set covering the union of the phenomena covered by the two test suites should be used.


View or add comments (502)