Well-formedness

Definition
Degree to which the output respects the reference rules of the target language at the specified linguistic level.
Metrics

Percentage of phenomena correctly treated.

List of error types.

Average string edit distance per sentence or for all tokens in the text.

References

Flanagan, 1994. (See also the LOGOS error list in the same AMTA proceedings).

Loffler-Laurian, 1983 (in French).

See also Arnold et al, eds., 1993 ('Machine Translation' 1993 vol. 8:1-2, special issue on evaluation).

Notes
We include here only the four most critical categories of error typically made by MT systems, though very often a more detailed classification is used. For example, SYSTRAN uses at least seven types of errors to rank the quality of the output: segmentation / tokenization, morphological analysis, homograph analysis, syntactic analysis, target language word selection, target language morphology, target language word order' target language grammar. All these errors are rated for severity. The severity ranges from "cosmetic" to "serious" when, for example, the meaning of original word/phrase is completely lost (L. Gerber, personal communication).

View or add comments (186)