Subjective evaluation of the degree to which the information contained in the original text has been reproduced without distortion in the translation (Van Slype).
Measurement of the correctness of the information transferred from the source language to the target language (Halliday in Van Slype's Critical Report).
Carroll (in Van Slype's Critical Report): Rating of sentences read out of context on a 9-point scale.
Crook and Bishop (in Van Slype's Critical Report): Rating on a 25-point scale.
Halliday (in Van Slype's Critical Report): Assessment of the correctness of the information transferred.
Leavitt (in Van Slype's Critical Report): Rating of text units read on a 9-point scale.
Miller and Beebe-Center (in Van Slype's Critical Report): Rating of a text on a 100-point scale.
Miller and Beebe-Center (in Van Slype's Critical Report): Shannon measurement of the quality of information transferred.
Sinaiko (in Van Slype's Critical Report): Re-translation.
Van Slype (in Van Slype's Critical Report): Rating of sentences read on a 4-point scale.
White and O'Connell (in DARPA 94): Rating of 'Adequacy' on a 5-point scale.
Bleu evaluation tool kit (in Papineni et al. 2001): Automatic n-gram comparison of translated sentences with one or more human reference translations.
Rank-order evaluation of MT system: correlation of automatically computed semantic and syntactic attributes of the MT output with human scores for adequacy and informativeness, and also fluency. Hartley and Rajman 2001 and 2002.
Automated word-error-rate evaluation (in Och, Tillmann and Ney, 1999).
Automated metric using head transducers (Alshawi et al, 2000).
loss of information (silence) - example: word not translated
interference (noise) - example: word added by the system
distortion from a combination of loss and interference - example: word badly translated
Detailed analysis of the fidelity of a translation is very difficult to carry out, since each sentence conveys not a single item of information or a series of elementary items of information, but rather a portion of message or a series of complex messages whose relative importance in the sentence is not easy to appreciate.
Some automated metrics assume a fidelity evaluation as a human ground truth, or are relevant to fidelity evaluation.