Translation memory incorporated

Definition

A translation memory is a multilingual text archive containing multilingual texts, allowing storage and retrieval of aligned multilingual text segments against various search conditions.

Different translation memories differ as to the information stored along with the raw texts and the retrieval methods. This definition does not restrict translation memory to what is currently available in systems on the market.

A translation memory is a collection of multilingual correspondences with optional control information stored with each correspondence. This characterization abstracts away from the actual manner of storing the correspondences (one-one, one-many, or many-many).

The control information can include information about the source text of the correspondence, its date, author, company, subject domain. This information may be used in ranking matches.

When a translation memory is used to support a given direction of translation, we can identify one segment of each correspondence as the (stored) source segment and another one as the (stored) target segment. A given query with a current source segment may return a number of correspondences with matching stored source segments. (EAGLES).

Metrics

Metric: The developer should provide a description of the incorporation of translation memory and how it fits into the MT process. -- Method: Provision of supporting documentation -- Measurement: Yes or no: Does the documentation describe the role and function of translation memory?

Metric: Size of parallel corpus -- Method: Developer report of parallel corpus size -- Measurement: Size can be reported in terms of bytes, sentence pairs or words per language

Metric: Form and number of text segments -- Method: Specification by developer of form, granularity and number of text segments -- Measurement: 1) Confirmation by test or examination of form and granularity of text segments. 2) Count of number of text segments. 3) Test suites may be designed and executed in which case the measurement is percentage of test suite cases accepted.

Metric: Type of control information permitted -- Method: Specification by developer of type of control information permitted. -- Measurement: Inspection of specifications. Number of specified control settings which work.

Metric: Source language matching technique -- Method: Specification of source language matching technique and parameters. -- Measurement: 1) Yes or no: Is specification provided? 2) Use test suite to test coverage and flexibility of source language matching algorithm.

Metric: Ease of extending parallel corpus. -- Method: Test corpus / test suite application -- Measurement: Percentage of test items that can be added

References

EAGLES Evaluation Standard for Translation Memory

Notes

The incorporation of translation memory into traditional machine translation platforms is a relatively new and under-represented field of study, although a few examples do exist (AMTA-2002)


View or add comments (410)