Artificial Intelligence

More than the sum of their parts: Translating idioms without destroying their meaning

Thursday, September 05, 2019, 11:00am - 12:00pm PDTiCal
CR# 689
This event is open to the public.
NL Seminar
Denis Emelin and Prince Wang

Abstract: Translating idioms is hard. As low-frequency linguistic events with a non-compositional meaning, idiomatic expressions are at odds with contemporary neural machine translation methods. Accordingly, the literal translation of idiomatic phrases which fails to preserve their semantic content represents an often observed failure case in NMT models. To facilitate future work on idiom translation, the current project sets out to compile a large-coverage, multilingual corpus of parallel sentences containing idiomatic expressions, augmented with their respective monolingual definitions. With this resource in hand, we next aim to propose models which can effectively exploit idiom definitions to avoid literal translation errors. As part of the evaluation of the constructed corpus, we demonstrate that idioms continue to pose a veritable challenge for state-of-the-art NMT models.

Bio: Denis is a second-year PhD candidate at the University of Edinburgh, advised by Dr. Rico Sennrich. His background is in machine translation, natural language understanding, and linguistics.

« Return to Events