Abstract:
Word alignment, the process of inferring the implicit links between words across two languages, serves as an integral piece of the puzzle of learning linguistic translation knowledge. It enables us to acquire automatically from data the rules that govern the transformation of words, phrases, and syntactic structures from one language to another. Though in this work we focus on word alignment for machine translation, it is used in many other related tasks such as bilingual dictionary induction, cross-lingual information retrieval, and distilling parallel text from within noisy data.
We advance the state-of-the-art in search, modeling, and learning of alignments and show empirically that, when taken together, these contributions significantly improve the output quality of large-scale statistical machine translation. The work we describe is widely and generically applicable due to its linguistic independence from the languages involved, and flexible and modular enough to support and encode arbitrary language-specific ideas from varied sources.