An unsupervised boosting technique for refiningword alignment

Abstract

Translation rules extracted from automatic word alignment form the basis of statistical machine translation (SMT) systems. An unsupervised expectation-maximization (EM) algorithm is typically used to obtain a word alignment from parallel corpora. Being statistically-driven, the alignments produced by this technique are often erroneous. In this paper, we propose an unsupervised boosting strategy for refining automatic word alignment with the goal of improving SMT performance. The proposed approach results in fewer unaligned words, a significant reduction in the number of extracted translation phrase pairs, a corresponding improvement in SMT decoding speed, and a consistent improvement in translation accuracy, as measured by BLEU, across multiple language pairs and test sets. The reduction in storage and processing requirements coupled with improved accuracy make the proposed technique ideally …

Date: December 12, 2010
Authors: Sankaranarayanan Ananthakrishnan, Rohit Prasad, Prem Natarajan
Conference: 2010 IEEE Spoken Language Technology Workshop
Pages: 177-182
Publisher: IEEE

Information Sciences Institute

Publications

An unsupervised boosting technique for refiningword alignment

Abstract