Publications
Phrase alignment confidence for statistical machine translation.
Abstract
The performance of phrase-based statistical machine translation (SMT) systems is crucially dependent on the quality of the extracted phrase pairs, which is in turn a function of word alignment quality. Data sparsity, an inherent problem in SMT even with large training corpora, often has an adverse impact on the reliability of the extracted phrase translation pairs. In this paper, we present a novel feature based on bootstrap resampling of the training corpus, termed phrase alignment confidence, that measures the goodness of a phrase translation pair. We integrate this feature within a phrase-based SMT system and show an improvement of 1.7% BLEU and 4.4% METEOR over a baseline English-to-Pashto (E2P) SMT system that does not use any measure of phrase pair quality. We then show that the proposed measure compares well to an existing indicator of phrase pair reliability, the lexical smoothing probability. We also demonstrate that combining the two measures leads to a further improvement of 0.4% BLEU and 0.3% METEOR on the E2P system. Commensurate translation improvements are obtained on automatic speech recognition (ASR) transcripts of the source speech utterances.
- Date
- September 26, 2010
- Authors
- Sankaranarayanan Ananthakrishnan, Rohit Prasad, Prem Natarajan
- Conference
- INTERSPEECH
- Pages
- 2878-2881