University of Southern California

Introducing a new syntax form and a new decoding algorithm to syntax-based translation

When:
Thursday, February 7, 2013, 01:30 pm - 2:30 pm
Where:
11th Floor Conf. Room (#1135)
Speaker:
Yang Feng
Description:

Abstract:

The state-of-the-art syntax-based models are widely used in statistical machine translation nowadays. They can be divided into two categories : formally syntax-based (e.g. SCFG) and linguistically syntax-based models, which have their own strength. We propose  a compromise model -- a chunk-to-string translation model, to combine the merits of both categories. On the NIST 2008 English-Chinese translation task,  the chunk-to-string system outperforms the hierarchical phrase-based system and the tree-to-string system by about 1 and 2 BLEU points, respectively. For linguistically syntax-based models, specifically the tree-to-string model, we present a left-to-right decoding algorithm which employs a bottom-up parsing strategy and dynamic future cost estimation. Experiment results show that this new algorithm can get better performance at the same level of speed than the top-down left-to-right decoding algorithm and the post-order traversal decoding algorithm.

Bio: 

Dr. Yang Feng is a Research Associate in University of Sheffield now. She got her ph.D. from  Institute of Computing Technology, Chinese Academy of Sciences in 2011, supervised by Prof. Qun Liu. Dr. Yang Feng mainly works on statistical machine translation and machine learning.  She has published several papers on ACL/EMNLP/COLING about decoding algorithms, translation model, system combination etc. Now she is focusing on non-parametric Bayesian models which try to give a new framework of statistical machine translation. 

Related papers:

chunk-to-string left-to-right decoding

Webcast:

http://webcasterms1.isi.edu/mediasite/Viewer/?peid=e571a09057764fd08b597d890de2fbe41d

View Event Calendar »