
- Office
- Information Sciences Institute 940
- Mailing address
- USC Information Sciences Institute
4676 Admiralty Way, Suite 1001
Marina del Rey, CA 90292
- Phone
- (O) +1-310-448-9341
- (F) +1-310-822-0751
- E-mail
- my last name @isi.edu
Computer Scientist (ISI)
Research Assistant Professor (USC Computer Science)
I am interested in statistical machine translation, particularly
applications of synchronous grammars to statistical MT. During my PhD
work at the University of Pennsylvania under Aravind Joshi, I worked
on tree-adjoining grammars and related formalisms; stochastic TAG
parsing models; synchronous grammars; and formal grammars applied to
RNA and protein structure prediction. Curriculum
vitae
Current and recent activities:
Selected publications
Machine translation
- Online large-margin training of syntactic and structural translation features. With Yuval Marton and Philip Resnik, 2008. In Proc. EMNLP. [PDF]
- Decomposability of translation metrics for improved evaluation and efficient algorithms. With Steve DeNeefe, Yee Seng Chan, and Hwee Tou Ng, 2008. In Proc. EMNLP. [PDF]
- Hierarchical phrase-based translation. 2007.
Computational Linguistics 33(2):201–228.
[PDF] Expands on the ACL 2005 paper
below with a description, including experimental results, of the
decoding algorithm ("cube pruning").
- Forest rescoring: faster decoding with integrated language
models. Liang Huang and David Chiang, 2007. In Proc. ACL. [PDF]
- Word sense disambiguation improves statistical machine
translation. Yee Seng Chan, Hwee Tou Ng, and David Chiang, 2007. In
Proc. ACL. [PDF]
- A hierarchical phrase-based model for statistical machine translation. 2005. In Proc. ACL, pages 263–270. Best paper award. [PDF]
Statistical parsing
- The hidden TAG model: synchronous grammars for parsing
resource-poor languages. 2006. In Proc. TAG+8, pages 1–8. With Owen Rambow. [PDF]
- Parsing Arabic dialects. 2006. In Proc. EACL. With
Mona Diab, Nizar Habash, Owen Rambow, and Safiullah Shareef. [PDF]
- Better k-best parsing. 2005. In Proc. IWPT. Liang Huang and David Chiang. [PDF]
- Recovering latent information in treebanks. 2002. With Daniel M. Bikel. In Proc. COLING 2002, pages 183–189. [PS] [PDF]
- Statistical parsing with an automatically extracted tree adjoining grammar. 2003. In Data Oriented Parsing, CSLI Publications, pages 299–316. Combines and updates results from two papers, from ACL 2000 and the colocated Chinese workshop. [PS] [PDF]
Biological sequence analysis
- Computational linguistics: a new tool for exploring biopolymer
structures and statistical mechanics. 2007. Ken A. Dill, Adam Lucas,
Julia Hockenmaier, Liang Huang, David Chiang, and Aravind K.
Joshi. Polymer 48:4289–4300.
- Grammatical representations of macromolecular
structure. 2006. J. Computational Biology 13(5):1077–1100. With Aravind
K. Joshi and David B. Searls.
- A grammatical theory for the conformational changes of simple
helix bundles. 2006. J. Computational Biology 13(1):21–42. With Aravind
K. Joshi and Ken A. Dill.
Formal grammars and their applications
- An introduction to synchronous grammars: part of a tutorial given at ACL 2006 with Kevin Knight. Notes: [PDF] Slides: [PDF]
- Some remarks on an extension of synchronous TAG. With William Schuler and Mark Dras. 2000. In Proc. TAG+5, pages 61–66. [PS] [PDF]
- The weak generative capacity of linear tree adjoining grammars. 2006. In Proc. TAG+8. [PDF]
- Evaluation of Grammar Formalisms for Applications to Natural
Language Processing and Biological Sequence Analysis, PhD
dissertation, University of Pennsylvania, 2004. Rubinoff Award. [PDF, 600k] Slides from defense: [PDF]
- Uses and abuses of intersected languages. 2004. In Proc. TAG+7, pages 9–15. [PS] [PDF]
- Putting some weakly context-free formalisms in order. 2002. In Proc. TAG+6, pages 11–18. [PS] [PDF]
Software and other stuff
- Hiero, hierarchical phrase-based machine translation
system.
- Stochastic TIG parser with configuration files for English and Chinese
- Treep, a tree processor to be used as a front-end for training statistical parsers.
- TeX/MetaPost stuff: some tree-drawing packages