We are looking for interested and qualified students (graduate and undergraduate) to spend the summer working with ongoing research projects at USC/ISI on natural language processing, machine learning, statistical modeling, machine translation, automata, and other areas. Please click here for descriptions of some of these projects. These are paid internships. Prior experience in natural language processing is not necessarily required: we will get you interested! Good programming skill is required for the types of projects we do.
Internships will be available for a three month period during the summer of 2009. We arrange a time for joint tutorials at the beginning of the summer, and we have presentations at the end.
How to apply:
We plan to make decisions over the week of March 2-8, 2009.
We have summer research projects in the following areas:
1. Statistical Machine Translation
Translating human languages (e.g., Chinese to English) is a longstanding challenge for computer science. We are developing statistical algorithms to tackle this problem, extracting large amounts of relevant translation knowledge automatically from bilingual text (e.g., United Nations documents). Our research group is investigating the combination of tree-transformation models and machine-learning techniques in order to exploit syntactic structure to improve automatic translation quality.
2. Decipherment
Two of the first applications for computers were (1) cracking codes, and (2) translating human languages. Statistical properties of human language were essential for early breakthroughs in decipherment, and are now essential in machine translation. If we treat foreign languages as an encrypted form of English (a kind of word substitution/transposition cipher), then we can apply code-cracking technology to translation. It may therefore be possible to extract translation dictionaries from large quantities of non-parallel linguistic data.
USC/ISI's natural language research environment includes weekly seminars and reading groups, opportunities for teaching and advising, an active program for summer students, large quantities of linguistic resources, and 108-processor supercomputing cluster dedicated to natural language research at USC/ISI, plus desktop workstations.
For more information about NLP research activities at USC/ISI, please click here. ISI is an academic research institute that is part of USC's School of Engineering; many USC/ISI scientists hold research faculty positions in the computer science department. Click here to see a range of artificial intelligence research projects in ISI's Intelligent Systems Division. USC/ISI is located in Marina del Rey on the Southern California coast, an excellent location convenient to beaches, restaurants, boating, bike paths, and shopping.
2008: Amittai Axelrod (University of Washington), John DeNero (Berkeley), Kyle Gorman (UPenn), Catalin Tirnauca (Universitat Rovira i Virgili)
2007: Michael Bloodgood (Delaware), Jennifer Gillenwater (Rice University), Carmen Heger (Dresden), Wei Ho (Princeton).
2006: Joseph Turian (NYU), Chenhai Xi (Pitt), Victoria Fossum (Michigan), Liang Huang (UPenn), Jason Riesa (JHU), Oana-Diana Postolache (Saarland).
2005: Victoria Fossum (Michigan), Mark Hopkins (UCLA), Liang Huang (UPenn), Behrang Mohit (Pitt), Preslav Nakov (Berkeley), Jason Riesa (JHU), Hao Zhang (Rochester).
2004: Madhur Ambastha (Rochester), Michel Galley (Columbia), David Kauchak (UCSD).
2003: Michel Galley (Columbia), Mark Hopkins (UCLA), Beata Klebanov (Hebrew University), Ana-Maria Popescu (University of Washington), Lara Taylor (UCSD).
2002: Chris Ackerman (USC), Emil Ettelaie (USC), Yuling Hsueh (USC), John Lee (Waterloo/MIT), Bo Pang (Cornell)
2001: Abdessamad Echihabi (USC), Hal Daume III (CMU), Michael Laszlo (Waterloo), Dragos Stefan Munteanu (Iowa), Rebecca Rees (BYU), Radu Soricut (Iowa)
1994-2000: Estibaliz Amorrortu, Vasileios Hatzivassiloglou (Columbia), Michael Jahr (Stanford), Larry Kite (USC), Magdalena Romera (USC), Maki Watanabe (USC).
"Scalable Inference and Training of Context-Rich Syntactic Models", (M. Galley, J. Graehl, K. Knight, D. Marcu, S. DeNeefe, W. Wang, and I. Thayer), Proceedings of the ACL Conference, poster session (ACL-2006).
"Synchronous Binarization for Machine Translation", (H. Zhang, L. Huang, D. Gildea, K. Knight), Proceedings of the NAACL-HLT Conference (2006).
"Statistical Syntax-Directed Translation with Extended Domain of Locality", (L. Huang, K. Knight, A. Joshi), Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA-06).
"Text Simplification for Information Seeking Applications", (B. Beigman Klebanov, K. Knight, D. Marcu), In: On the Move to Meaningful Internet Systems, eds. R. Meersman and Z. Tari, Lecture Notes in Computer Science (3290), Springer-Verlag, 2004.
"What's in a Translation Rule?", (M. Galley, M. Hopkins, K. Knight, D. Marcu), Proceedings of the North American ACL Conference (NAACL-2004).
"Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences" (B. Pang, K. Knight, and D. Marcu), Proceedings of the North American ACL Conference (NAACL-2003).
"Using a Large Monolingual Corpus to Improve Translation Accuracy" (R. Soricut, K. Knight, and D. Marcu), Proceedings of the 6th Association for Machine Translation in the Americas Conference (AMTA-2002).
"Processing Comparable Corpora With Bilingual Suffix Trees" (D. Munteanu and D. Marcu). Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2002), Philadelphia, PA.
"A Noisy-Channel Model for Document Compression" (H. Daume III and D. Marcu). Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-2002), Philadelphia, PA.
"An Unsupervised Approach to Recognizing Discourse Relations" (D. Marcu and A. Echihabi). Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-2002), Philadelphia, PA.
"Fast Decoding and Optimal Decoding for Machine Translation" (U. Germann, M. Jahr, K. Knight, D. Marcu, and K. Yamada), Proc. of the Conference of the Association for Computational Linguistics (ACL-2001). ACL Best Paper award.
"An Empirical Study in Multilingual Natural Language Generation: What Should a Text Planner Do?" (D. Marcu, L. Carlson, and M. Watanabe). The 1st International Conference on Natural Language Generation INLG'2000, Mitzpe Ramon, Israel, 2000.
"Experiments in Constructing a Corpus of Discourse Trees" (D. Marcu, E. Amorrortu, and M. Romera). ACL'99 Workshop on Standards and Tools for Discourse Tagging, Univ. Maryland, 1999.
"Two-Level, Many-Paths Generation," (K. Knight and V. Hatzivassiloglou), Proc. of the Conference of the Association for Computational Linguistics (ACL-1995).