Jobs and Internships

Summer 2008 Internships, Natural Language Processing
USC/Information Sciences Institute

NOW CLOSED!!!! No longer accepting applications for 2008.

We are looking for interested and qualified students (graduate and undergraduate) to spend the summer working with ongoing research projects at USC/ISI on natural language processing, machine learning, statistical modeling, machine translation, automata, and other areas. Please click here for descriptions of some of these projects. These are paid internships. Prior experience in natural language processing is not necessarily required: we will get you interested! Good programming skill is required for the types of projects we do.

Internships will be available for a three month period during the summer of 2008. We arrange a time for joint tutorials at the beginning of the summer, and we have presentations at the end.

How to apply:

  • Send email to nlsummer@isi.edu before February 29, 2008.
  • Include a resume and statement, and let us know what project(s) you would be interested in.
  • A brief recommendation letter will be useful -- recommenders should send letters directly to nlsummer@isi.edu.
  • The USC/ISI Natural Language Processing Summer 2008 Internships Committee is reviewing all applications received on or before February 29, 2008.

    We plan to make decisions over the week of March 1-7, 2008.

    We have summer research projects in the following areas:

    1. Statistical Machine Translation

    Translating human languages (e.g., Chinese to English) is a longstanding challenge for computer science. We are developing and applying statistical algorithms to this problem, extracting large amounts of relevant translation knowledge automatically from bilingual text (e.g., United Nations documents). We face many interesting challenges in this quest to improve significantly on the quality of automatic translation. Our research group is investigating statistical transformation models that are sensitive to different kinds of linguistic structure.

    2. New Automata for Natural Language

    Many recent advances in natural language processing are due to widespread use of finite-state string automata. These automata probabilistically transform input strings into output strings, and they can be quickly assembled to tackle new jobs via generic mathematical operations. However, they are weak for applications like machine translation, which involves re-ordering and syntax-sensitive operations. Tree automata and synchronous grammars are attractive alternative building blocks for new natural language systems. Fortunately, there is an extensive mathematical theory associated with these models and devices. We are exploring many fascinating open problems in both theory and practice of tree automata and synchronous grammars for NLP applications.

    3. Decipherment

    Two of the first applications for computers were (1) cracking codes, and (2) translating human languages. Statistical properties of human language were essential for early breakthroughs in decipherment, and are now essential in machine translation. If we treat foreign languages as an encrypted form of English (a kind of word substitution/transposition cipher), then we can apply code-cracking technology to translation. It is therefore possible (though not yet established!), that we can learn translation dictionaries from large quantities of non-parallel linguistic data.


    Stimulating Research Environment

    USC/ISI's natural language research environment includes weekly seminars and reading groups, opportunities for teaching and advising, an active program for summer students, large quantities of linguistic resources, and 108-processor supercomputing cluster dedicated to natural language research at USC/ISI, plus desktop workstations.

    For more information about NLP research activities at USC/ISI, please click here. ISI is an academic research institute that is part of USC's School of Engineering; many USC/ISI scientists hold research faculty positions in the computer science department. Click here to see a range of artificial intelligence research projects in ISI's Intelligent Systems Division. USC/ISI is located in Marina del Rey on the Southern California coast, an excellent location convenient to beaches, restaurants, boating, bike paths, and shopping.


    Our summer program is well established! Past students include:

    2007: Michael Bloodgood (Delaware), Jennifer Gillenwater (Rice University), Carmen Heger (Dresden), Wei Ho (Princeton).

    2006: Joseph Turian (NYU), Chenhai Xi (Pitt), Victoria Fossum (Michigan), Liang Huang (UPenn), Jason Riesa (JHU), Oana-Diana Postolache (Saarland).

    2005: Victoria Fossum (Michigan), Mark Hopkins (UCLA), Liang Huang (UPenn), Behrang Mohit (Pitt), Preslav Nakov (Berkeley), Jason Riesa (JHU), Hao Zhang (Rochester).

    2004: Madhur Ambastha (Rochester), Michel Galley (Columbia), David Kauchak (UCSD).

    2003: Michel Galley (Columbia), Mark Hopkins (UCLA), Beata Klebanov (Hebrew University), Ana-Maria Popescu (University of Washington), Lara Taylor (UCSD).

    2002: Chris Ackerman (USC), Emil Ettelaie (USC), Yuling Hsueh (USC), John Lee (Waterloo/MIT), Bo Pang (Cornell)

    2001: Abdessamad Echihabi (USC), Hal Daume III (CMU), Michael Laszlo (Waterloo), Dragos Stefan Munteanu (Iowa), Rebecca Rees (BYU), Radu Soricut (Iowa)

    1994-2000: Estibaliz Amorrortu, Vasileios Hatzivassiloglou (Columbia), Michael Jahr (Stanford), Larry Kite (USC), Magdalena Romera (USC), Maki Watanabe (USC).


    We always aim to solve interesting and novel scientific problems, and to publish the results in the best conferences. Sample papers that have come from past student internships:

    "Scalable Inference and Training of Context-Rich Syntactic Models", (M. Galley, J. Graehl, K. Knight, D. Marcu, S. DeNeefe, W. Wang, and I. Thayer), Proceedings of the ACL Conference, poster session (ACL-2006).

    "Synchronous Binarization for Machine Translation", (H. Zhang, L. Huang, D. Gildea, K. Knight), Proceedings of the NAACL-HLT Conference (2006).

    "Statistical Syntax-Directed Translation with Extended Domain of Locality", (L. Huang, K. Knight, A. Joshi), Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA-06).

    "Text Simplification for Information Seeking Applications", (B. Beigman Klebanov, K. Knight, D. Marcu), In: On the Move to Meaningful Internet Systems, eds. R. Meersman and Z. Tari, Lecture Notes in Computer Science (3290), Springer-Verlag, 2004.

    "What's in a Translation Rule?", (M. Galley, M. Hopkins, K. Knight, D. Marcu), Proceedings of the North American ACL Conference (NAACL-2004).

    "Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences" (B. Pang, K. Knight, and D. Marcu), Proceedings of the North American ACL Conference (NAACL-2003).

    "Using a Large Monolingual Corpus to Improve Translation Accuracy" (R. Soricut, K. Knight, and D. Marcu), Proceedings of the 6th Association for Machine Translation in the Americas Conference (AMTA-2002).

    "Processing Comparable Corpora With Bilingual Suffix Trees" (D. Munteanu and D. Marcu). Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2002), Philadelphia, PA.

    "A Noisy-Channel Model for Document Compression" (H. Daume III and D. Marcu). Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-2002), Philadelphia, PA.

    "An Unsupervised Approach to Recognizing Discourse Relations" (D. Marcu and A. Echihabi). Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-2002), Philadelphia, PA.

    "Fast Decoding and Optimal Decoding for Machine Translation" (U. Germann, M. Jahr, K. Knight, D. Marcu, and K. Yamada), Proc. of the Conference of the Association for Computational Linguistics (ACL-2001). ACL Best Paper award.

    "An Empirical Study in Multilingual Natural Language Generation: What Should a Text Planner Do?" (D. Marcu, L. Carlson, and M. Watanabe). The 1st International Conference on Natural Language Generation INLG'2000, Mitzpe Ramon, Israel, 2000.

    "Experiments in Constructing a Corpus of Discourse Trees" (D. Marcu, E. Amorrortu, and M. Romera). ACL'99 Workshop on Standards and Tools for Discourse Tagging, Univ. Maryland, 1999.

    "Two-Level, Many-Paths Generation," (K. Knight and V. Hatzivassiloglou), Proc. of the Conference of the Association for Computational Linguistics (ACL-1995).