Summer 2017 Internships in Natural Language Processing
We are looking for interested and qualified students (graduate and undergraduate) to spend the summer working with ongoing research projects at USC/ISI on natural language processing, machine learning, statistical modeling, machine translation, automata, and other areas.
These are paid internships. They will be available for a three month period during the summer of 2017.
Good programming skills are required, but prior experience in natural language processing is not necessarily required. We will provide tutorials on relevant topics at the beginning of the summer.
- 2017 Jan 20 Applications due
- 2017 Feb
1020 (approx.) First acceptance notifications
- 2017 Jun 1 Internships begin
How to Apply
Please follow the instructions below. Applications that do not conform will be rejected without review.
Submit your application no later than January 20, 2017, by email to email@example.com with the subject: "Application: <applicant name>". Your application should include:
- A CV or resume, as a PDF file.
- A statement of purpose, as a PDF file. It should indicate what project areas you are interested in.
- The name and email address of one or more people whom you have asked to write you a recommendation letter.
Recommenders should send letters no later than January 20, 2017, directly to firstname.lastname@example.org with subject: "Recommendation: <applicant name>". Letters should be in PDF or plain text format.
Projects Areas of Interest
- Neural Machine Translation. We have built a powerful, efficient set of tools for machine translation with deep recurrent networks and are ready for experiments to extend its accuracy and capabilities.
- Tools for All Languages. Today's automatic parsers, translators, and pronunciation dictionaries cover a tiny fraction of the world's languages. Can we use general knowledge of how language works to extend the reach of natural language tools?
- Decipherment. Code-cracking and machine translation have an intimately tied history. The first job of proto-computers was to crack military codes, and the idea arose soon afterwards of treating foreign language as a code for English. We have several investigations planned, inspired by results we have recently reported: (1) cracking of the Copiale cipher (see also here and here), (2) discoveries concerning the Voynich manuscript, and (3) the training of a statistical machine translation system without the need for parallel data.
- Creative Language. In the not-too-distant future, stories, poems, songs, and advertisements will be written by machines and human-machine collaborations. We are starting down this path now (for example, see here and here), and there are many research avenues to pursue.
- Language Theory. Recurrent neural networks are showing great promise for language modeling and string transformation tasks. But the theoretical properties of most commonly-used network structures are still little understood. What classes of weighted languages can they capture? What classes of transformations lie outside their capabilities?
Summer internship projects are supervised by Kevin Knight and Jonathan May, and interns also interact and collaborate closely with the rest of ISI's Natural Language Group. Our group's research environment includes weekly seminars and reading groups, opportunities for teaching and advising, an active program for summer students, large quantities of linguistic resources, and a 2000-processor supercomputing cluster completely dedicated to natural language research at USC/ISI.
USC/ISI is an academic research institute that is part of USC's Viterbi School of Engineering; many USC/ISI scientists hold research faculty positions in the Computer Science Department. The Natural Language Group is part of USC/ISI's Intelligent Systems Division which carries out a wide range of artificial intelligence research.
USC/ISI is located in Marina del Rey on the Southern California coast, an excellent location convenient to beaches, restaurants, boating, bike paths, and shopping. Note: we are not located on the main campus of USC, which is near downtown LA.
Our summer program is well established! Past students are listed below. Several students (marked *) interned twice, and several (marked ^) joined ISI later as a PhD student, visiting PhD student, or research scientist.
- 2016: Nada Aldarrab (USC), Angeliki Laziradou (U. Trento), Xiang Li (U. Chicago), Sebastian Mielke (Dresden Univ. Technology), Ke Tran (U. Amsterdam)
- 2015: Callum O'Shaughnessy (Queens University), Sudha Rao (Maryland), Wenduan Xu (Cambridge), Barret Zoph (USC)
- 2014: Julian Schamper (Aachen), Eunsol Choi (Washington), Allen Schmaltz (Harvard), Matic Horvat (Cambridge)
- 2013: Daniel Bauer* (Columbia), Fabienne Braune (Stuttgart), Jackie Lee (MIT), Elliot Meyerson (Wesleyan), Arvind Neelakantan (Columbia/UMass), Malte Nuhn (Aachen)
- 2012: Jacob Andreas (Columbia), Daniel Bauer (Columbia), Karl Moritz Hermann (Oxford), Bevan Jones (Edinburgh/Macquarrie), Nathan Schneider (CMU), Ada Wan (CUNY).
- 2011: Licheng Fang (Rochester), Sravana Reddy* (Chicago), Xuchen Yao (JHU).
- 2010: Yoav Goldberg (Ben Gurion, Israel), Ann Irvine (Hopkins), Sravana Reddy (Chicago), Alexander "Sasha" Rush (MIT).
- 2009: Michael Auli (University of Edinburgh), Paramveer Dhillon (Penn), Erica Greene^ (Haverford), Adam Pauls (UC Berkeley)
- 2008: Amittai Axelrod (University of Washington), John DeNero (UC Berkeley), Kyle Gorman (Penn Linguistics), Catalin Tirnauca (Universitat Rovira i Virgili)
- 2007: Michael Bloodgood (Delaware), Jennifer Gillenwater (Rice University), Carmen Heger (Dresden), Wei Ho (Princeton).
- 2006: Joseph Turian (NYU), Chenhai Xi (Pitt), Victoria Fossum*^ (Michigan), Liang Huang*^ (Penn), Jason Riesa*^ (JHU), Oana-Diana Postolache^ (Saarland).
- 2005: Victoria Fossum (Michigan), Mark Hopkins* (UCLA), Liang Huang (Penn), Behrang Mohit (Pitt), Preslav Nakov (Berkeley), Jason Riesa (JHU), Hao Zhang (Rochester).
- 2004: Madhur Ambastha (Rochester), Michel Galley* (Columbia), David Kauchak (UCSD).
- 2003: Michel Galley (Columbia), Mark Hopkins (UCLA), Beata Klebanov (Hebrew University), Ana-Maria Popescu (University of Washington), Lara Taylor (UCSD).
- 2002: Chris Ackerman (USC), Emil Ettelaie (USC), Yuling Hsueh (USC), John Lee (Waterloo/MIT), Bo Pang (Cornell)
- 2001: Abdessamad Echihabi (USC), Hal Daume III^ (CMU), Michael Laszlo (Waterloo), Dragos Stefan Munteanu^ (Iowa), Rebecca Rees (BYU), Radu Soricut^ (Iowa)
- 1994-2000: Estibaliz Amorrortu, Vasileios Hatzivassiloglou (Columbia), Michael Jahr (Stanford), Larry Kite (USC), Magdalena Romera (USC), Maki Watanabe (USC).
We always aim to solve interesting and novel scientific problems, and to publish the results in the best conferences. Sample papers that have come from past student internships:
- "Unsupervised Neural Hidden Markov Models" (K. Tran, Y. Bisk, A. Vaswani, D. Marcu, and K. Knight), Proceedings of the EMNLP Workshop on Structured Prediction, 2016.
- "Multi-Source Neural Translation" (B. Zoph and K. Knight), Proceedings of NAACL 2016.
- "Extracting Structured Scholarly Information from the Machine Translation Literature" (E. Choi, M. Horvat, J. May, K. Knight, D. Marcu), Proceedings of LREC 2016.
- "Cipher Type Detection" (Malte Nuhn and Kevin Knight), Proceedings of EMNLP 2014.
- "Mapping between English Strings and Reentrant Semantic Graphs" (F. Braune, D. Bauer, and K. Knight), Proceedings of LREC 2014.
- "Parsing Graphs with Hyperedge Replacement Grammars" (D. Chiang, J. Andreas, D. Bauer, K.-M. Hermann, B. Jones and K. Knight), Proceedings of ACL 2013.
- "Learning Whom to Trust with MACE" (D. Hovy, T. Berg-Kirkpatrick, A. Vaswani, and E. Hovy), Proceedings of NAACL 2013.
- "Semantics-Based Machine Translation with Hyperedge Replacement Grammars" (B. Jones, J. Andreas, D. Bauer, K.-M. Hermann, K. Knight), Proceedings of COLING 2012.
- "Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation" (J. Riesa, A. Irvine, D. Marcu), Proceedings of EMNLP 2011.
- "Language-independent parsing with empty elements" (S. Cai, D. Chiang, Y. Goldberg), Proceedings of ACL 2011.
- "Automatic Analysis of Rhythmic Poetry with Applications to Generation and Translation" (E. Greene, T. Bodrumlu, K. Knight), Proceedings of EMNLP 2010.
- "Efficient optimization of an MDL-inspired objective function for unsupervised part-of-speech tagging" (A. Vaswani, A. Pauls, D. Chiang), Proceedings of ACL 2010.
- "Unsupervised Syntactic Alignment with Inversion Transduction Grammars" (A. Pauls, D. Klein, D. Chiang, K. Knight), Proceedings of NAACL 2010.
- "Bayesian Inference for Finite-State Transducers" (D. Chiang, J. Graehl, K. Knight, A. Pauls, S. Ravi), Proceedings of NAACL 2010.
- "Binarization of Synchronous Context-Free Grammars" (L. Huang, H. Zhang, D. Gildea, K. Knight), Computational Linguistics, 2009.
- "Fast Consensus Decoding over Translation Forests" (J. DeNero, D. Chiang, and K. Knight). Proceedings of ACL 2009.
- "Forest Rescoring: Faster Decoding with Integrated Language Models" (L. Huang and D. Chiang), Proceedings of ACL 2007.
- "Scalable Inference and Training of Context-Rich Syntactic Models" (M. Galley, J. Graehl, K. Knight, D. Marcu, S. DeNeefe, W. Wang, and I. Thayer), Proceedings of ACL 2006, poster session.
- "Synchronous Binarization for Machine Translation" (H. Zhang, L. Huang, D. Gildea, K. Knight), Proceedings of NAACL 2006.
- "Statistical Syntax-Directed Translation with Extended Domain of Locality" (L. Huang, K. Knight, A. Joshi), Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA-06).
- "Building an English-Iraqi Arabic Machine Translation System for Spoken Utterances with Limited Resources" (J. Riesa, B. Mohit, K. Knight, D. Marcu), Proceedings of Interspeech 2006.
- "Text Simplification for Information Seeking Applications" (B. Beigman Klebanov, K. Knight, D. Marcu), In: On the Move to Meaningful Internet Systems, eds. R. Meersman and Z. Tari, Lecture Notes in Computer Science (3290), Springer-Verlag, 2004.
- "What's in a Translation Rule?" (M. Galley, M. Hopkins, K. Knight, D. Marcu), Proceedings of NAACL 2004.
- "Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences" (B. Pang, K. Knight, and D. Marcu), Proceedings of NAACL 2003.
- "Using a Large Monolingual Corpus to Improve Translation Accuracy" (R. Soricut, K. Knight, and D. Marcu), Proceedings of the 6th Association for Machine Translation in the Americas Conference (AMTA-2002).
- "Processing Comparable Corpora With Bilingual Suffix Trees" (D. Munteanu and D. Marcu), Proceedings of EMNLP 2002.
- "A Noisy-Channel Model for Document Compression" (H. Daume III and D. Marcu), Proceedings of ACL 2002.
- "An Unsupervised Approach to Recognizing Discourse Relations" (D. Marcu and A. Echihabi), Proceedings of ACL 2002.
- "Fast Decoding and Optimal Decoding for Machine Translation" (U. Germann, M. Jahr, K. Knight, D. Marcu, and K. Yamada), Proceedings of ACL 2001. ACL Best Paper award.
- "An Empirical Study in Multilingual Natural Language Generation: What Should a Text Planner Do?" (D. Marcu, L. Carlson, and M. Watanabe), The 1st International Conference on Natural Language Generation INLG'2000, Mitzpe Ramon, Israel, 2000.
- "Experiments in Constructing a Corpus of Discourse Trees" (D. Marcu, E. Amorrortu, and M. Romera), ACL'99 Workshop on Standards and Tools for Discourse Tagging, Univ. Maryland, 1999.
- "Two-Level, Many-Paths Generation," (K. Knight and V. Hatzivassiloglou), Proceedings of ACL 1995.
Frequently Asked Questions
Q: Is the salary enough for a decent life in westside LA? What will the exact salary be?
A: Yes, of course! Our internship compensation is competitive with industrial internships. Housing is generally expensive in this area (because it's safe, beautiful, and close to the ocean), but definitely affordable with the salary we offer. The exact amount is yet to be determined (and will be stated on the offer letter), but again it will be enough for a decent life for 3 months.
Q: During the internship, can I go to a conference for a week or so? Or a short vacation?
A: Conferences are definitely OK especially when you have a paper there, but in any case there should be at least 12 weeks of work here (otherwise it's hard to get anything sizable done). We generally discourage vacations over a week during the internship.
Q: Can I keep working on the projects after going back to my own school?
A: In general yes, especially when you are writing up a paper on the topic. Most likely you will be logging in remotely to work on our machines.
Q: Can I survive without a car here?
A: For three months, definitely yes. Many of our past interns did not own a car while here, and they either bike or take a bus to ISI. Unlike other parts of LA, we do have reliable buslines systems here in this area. The famous Santa Monica "big blue" buses serve UCLA, Santa Monica, Palms, Venice, ISI, and LAX, and Culver City buslines serve Culver City, Venice, ISI, and LAX. In addition, LA metrolink buses take you to downtown LA. Furthermore, LAX is very close to ISI (10 minutes by bus) so air travel is convenient.
Q: Are international students eligible to apply?
A: Yes, we do take on international students (see past interns list). For international students currently studying in the United States (F-1 holders), we will help you get an OPT or CPT status on top of your F-1, which is generally straighforward. CPT is largely preferred because it takes much shorter time to get approved but requires you to register for (at least) one unit in the summer. OPT usually takes 2-3 months to get approved, but you don't need to register any unit. For details about CPT/OPT, please consult your school's international student office. For international students currently studying outside the United States, we will help you get a J-1 visa.
Q: I have other plans in the summer, so can I intern during Fall or Spring?
A: No, we only take summer interns (and they have to start by June 1), but they can extend their stays into the Fall semester if needed.