Prof. David Chiang (firstname.lastname@example.org)
Prof. Liang Huang (email@example.com)
Shu Cai (firstname.lastname@example.org)
Ashish Vaswani (email@example.com)
Computers process massive quantities of information every day in the form of human language, yet machine understanding of human language remains one of the great challenges of computer science. How can advances in computing technology enable more intelligent processing of all this language data? Will computers ever be able to use this data to learn language like humans do? This course provides a systematic introduction to statistical models of human language, with particular attention to the structures of human language that inform them and the structured learning and inference algorithms that drive them. This is a lecture course, not a seminar course, but aims to cover both fundamental and cutting-edge research issues.We will also teach Python in the first two lectures, which you will use in the programming assignments.
Audience This graduate course is intended for:
Prerequisites Students are expected to be proficient in programming, basic algorithms and data structures (e.g., dynamic programming, graph traversal and shortest paths, hashtables and priority queues), discrete math, and basic probability theory. Although the formal prerequisite for the course is currently CSCI 561, students who meet the preceding requirements should email the instructors to waive the prerequisite. Undergraduate students are welcome as well. If you are not sure if you have enough background for this course, you are welcome to sit in on the first few lectures. There will be a Quiz 0 in the third lecture to test if you qualify for this course.
Related Courses This course is part of USC's curriculum in natural language processing. There is a sister course, Natural Language Processing (CSCI 544), offered in the Spring semester. You can take these two in either order.
CourseworkStudents will experiment with existing NLP software toolkits and write their own programs. Grades will be based on:
All assignments and the project will be due at the beginning of class (11am) on the due date. Late assignments will be accepted with a 30% penalty up to one week late. No exceptions can be made except for a grave reason. Submissions will be made using Moodle at https://pedtek.isi.edu/moodle.
Students are expected to submit only their own work for homework assignments. They may discuss the assignments with one another but may not collaborate with or copy from one another.
Statement on Academic Integrity USC seeks to maintain an optimal learning environment. General principles of academic honesty include the concept of respect for the intellectual property of others, the expectation that individual work will be submitted unless otherwise allowed by an instructor, and the obligations both to protect one's own academic work from misuse by others as well as to avoid using another's work as one's own. All students are expected to understand and abide by these principles. Scampus, the Student Guidebook, contains the Student Conduct Code in Section 11.00, while the recommended sanctions are located in Appendix A: http://www.usc.edu/dept/publications/SCAMPUS/gov. Students will be referred to the Office of Student Judicial Affairs and Community Standards for further review, should there be any suspicion of academic dishonesty. The Review process can be found at: http://www.usc.edu/student-affairs/SJACS.
|Tue, Aug 23||What is natural language processing for? What is the empirical approach to NLP? Rudiments of linguistic theory. Introduction to the Python programming language with application to text processing.||Chiang/Huang|
|Thu, Aug 25||Chiang||HW0: nothing to turn in.
|Unit One: Sequence Models (10 lectures)|
|Tue, Aug 30||Basic automata theory. Finite-state acceptors and transducers.||Chiang||Quiz 0|
|Thu, Sep 1||HW1 out: FSTs|
|Tue, Sep 6||Probability theory and estimation. Weighted FSTs. The noisy-channel model.|
|Thu, Sep 8
|Tue, Sep 13||Language modeling and using language models to resolve ambiguities across a wide range of applications.
Training and testing data.
The sparse data problem; smoothing with held-out data.
|Huang||J&M chapter 4 [slides]||HW1 due|
|Thu, Sep 15||HW2 out: LMs|
|Tue, Sep 20|
|Thu, Sep 22||String transformations. A simple framework for stochastically modeling many types of string transformations, such as: tagging word sequences with parts of speech, cleaning up misspelled word sequences, automatically marking-up names, organizations, and locations in raw text, etc. Estimating parameter values from annotated data. Viterbi algorithm.||Huang||[slides]|
|Tue, Sep 27||HW2 due|
|Thu, Sep 29||Quiz 1
HW3 out: Tagging/Transliteration
|Unit Two: Tree Models (7 lectures)|
|Tue, Oct 4||Context-Free Grammars (CFGs)||Chiang|
|Thu, Oct 6||Probabilistic Context-Free Grammars (PCFGs)
|Tue, Oct 11||Binarization; Unary Rules||HW3 due|
|Thu, Oct 13||Parsing as Deduction and Intersection
Forests and Hypergraphs
Dynamic Programming on Hypergraphs
From Viterbi to CKY
From Dijkstra to Knuth 77
|HW4 out: Parsing|
|Tue, Oct 18|
|Thu, Oct 20|
|Tue, Oct 25||Synchronous Grammars and Machine Translation||Quiz 2|
|Unit Three: Learning Language (8 lectures)|
|Thu, Oct 27||Unsupervised Learning||Huang||[slides]||HW4 due
Project guidelines out
|Tue, Nov 1||EM Algorithm: slow version||HW5 out: EM|
|Thu, Nov 3||EM: fast version (Forward-Backward)|
|Tue, Nov 8||Theory behind EM: converging to local maximum||Initial project proposal due|
|Thu, Nov 10
|Discriminative Learning: The Perceptron||Chiang|
|Tue, Nov 15||Conditional Random Fields (CRFs)||HW5 due|
|Thu, Nov 17||Max-margin methods||Final project proposal due
HW6 out: Discriminative Learning
|Tue, Nov 22||More on Discriminative Training||Quiz 3|
|Thu, Nov 24||Thanksgiving – no class|
|Tue, Nov 29||Project interim presentations||You|
|Thu, Dec 1||Project interim presentations||You||HW6 due|
|Wed, Dec 14||Final projects due|
|[LY90]||The estimation of stochastic context-free grammars using the inside-outside algorithm.|
|K. Lari and S. J. Young.|
|Computer Speech and Language, 1990: 4:35-56.|
|[SSP95]||Principles and Implementation of Deductive Parsing.|
|Stuart Shieber, Yves Schabes, and Fernando Pereira.|
|Journal of Logic Programming, 1995: 24 (1-2): 3-36.|
|[H08]||Tutorial on Semirings and Advanced Dynamic Programming in NLP (slides).|
|Tutorial given at COLING 2008 and NAACL 2009.|
|[KS09]||Tutorial on Writing Systems, Transliteration, and Decipherment (slides).|
|Kevin Knight and Richard Sproat|
|Tutorial given at NAACL 2009.|
|[CG98]||An Empirical Study of Smoothing Techniques for Language Modeling|
|Stanley F. Chen and Joshua Goodman|
|Technical Report TR 10-98, Computer Science Group, Harvard University.||[DLR77]||Maximum Likelihood from Incomplete Data via the EM Algorithm|
|A.P. Dempster, N.M. Laird, and D.B. Rubin|
|Journal of the Royal Statistical Society, Vol. 39, No. 1, 1977|
|[C97]||The EM Algorithm|
|[D02]||The Expectation Maximization Algorithm|
|Technical Report, 2002|
|[B06]||The Expectation Maximization Algorithm -- A short Tutorial.|