CSCI 562: Statistical Natural Language Processing, Fall 2010
University of Southern California

(a.k.a. "Empirical Methods in Natural Language Processing".)
(There is a sister course, CSCI 544, "Natural Language Processing", offered in the Spring semesters. You can take these two in either order.)

Time: Tu/Th, 11:00am-12:20pm
Location: KAP 145
Type: Lectures
Instructors: Prof. Liang Huang (
Prof. Kevin Knight (
Teaching Assistant: Jason Riesa (

Course Description

This graduate course covers the basics of statistical methods for processing human language, intended for:

  1. students who want to understand current natural language processing (NLP) research,
  2. students interested in tools for building NLP applications,
  3. machine learning students looking for large-scale application domains, and
  4. students seeking experience with probabilistic methods that can be applied to a range of AI problems.
Prerequisites: CSCI 561 (AI), and students are expected to be proficient in programming and discrete math, and familiar with basic algorithms and data structures (e.g., dynamic programming, graph traversal and shortest paths, hashtables and priority queues). Minimal familiarity with probability theory is also assumed.

If you have not taken CSCI 561 and want to enroll with our CSCI 562: email us if you are a PhD student or if you received an A in CSCI 570 (Algorithms).

Undergraduates welcome! (talk to us to get special permission).

Academic Integrity

At the USC, Academic Integrity is taken seriously. Be sure to review:


Class Notes (these will be handed to you in the second week).
Optional text: Jurafsky and Martin, Speech and Language Processing (2nd edi.), Prentice Hall, 2008.

Requirements and policies

Students will experiment with existing NLP software toolkits and write their own programs. There will be six programming assignments, four quizzes, and a research project. Grades will be based on:

Students are expected to submit only their own work for homework assignments. They may discuss the assignments with one another but may not collaborate with or copy from one another.

All assignments and the project will be due at the beginning of class (11am) on the due date. Late assignments will be accepted with a 30% penalty up to one week late. No exceptions can be made except for a grave reason.

Syllabus (Subject to change!)

Date Topic Instructor Slides/Readings Assignments
Tue, 8/24 Introduction: real-world applications of NLP.
Python tutorial.
Huang intro slides
python slides
Thu, 8/26 Python for text processing.
Basic linguistic theory.
Huang HW0: nothing to turn in. word.out
Unit One: Sequence Models (10 lectures)
Tue, Aug 31 Basic automata theory. Finite-state acceptors and transducers. Knight Quiz 0: 20 mins Solutions
Thu, Sep 2 HW1 out: FSTs
Tue, Sep 7 Probability theory and estimations; WFSTs; Noisy-channel Knight
Thu, Sep 9
Tue, Sep 14 Language modeling and using language models to resolve ambiguities across a wide range of applications.
Training and testing data.
The sparse data problem; smoothing with held-out data.
Knight J&M Ch. 4 HW1 due
Thu, Sep 16 HW2 out: LMs
Tue, Sep 21
Thu, Sep 23 String transformations. A simple framework for stochastically modeling many types of string transformations, such as: tagging word sequences with parts of speech, cleaning up misspelled word sequences, automatically marking-up names, organizations, and locations in raw text, etc. Estimating parameter values from annotated data. Viterbi algorithm. Knight HW2 due
Tue, Sep 28 HW3 out: Tagging or Transliteration
Thu, Sep 30 Quiz 1: 15 min
Unit Two: Tree Models (7 lectures)
Tue, Oct 5 Context-Free Grammars (CFGs) Huang slides
Thu, Oct 7 Probabilistic Context-Free Grammars (PCFGs)
CKY algorithm
HW3 due
Tue, Oct 12 Binarization; Unary Rules (lecture by D. Chiang)
Thu, Oct 14 Parsing as Deduction and Intersection
Forest and Hypergraphs
Dynamic Programming on Hypergraphs
From Viterbi to CKY
From Dijkstra to Knuth 77
HW4 out: Parsing
Tue, Oct 19
Thu, Oct 21
Tue, Oct 26 Synchronous Grammars and Machine Translation Quiz 2: 15 min
Unit Three: Machine Learning (8 lectures)
Thu, Oct 28 Unsupervised Learning: The EM Algorithm Knight HW4 due (Sun Oct 31)
Project Guidelines out
Tue, Nov 2 Forward-Backward Algorithm HW5 out: EM
Thu, Nov 4 Word-Alignment
Tue, Nov 9 Inside-Outside Algorithm Initial Project Proposal due
Thu, Nov 11 Discriminative Learning: The Perceptron Huang slides
Tue, Nov 16 Conditional Random Fields (CRFs) HW5 due
HW6 out: Discriminative Learning
Thu, Nov 18 Max-margin methods Final proposal due
Tue, Nov 23 More on Discriminative Training Quiz 3: 15 min
Thu, Nov 25 Thanksgiving! No class.
Tue, Nov 30 Project Midway Presentations Knight/Huang
Thu, Dec 2 Project Midway Presentations Knight/Huang HW6 due

Suggested Readings

[LY 90] The estimation of stochastic context-free grammars using the inside-outside algorithm.
K. Lari and S. J. Young.
Computer Speech and Language, 1990: 4:35-56.
[SSP 95] Principles and Implementation of Deductive Parsing.
Stuart Shieber, Yves Schabes, and Fernando Pereira.
Journal of Logic Programming, 1995: 24 (1-2): 3-36.
[H08] Tutorial on Semirings and Advanced Dynamic Programming in NLP (slides).
Liang Huang
Tutorial given at COLING 2008 and NAACL 2009.
[KS09] Tutorial on Writing Systems, Transliteration, and Decipherment (slides).
Kevin Knight and Richard Sproat
Tutorial given at NAACL 2009.
[CG98] An Empirical Study of Smoothing Techniques for Language Modeling
Stanley F. Chen and Joshua Goodman
Technical Report TR 10-98, Computer Science Group, Harvard University.
[DLR77] Maximum Likelihood from Incomplete Data via the EM Algorithm
A.P. Dempster, N.M. Laird, and D.B. Rubin
Journal of the Royal Statistical Society, Vol. 39, No. 1, 1977
[C97] The EM Algorithm
Michael Collins
[D02] The Expectation Maximization Algorithm
Frank Dellaert
Technical Report, 2002
[B06] The Expectation Maximization Algorithm -- A short Tutorial.
Sean Borman

Last year's version is cached here.