CS 562: Statistical Natural Language Processing -- Fall 2009
University of Southern California

(a.k.a. "Empirical Methods in Natural Language Processing".)
(There is a sister course, CS 544, "Natural Language Processing", offered in the Spring semesters. You can take these two in either order.)

Time: Tu/Th, 11:00am-12:20pm
Location: GFS 223
Type: Lectures
Instructors: David Chiang (chiang@isi.edu)
Liang Huang (lhuang@isi.edu)
Teaching Assistant: Oana Nicolov (oana@isi.edu)

Course Description

This graduate course covers the basics of statistical methods for processing human language, intended for:

  1. students who want to understand current natural-language processing (NLP) research,
  2. students interested in tools for building NLP applications,
  3. machine-learning students looking for large-scale application domains, and
  4. students seeking experience with probabilistic methods that can be applied to a range of AI problems.
Prerequisite: CS561 or permission of instructor.


Class Notes (these will be handed to you in the first class).
Optional text: Jurafsky and Martin, Speech and Language Processing (2nd edi.), Prentice Hall, 2008.

Requirements and policies

Students will experiment with existing NLP software toolkits and write their own programs. There will be six programming assignments and a research project; there will be no exams. Grades will be based on:

Students are expected to submit only their own work for homework assignments. They may discuss the assignments with one another but may not collaborate with or copy from one another.

All assignments and the project will be due at the beginning of class (11am) on the due date. Late assignments will be accepted with a 20% penalty at the beginning of the following class; with a 40% penalty up to a week after the due date. No exceptions can be made except for a grave reason.

Syllabus (Subject to change!)

Date Topic Instructor Slides/Readings Assignments
Tue, Aug 25 Introduction: real-world applications of NLP
Chiang & Huang
slides -
Thu, Aug 27 Basic linguistic theory. Words, parts-of-speech, ambiguity, morphology, phrase structure, word senses, speech. Text corpora and processing tools.
Notes: p1-6
J & M: Ch. 1
HW 0: nothing to turn in.
Unit One: Sequence models
Tue, Sep 1 Basic automata theory. Finite-state acceptors and intersection. NFA determinization and DFA minimization. Regular Expressions.
Notes: p8-13
J & M: Ch. 2
slides (Tu-Th)
Thu, Sep 3 Finite-state transducers and composition. Applications in morphology, text-to-sound conversion, word alignment. Notes: p14-19
J & M: Ch. 3
HW 1 out
Tue, Sep 8 Basic probability theory. Conditional probability, Bayes' rule, estimating parameter values from data, priors, building generative stochastic models. Notes: p20-27
slides (Tu-Th)
Thu, Sep 10 The noisy-channel framework. Probabilistic finite-state acceptors and transducers. Notes: p28-34 HW 1 due
Tue, Sep 15 Language modeling and using language models to resolve ambiguities across a wide range of applications.
Notes: p36-53
J & M: Ch. 4

Adv. Smoothing: [CG98]

Thu, Sep 17 -
Tue, Sep 22 Training and testing data. The sparse data problem. Smoothing with held-out data. HW 2 out
Thu, Sep 24 -
Tue, Sep 29 String transformations. A simple framework for stochastically modeling many types of string transformations, such as: tagging word sequences with parts of speech, cleaning up misspelled word sequences, automatically marking-up names, organizations, and locations in raw text, etc. Estimating parameter values from annotated data. Various Dynamic Programming Algorithms for sequences.
Notes: p54-65
J & M: Ch. 5

Semirings and DP: [H08 (part 1)]
Writing Systems: [KS09]

HW 2 due
Thu, Oct 1 HW 3 out
Tue, Oct 6 -
Thu, Oct 8 Problems involving incomplete data. Hidden parameters.
Notes: p66-90
J & M: Ch. 6
HW 3 due
Tue, Oct 13 The Expectation-Maximization algorithm -
Thu, Oct 15 HW 4 out
(EM / F-B)
Tue, Oct 20 The Forward-Backward algorithm -
Thu, Oct 22 -
Unit Two: Tree models
Tue, Oct 27 Limitations of finite-state machines. Natural language syntax. Context-free grammars.
J & M: Ch. 12.
[slides (5 lectures)]
Project Guidelines out.
Thu, Oct 29 A Toy Grammar for English;
Probabilistic CFGs;
Parsing: the CKY algorithm.
J & M: Ch. 13-14. -
Tue, Nov 3 CKY Parsing Examples; Ambiguities in Chart.
Chomsky Normal Form and Binarization.
J & M: Ch. 14.
HW 4 due
Thu, Nov 5 Parsing as Deduction, Intersection, and Generation.
Packed Forests and Hypergraphs.
Parser Evaluation Metrics.
[SSP 95]
[H08 (part 2)]
J & M: Ch. 14.7
HW 5 out
(parsing / forest).
Tue, Nov 10 The Inside-Outside Algorithm; PCFG re-estimation. [LY 90]
J & M: Ch. 14.3
Initial project proposal due
Thu, Nov 12 Applications in Machine Translation:
  • Synchronous Grammars; Translation as Parsing;
  • Language Model Integration;
  • Tree Transducers and Tree-to-String Translation
[slides (2 lectures)]
J & M: Ch. 25.10.
Tue, Nov 17 HW 5 due
Unit Three: (Structured) Discriminative training
Thu, Nov 19 Discriminative training: (Structured) Perceptron
[slides (3 lectures)] Final project proposal due. Data preparation and baseline finished.
Tue, Nov 24 Maximum-entropy models. Conditional random fields. Stochastic gradient ascent. - HW 6 out
(discriminative tagging)
Thu, Nov 26 Thanksgiving - NO CLASS
Tue, Dec 1 Max-margin methods
- -
Thu, Dec 3 Research project presentations
- HW 6 due
Wed, Dec 16 - - - Research project due by e-mail

Extra Readings (not required, but recommended)

[LY 90] The estimation of stochastic context-free grammars using the inside-outside algorithm.
K. Lari and S. J. Young.
Computer Speech and Language, 1990: 4:35-56.
[SSP 95] Principles and Implementation of Deductive Parsing.
Stuart Shieber, Yves Schabes, and Fernando Pereira.
Journal of Logic Programming, 1995: 24 (1-2): 3-36.
[H08] Tutorial on Semirings and Advanced Dynamic Programming in NLP (slides).
Liang Huang
Tutorial given at COLING 2008 and NAACL 2009.
[KS09] Tutorial on Writing Systems, Transliteration, and Decipherment (slides).
Kevin Knight and Richard Sproat
Tutorial given at NAACL 2009.
[CG98] An Empirical Study of Smoothing Techniques for Language Modeling
Stanley F. Chen and Joshua Goodman
Technical Report TR 10-98, Computer Science Group, Harvard University.
[DLR77] Maximum Likelihood from Incomplete Data via the EM Algorithm
A.P. Dempster, N.M. Laird, and D.B. Rubin
Journal of the Royal Statistical Society, Vol. 39, No. 1, 1977
[C97] The EM Algorithm
Michael Collins
[D02] The Expectation Maximization Algorithm
Frank Dellaert
Technical Report, 2002

Last year's version is cached here.