(a.k.a. "Empirical Methods in Natural Language Processing".)
(There is a sister course,
CS 544, "Natural Language Processing", offered in the Spring semesters.
You can take these two in either order.)
Time: | Tu/Th, 11:00am-12:20pm |
Location: | GFS 223 |
Type: | Lectures |
Instructors: | David Chiang (chiang@isi.edu)
Liang Huang (lhuang@isi.edu) |
Teaching Assistant: | Oana Nicolov (oana@isi.edu) |
This graduate course covers the basics of statistical methods for processing human language, intended for:
Students will experiment with existing NLP software toolkits and write their own programs. There will be six programming assignments and a research project; there will be no exams. Grades will be based on:
Students are expected to submit only their own work for homework assignments. They may discuss the assignments with one another but may not collaborate with or copy from one another.
All assignments and the project will be due at the beginning of class (11am) on the due date. Late assignments will be accepted with a 20% penalty at the beginning of the following class; with a 40% penalty up to a week after the due date. No exceptions can be made except for a grave reason.
Date | Topic | Instructor | Slides/Readings | Assignments |
---|---|---|---|---|
Tue, Aug 25 | Introduction: real-world applications of NLP | slides | - | |
Thu, Aug 27 | Basic linguistic theory. Words, parts-of-speech, ambiguity, morphology, phrase structure, word senses, speech. Text corpora and processing tools. | Notes: p1-6 J & M: Ch. 1 |
HW 0: nothing to turn in. | |
Unit One: Sequence models | ||||
Tue, Sep 1 | Basic automata theory. Finite-state acceptors and intersection. NFA determinization and DFA minimization. Regular Expressions. | Notes: p8-13 J & M: Ch. 2 slides (Tu-Th) |
- | |
Thu, Sep 3 | Finite-state transducers and composition. Applications in morphology, text-to-sound conversion, word alignment. | Notes: p14-19 J & M: Ch. 3 |
HW 1 out | |
Tue, Sep 8 | Basic probability theory. Conditional probability, Bayes' rule, estimating parameter values from data, priors, building generative stochastic models. | Notes: p20-27
slides (Tu-Th) |
- | |
Thu, Sep 10 | The noisy-channel framework. Probabilistic finite-state acceptors and transducers. | Notes: p28-34 | HW 1 due | |
Tue, Sep 15 | Language modeling and using language models to resolve ambiguities across a wide range of applications. | Notes: p36-53 J & M: Ch. 4 Adv. Smoothing: [CG98] |
- | |
Thu, Sep 17 | - | |||
Tue, Sep 22 | Training and testing data. The sparse data problem. Smoothing with held-out data. | HW 2 out (LM) |
||
Thu, Sep 24 | - | |||
Tue, Sep 29 | String transformations. A simple framework for stochastically modeling many types of string transformations, such as: tagging word sequences with parts of speech, cleaning up misspelled word sequences, automatically marking-up names, organizations, and locations in raw text, etc. Estimating parameter values from annotated data. Various Dynamic Programming Algorithms for sequences. |
Notes: p54-65 J & M: Ch. 5 [slides]
Semirings and DP: [H08 (part 1)] |
HW 2 due | |
Thu, Oct 1 | HW 3 out (transliteration). [data] |
|||
Tue, Oct 6 | - | |||
Thu, Oct 8 | Problems involving incomplete data. Hidden parameters. | Notes: p66-90 J & M: Ch. 6 [DLR77][C97][D02][B06] |
HW 3 due | |
Tue, Oct 13 | The Expectation-Maximization algorithm | - | ||
Thu, Oct 15 | HW 4 out (EM / F-B) |
|||
Tue, Oct 20 | The Forward-Backward algorithm | - | ||
Thu, Oct 22 | - | |||
Unit Two: Tree models | ||||
Tue, Oct 27 | Limitations of finite-state machines. Natural language syntax. Context-free grammars. | J & M: Ch. 12. [slides (5 lectures)] | Project Guidelines out. | |
Thu, Oct 29 | A Toy Grammar for English; Probabilistic CFGs; Parsing: the CKY algorithm. | J & M: Ch. 13-14. | - | |
Tue, Nov 3 | CKY Parsing Examples; Ambiguities in Chart. Chomsky Normal Form and Binarization. |
J & M: Ch. 14. |
HW 4 due | |
Thu, Nov 5 | Parsing as Deduction, Intersection, and Generation. Packed Forests and Hypergraphs. Parser Evaluation Metrics. | [SSP 95] [H08 (part 2)] J & M: Ch. 14.7 |
HW 5 out (parsing / forest). |
|
Tue, Nov 10 | The Inside-Outside Algorithm; PCFG re-estimation. | [LY 90] J & M: Ch. 14.3 |
Initial project proposal due | |
Thu, Nov 12 | Applications in Machine Translation:
|
[slides (2 lectures)] J & M: Ch. 25.10. |
- | |
Tue, Nov 17 | HW 5 due | |||
Unit Three: (Structured) Discriminative training | ||||
Thu, Nov 19 | Discriminative training: (Structured) Perceptron | [slides (3 lectures)] | Final project proposal due. Data preparation and baseline finished. | |
Tue, Nov 24 | Maximum-entropy models. Conditional random fields. Stochastic gradient ascent. | - | HW 6 out (discriminative tagging) |
|
Thu, Nov 26 | Thanksgiving - NO CLASS | |||
Tue, Dec 1 | Max-margin methods | - | - | |
Thu, Dec 3 | Research project presentations | - | HW 6 due | |
Wed, Dec 16 | - | - | - | Research project due by e-mail |
[LY 90] | The estimation of stochastic context-free grammars using the inside-outside algorithm. |
K. Lari and S. J. Young. | |
Computer Speech and Language, 1990: 4:35-56. | |
[SSP 95] | Principles and Implementation of Deductive Parsing. |
Stuart Shieber, Yves Schabes, and Fernando Pereira. | |
Journal of Logic Programming, 1995: 24 (1-2): 3-36. | |
[H08] | Tutorial on Semirings and Advanced Dynamic Programming in NLP (slides). |
Liang Huang | |
Tutorial given at COLING 2008 and NAACL 2009. | |
[KS09] | Tutorial on Writing Systems, Transliteration, and Decipherment (slides). |
Kevin Knight and Richard Sproat | |
Tutorial given at NAACL 2009. | |
[CG98] | An Empirical Study of Smoothing Techniques for Language Modeling |
Stanley F. Chen and Joshua Goodman | |
Technical Report TR 10-98, Computer Science Group, Harvard University. | [DLR77] | Maximum Likelihood from Incomplete Data via the EM Algorithm |
A.P. Dempster, N.M. Laird, and D.B. Rubin | |
Journal of the Royal Statistical Society, Vol. 39, No. 1, 1977 | |
[C97] | The EM Algorithm |
Michael Collins | |
[D02] | The Expectation Maximization Algorithm |
Frank Dellaert | |
Technical Report, 2002 |