|
Instructors: Profs. Kevin
Knight & Daniel Marcu |
|||
|
Class
Meeting Time: |
Tues
& Thurs 2:00pm – 3:20pm |
Class
Location: |
VHE217 |
This graduate course
covers the basics of statistical methods for processing human language,
intended for:
(1) students who want
to understand current natural-language processing (NLP) research,
(2) students interested in tools for building NLP applications,
(3) machine-learning students looking for large-scale application domains, and
(4) students seeking experience with probabilistic methods that can be applied
to a range of AI problems.
Students will
experiment with existing NLP software toolkits and write their own programs.
Grades will be based on five programming assignments (70% = 14% each) and a
final project (30%); there will be no midterm or final.
Office hours: TBA.
Aug
24
Overview.
Aug
26
Basic
linguistic theory. Words, parts-of-speech, ambiguity, morphology, phrase
structure, word senses, speech. Text corpora and processing tools.
Programming Assignment 0 (no credit) out Aug 26,
nothing to turn in.
Aug
31, Sept 2
Basic
automata theory. Finite-state acceptors and intersection. Finite-state transducers
and composition. Applications in morphology and text-to-sound conversion.
Context-free grammars and parsing.
Programming
Assignment 1 out Sept 2, due beginning of class Sept 9.
Topic:
Finite-state acceptors for natural language.
Sept
7, 9
Basic
probability theory. Conditional probability, Bayes rule, estimating parameter
values from data, building generative stochastic models, the noisy-channel
framework. Probabilistic finite-state acceptors and transducers.
Sept
14, 16, 21, 23
Language
modeling. Estimating the frequency of English strings. Using language models to
resolve ambiguities across a wide range of applications. Training and testing
data. The sparse data problem. Smoothing with held-out data.
Programming
Assignment 2 out Sept 16, due beginning of class Sept 28.
Topic: Weighted
finite-state acceptors for language modeling.
Sept
28, 30
Guest
lectures, TBA.
Oct
5, 7, 12, 14
String
transformations. A simple framework for stochastically modeling many types of
string transformations, such as: tagging word sequences with parts of speech,
cleaning up misspelled word sequences, automatically marking-up names,
organizations, and locations in raw text, etc. Estimating parameter values from
annotated data.
Programming
Assignment 3 out Oct 7, due beginning of class Oct 19.
Topic: Weighted
finite-state transducers for string transformation.
Oct
19, 21, 26, 28
Hidden
parameters. Problems involving incomplete data, such as: elementary
cryptanalysis, transliteration, machine translation, NL interfaces, deciphering
ancient scripts. The EM algorithm.
Programming
Assignment 4 out Oct 21, due beginning of class Nov 2.
Topic:
Unsupervised learning of natural language structure.
Nov
2, 4, 9, 11
Syntactic
structures, context-free grammars, parsing, lexicalized grammars, syntax-based
language models, the inside-outside algorithm.
Programming
Assignment 5 out Nov 9, due beginning of class Nov 18.
Topic: Parsing.
Initial project
proposal due beginning of class Nov 4.
Final project
proposal due Nov 11.
Project
presentations in class Nov 30, Dec 2.
Final project write-ups due Dec 12 by email.
Nov
16, 18, 23
Tree
automata and applications.
Nov
30, Dec 2
Project
presentations by students in class.
Final project
write-ups due Dec 12 by email.