Instructors: Prof. Kevin
Knight and Prof. Daniel Marcu Teaching Assistant:
Jonathan May knight@isi.edu,
marcu@isi.edu, jonmay@isi.edu |
|||
Class
Meeting Time: |
Tues
& Thurs 11am-12:20pm |
Class
Location: |
WPH
B30 |
This graduate course
covers the basics of statistical methods for processing human language,
intended for:
(1) students who want
to understand current natural-language processing (NLP) research,
(2) students interested in tools for building NLP applications,
(3) machine-learning students looking for large-scale application domains, and
(4) students seeking experience with probabilistic methods that can be applied
to a range of AI problems.
Students will
experiment with existing NLP software toolkits and write their own programs.
Grades will be based on six programming assignments (72% = 12% each) and a final
project (28%); there will be no midterm or final.
Office hours: TBA.
Course software:
·
·
Tiburon tree automata
toolkit (http://www.isi.edu/publications/licensed-sw/tiburon/)
Aug
22
Example
state-of-the-art natural language application: Machine Translation.
Aug
24
Basic
linguistic theory. Words, parts-of-speech, ambiguity, morphology, phrase
structure, word senses, speech. Text corpora and processing tools.
Programming Assignment 0 (no credit) out Aug 24,
nothing to turn in.
Aug
29, 31
Basic
automata theory. Finite-state acceptors and intersection. Finite-state
transducers and composition. Applications in morphology and text-to-sound
conversion. Context-free grammars and parsing.
Programming
Assignment 1 out Aug 31, due beginning of class Sept 7.
Topic:
Finite-state acceptors for natural language.
Sept
5, 7
Basic
probability theory. Conditional probability, Bayes rule, estimating parameter
values from data, building generative stochastic models, the noisy-channel
framework. Probabilistic finite-state acceptors and transducers.
Sept
12, 14, 19, 21
Language
modeling. Estimating the frequency of English strings. Using language models to
resolve ambiguities across a wide range of applications. Training and testing
data. The sparse data problem. Smoothing with held-out data.
Programming
Assignment 2 out Sept 14, due beginning of class Sept 21.
Topic: Weighted
finite-state acceptors for language modeling.
Sept
26, 28; Oct 3, 5
String
transformations. A simple framework for stochastically modeling many types of
string transformations, such as: tagging word sequences with parts of speech,
cleaning up misspelled word sequences, automatically marking-up names,
organizations, and locations in raw text, etc. Estimating parameter values from
annotated data.
Programming
Assignment 3 out Sept 28, due beginning of class Oct 5.
Topic: Weighted
finite-state transducers for string transformation.
Oct
10, 12, 17, 19
Hidden
parameters. Problems involving incomplete data, such as: elementary
cryptanalysis, transliteration, machine translation, NL interfaces, deciphering
ancient scripts. The EM algorithm.
Programming
Assignment 4 out Oct 12, due beginning of class Oct 19.
Topic: Unsupervised
learning of natural language structure.
Oct
24, 26, 31
Syntactic
structures, context-free grammars, parsing, lexicalized grammars, regular tree
grammars, syntax-based language models, the inside-outside algorithm.
Programming
Assignment 5 out Oct 26, due beginning of class Nov 2.
Topic: Modeling
syntactic structure of English.
Nov
2, 7, 9, 14
Tree
transformations and applications.
Programming
Assignment 6 out Nov 9, due beginning of class Nov 16.
Topic: Modeling
syntactic structure.
Initial project
proposal due beginning of class Nov 9.
Final project
scope settled Nov 16.
Final project
write-ups due Dec 12 by email.
Nov
16, 21
TBD
Nov
23
Nov
28, 30
Current
research in natural language processing.