Jonathan May
Website | https://www.isi.edu/~jonmay/cs544_fa17_web/ |
Lectures | SAL 101, Wednesdays and Fridays 8:00-9:50 am |
Instructor office hours | Jonathan May, RTH 512, Wednesdays and Fridays 10-11am or 11am-12pm with appointment |
Textbook | None required; Speech and Language Processing, 2nd Edition1 is optional. All required readings will be listed in the syllabus. |
TA Office hours | Dong Guo, 3pm-5pm Tue, EEB 220 |
Siddharth Jain, 8am-10am Mon, SAL lab | |
Ramesh Manuvinakurike, 8am-10am Thu, SAL lab | |
Homeworks (60% of grade) | 8 x 7.5% of grade each. A mix of programming and written assignments, electronic submission due 11:59 PM local Pacific time per table below. |
Late days | Four (4) cumulative with no more than two (2) per assignment. 50% penalty first day thereafter; no credit afterward. |
Midterm (15% of grade) | Friday, October 6, 8:00-9:45 am. Covers lectures, readings, and homeworks from 8/23-10/4 |
Final exam (25% of grade) | Wednesday, December 6, 8:00-10:00 am. Covers lectures, readings, and homeworks from 8/23-11/29 (i.e. the entire class) |
Contact us | On Piazza or in class/office hours. Please do not email (unless notified otherwise). |
week/theme | date | material | reading | HW out | HW due |
1: Corpus Processing | 8/23 | intro, applications | jm v2 c1 2, hirschberg & manning 3 | ||
8/25 | corpora, text processing, words, regular expressions (lec 2 starting code 4) | NLTK ch.2 5, jmv3 c26, nathan schneider's notes 7, Unix for poets8, sculpting text9, | HW 1 (due 9/8) | ||
2: Morphology, Regular languages | 8/30 | Morphology, Finite-State Automata, FSA relationship to regex, finite state transducers | jm v2 c3 10 | ||
9/1 | probability | Goldwater probability tutorial 11, | HW 2 (due 9/15) | ||
3: Classification and tagging | 9/6 | Classifiers, features, naive Bayes, perceptron, logistic regression | jm v3 c612, Eisenstein notes pp. 21-4813 | ||
9/8 | pos tagging, hmm, search | jm v3 c914, blunsom notes15, collins notes (optional, more detailed)16 | HW1 | ||
4: Syntax and Parsing 1 | 9/13 | (guest lecture) parsing and syntax 1: treebanks, evaluation, cky, grammar induction | penn treebank 17, jm v3 c1218, jm v3 c1319 | HW3 (due 9/27) | |
9/15 | (guest lecture) parsing and syntax 2: pcfgs, restructuring, lexicalization, smoothing, beaming | chiang notes20 | HW2 | ||
5: Syntax and Parsing 2 | 9/20 | parsing and syntax 3: dependencies, mst and shift reduce algorithms | jm v3 c1421 | HW 4 (due 10/4) | |
9/22 | NO CLASS | ||||
6: Language Models | 9/27 | ngram LMs | jm v3 c422 | HW 3 | |
9/29 | smoothing, interpolation | (optional) chen and goodman23 | |||
7 | 10/4 | feed forward lm, rnn lm | jm v3 c824 | HW 4 | |
10/6 | MIDTERM | ||||
8: Semantics | 10/11 | classical lexical semantics | jm v3 c1725 | ||
10/13 | distributional lexical semantics | jm v3 c1526, jm v3 c1627 | HW 5 (due 10/27) | ||
9: Information Extraction | 10/18 | (guest lecture) Information Extraction 1 | jmv3 c2128 | ||
10/20 | MT introduction, evaluation | alpac report (optional; skip appendices)29, 20 years of bitext30 , Bleu31 | HW 6 (due 11/10) | ||
10: | 10/25 | (guest lecture) Information Extraction 2 | |||
10/27 | (guest lecture) NLP and ML research at Amazon | HW 5 | |||
11: Machine Translation 1 | 11/1 | introduction, evaluation | arcturan and centauri32 | ||
11/3 | alignment, em, model 1 | Brown et al. 33 (mathy bits of models 3+ can be skimmed), Knight workbook34 | |||
12: Machine Translation 2 | 11/8 | pbmt, search, features, tuning | |||
11/10 | features, tuning | mert35, pro36 | HW 7 (due 12/1) | HW 6 | |
13: | 11/15 | syntax, neural | |||
11/17 | (guest lecture) Information Retrieval/Question Answering | jm v3 c28 37 | |||
14: Thanksgiving | 11/22 | NO CLASS | |||
11/24 | NO CLASS | ||||
15: Data Mining | 11/29 | (guest lecture) Social Media Mining | |||
12/1 | Review | HW 7 | |||
12/6 | FINAL | ROOM ASSIGNMENT38 |