CSCI 544

Applied Natural Language Processing

University of Southern California

Spring 2013


Tuesdays and Thursdays, 2:00pm-3:20pm


VHE 217


Zornitsa Kozareva

Teaching Assistant:

Victor Barres

Guest Lecturers:

Kenji Sagae, ICT

Sujith Ravi, Google

Anton Leuski, ICT

CSCI544-2013 Final Project Award Winners

(from left to right Victor Barres, Ruoyang Wang, Wenqi Zhang, Greg Harris, Zornitsa Kozareva, Changhai Zheng, Yunqing Cao, Soonil Nagarkar, Sam Shuster and Bobi Pu)

Best Award for Most Creative Idea: Changhai Zheng and Yunqing Cao
Best Award for Presentation: Soonil Nagarkar
Best Award for System (results and algorithm): Greg Harris
Best Award for Most Likely to be Converted into Successful Business: Bobi Pu and Sam Shuster
Best Award for all of the above categories: Wenqi Zhang and Ruoyang Wang

The awards are based on the rating of each project by the 40 (fourty) graduate students taking the CSCI544 2013 class, the TA Victor Barres and professor Dr. Zornitsa Kozareva.

Class Questions:

Use Piazza to post class related questions and/or to start a discussion


This course covers both fundamental and cutting-edge research topics in Natural Language Processing (NLP) and delves into modern NLP applications including: information extraction, information retrieval, question answering systems like IBM's Watson, sentiment analysis.


This graduate course is intended for:
  • students who want to understand state-of-the-art and current NLP research
  • students interested in tools for building NLP applications
  • students interested in applications of NLP like sentiment analysis, information extractors, search engines among others


Proficiency in programming, algorithms and data structures, basic knowledge of linear algebra and statistics.

Related Courses

There is a sister course, Advanced Natural Language Processing, offered in the fall semester. You can take these two courses in either order.

Textbooks (optional reading)

Classes from Previous Years

Syllabi and materials from previous years. Since those pages are no longer maintained, there is no guarantee of completeness.


Students will experiment with existing NLP software toolkits and write their own programs. Students will work with real datasets and will build their own NLP Information Extraction, Text Classification and Sentiment Analysis systems. Grades will be based on:
  • Programming assignments (2 x 25%): the grade will depend on the performance of a system relative to the rest of the class and the technical report.
  • Research project (50%): the grade will depend on the project's substantiality, correctness, relevance to the course, as well as the clarity and depth of the project report, which should follow standard ACL guidelines. Building a demo system will be optional, but will count as bonus points.

Homeworks and Project Proposal Guidelines

Homework I: Named Entity Recognition
Homework II: Web Page Clustering of Ambiguous Names
Project Proposal


Date Instructor Lecture
January 15 Kozareva Introduction to NLP
January 17 Kozareva Morphology and Basic Text Processing
January 22 Kozareva Named Entity Recognition, Decision Trees
January 24 Kozareva Named Entity Recognition, k-NN, Features
January 29 Kozareva Introduction to Weka
Homework 1 is out
January 31 Kozareva Name Discrimination
February 5 Sagae POS Tagging
February 7 Sagae Parsing
February 12 Kozareva Question Classification
February 14 Kozareva Sentiment Analysis
February 19 Kozareva Regression
February 21 Kozareva Bullying Detection
February 26 Kozareva Latent Semantic Analysis
Homework 2 is out
February 28 Kozareva Applications of Latent Semantic Analysis
March 5 Kozareva Singular Value Decomposition
March 7 Ravi Unsupervised Learning for Structured Prediction
March 12 Barres Principal Component Analysis
March 14 Kozareva Latent Dirichlet Allocation
March 19 Spring Break
March 21 Spring Break
March 26 Kozareva Semantic Class Induction
March 28 Kozareva Graph Algorithms
April 2 Kozareva Taxonomies
April 4 Kozareva Semantic Relations
April 9 Leuski Information Retrieval
April 11 Kozareva Events
April 16 Kozareva Textual Entailment
April 18 Satheeshkumar Karuppusam #8: Sarcasm Identification in Social Network data
Vaishnavi Dalvi and Nirmisha Bollampalli #3: Affective Text for News Headlines
Saravanan Ganesh and Harshvardhan #4: News Summarization
Soonil Nagarkar #19: Author Attribution Through Stylometric Analysis
April 23 Nabir Bora
#1: Sentiment Analysis on Conversational Speech Transcriptions: Do vocal cues play a role?
Kiana Baradaran and Stefan Zeltner #15: Semantic Analysis for TV Episodes Ratings
Akanksha Gopinath, Laksh Gupta and Sanket Sabnis #5: Question-Answer System with Search Engine Integration
Greg Harris #6: Finding Informative Comments in Forums Beset by Amateurs, Jokesters and Trolls
Xing Shi and Ai He #20: Building a Causality Database from Yelp Reviews
April 25 Shashank Mandil and Linwei Zhu #9: Stock Sentiment Analysis
Pavan Gadam Manohar and Palvinder Singh #10: Prediction of movie ratings
Cristina Cano and Sayat Satybaldiyev #2: Twitter Topic Modeling and Sentiment
Ashish Jain #7: Multi-Document Summarization
Changhai Zheng and Yunqing Cao #16: Opinion Mining based on Comparison
April 30 Bobi Pu and Sam Shuster #11: Using Natural Language Processing of Social Media in conjunction with Online Charts to Predict Billboard Top10
Shivasankari Kannan #12: Emotion Tracking in Novels
Jia Li and Chen Zhang #18: Aspect-Sentiment Analysis of Amazon Product Review
Yubing Dong, Yunru Huang and Shitian Shen #17: A Case Study of Sentiment Analysis and Topic Detection in Chinese Tweets
Teng Song #22: Unsupervised part-of-speech tagging
May 2 Wenqi Zhang and Ruoyang Wang #13: Graph of Fame: Interpersonal Relationship Extraction and Social Network Construction of Celebrities
Vladimir Zaytsev #14: N7 - Hate Speech Classifier for Short Messages
Swaroop Manjunath and Zinan Xing #21: Predicting Stock Returns from Blog Sentiment
Kozareva Closing and Award Ceremony

Statement for Students with Disabilities:

Any student requesting academic accommodations based on a disability is required to register with Disability Services and Programs (DSP) each semester. A letter of verification for approved accommodations can be obtained from DSP. Please be sure the letter is delivered to me (or to TA) as early in the semester as possible. DSP is located in STU 301 and is open 8:30 a.m.-5:00 p.m., Monday through Friday. The phone number for DSP is (213) 740-0776.

Statement on Academic Integrity:

USC seeks to maintain an optimal learning environment. General principles of academic honesty include the concept of respect for the intellectual property of others, the expectation that individual work will be submitted unless otherwise allowed by an instructor, and the obligations both to protect one's own academic work from misuse by others as well as to avoid using another's work as one's own. All students are expected to understand and abide by these principles. Scampus, the Student Guidebook, contains the Student Conduct Code in Section 11.00, while the recommended sanctions are located in Appendix A: Students will be referred to the Office of Student Judicial Affairs and Community Standards for further review, should there be any suspicion of academic dishonesty. The Review process can be found at:

Emergency Preparedness/Course Continuity in a Crisis:

In case of a declared emergency if travel to campus is not feasible, USC executive leadership will announce an electronic way for instructors to teach students in their residence halls or homes using a combination of Blackboard, teleconferencing, and other technologies.