Ling 696 - Advanced Statistical Methods in Computational Linguistics

department of linguistics and oriental languages

Ling 696 Advanced Statistical Methods in Computational Linguistics

[ Course requirements | Schedule | Resources ]

Spring 2004
Thursday 7:00pm-9:40pm (Note time change!)
Room BA 412

This course offers a survey of the state of the art statistical and machine learning methods for computational linguistics. Topic to be covered include:

generative and discriminative probabilistic models
naïve Bayes classifiers
maximum entropy models
conditional random fields
memory based learning
support vector machines.

Prerequisite: Ling 681 or equivalent

Instructor: Rob Malouf

Office: BA 310A

Office hours: Mondays 11:00-12:00, Thursdays 2:00-3:00

Email: rmalouf@mail.sdsu.edu (PGP public key)

Phone: (619) 594-7111

Requirements

Assessments

The final grade will be based on homework assignments (30%) and a final project (70%).

Through the term, there will be occasional homework assignments to practice the techniques learned in class. These homework assignments will be graded. Working in groups is encouraged, but please include the names of all coworkers on the assignment.

The final project for this course will be a project to design, implement, document, and evaluate an NLP application based on the machine learning methods cover in the course. The details will depend on the interests of the students.

Readings

The required textbook for this course is:

Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press.

Additional readings will be made available in class or via the Resources section of the course web page.

Lab

For homework assignments and final projects, we will be using the computational linguistics lab, part of the Social Sciences Research Lab in the basement of the Professional Services and Fine Arts building. Information about how to use the lab will be made available before the first assignment.

Schedule

Week 1 Introduction
Background · Mathematical background · Machine learning applications · Types of models

Week 2-4 Non-parametric methods
Decision trees · Memory-based learning · Rule induction

Week 5-7 Bayesian methods
Naive Bayes classifiers · Improved priors · Maximum Entropy classifiers · Conditional random fields

Week 8-10 Ensemble machines
Weighted voting · bagging · boosting · co-training

Week 11-13 Kernel methods
Linear classifiers · Perceptron · Kernel functions · Support Vector Machines

Week 14 Odds and ends
Training data · Running experiments · Computational realities

Week 15 Projects
Final project due May 13

Resources

General information
- Textbook website and errata
- Roulette wheel
- Manual for Timbl 5.0
- Basic calculus help
- Manuals for Maple
- CoNLL 2004 shared task
- Propbank homepage
- Document Object Model DOM Level 1 info

Class materials

Jan 15	Introduction	slides	handout
Jan 22	Decision trees	slides	handout	tennis.tgz, votes.tgz, past.tgz
Jan 29	Memory-based learning	slides	handout
Feb 5	Rule induction Hwk #1 due	slides	handout
Feb 12	Bayesian classifiers Hwk #2 due	slides	handout
Feb 19	MaxEnt models	slides	handout
Feb 26	MaxEnt models (cont.)	slides	handout
Mar 4	MaxEnt models (cont.) Ensemble machines Hwk #3 due	slides	handout
Mar 11	Ensemble machines (cont.)	slides	handout
Mar 18	Spring break	No class
Mar 25	Ensemble machines (cont.) Hwk #4 due	slides	handout
Apr 1	Perceptron	slides	handout
Apr 7	Support Vector Machines	slides	handout
Apr 15	Support Vector Machines (cont.) Hwk #5 due	slides	handout
Apr 22	Review	slides	handout
Apr 29	Presentations

rmalouf@mail.sdsu.edu
Last modified: Mon Apr 26 09:55:37 PDT 2004