San Diego State University logo

Ling 696 Advanced Statistical Methods in Computational Linguistics

[ Course requirements | Schedule | Resources ]

Spring 2004
Thursday 7:00pm-9:40pm (Note time change!)
Room BA 412

This course offers a survey of the state of the art statistical and machine learning methods for computational linguistics. Topic to be covered include:

  • generative and discriminative probabilistic models
  • naïve Bayes classifiers
  • maximum entropy models
  • conditional random fields
  • memory based learning
  • support vector machines.

Prerequisite: Ling 681 or equivalent

Instructor:Rob Malouf
Office: BA 310A
Office hours: Mondays 11:00-12:00, Thursdays 2:00-3:00
Email:rmalouf@mail.sdsu.edu (PGP public key)
Phone:(619) 594-7111

Requirements

Assessments

The final grade will be based on homework assignments (30%) and a final project (70%).

Through the term, there will be occasional homework assignments to practice the techniques learned in class. These homework assignments will be graded. Working in groups is encouraged, but please include the names of all coworkers on the assignment.

The final project for this course will be a project to design, implement, document, and evaluate an NLP application based on the machine learning methods cover in the course. The details will depend on the interests of the students.

Readings

The required textbook for this course is:

Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press.

Additional readings will be made available in class or via the Resources section of the course web page.

Lab

For homework assignments and final projects, we will be using the computational linguistics lab, part of the Social Sciences Research Lab in the basement of the Professional Services and Fine Arts building. Information about how to use the lab will be made available before the first assignment.

Schedule

Week 1 Introduction
Background · Mathematical background · Machine learning applications · Types of models

Week 2-4 Non-parametric methods
Decision trees · Memory-based learning · Rule induction

Week 5-7 Bayesian methods
Naive Bayes classifiers · Improved priors · Maximum Entropy classifiers · Conditional random fields

Week 8-10 Ensemble machines
Weighted voting · bagging · boosting · co-training

Week 11-13 Kernel methods
Linear classifiers · Perceptron · Kernel functions · Support Vector Machines

Week 14 Odds and ends
Training data · Running experiments · Computational realities

Week 15 Projects
Final project due May 13

Resources

rmalouf@mail.sdsu.edu
Last modified: Mon Apr 26 09:55:37 PDT 2004