San Diego State University logo

Ling 795 Seminar: Machine Learning in NLP

[ Course requirements | Readings | Resources ]

Fall 2002
Monday 4:00-6:40
Room AH 3150

The past decade has seen has dramatic increase in research on machine learning methods in natural language processing. These techniques use general symbolic and statistical learning methods to automatically extract linguistic constraints from natural language corpora which replace or augment those constructed by the linguist. In this course we will survey some of the leading machine learning methods and the ways in these sophisticated `discovery procedures' can be applied to problems in both computational and theoretical linguistics.

Instructor:Rob Malouf
Office: BA 310A
Office hours:Tuesdays 2:00-4:00 or by appointment (PGP public key)



We will assume basic familiarity with computational linguistics, probability, and machine learning as covered in Ling 581 and/or Ling 681.


The goals of this course are for us to gain experience in:

  • exploring the role that machine learning plays in natural language processing in general, and text classification in particular,
  • reading and evaluating the primary literature,
  • presenting and discussing research material with peers,
  • identifying open research questions,
  • and designing and carrying out our own experiments.
Through the term, participants (including auditors!) will present and discuss articles from the reading list, which cover a number of machine learning methods as applied to the problem of text classification.

In addition to leading and participating in discussions, students taking the class for a grade will also prepare a final project. Projects should somehow involve machine learning and NLP, but need not be restricted to the methods we cover in class or to text classification. The project should be something new and useful, which will help other researchers in the field. Possible projects include a survey of relevant literature on a particular problem area or machine learning method, a novel implementation of an algorithm or the application of a method to a new problem, or even an annotated data set with can provide new insight into the differences between techniques. Ideally, the final project should be something that could be submitted to one of the many NLP conferences.

Final projects can done individually or, with prior approval, in small groups. So that we all know what each other is working on, each group will give a short presentation on their topic around the middle of the semester (October 14). Two thirds of the way through the term (November 4), each group should submit an annotated bibliography of the references they intend to use in their work. Final presentations will be on the last day of class (December 9).

The final grade will be based on class participation (20%) and a final project (80%).

Project deadlines:
October 14 Topic presentations (10-15 minutes)
November 4 Bibliography due
December 9 Final presentations (20-30 minutes)
December 16     Final project due


The computational linguistics lab will be accessible for participants to work on their final projects. The lab is part of the Social Sciences Research Lab in the basement of the Professional Services and Fine Arts building. Information about how to use the lab will be made available during the first class meeting.


This list is only a starting point. We will certainly not cover all of these topics, and additional materials may be added as things come up in class.

September 16

M. E. Maron. 1961. "Automatic indexing: An experimental inquiry." Journal of the ACM (JACM) 8(3):404--417.

David Lewis. 1998. "Naive (Bayes) at forty: The independence assumption in information retrieval." In Proceedings of ECML-98, 10th European Conference on Machine Learning. [.ps.gz]

Andrew McCallum and Kamal Nigam. 1998. "A comparison of event models for Naive Bayes text classification." In AAAI/ICML-98 Workshop on Learning for Text Categorization. Pages 41--48. [.pdf]

September 23-30

Pedro Domingos and Michael Pazzani. 1997. "On the Optimality of the Simple Bayesian Classifier under Zero-One Loss." Machine Learning 29:103--130. [.ps.gz]

I. Rish., J. Hellerstein, and T.S. Jayram. 2001. An analysis of data characteristics that affect naive Bayes performance. IBM Technical Report RC21993. [.ps]

October 14

J. Ross Quinlan. 1993. C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann. Chapters 2 and 4.

David Lewis and Marc Ringuette. 1994. "A comparison of two learning algorithms for text categorization." In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval. [.ps]

October 21

D. E. Johnson, F. J. Oles, T. Zhang and T. Goetz. 2002. "A Decision-Tree-Based Symbolic Rule Induction System for Text Categorization." IBM Systems Journal 41(3):428--37. [.ps]

S.M. Weiss, C. Apte, F. Damerau, D.E. Johnson, F.J. Oles, T. Goetz, and T. Hampp. 1999. "Maximizing text-mining performance." IEEE Intelligent Systems 14(4): 63--69. [.pdf]

October 28

Robert E. Schapire. 2002. "The boosting approach to machine learning: An overview." In MSRI Workshop on Nonlinear Estimation and Classification. Berkeley. [.ps.gz]

Robert E. Schapire and Yoram Singer. 2000. "BoosTexter: A boosting-based system for text categorization." Machine Learning 39(2/3):135--168. [.ps]

November 4

Thomas G. Dietterich and Ghulum Bakiri. 1995. "Solving multiclass learning problems via error-correcting output codes." Journal of Artificial Intelligence Research 2:263-286. [.ps]

Eun Bae Kong and and Thomas G. Dietterich. 1995. "Error-correcting output coding corrects bias and variance." In Proceedings of the 12th International Conference on Machine Learning, pages 313-321. [.ps.gz]

Adam Berger. 1999. "Error-correcting output coding for text classification." IJCAI'99: Workshop on machine learning for information filtering. Stockholm, Sweden. [.pdf]

November 11

D. Aha, D. Kibler, and M. Albert. 1991. "Instance based learning algorithms." Machine Learning 6:37--66.

Walter Daelemans, Antal van den Bosch, and Ton Weijters. 1996. "IGTree: Using Trees for Compression and Classification in Lazy Learning Algorithms." In D. Aha (ed.), Artificial Intelligence Review, 11:407--423. [.ps]

B. Masand, G. Linoff, and D. Waltz. 1992. "Classifying news stories using memory based reasoning." In 15th Ann. Int. ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'92), pages 59--64.

Walter Daelemans, Antal van den Bosch, and Jakub Zavrel. 1999. "Forgetting exceptions is harmful in language learning." Machine Learning, 34:11--43. [.ps]

November 18

Marti A. Hearst, Bernhard Schölkopf, Susan T. Dumais, Edgar Osuna, and John Platt. 1998. "Trends and Controversies -- Support Vector Machines." IEEE Intelligent Systems 13:18--28. [.pdf]

Bernhard Schölkopf and Alexander J. Smola. 2002. Learning with Kernels: Support Vector Macines, Regularization, Optimization, and Beyond. MIT Press. Chapter 1 and Appendix B. [site]

November 25

Thorsten Joachims. 2001. "A statistical learning model of text classification with support vector machines." Proceedings of the Conference on Research and Development in Information Retrieval (SIGIR). [.pdf]

Susan T. Dumais, John Platt, David Heckerman and Mehran Sahami. 1998. "Inductive learning algorithms and representations for text categorization." In Proceedings of ACM-CIKM98, pages 148--155. [.pdf]

December 2

Huma Lodhi, Craig Saunders, John Shawe-Taylor, Nello Cristianini, and Chris Watkins. 2002. "Text classification using string kernels." Journal of Machine Learning Research 2:419-444. [.pdf]

Jason D. M. Rennie and Ryan Rifkin. 2001. Improving Multiclass Text Classification with the Support Vector Machine. Massachusetts Institute of Technolgy. AI Memo AIM-2001-026. [.pdf]

Yiming Yang amd Xin Liu. 1999. "A re-examination of text categorization methods." In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'99), pages 42--49. [.ps.gz]

December 9

Final presentations

Last modified: Tue Jan 14 09:57:51 PST 2003