|
Ling 795 Seminar: Machine Learning in NLP
[ Course requirements |
Readings |
Resources ]
Fall 2002
Monday 4:00-6:40
Room AH 3150
The past decade has seen has dramatic increase in research on machine
learning methods in natural language processing. These techniques use
general symbolic and statistical learning methods to automatically
extract linguistic constraints from natural language corpora which
replace or augment those constructed by the linguist. In this course
we will survey some of the leading machine learning methods and the
ways in these sophisticated `discovery procedures' can be applied to
problems in both computational and theoretical linguistics.
Prerequisites
We will assume basic familiarity with computational linguistics,
probability, and machine learning as covered in Ling 581 and/or
Ling 681.
Assessments
The goals of this course are for us to gain experience in:
- exploring the role that machine learning plays in natural
language processing in general, and text classification in particular,
- reading and evaluating the primary literature,
- presenting and discussing research material with peers,
- identifying open research questions,
- and designing and carrying out our own experiments.
Through the term, participants (including auditors!) will present and
discuss articles from the reading list, which cover a number of
machine learning methods as applied to the problem of text
classification.
In addition to leading and participating in discussions, students
taking the class for a grade will also prepare a final
project. Projects should somehow involve machine learning and NLP, but
need not be restricted to the methods we cover in class or to text
classification. The project should be something new
and useful, which will help other researchers in the
field. Possible projects include a survey of relevant literature on a
particular problem area or machine learning method, a novel
implementation of an algorithm or the application of a method to a new
problem, or even an annotated data set with can provide new insight
into the differences between techniques. Ideally, the final project
should be something that could be submitted to one of the many NLP
conferences.
Final projects can done individually or, with prior approval, in small
groups. So that we all know what each other is working on, each group
will give a short presentation on their topic around the middle of the
semester (October 14). Two thirds of the way through the term
(November 4), each group should submit an annotated bibliography of
the references they intend to use in their work. Final presentations
will be on the last day of class (December 9).
The final grade will be based on class participation (20%) and a final
project (80%).
Project deadlines:
October 14 | Topic presentations (10-15
minutes) |
November 4 | Bibliography due |
December 9 | Final presentations (20-30 minutes) |
December 16 | Final project due |
Lab
The computational linguistics lab
will be accessible for participants to work on their final projects.
The lab is part of the Social Sciences
Research Lab in the basement of the Professional
Services and Fine Arts building. Information about how to use the
lab will be made available during the first class meeting.
This list is only a starting point. We will certainly not cover all of
these topics, and additional materials may be added as things come up
in class.
- September 16
-
M. E. Maron. 1961. "Automatic indexing: An experimental inquiry."
Journal of the ACM (JACM) 8(3):404--417.
David Lewis. 1998. "Naive (Bayes) at forty: The independence
assumption in information retrieval." In Proceedings of ECML-98,
10th European Conference on Machine Learning. [.ps.gz]
Andrew McCallum and Kamal Nigam. 1998. "A comparison of event models
for Naive Bayes text classification." In AAAI/ICML-98 Workshop on
Learning for Text Categorization. Pages 41--48. [.pdf]
- September 23-30
-
Pedro Domingos and Michael Pazzani. 1997. "On the Optimality of the
Simple Bayesian Classifier under Zero-One Loss." Machine
Learning 29:103--130. [.ps.gz]
I. Rish., J. Hellerstein, and T.S. Jayram. 2001. An analysis of
data characteristics that affect naive Bayes performance. IBM
Technical Report RC21993. [.ps]
- October 14
-
J. Ross Quinlan. 1993. C4.5: Programs for Machine Learning.
San Mateo: Morgan Kaufmann. Chapters 2 and 4.
David Lewis and Marc Ringuette. 1994. "A comparison of two learning
algorithms for text categorization." In Proceedings of SDAIR-94,
3rd Annual Symposium on Document Analysis and Information
Retrieval. [.ps]
- October 21
-
D. E. Johnson, F. J. Oles, T. Zhang and T. Goetz. 2002. "A
Decision-Tree-Based Symbolic Rule Induction System for Text
Categorization." IBM Systems Journal
41(3):428--37. [.ps]
S.M. Weiss, C. Apte, F. Damerau, D.E. Johnson, F.J. Oles, T. Goetz,
and T. Hampp. 1999. "Maximizing text-mining performance." IEEE
Intelligent Systems 14(4): 63--69. [.pdf]
- October 28
-
Robert E. Schapire. 2002. "The boosting approach to machine learning:
An overview." In MSRI Workshop on Nonlinear Estimation and
Classification. Berkeley. [.ps.gz]
Robert E. Schapire and Yoram Singer. 2000. "BoosTexter: A
boosting-based system for text categorization." Machine Learning
39(2/3):135--168. [.ps]
- November 4
-
Thomas G. Dietterich and Ghulum Bakiri. 1995. "Solving multiclass
learning problems via error-correcting output codes." Journal of
Artificial Intelligence Research 2:263-286. [.ps]
Eun Bae Kong and and Thomas G. Dietterich. 1995. "Error-correcting
output coding corrects bias and variance." In Proceedings of the
12th International Conference on Machine Learning, pages
313-321. [.ps.gz]
Adam Berger. 1999. "Error-correcting output coding for text
classification." IJCAI'99: Workshop on machine learning for
information filtering. Stockholm, Sweden. [.pdf]
- November 11
-
D. Aha, D. Kibler, and M. Albert. 1991. "Instance based learning
algorithms." Machine Learning 6:37--66.
Walter Daelemans, Antal van den Bosch, and Ton
Weijters. 1996. "IGTree: Using Trees for Compression and
Classification in Lazy Learning Algorithms." In D. Aha (ed.),
Artificial Intelligence Review, 11:407--423.
[.ps]
B. Masand, G. Linoff, and D. Waltz. 1992. "Classifying news stories
using memory based reasoning." In 15th Ann. Int. ACM SIGIR
Conference on Research and Development in Information Retrieval
(SIGIR'92), pages 59--64.
Walter Daelemans, Antal van den Bosch, and Jakub
Zavrel. 1999. "Forgetting exceptions is harmful in language
learning." Machine Learning, 34:11--43.
[.ps]
- November 18
-
Marti A. Hearst, Bernhard Schölkopf, Susan T. Dumais, Edgar Osuna,
and John Platt. 1998. "Trends and Controversies -- Support Vector
Machines." IEEE Intelligent Systems 13:18--28.
[.pdf]
Bernhard Schölkopf and Alexander J. Smola. 2002. Learning
with Kernels: Support Vector Macines, Regularization, Optimization,
and Beyond. MIT Press. Chapter 1 and Appendix B.
[site]
- November 25
-
Thorsten Joachims. 2001. "A statistical learning model of text
classification with support vector machines." Proceedings of
the Conference on Research and Development in Information Retrieval
(SIGIR).
[.pdf]
Susan T. Dumais, John Platt, David Heckerman and Mehran
Sahami. 1998. "Inductive learning algorithms and representations for
text categorization." In Proceedings of ACM-CIKM98, pages
148--155.
[.pdf]
- December 2
-
Huma Lodhi, Craig Saunders, John Shawe-Taylor, Nello Cristianini, and
Chris Watkins. 2002. "Text classification using string kernels."
Journal of Machine Learning Research 2:419-444.
[.pdf]
Jason D. M. Rennie and Ryan Rifkin. 2001. Improving Multiclass
Text Classification with the Support Vector Machine. Massachusetts
Institute of Technolgy. AI Memo AIM-2001-026.
[.pdf]
Yiming Yang amd Xin Liu. 1999. "A re-examination of text
categorization methods." In Proceedings of ACM SIGIR Conference
on Research and Development in Information Retrieval (SIGIR'99),
pages 42--49.
[.ps.gz]
- December 9
-
Final presentations
- Probability and information theory slides
- MathWorld - a good
site for looking up obscure mathematical jargon and formulas you may
have forgotten
- Software
|