Syllabus for Semisupervised Learning for Computational Linguistics: LIN386

Course Information

  • Course: Semisupervised Learning for Computational Linguistics, LIN386
  • Semester: Spring 2009

Instructor Contact Information

  • office hours: Mon/Fri 1:30-3pm, or by appointment
  • office: Calhoun 510
  • phone: 232-7682
  • fax: 471-4340
  • email: jbaldrid@mail.utexas.edu

Prerequisites

Graduate standing. Computational Linguistics II (LIN386), Natural Language Processing (CS388) or consent of instructor.

Syllabus and Text

This page serves as the syllabus for this course.

Official course textbook:

Recommended:

We will also make use of other readings from books, articles and lecture notes, which will be made available on the course website.

Exams and Assignments

Assignments will be updated on the assignments page. A tentative schedule for the entire semester is posted on the Schedule page. Readings and exercises may change up a week in advance of their due dates.

Philosophy and Goal

The field of computational linguistics has undergone a major shift over the last two decades toward statistical methods. For some tasks, such as language modeling, there is a wealth of data available for training models, but for many tasks, the performance of models is severely limited by the amount of relevant labeled training material. Semisupervised learning seeks to use small amounts of annotated data in combination with (possibly) large amounts of raw text to improve performance over just using the annotated data by itself.

This class will look at the theory and methods behind semisupervised learning methods in the context of computational linguistics. The main goal will be to provide a high-level view of machine learning methods in the context of computational linguistics tasks, with an aim toward understanding when they should be expected to work well for a task and how to apply them. The focus will be on practical and applied concerns in natural language processing. Even so, what is linguistics if not the search to find concise characterizations of linguistic patterns? Unsupervised/semi-supervised machine learning seeks to learn good models using less human-guided input, and we will consider the possible ramifications for core linguistic concerns, such as acquiring syntactic structure.

Content Overview

The content of the course will in large part be shaped based on the make-up of the class participants. Nonetheless, topics that we are likely to cover include:

Machine learning

  • self-training and co-training
  • active learning
  • boundary-oriented methods
  • mixture models
  • expectation maximization
  • Bayesian modeling
  • topic models (e.g. LDA)

Computational linguistics

  • text classification
  • sequence tagging
  • parsing
  • word sense disambiguation
  • coreference
  • use of parallel corpora to reduce supervision

Students will also be required to learn and use R as part of their course projects.

Course Requirements

  • Proposal (10%): Each student will write a proposal for their course project. The proposal should be 3 pages in length and contain at least 6 references, done using LaTeX with the ACL submission style. A scoring rubric will be provided describing the expectations for this write-up. Ten percentage points will be taken off for every page over the limit.
  • Progress report (20%): The progress report is a revision and extension of the proposal that incorporates feedback and progress made on the topic since the proposal. It should be 6 pages in length and contain at least 8 references.
  • Project paper (40%): The final report builds on the progress report and presents the project results and conclusions. It should be 8 pages in length and contain at least 10 references.
  • Presentation (15%): Each student will give a presentation on his or her project in the last week of class.
  • Participation (15%): Students are expected to be present in class having completed the readings and participate actively in the discussions. Each student will lead one discussion during the course of the semester.

Extension Policy

Homework must be turned in on the due date in order to receive credit. Extensions will be considered on a case-by-case basis and only if the student asks for the extension before the deadline. In most cases they will not be granted.

Points will be deducted for lateness. By default, 10 points (out of 100) will be deducted for lateness, plus an additional 5 points for every 24-hour period beyond 2 that the assignment is late. For example, an assignment due at 11am on Tuesday will have 10 points deducted if it is turned in late but before 11am on Thursday. It will have 15 points deducted if it is turned in by 11am Friday, etc.

Late submissions will not be accepted if they are more than one week past the deadline. No points will be received in this case.

The greater the advance notice of a need for an extension, the greater the likelihood of leniency.

Academic Dishonesty Policy

You are encouraged to discuss assignments with classmates. But all written work must be your own. Students caught cheating will automatically fail the course. If in doubt, ask the instructor.

Notice about students with disabilities

The University of Texas at Austin provides upon request appropriate academic accommodations for qualified students with disabilities. To determine if you qualify, please contact the Dean of Students at 471-6529; 471-4641 TTY. If they certify your needs, we will work with you to make appropriate arrangements.

Notice about missed work due to religious holy days

A student who misses an examination, work assignment, or other project due to the observance of a religious holy day will be given an opportunity to complete the work missed within a reasonable time after the absence, provided that he or she has properly notified the instructor. It is the policy of the University of Texas at Austin that the student must notify the instructor at least fourteen days prior to the classes scheduled on dates he or she will be absent to observe a religious holy day. For religious holy days that fall within the first two weeks of the semester, the notice should be given on the first day of the semester. The student will not be penalized for these excused absences, but the instructor may appropriately respond if the student fails to complete satisfactorily the missed assignment or examination within a reasonable time after the excused absence.

 
courses/2009/spring/semisupervised_learning/syllabus.txt · Last modified: 2009/01/19 by jbaldrid | | UTCL Wiki