Syllabus
Instructor Contact Information
Katrin Erkoffice hours: Tue 2-3, Wed 2-4.
office: Calhoun 512
phone: 471-9020
fax: 471-4340
email: katrin.erk AT mail DOT utexas DOT edu
Lab Information
location: Calhoun 514lab hours: Mon-Fri 9am-4:30pm
lab administrator: Praveen Balasubramanium
lab administrator office hours: Mon 10am-12, Wed 3-5pm.
phone: 471-9025
email: praveen_b AT mail DOT utexas DOT edu
Prerequisites
Graduate standing. Computational Linguistics I or consent of instructor. Experience with at least one programming language.
Syllabus and Text
This page serves as the syllabus for this course.
The official course text book:
Selected readings from this text will be suggested, along with other readings made available for download or copying.
Jurafsky and Martin will be useful as an extra resource:
Assignments
Assignments will be updated on the assignments page. A tentative schedule for the entire semester is posted on the schedule page. Readings and exercises may change up one week in advance of their due dates.
To see your grades go to eGradebook.
Philosophy and Goal
The foremost goal of this course is to expose the student to advanced techniques and applications of natural language processing (NLP), especially those involving statistical approaches. The course will address both theoretical and applied topics.
Some specific goals of the course are to enable students to:
- utilize corpora and annotations added to them
- build statistical NLP components, such as part-of-speech taggers,
that learn from such corpora
- evaluate the merits of different machine learning methods for
given NLP tasks
- write a project report in the format of submissions to computational linguistics conferences
This course presents an opportunity for students to gain experience with models and algorithms used in computational linguistics that underly practical applications while gaining an appreciation for the theoretical questions of the field. It will thus help prepare the student both for jobs in the industry and for doing original research in computational linguistics.
Evaluation will be based on the project and homeworks. There will be no exams.
Content Overview
Natural Language Processing (NLP) is concerned with automatically processing human language. Applications include machine translation, search, automatic summarization, and dialog systems. NLP has proved to be a hard task, among other things because of the complexity of the structure of human language, and because of the massive amount of world knowledge that humans use in language understanding.
But there has been significant progress, and growth of the field, in the last ten years. Some of the major factors behind this are the the use of statistical techniques, the availability of large (and sometimes annotated) text corpora (including the web itself), and access to relatively cheap fast computing power.
This course will focus on many of the core technologies and techniques used in computational linguistics (such as language models and part-of-speech taggers) which have proved to be particularly useful in practical applications. It will also consider how statistical methods in computational linguistics can be brought to bear on linguistic theory.
This course provides a broad introduction to applied natural language processing with a particular emphasis on corpus-driven learning. Techniques we will study include
- using corpora
- n-gram language models
- hidden markov models
- probabilistic classifiers
- experimental methodology in NLP
- clustering techniques
Applications discussed in the course will include
- part-of-speech tagging
- probabilistic parsing
- word sense disambiguation
- machine translation
With respect to content, the goal of this course is to give the student an appreciation for the broad research topics currently being pursued in the field of computational linguistics. By the end of the course, the student should be able to
- identify and discuss the characteristics of different NLP techniques
- implement a naive Bayes classifier
- create features for probabilistic classifiers to model novel NLP tasks
- implement a pipeline of NLP components
The course is designed to include key activities engaged in by computational linguistics researchers, including generation of ideas and programs, critical oral discussion of ideas, and written evaluation and presentation of ideas.
The course is designed to help students make the transition to doing real research in the field. For those students with interest, it could possibly lead to subsequent research opportunities; in fact, it is hoped that several students' course projects will lead to submissions to professional conferences.
Course Requirements
Assignments (15% each):
Project proposal draft (5%):
Project progress report (5%):
Project final report (20%):
Project presentation (10%):
Extension Policy
Homework must be turned in on the due date in order to receive credit. Extensions will be considered on a case-by-case basis and only if the student asks for the extension before the deadline. In most cases they will not be granted.
By default, 5 points (out of 100) will be deducted for lateness, plus an additional 1 point for every 24-hour period beyond 2 that the assignment is late. For example, an assignment due at 2pm on Tuesday will have 5 points deducted if it is turned in late but before 2pm on Thursday. It will have 6 points deducted if it is turned in by 2pm Friday, etc.
The greater the advance notice of a need for an extension, the greater the likelihood of leniency.
Academic Dishonesty Policy
You are encouraged to discuss assignments with classmates. But all written work must be your own. Students caught cheating will automatically fail the course. If in doubt, ask the instructor.
Notice about students with disabilities
The University of Texas at Austin provides upon request appropriate academic accommodations for qualified students with disabilities. To determine if you qualify, please contact the Dean of Students at 471-6529; 471-4641 TTY. If they certify your needs, we will work with you to make appropriate arrangements.
Notice about missed work due to religious holy days
A student who misses an examination, work assignment, or other project due to the observance of a religious holy day will be given an opportunity to complete the work missed within a reasonable time after the absence, provided that he or she has properly notified the instructor. It is the policy of the University of Texas at Austin that the student must notify the instructor at least fourteen days prior to the classes scheduled on dates he or she will be absent to observe a religious holy day. For religious holy days that fall within the first two weeks of the semester, the notice should be given on the first day of the semester. The student will not be penalized for these excused absences, but the instructor may appropriately respond if the student fails to complete satisfactorily the missed assignment or examination within a reasonable time after the excused absence.
