Spring 2009 | Instructor: Jason Baldridge | MW 12-1:30 | PAR 10 | Blackboard

The field of computational linguistics has undergone a major shift over the last two decades toward statistical methods. For some tasks, such as language modeling, there is a wealth of data available for training models, but for many tasks, the performance of models is severely limited by the amount of relevant labeled training material. Semisupervised learning seeks to use small amounts of annotated data in combination with (possibly) large amounts of raw text to improve performance over just using the annotated data by itself.

This class will look at the theory and methods behind semisupervised learning methods in the context of computational linguistics. The main goal will be to provide a high-level view of machine learning methods in the context of computational linguistics tasks, with an aim toward understanding when they should be expected to work well for a task and how to apply them. The focus will be on practical and applied concerns in natural language processing. Even so, what is linguistics if not the search to find concise characterizations of linguistic patterns? Unsupervised/semi-supervised machine learning seeks to learn good models using less human-guided input, and we will consider the possible ramifications for core linguistic concerns, such as acquiring syntactic structure.

Back to the UTCL website.