See the UT Compling Lab project page for a list of some projects I am currently involved in. Here's a brief list:
TeXIT: (Texas) X-lingual Interpretation of Texts. (New York Community Trust, 2008-Present)
EARL: Efficient Annotation of Resources by Learning. (NSF, 2007-Present)
Past projects:
TEXTIME: Temporal Expressions and Time Processing (in Texas). (New York Community Trust, 2006-2008)
DISCOR: Discourse Structure and Coreference Resolution. (NSF, 2006-2008)
OpenCCG Front End: DotCCG specification language and VisCCG
GUI editor for OpenCCG grammars. (LAITS, 2006-2007)
-
Computational linguistics usually requires writing a fair amount of programming code, but there is a lot of existing software that can be used directly or built on for performing natural language processing tasks. Open source software is particularly appealing because it allows you to modify the source code if you need to. Check out the OpenNLP project for a fairly comprehensive list of open source software for natural language processing. Here are some of the open source software that I am involved with:
-
TADM (Toolkit for Advanced Discriminative Modeling): a C++ package for training maximum entropy and perceptron models.
The OpenNLP toolkit: a suite of Java tools for various NLP tasks, including sentence splitting, part-of-speech tagging, and parsing.
OpenCCG: a Java parsing/realization system for Combinatory Categorial Grammar.
OpenNLP Maxent: a Java implementation of Generalized Iterative Scaling for training and using maximum entropy models.