EARL Efficient Annotation of Resources by Learning

About

The goal of this project is to reduce the annotation effort in documenting languages with insufficient resources through machine learning and active learning. The output of the research will allow not only field linguists but language scholars in general to concentrate on the more pressing issues of data collection and linguistic analysis while minimizing the time investment required for the highly labor intensive task of linguistic annotation. Harnessing recent developments in natural language processing and machine learning, especially that of active learning and machine translation, our research focuses on maximizing performance in terms of both coverage and precision while using as little human annotated material as possible. To test our approach in a real-world situation, we will be directly involved in the annotation of an actual underdocumented language, Q'anjob'al, working closely with native speakers and thus maximizing the robustness and applicability of our results.

People

Principal investigators: Jason Baldridge, Katrin Erk,

Research assistants: Alexis Palmer, Taesun Moon

Publications

Resources

Project Page

The project wiki page is accessible here.