Introduction to working with corpora and programming in Python
Instructor: Katrin Erk
Course description We will study the design, annotation formats, and analysis of text
corpora. Topics to be discussed include: what types of corpora there
are, and what kinds of research questions can be answered using a
corpus; corpus annotation: principles and standards, formats,
examples, and tests for annotation guidelines; tools and methods for
searching and extracting information in corpora; and the basics of
statistical modeling of corpus phenomena, including the selection and
evaluation of models.
The introduction to programming in Python will start with a general
introduction to key concepts of the language. Later, merging the two
topics of the course, we will use Python to access and analyze corpus
data.
Relevant links
Students enrolled in the course can access its Blackboard page.
Martin Wynne (ed): Developing Linguistic Corpora:
a Guide to Good Practice. This is a nice collection of hands-on advice for corpus collection and annotation.
The Python tutorial,
also available as a PDF document from this page. This
tutorial contains rather condensed information, which may be more
helpful for lookup than for learning Python in the first place. For a
more extensive list of Python documentation sources, take a look at
the Python Documentation Index.
Allen B. Downey, Jeffrey Elkner and Chris Meyers: How to Think
Like a Computer Scientist: Learning with Python. This Python
tutorial is more of a tutorial. Its only drawback for our purpuses is
that it is not specifically geared at working with text. Especially
the first chapter, which discusses what a computer program does and
what it is made of, is quite unique.
Office: Calhoun, Room 512.
Office hours: Tuesday 2-3:30 pm, and Wednesday 9:30-11 am.
Phone: 471-9020
Email: katrin dot erk at gmail dot com
This course is a combined introduction into working with text corpora
and into the basics of programming in Python. It is aimed at graduate
students in linguistics who would like to use text corpora for their
investigations; previous programming experience is not required.
Syllabus
Schedule
Course documents