Grammar for first sentences of Oxford Dictionary of National Biography entries

Original goals

I originally planned to build a grammar for first sentences of biography entries from the Oxford Dictionary of National Biography.

While most of these sentences share the same basic structure ([Lastname], [Firstname] ([year of birth, year of death]), [information about profession or works written], was born [where, when], son/daughter of [information about parents]), there is quite a bit of variation in the kinds of information included in each sentence. For example, the grammar would (ideally) handle sentences such as the following:

Gil [Gill], Alexander, the elder (1565–1635), headmaster, was born in Lincolnshire on 27 February 1565.

and

Laud, William (1573–1645), archbishop of Canterbury, was born in Reading on 7 October 1573, the only son of William Laud (d. 1594), a prosperous clothier, and his wife, Lucy, nee Webb (d. 1600), the widow of John Robinson, another Reading clothier.

In sentences such as the latter, I wanted the appositive phrases to be attached to the correct nouns (i.e., John Robinson is “another Reading clothier,” not either of the William Lauds).

Revised goals for this assignment

I found my original goals a bit too ambitious for this moment in the semester, so I've simplified the kinds of sentences I want to be able to parse. Instead of actual first sentences from DNB entries, I'm looking at a few simple kinds of biographical assertions:

  • John Milton was born in London in 1608
  • John Milton was born in December 1608 in London
  • John Milton was born in London in December
  • in 1693 Eliza Haywood was born
  • Katherine Woodcock married John Milton in 1656
  • John Milton was raised by his parents
  • John Milton was baptized in London
  • Milton was raised in London

I therefore need to be able to discriminate between locative, temporal, and agentive prepositional phrases, and between transitive verbs (“married”) and static (“born”) and dynamic (“raised” or “baptized”) passive constructions, and to build appropriate semantic representations for each. Temporal prepositional phrases should be able to precede the verb they modify, while locative and agentive prepositional phrases should not.

The following sentences should not parse:

  • John Milton was born in his parents
  • in London John Milton was raised
  • in London John Milton was born
  • Milton John was raised in London
  • in 1608 December John Milton was born
  • in December 1608 London bore John Milton
  • Haywood Milton was raised in Kent
  • John Milton was raised in Milton
  • John Milton was raised by December 1608

The grammar is available here: dnb.ccg. I'm starting with a small but expandable lexicon of the first and last names and place names necessary to cover my test sentences.

Problems

There are a number of things that I'm a bit confused about (which is what I get for putting off the assignment until the last minute): I had some trouble trying to handle punctuation (even when separating it with spaces in the input), and I'm not entirely sure I understand the distinction between families and parts of speech here. I also feel that I'm building far too many families (16), but I was unable to figure out a better way to handle these phenomena.

I had trouble handling the locative prepositional phrases “in April 1608” and “on 17 April 1608” in a way that made sense to me. In its current form the grammar only parses the former.

Also, it took me a couple of minutes to realize that any digits in the testbed sentences have to be quoted.

 
openccg/grammars/dnb.txt · Last modified: 2007/04/26 01:51 (external edit)
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki