The .ccg file for this grammar fragment of Olutec can be found here:
:openccg:grammars:olutec.ccg.txt.
To use it, just remover the .txt extension from the end of the filename, making it olutec.ccg
Olutec, also known as Oluta Popoluca, is a Mixe-Zoquean language spoken in southern Veracruz in Mexico. All of the content of this grammar fragment is taken from Roberto Zavala's 2001 dissertation entitled Inversion and Other Topics in the Grammar of Olutec (Mixean), from the University of Oregon. At the time his dissertation was completed, there were roughly 20 fluent native speakers of Olutec remaining, all of whom were over 70 years old.
Zavala's dissertation contains a grammatical sketch of Olutec, including discussions of word order, auxiliaries, complement clauses, and lexical categories. It is an extremely useful resource for someone beginning to implement a grammar of the language because of its concise presentation of major syntactic facts. The entire dissertation, however, is 981 pages long, and clearly there is a great deal more to the language than is represented in the sketch. The inclusion of a grammatical sketch, however, made the topic much more accessible to the beginner.
Olutec has very free word order, with all possible permutations of verbs
and their arguments being attested. This is a phenomenon that lends itself to the use of multisets
in the lexical entries for transitive verbs, but since multisets aren't currently supported in OpenCCG,
you will see two lexical entries for the transitive verb in the .ccg file. In general, however, Olutec
syntax can be handled straightforwardly in CCG, and there are a couple of phenomena that CCG handles
especially well.
Olutec has two very interesting syntactic discontinuities: discontinuous np's and discontinuous coordinate structures. The discontinuous np's all take the form of a verb intervening between a quantifier (or quantifier-like object) and the rest of the noun phrase. Assigning the quantifer a reasonable syntactic type such as np/np, these discontinuities can be handled by allowing the quantifier to first backwards cross-compose with the verb, and then eat up the rest of the np via function application. The discontinuous coordinate structures
are also straightforwardly handled by the removal of the star modality that one usually sees on the slashes in the
syntactic category of the coordinator.
I made use of two types of unary typechanging rule in developing this grammar fragment. One rule is used to allow pro-drop to take place, and is of the form (s$1 | np) ⇒ s$1, with $-schematization being used to accommodate both subject and object dropping. The other rule is of the form np ⇒ s / (s | np) and allows for reduced relatives (relative clauses without an overt complementizer).
Olutec has complex verbal morphology, though not much in the way of case-marking on nominal arguments, which is somewhat strange due to the very free word order in the language. Verbal agreement is accomplished by the way of an ergative/absolutive proclitic, which in general (and always in this fragment) agrees with the subject. In reality, the situation is more complex. Olutec has a 3-way ergative alignment pattern and is the first language to have such a pattern attested. One class of proclitics functions as ergative in independent clauses, but absolutive in dependent clauses; one class functions as absolutive in independent clauses, and one functions as ergative in dependent clauses. The sentences I have examined all only have one clitic, so presumably in Olutec the verb overtly agrees with a single argument. In cases, say, that depict a transitive action between two 3rd-person singluar arguments, there could well be legitimate ambiguity that would need to be resolved at the level of discourse.
In addition to its interesting ergative alignment pattern, Olutec has complex agglutinative morphology that was
beyond the scope of this grammar fragment. As such, I was not able to make use of expansions during the course of my development, but this is not to say that expansions are not applicable here. If one were to develop a broad-coverage grammar of Olutec, or of any agglutinating language, a strategy would be necessary to handle the realization of the myriad possible inflections for any given stem. This could be done using a finite-state transducer, and expansions present a possible second option that would have the benefit of being implemented within
the .ccg file itself. For the purposes of the fragment here, I have supplied fully inflected verb forms in my lexicon and specified them with the relevant syntactic and semantic features that I could confidently retrieve from the glosses in Zavala (2001).
This grammar fragment currently covers (cursorily, mind you) the following phenomena in Olutec:
* Intransitive and Transitive Sentences
* Permutations in ordering of Verbs and their Arguments
* Pro-drop (subject and object)
* Adverbs
* Bare np's and np's with determiners and/or adjectives
* Discontinuous np's
* Discontinuous coordinate structures
* Subject and Object Relativization, and reduced relatives
* Object Control
* One of two kinds of Subject Control
Subject control constructions still need some attention in this fragment. There are two kinds of subject control constructions in Olutec. The control verb takes another verb as its complement in either case; in one case, the complement verb must be nonfinite, and in the other case it must have the same inflection as the matrix (control) verb. This fragment handles the nonfinite-complement construction, but not the same-inflection one.
I'm a little worried about the possibility of overgeneration with a few of the sentences in the testbed. If you look through, you'll see that some return as many as 12 parses, which could all be legitimate, but working with a limited set of data makes it potentially difficult to refine things with the precision that one might like. Additionally, the use of two unary typechanging rules is partly to blame for the larger number of parses. On the upside, all of the bad sentences in the testbed do fail to parse.
If one were to continue working on this grammar fragment, the first major improvement to be made would be to, via one strategy or another, provide a more thorough treatment of verbal morphology than is seen here. This is not to say that the morphology needs to be handled using CCG, but it would be very useful to be able to start talking about some fully-fleshed-out inflectional paradigms for a few verb stems, along with their more detailed semantics and different agreement patterns, where they apply.