This dissertation elaborates several refinements to the Combinatory Categorial Grammar (CCG) framework which are motivated by phenomena in parametrically diverse languages such as English, Dutch, Tagalog, Toba Batak, and Turkish. I present Multi-Modal Combinatory Categorial Grammar, a formulation of CCG which incorporates devices and category constructors from related categorial frameworks and demonstrate the effectiveness of these modifications both for providing parsimonious linguistic analyses and for improving the representation of the lexicon and computational processing.
I begin by introducing the various grammar frameworks which set the background for this dissertation and then discuss aspects of providing substantive universals for the theory of CCG. Most importantly, I lay out some of the components necessary for providing a theory of the lexicon, outlining previous approaches and suggesting directions forward. I then turn to a description of the syntactic extraction asymmetries found in English, Tagalog, and Toba Batak and the word order flexibility of Tagalog and Turkish, and discuss previous approaches to handling the data.
Having explicated the foundations and the linguistic motivations for the dissertation, I show how the multi-modal perspective on grammatical composition provided by the logical tradition of categorial grammar can be incorporated into CCG's rule-based approach. The enhanced resource-sensitivity of this perspective allows me to utilize an invariant rule component, controlling the applicability of the combinatory rules via lexical specification rather than with constraints on the rules themselves. This control is shown to be necessary for many aspects of English syntax, and I furthermore demonstrate that the multi-modal approach can improve upon existing CCG analyses for English and Dutch.
The second major development is a redefinition of categories and combinatory rules which relaxes the strict ordering inherent in categories that is normally assumed in categorial grammars. The manner in which this is done permits an intuitive account of local scrambling behavior without increasing the generative power of the system. Bounded long-distance scrambling is handled with the same mechanisms as CCG -- type-raising and crossed composition rules. I furthermore show how the resource-sensitivity of the system effectively limits the permutative possibilities for some constructions in the otherwise quite free grammar of Turkish.
Having thus motivated and developed the Multi-Modal CCG system, I present an account of syntactic extraction asymmetries in Tagalog and Toba Batak, showing how the categories of each language license only a subset of the resource-sensitive combinatory rules and thereby give rise to the observed asymmetries. This leads to a cross-linguistic perspective on the appearance of extraction asymmetries triangulated between English, Tagalog, and Toba Batak. We see that rigid languages like English and Toba Batak are forced to restrict permutativity and this leads naturally to certain arguments being inaccessible for extraction. Tagalog, with its more flexible word order, restricts associativity rather than permutativity, leading to robust asymmetries in a different manner.
Finally, I discuss the implementation of Multi-Modal CCG provided in Grok, highlighting the ways in which the properties of Multi-Modal CCG can be exploited to improve the use of CCG for parsing. In particular, the invariant rule component and modalities of Multi-Modal CCG make it possible to write hard-coded procedures that perform the work of the combinatory rules more efficiently than \naive implementations of the rules. Multiset categories also provide a more compact encoding of several rigid categories, and it is demonstrated that a few simple assumptions about unification significantly reduce the potential for them to induce indeterminacy in parsing. Finally, I discuss how the linguistic analyses presented in this dissertation have been encoded as grammars for Grok and improved in the process.
Altogether, this dissertation provides many formal, linguistic, and computational justifications for the central thesis that this dissertation puts forth --- that an explanatory theory of natural language grammar can be based on a categorial grammar formalism which allows cross-linguistic variation only in the lexicon and has computationally attractive properties.
The dissertation won the 2003 Beth Dissertation prize from FoLLI.