A new resource and methodological considerations on verb subcategorization biases

Susanne Gahl,1 Douglas Roland2 & Daniel Jurafsky2
1
University of Illinois at Urbana-Champaign. 2 University of Colorado, Boulder

gahl@icsi.berkeley.edu

 

Verb transitivity biases, and verb subcategorization probabilities generally, play an important role in models of sentence processing.  Unfortunately, counts from different corpora and from psychological norming studies, though generally positively correlated, differ considerably (see, e.g., Merlo, 1994; Lapata et al., 2000), partly due to differences in text genre and verb senses (e.g., Biber, 1988; Roland &Jurafsky, 1998).

To make matters worse, different studies have applied different criteria for transitivity.  In this paper, we describe the results of a new norming study with a coding system that allows us to investigate how coding decisions affect transitivity biases, and we discuss two types of syntactic constructions --- adjectival passives and verb+particle constructions --- that affect transitivity counts considerably.

The norming study is based on British and American English corpora (Francis & Kucera, 1982; Zeno et al., 1995), hand-labeled by a group of Linguistics graduate students.  Our coding scheme distinguishes 15 different patterns, including passives, and sentential complements.  Between 100 and 200 occurrences of 300 verbs were coded.  One result of the project is a detailed labelers manual discussing many of the complications surrounding subcategorization counts.

Both "absolute" and "relative" criteria have been used in describing verb transitivity biases.  By the "absolute" criterion, a verb is considered highly transitive if the proportion of transitive uses exceeds some pre-determined cut-off point, say 50%.  By the "relative" method, the proportion of transitive uses needs to exceed that of some alternative pattern, say sentential complements.  Studies comparing different corpora have generally applied some version of the "absolute" method, comparing the percentages of, for example, transitive uses of a particular verb in different corpora.  However, most experimental studies of the behavioral effects of verb biases have relied on the "relative" method.  One conclusion emerging from our cross-corpus comparisons is that agreement among different corpora is considerably higher if the "relative" method is used.

Adjectival passives (or "pseudo-passives" and "semi-passives", cf. Quirk et al., 1985) superficially resemble true passives, but their syntactic and aspectual properties are those of adjectives, not verbs (cf. the examples in (1) below).  Adjectival passives are frequently counted as transitive verb occurrences in available norming studies (e.g., Lalami, 2000; Lapata et al., to appear), partly because form-based automatic extraction methods are unable to distinguish adjectival and true passives.  Analysis of our data shows that adjectival passives account for an average of 8.1% of verb occurrences, and as much as 85% of the "transitive" occurrences of verbs like locate and delight.  This affects the overall counts considerably: For example, the transitivity biases of 16 out of the 59 "Psych-verbs" (Levin, 1993) in our data set change from "high" to "low" or "mid" if adjectival passives are excluded.  Similarly, verb+particle combinations (e.g., look it up) make up an average of 35% of "transitive" verb occurrences and have been variously counted as transitive or non-transitive in previous studies.  How these forms are actually processed in human parsing may be unclear, but their treatment in estimating verb biases significantly affects the databases underlying sentence processing research.

This analysis sheds further light on the sources of differences among available corpus norms.  The norming study offers a valuable resource for research on sentence processing.

(1) a. Adjectival Passive
The cloakrooms were located in the basement and hard to find.
b. True Passive
The missing children were finally located.

 

References

Biber, D. (1988).  Variation Across Speech and Writing.  Cambridge: Cambridge University Press.

Francis, W. & Kucera, H. (1982).  Frequency Analysis of English Usage: Lexicon and Grammar.  Boston: Houghton Mifflin.

Lalami, L. (1997).  Frequency in Sentence Comprehension.  University of Southern California doctoral dissertation.

Lapata, M., Keller, F., & Schulte im Walde, S. (to appear).  Verb frame frequency as a predictor of verb bias.  Journal of Psycholinguistic Research.

Levin, B. (1993).  English Verb Classes and Alternations.  Chicago: Chicago University Press.

Merlo, P. (1994).  A corpus-based analysis of verb continuation frequencies for syntactic processing.  Journal of Psycholinguistic Research, 23.6: 435-457.

Quirk, R., Greenbaum, S., Leech, G., and J. Svartvik. (1985).  A Comprehensive Grammar of the English Language.  London: Longman.

Roland, Douglas & Daniel Jurafsky. (1998).  How verb subcategorization frequencies are affected by corpus choice.  Proceedings of COLING-ACL 1998, p 1117-1121.

Zeno, S. M., Ivens, S. H., Millard, R. T., & Duvvuri, R. (1995).  The Educator's Word Frequency Guide.  Touchstone Applied Science Associates, Inc.