Sabine Schulte im Walde (Universität Stuttgart),
Distributional models assume that the contexts of a linguistic unit (such as a word, a multi-word expression, a phrase, a sentence, etc.) provide information about the meaning of the linguistic unit (Harris, 1954; Firth, 1957). They have been widely applied in data-intensive lexical semantics (among other areas), and proven successful in diverse research issues, such as the representation and disambiguation of word senses; selectional preference modelling; the compositionality of compounds and phrases, or as a general framework across semantic tasks.
While it is clear that distributional knowledge does not cover all the cognitive knowledge humans possess with respect to word meaning (Marconi, 1997; Lenci, 2008), distributional models are very attractive, as the underlying parameters are accessible from even low-level annotated corpus data. We are thus interested in maximizing the benefit of distributional information for lexical semantics, by exploring the meaning and the potential of comparatively simple distributional models.
In this respect, this talk will present four case studies on semantic relatedness tasks that demonstrate the potential and the limits of distributional models: (i) the availability of various German association norms in standard web and newspaper corpora; (ii) the prediction of compositionality for German multi-word expressions; (iii) the distinction between the paradigmatic relations synonymy, antonymy and hypernymy with regard to German nouns, verbs and adjectives; and (iv) the integration and evaluation of distributional semantic information into an SMT system.