Skip to main content.

The 13th Workshop on Multiword Expressions (MWE 2017)

Workshop at EACL 2017 (Valencia, Spain), April 4, 2017

Endorsed by the Special Interest Group on the Lexicon (SIGLEX) of the Association for Computational Linguistics and PARSEME, the European IC1207 COST Action

Last updated: January 31, 2017

Exploiting multilingual lexical resources to predict the compositionality of MWEs

by Paul Cook – University of New Brunswick

Paul Cook is an Assistant Professor at the University of New Brunswick in Canada. He received his PhD in Computer Science from the University of Toronto in 2010. Prior to joining the University of New Brunswick he was a McKenzie Postdoctoral Fellow at the University of Melbourne from 2011-2014. Paul's research interests include lexical semantics, multiword expressions (MWEs), social media text processing, and web corpus construction and analysis. His research in MWEs has focused primarily on compositionality prediction and token-level MWE identification.


Semantic idiomaticity is the extent to which the meaning of a multiword expression (MWE) cannot be predicted from the meanings of its component words. Much work in natural language processing on semantic idiomaticity has focused on compositionality prediction, wherein a binary or continuous-valued compositionality score is predicted for an MWE as a whole, or its individual component words. One source of information for making compositionality predictions is the translation of an MWE into other languages. In this talk we will consider methods for predicting compositionality that exploit translation information provided by multilingual lexical resources, and that are applicable to many kinds of MWEs in a wide range of languages. These methods will make use of distributional similarity of an MWE and its component words under translation into many languages, as well as string similarity measures applied to definitions of translations of an MWE and its component words. Experimental results over English verb-particle constructions, and English and German noun compounds, will highlight the importance of token-level identification of MWEs in type-level compositionality prediction. We will conclude by considering limitations of compositionality scores. In particular, such measures on their own do not indicate which of the possible meanings of a component word is contributed in the case that an MWE is at least partially compositional. We will discuss this issue through a case study on English verb-particle constructions.