Skip to main content.

The 10th Workshop on Multiword Expressions (MWE 2014)

Workshop at EACL 2014 (Gothenburg, Sweden), April 26-27, 2014

Endorsed by the Special Interest Group on the Lexicon of the Association for Computational Linguistics (SIGLEX), SIGLEX's Multiword Expressions Section (SIGLEX-MWE), and PARSEME, European IC1207 COST Action.

Last updated: Mar 26, 2014

The Web as an Implicit Training Set: Application to Noun Compounds Syntax and Semantics

by Preslav Nakov – Qatar Computing Research Institute

Preslav Nakov is a Senior Scientist at the Qatar Computing Research Institute (QCRI). He received his Ph.D. in Computer Science from the University of California at Berkeley in 2007 (supported by a Fulbright grant and a UC Berkeley fellowship). Before joining QCRI, Preslav was a Research Fellow at the National University of Singapore. He has also spent a few months at the Bulgarian Academy of Sciences and the Sofia University, where he was an honorary lecturer. Preslav's research interests include lexical semantics (in particular, multi-word expressions, noun compounds syntax and semantics, and semantic relation extraction), machine translation, Web as a corpus, and biomedical text processing.

Preslav was involved in many activities related to lexical semantics, with focus on multi-word expressions. He is a member of the SIGLEX board, he is co-chairing SemEval 2014 and SemEval 2015, and he has co-organized several SemEval tasks, e.g., on the semantics of noun compounds and on semantic relation extraction. He has co-chaired previous editions of MWE (in 2009 and 2010) and of other semantics workshops such as RELMS, and he was an area chair of *SEM 2013. He was also a guest editor for the 2013 special issue of the journal of Natural Language Engineering on the syntax and semantics of noun compounds. In 2013, he has published a book on semantic relation extraction, and he has given a tutorial on the same topic at RANLP 2013.


The 60-year-old dream of computational linguistics is to make computers capable of communicating with humans in natural language. This has proven hard, and thus research has focused on sub-problems. Even so, the field was stuck with manual rules until the early 90s, when computers became powerful enough to enable the rise of statistical approaches. Eventually, this shifted the main research attention to machine learning from text corpora, thus triggering a revolution in the field.

Today, the Web is the biggest available corpus, providing access to quadrillions of words; and, in corpus-based natural language processing, size does matter. Unfortunately, while there has been substantial research on the Web as a corpus, it has typically been restricted to using page hit counts as an estimate for n-gram word frequencies; this has led some researchers to conclude that the Web should be only used as a baseline.

In this talk, we will reveal some of the hidden potential of the Web that lies beyond the n-gram, with focus on the syntax and semantics of English noun compounds. First, we will present a highly accurate lightly supervised approach based on surface markers and linguistically-motivated paraphrases that yields state-of-the-art results for noun compound bracketing: e.g., “[[liver cell] antibody]” is left-bracketed, while “[liver [cell line]]” is right-bracketed. Second, we will present a simple unsupervised method for mining implicit predicates that can characterize the semantic relations holding between the nouns in noun compounds, e.g., “malaria mosquito” is a “mosquito that carries/spreads/causes/transmits/brings/infects with/… malaria”. Finally, we will show how these ideas can be used to improve statistical machine translation.

Statistical Modelling of Metaphor

by Ekaterina Shutova – ICSI & University of California, Berkeley

Ekaterina Shutova is a Research Scientist at the International Computer Science Institute (ICSI) and the Institute for Cognitive and Brain Sciences (ICBS) at the University of California, Berkeley, USA. Her research is in the area of Natural Language Processing with a specific focus on metaphor and human creativity, and its computational and cognitive modeling. She is currently leading the new Metaphor Extraction research team at ICSI, the goal of which is to create robust and accurate tools that identify metaphorical expressions in unrestricted text using statistical methods. Previously, she was a Research Associate at DTAL and the Computer Laboratory, University of Cambridge, UK, where she worked on issues in computational lexical semantics. Ekaterina received her PhD in Computer Science from the University of Cambridge in 2011 and her doctoral dissertation concerned computational modeling of figurative language.


Besides making our thoughts more vivid and filling our communication with richer imagery, metaphor plays a fundamental structural role in our cognition, helping us organise and project knowledge. For example, when we say “a well-oiled political machine”, we view the concept of political system in terms of a mechanism and transfer inferences from the domain of mechanisms onto our reasoning about political processes. Highly frequent in text, metaphorical language represents a significant challenge for natural language processing (NLP) systems; and large-scale, robust and accurate metaphor processing tools are needed to improve the overall quality of semantic interpretation in today's language technology. In this talk I will introduce statistical models of metaphor identification and interpretation and discuss how statistical techniques can be applied to identify patterns of the use of metaphor in linguistic data and to generalize its higher-level mechanisms from text.