Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

Workshop proposal for COLING 2018 (Santa Fé, USA) or ACL 2018 (Melbourne, Australia)

Endorsed by the Special Interest Group for Annotation (SIGANN), the Special Interest Group on the Lexicon (SIGLEX), and the Special Interest Group on Computational Semantics (SIGSEM) of the Association for Computational Linguistics (ACL)

Last updated: November 9, 2017


This workshop proposal addresses, within a joint event, three domains - linguistic annotation, multiword expressions and grammatical constructions - with partly overlapping communities and research interests, but relatively divergent practices and terminologies.

Linguistic annotation of natural language corpora is the backbone of supervised methods for statistical natural language processing. It also provides valuable data for evaluation of both rule-based and supervised systems and can help formalize and study linguistic phenomena. Challenges posed by creation/evaluation of annotation schemes, automatic and manual annotation, use and evaluation of annotation software and frameworks, or representation of linguistic data and annotations, have been addressed for the last decade within the Linguistic Annotation Workshop (LAW) organised yearly by the SIGANN.

The domain of multiword expressions (MWEs) is orthogonal to linguistic annotation since it addresses one particular linguistic phenomenon across various NLP modelling and processing layers or practices (including annotation). MWEs are word combinations, such as all of a sudden, a hot dog, to pay a visit or to pull one's leg, which exhibit lexical, syntactic, semantic, pragmatic and/or statistical idiosyncracies. They encompass closely related linguistic objects such as idioms, compounds, light verb constructions, rhetorical figures, institutionalised phrases or collocations. Modelling and computational aspects of MWEs have been covered by the Multiword Expression Workshop, organised over the past years by the MWE section of SIGLEX. Due to their unpredictable behavior, and most prominently their non-compositional semantics, MWEs pose special problems in linguistic modelling (e.g. treebank annotation and grammar engineering), in NLP pipelines (e.g. when their orchestration with parsing is concerned), and in end-use applications (e.g. information extraction or machine translation).

These challenges are magnified when larger classes of idiosyncratic units are considered, namely grammatical constructions, i.e. conventional associations of lexical, syntactic, and pragmatic information, such as the-Adj-more-Adj (the more the merrier, the higher the better, etc.). In the framework of Construction Grammar (CxG), linguistic knowledge is captured in an inventory of form-meaning pairings of varying degrees of internal complexity and lexical fixedness. Thus, MWEs can be seen as special types of constructions: those in which constraints of a lexical nature are particularly strong. The potential new insights to be gained from bringing MWE and construction studies together are mutual. On the one hand, computational approaches to MWEs usually take binary decisions about units of language (MWE vs. non-MWE), i.e. the fact that MWEs occupy a "continuum of compositionality" is neglected. Construction-oriented modelling, conversely, paves the way towards a more nuanced representation of MWE idiosyncrasies. On the other hand, most grammatical constructions display considerable flexibility, therefore their discovery and description is a highly complex and labor-intensive process. This process might be largely facilitated if recent computational achievements for MWEs could be extended to constructions.

Annotation of grammatical constructions in training data could improve machine translation and information extraction, especially cross-lingually, as meanings that are similar across languages (like comparison) can be expressed in drastically different forms. However, annotation of constructions poses significant challenges: because constructions are form-meaning pairs that can be more or less fluid in form, determining the annotation units for a construction is not straightforward. As a result, strategies for choosing annotation units may vary greatly among annotators and projects depending on a range of factors, from practical concerns (intended use, processing constraints) to concerns imposed by an underlying theory. Annotation of grammatical constructions is therefore an area that offers rich opportunities for identifying principled annotation strategies, accommodating different perspectives on a given phenomenon, and finding ways to allow for harmonization of annotations not only from different sources, but also at different linguistic levels.

For the above reasons, grammatical constructions were elected as a joint focus of interest both by the MWE and the LAW community. We will call for papers focusing on research related (but not limited) to the following topics.

Special Track: PARSEME Shared Task on Automatic Verbal MWE Identification

We propose for LAW-MWE-CxG-2018 to host edition 1.1 of the PARSEME shared task on automatic verbal MWE identification (see below). This initiative will be a follow-up of edition 1.0 in 2017, which attracted 7 systems working on 18 languages in total. In 2018, we will extend the scope to new languages (23 in total), which should attract a larger number of systems. A separate session will be allocated for the shared task track within the workshop, featuring presentations of the participating systems. Authors may submit papers either to the special track or to the regular workshop. They should follow common submission instructions, based on those of the main conference.

Submission modalities

We envision the following submission formats for regular research papers, presented at the sessions of the three regular tracks (LAW-MWE-CxG, LAW, or MWE):

The shared task track would feature:

There is no limit on the number of reference pages. Authors will be granted an extra page for the final version of their papers.

For the three regular tracks, submission will be double-blind. The reported research should be substantially original. Papers available as preprints can also be submitted provided that they fulfil the conditions defined by the new ACL Policies for Submission, Review and Citation. The papers will be presented orally or as posters.

The shared task system description papers will go through a separate reviewing process. Like in SEMEVAL, submissions will be double-blind and will be reviewed by the shared task organizers and participants according to the schedule below. The selected papers will be presented as posters or oral presentations, depending on the attributed length of the workshop (1 or 2 days). Participants of the shared task are not required to submit system description papers, and their acceptance depends on the quality of the paper rather than on the results obtained in the shared task.

