Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
Workshop at COLING 2018 (Santa Fe, USA), August 25-26, 2018
Organized and sponsored by the Special Interest Group for Annotation (SIGANN) and the Special Interest Group on the Lexicon (SIGLEX) of the Association for Computational Linguistics (ACL). Endorsed also by the Special Interest Group on Computational Semantics (SIGSEM).
This joint event brings together:
- The 12th Linguistic Annotation Workshop (LAW XII), and
- The 14th Workshop on Multiword Expressions (MWE).
Last updated: Aug 13, 2018
- NEW! [Aug 13]: The LAW-MWE-CxG proceedings are available on the ACL Anthology.
- NEW! [May 24]: COLING 2018 has changed its registration policy. Now a registration to a workshop only is possible. See the COLING home page for details.
- NEW! [May 23]: The submission deadline has been extended until Sunday, May, 27. We have also extended the page limits for all papers by 1 page, to be consistent with the COLING policy.
- NEW! [May 22]: Remote presentations of papers will be possible in case of visa denial. In order to qualify for this service, the author will need to prove that he or she was denied a visa, i.e. provide the following: (i) the embassy location where the application was made, (ii) the DS160 case number (this is given when the applicant has filed and paid for a formal application). US State Department website will be used to check whether the visa was actually refused.
- NEW!
[May 18]: May COLING 2018 has published its registration fees. Note that the registration to the main conference is mandatory - it is not possible to register only to the workshop. This policy is now being discussed among the conference and workshop organizers. More details soon.
Description
This workshop addresses, within a joint event, three domains - linguistic annotation, multiword expressions and grammatical constructions - with partly overlapping communities and research interests, but relatively divergent practices and terminologies.
Linguistic annotation of natural language corpora is the backbone of supervised methods for statistical natural language processing. It also provides valuable data for evaluation of both rule-based and supervised systems and can help formalize and study linguistic phenomena. Challenges posed by creation/evaluation of annotation schemes, automatic and manual annotation, use and evaluation of annotation software and frameworks, or representation of linguistic data and annotations, have been addressed for the last decade within the Linguistic Annotation Workshop (LAW) organised yearly by the SIGANN.
The domain of multiword expressions (MWEs) is orthogonal to linguistic annotation since it addresses one particular linguistic phenomenon across various NLP modelling and processing layers or practices (including annotation). MWEs are word combinations, such as all of a sudden, a hot dog, to pay a visit or to pull one's leg, which exhibit lexical, syntactic, semantic, pragmatic and/or statistical idiosyncracies. They encompass closely related linguistic objects such as idioms, compounds, light verb constructions, rhetorical figures, institutionalised phrases or collocations. Modelling and computational aspects of MWEs have been covered by the Multiword Expression Workshop, organised over the past years by the MWE section of SIGLEX. Due to their unpredictable behavior, and most prominently their non-compositional semantics, MWEs pose special problems in linguistic modelling (e.g. treebank annotation and grammar engineering), in NLP pipelines (e.g. when their orchestration with parsing is concerned), and in end-use applications (e.g. information extraction or machine translation).
These challenges are magnified when larger classes of idiosyncratic units are considered, namely grammatical constructions, i.e. conventional associations of lexical, syntactic, and pragmatic information, such as the-Adj-more-Adj (the more the merrier, the higher the better, etc.). In the framework of Construction Grammar (CxG), linguistic knowledge is captured in an inventory of form-meaning pairings of varying degrees of internal complexity and lexical fixedness. Thus, MWEs can be seen as special types of constructions: those in which constraints of a lexical nature are particularly strong. The potential new insights to be gained from bringing MWE and construction studies together are mutual. On the one hand, computational approaches to MWEs usually take binary decisions about units of language (MWE vs. non-MWE), i.e. the fact that MWEs occupy a "continuum of compositionality" is neglected. Construction-oriented modelling, conversely, paves the way towards a more nuanced representation of MWE idiosyncrasies. On the other hand, most grammatical constructions display considerable flexibility, therefore their discovery and description is a highly complex and labor-intensive process. This process might be largely facilitated if recent computational achievements for MWEs could be extended to constructions.
Annotation of grammatical constructions in training data could improve machine translation and information extraction, especially cross-lingually, as meanings that are similar across languages (like comparison) can be expressed in drastically different forms. However, annotation of constructions poses significant challenges: because constructions are form-meaning pairs that can be more or less fluid in form, determining the annotation units for a construction is not straightforward. As a result, strategies for choosing annotation units may vary greatly among annotators and projects depending on a range of factors, from practical concerns (intended use, processing constraints) to concerns imposed by an underlying theory. Annotation of grammatical constructions is therefore an area that offers rich opportunities for identifying principled annotation strategies, accommodating different perspectives on a given phenomenon, and finding ways to allow for harmonization of annotations not only from different sources, but also at different linguistic levels.
For the above reasons, grammatical constructions were elected as a joint focus of interest both by the MWE and the LAW community. We call for papers focusing on research related (but not limited) to the following topics.
Joint topics on constructions, annotation, and MWEs
- MWE and construction annotation in corpora and treebanks
- MWE and construction representation in manually and automatically constructed lexical resources
- Extending MWE discovery and identification methods to constructions
- MWEs and constructions (and their annotations) in language acquisition and in non-standard language (e.g. tweets, forums, spontaneous speech)
- Evaluation of MWE and construction annotation and processing techniques
- Computationally-applicable theoretical studies on MWEs and constructions in psycholinguistics, corpus linguistics and grammar formalisms, and/or how such studies can impact annotation of constructions
Annotation-specific topics
- Annotation procedures, whether manual or automatic, including machine learning and knowledge-based methods
- Maintenance and interactive exploration of annotation structures and annotated data
- Qualitative and quantitative annotation evaluation
- Linguistic considerations, representation formats and exploration tools for merged annotations of different phenomena
- Standards, best practices, documentation, interoperability, and comparison of annotation schemes
- Development, evaluation and innovative use of annotation software frameworks
MWE-specific topics
- Original MWE discovery and identification methods
- MWE processing in syntactic and semantic frameworks (e.g. HPSG, LFG, TAG, universal dependencies, WSD, semantic parsing), and in end-user applications (e.g. summarization, machine translation)
Special Track: PARSEME Shared Task on Automatic Verbal MWE Identification
The LAW-MWE-CxG-2018 workshop hosts edition 1.1 of the PARSEME shared task on automatic verbal MWE identification (see below). This initiative is a follow-up of edition 1.0 in 2017, which attracted 7 systems working on 18 languages in total. In 2018, we extend the scope to new languages. A separate session will be allocated for the shared task track within the workshop, featuring presentations of the participating systems.
Submission modalities
Note that we have extended the page limits for all papers by 1 page, to be consistent with the COLING policy.
Regular research track:
- Long papers (9 content pages + references): They should report on solid and finished research including new experimental results, resources and/or techniques.
- Short papers (5 content pages + references): They should report on small experiments, focused contributions, ongoing research, negative results and/or philosophical discussion.
In regular research papers, the reported research should be substantially original. Papers available as preprints can also be submitted provided that they fulfil the conditions defined by the ACL Policies for Submission, Review and Citation.
Shared task track:
- System description papers (5 content pages + references): These papers should briefly describe the approach implemented to solve the problem. They may include references and links to more detailed descriptions in other documents.
Shared task system description papers will go through a separate reviewing process. Submissions will be reviewed by the shared task organizers and participants. Participants of the shared task are not required to submit system description papers, and their acceptance depends on the quality of the paper rather than on the results obtained in the shared task.
Instructions for authors:
For all 3 types of papers, the submission is double-blind as per the COLING guidelines. There is no limit on the number of reference pages. Authors will be granted an extra page for the final version of their papers.
All papers will be presented orally or as posters, as determined by the Program Committee chairs. No distinction between papers presented orally or as posters is made in the workshop proceedings.
For all types of submission, the COLING 2018 LaTeX templates should be used. All papers should be submitted via the START space:
https://www.softconf.com/coling2018/ws-LAW-MWE-CxG-2018/
Please choose the appropriate track (research/shared task) and submission modality (long/short).
Important dates
All deadlines are at 23:59 UTC-12 (anywhere in the world).
Workshop Organizers
- Nancy Ide, Vassar College (USA)
- Adam Meyers, New York University (USA)
- Carlos Ramisch, Aix Marseille University (France)
- Agata Savary, Université François Rabelais Tours (France)
Program Committee Chairs
- Jena Hwang, Institute for Human and Machine Cognition (USA)
- Miriam R L Petruck, ICSI (USA)
- Sameer Pradhan, cemantix.org and Vassar College, New York (USA)
- Carlos Ramisch, Aix Marseille University (France)
- Agata Savary, Université François Rabelais Tours (France)
- Nathan Schneider, Georgetown University (USA)
Publication Chairs
- Melanie Andresen, Hamburg University (Germany)
- Agata Savary, Université François Rabelais Tours (France)
Publicity Chairs
- Adam Meyers, New York University (USA)
- Agata Savary, Université François Rabelais Tours (France)
Contact
For any inquiries regarding the workshop please send an email to lawmwecxg2018@gmail.com
Anti-harassment policy
The workshop supports the ACL anti-harassment policy.