Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
Workshop at COLING 2018 (Santa Fe, USA), August 25-26, 2018
Organized and funded by the Special Interest Group for Annotation (SIGANN), the Special Interest Group on the Lexicon (SIGLEX), and the Special Interest Group on Computational Semantics (SIGSEM) of the Association for Computational Linguistics (ACL)
Last updated: July 16, 2018
Keynote speakers
Annotation Schemes for Surface Construction Labeling
Lori Levin – Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA
![]() |
Lori Levin received a B.A. in linguistics from the University of Pennsylvania in 1979 and a Ph.D. in linguistics from MIT in 1986. She is a Research Professor at the Language Technologies Institute at Carnegie Mellon University, specializing in language technologies for low-resource languages. She is also co-Chair of the North American Computational Linguistics Olympiad. |
Abstract
In this talk I will describe the interaction of linguistics and language technologies in Surface Construction Labeling (SCL) from the perspective of corpus annotation tasks such as definiteness, modality, and causality. Linguistically, following Construction Grammar, SCL recognizes that meaning may be carried by morphemes, words, or arbitrary constellations of morpho-lexical elements. SCL is like Shallow Semantic Parsing in that it does not attempt a full compositional analysis of meaning, but rather identifies only the main elements of a semantic frame, where the frames may be invoked by constructions as well as lexical items. Computationally, SCL is different from tasks such as information extraction in that it deals only with meanings that are expressed in a conventional, grammaticalized way and does not address inferred meanings. I review the work of Dunietz (2018) on the labeling of causal frames including causal connectives and cause and effect arguments. I will describe how to design an annotation scheme for SCL, including isolating basic units of form and meaning and building a "constructicon". I will conclude with remarks about the nature of universal categories and universal meaning representations in language technologies. This talk describes joint work with Jaime Carbonell, Jesse Dunietz, Nathan Schneider, and Miriam Petruck.
Reference: Dunietz, Jesse (2018) Annotating and Automatically Tagging Constructions of Causal Language. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA.
From Lexical Functional Grammar to Enhanced Universal Dependencies
Adam Przepiórkowski – Institute of Philosophy, University of Warsaw, and Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland (joint work with Agnieszka Patejuk )
![]() |
As a computational and corpus linguist, Adam Przepiórkowski has led NLP projects resulting in the development of various tools and resources for Polish, including the National Corpus of Polish and tools for its manual and automatic annotation, and has worked on topics ranging from deep and shallow syntactic parsing to corpus search engines and valency dictionaries. As a theoretical linguist, he has worked on case assignment (mainly within Head-driven Phrase Structure Grammar) and the morphosyntax of Polish, and, more recently, on topics including the syntax and semantics of the argument/adjunct dichotomy, distributivity and coordination (within Lexical Functional Grammar and Glue Semantics). |
Abstract
The aim of this talk is twofold. First, I will present the process and the result of an automatic conversion of a corpus of Polish annotated with constituent and functional LFG structures to the Universal Dependencies standard, making heavy use of enhanced dependencies. Second, this conversion exercise will provide the basis for a more general discussion of the strengths and limitations of the current version of UD. (This is joint work with Agnieszka Patejuk.)
Reference: Agnieszka Patejuk, Adam Przepiórkowski (2018) From Lexical Functional Grammar to Enhanced Universal Dependencies. Linguistically informed treebanks of Polish. Institute of Computer Science Polish Academy of Sciences, Warszawa.
Leaving no token behind: comprehensive (and delicious) annotation of MWEs and supersenses
Nathan Schneider – Georgetown University, USA
![]() |
Nathan Schneider is an annotation schemer and computational modeler for natural language. As Assistant Professor of Linguistics and Computer Science at Georgetown University, he looks for synergies between practical language technologies and the scientific study of language. He specializes in broad-coverage semantic analysis: designing linguistic meaning representations, annotating them in corpora, and automating them with statistical natural language processing techniques. A central focus in this research is the nexus between grammar and lexicon as manifested in multiword expressions and adpositions/case markers. He has inhabited UC Berkeley (BA in Computer Science and Linguistics), Carnegie Mellon University (Ph.D. in Language Technologies), and the University of Edinburgh (postdoc). Now a Hoya and leader of NERT, he continues to play with data and algorithms for linguistic meaning. |
Abstract
I will describe an unorthodox approach to lexical semantic annotation that prioritizes corpus coverage, democratizing analysis of a wide range of expression types. I argue that a lexicon-free lexical semantics—defined in terms of units and supersense tags—is an appetizing direction for NLP, as it is robust, cost-effective, easily understood, not too language-specific, and can serve as a foundation for richer semantic structure. Linguistic delicacies from the STREUSLE and DiMSUM corpora, which have been multiword- and supersense-annotated, attest to the veritable smörgåsbord of noncanonical constructions in English, including various flavors of prepositions, MWEs, and other curiosities.


