The 9th Workshop on Multiword Expressions (MWE 2013)
Workshop at NAACL 2013 (Atlanta, Georgia, USA), June 13/14, 2013
Endorsed by the Special Interest Group on the Lexicon of the Association for Computational Linguistics (SIGLEX)
Last updated: Apr 11, 2013
The Far Reach of Multiword Expressions in Educational Technology
by Jill Burstein - Educational Testing Service
Jill Burstein is a managing principal research scientist in the Research & Development division at Educational Testing Service in Princeton, New Jersey. Her background and expertise is in computational linguistics with a focus on educational applications for writing, reading, and teacher professional development. She holds 13 patents for educational technology inventions. Jill’s inventions include e-rater®, an automated essay scoring and evaluation system. And, in more recent work, she has leveraged natural language processing to develop Language MuseSM, a teacher professional development application that supports teachers in the development of language-based instruction that aids English learner content understanding and language skills development. She received her B.A. in Linguistics and Spanish from New York University, and her M.A. and Ph.D. in Linguistics from the Graduate Center, City University of New York.
Multiword expressions as they appear as nominal compounds, collocational forms, and idioms are now leveraged in educational technology in assessment and instruction contexts. The talk will focus on how multiword expression identification is used in different kinds of educational applications, including automated essay evaluation, and teacher professional development in curriculum development for English language learners. Recent approaches developed to resolve polarity for noun-noun compounds in a sentiment system being designed to handle evaluation of argumentation (sentiment) in test-taker writing (Beigman-Klebanov, Burstein, and Madnani, to appear) will also be described.
Beigman Klebanov, B., Burstein, J., and Madnani, N. (to appear). Sentiment Profiles of Multi-Word Expressions in Test-Taker Essays: The Case of Noun-Noun Compounds. ACM Transactions for Speech and Language Processing, Special Issue on Multiword Expressions: From Theory to Practice (Eds. V. Kardoni, C. Ramisch, and A. Villavicencio).
Modelling the internal variability of MWEs
by Malvina Nissim - University of Bologna
Malvina Nissim is a tenured researcher in computational linguistics at the University of Bologna. Her research focuses on the computational handling of several lexical semantics and discourse phenomena, such as the choice of referring expressions, semantic relations within compounds and in argument structure, multiword expressions, and, more recently, on the annotation and automatic detection of modality. She is also a co-founder and promoter of the Senso Comune project, devoted to the creation of a common knowledge base for Italian via crowdsourcing. She graduated in Linguistics from the University of Pisa, and obtained her PhD in Linguistics from the University of Pavia. Before joining the University of Bologna she was a post-doc at the University of Edinburgh and at the Institute for Cognitive Science and Technology in Rome.
The issue of flexibility of multiword expressions (MWEs) is crucial towards their identification and extraction in running text, as well as their better understanding from a linguistic perspective. If we project a large MWE lexicon onto a corpus, projecting fixed forms suffers from low recall, while an unconstrained flexible search for lemmas yields a loss in precision. In this talk, I will describe a method aimed at maximising precision in the identification of MWEs in flexible mode, building on the idea that internal variability can be modelled via so-called variation patterns. I will discuss the advantages and limitations of using variation patterns, compare their performance to that of association measures, and explore their usability in MWE extraction, too.
Complex Predicates are Multi-word Expressions
by Martha Palmer - University of Colorado at Boulder
Martha Palmer is a Professor of Linguistics and Computer Science, and a Fellow of the Institute of Cognitive Science at the University of Colorado. Her current research is aimed at building domain-independent and language independent techniques for semantic interpretation based on linguistically annotated data, such as Proposition Banks. She has been the PI on NSF, NIH and DARPA projects for linguistic annotation (syntax, semantics and pragmatics) of English, Chinese, Korean, Arabic and Hindi. She has been a member of the Advisory Committee for the DARPA TIDES program, Chair of SIGLEX, Chair of SIGHAN, a past President of the Association for Computational Linguistics, and is a Co-Editor of JNLE and of LiLT and is on the CL Editorial Board. She received her Ph.D. in Artificial Intelligence from the University of Edinburgh in 1985.
Practitioners of English Natural Language Processing often feel fortunate because their tokens are clearly marked by spaces on either side. However, the spaces can be quite deceptive, since they ignore the boundaries of multi-word expressions, such as noun-noun compounds, verb particle constructions, light verb constructions and constructions from Construction Grammar, e.g., caused-motion constructions and resultatives. Correctly identifying and handling these types of expressions can be quite challenging, even from the viewpoint of manual annotation. This talk will review the pervasive nature of these constructions, touching on Arabic and Hindi as well as English. Using several illustrative examples from newswire and medical informatics, current best practices for annotation and automatic identification will be described, with an emphasis on contributions from predicate argument structures.