Shared task on automatic identification of verbal multiword expressions - edition 1.1
Organized as part of the LAW-MWE-CxG 2018 workshop co-located with COLING 2018 (Santa Fe, USA), August 25-26, 2018
Last updated: August 16, 2018
- NEW! The PARSEME corpus edition 1.1 (used in this shared task) is now available via the CLARIN/LINDAT infrastructure.
- NEW! System results are now available. Congratulations to all participants! (May 11)
- NEW! The gold test data for all 20 languages is now available (May 11)
- Good news: DEADLINE EXTENDED UNTIL MAY 08 for the submission of system results! (May 3)
- The blind test data for all 20 languages is now available (April 30)
- We have released new version of the evaluation script and a new script to calculate the macro-averages across languages. (April 23)
- An extra corpus for Arabic is now available through LDC, see the instructions (April 22)
- The full training/development data for all 19 languages (including Basque and Hebrew) is now available (April 12)
- The training/development data for 17 languages is now available (April 5).
- Trial data, the definition of the .cupt format, the description of evaluation measures and the evaluation script are now available.
Description
The second edition of the PARSEME shared task on automatic identification of verbal multiword expressions (VMWEs) aims at identifying verbal MWEs in running texts. Verbal MWEs include, among others, idioms (to let the cat out of the bag), light verb constructions (to make a decision), verb-particle constructions (to give up), multi-verb constructions (to make do) and inherently reflexive verbs (se suicider 'to suicide' in French). Their identification is a well-known challenge for NLP applications, due to their complex characteristics including discontinuity, non-compositionality, heterogeneity and syntactic variability.
The shared task is highly multilingual: PARSEME members have elaborated annotation guidelines based on annotation experiments in about 20 languages from several language families. These guidelines take both universal and language-specific phenomena into account. We hope that this will boost the development of language-independent and cross-lingual VMWE identification systems.
Participation policies
The evaluation phase of the shared task is now over, but you can find useful information about the shared task on this page.
Participation is open and free worldwide. We ask potential participant teams to register using the expression of interest form. Task updates and questions will be posted to our public mailing list. More details on the annotated corpora can be found in a dedicated PARSEME page. See also the annotation guidelines used in manual annotation of the training/development and test sets. See also the description of of evaluation measures and the evaluation script.
It should be noted that a large international community has been gathered (via the PARSEME network) around the effort of putting forward universal guidelines and performing corpus annotations. Our policy was to allow the same national teams, which provided annotated corpora, to also submit VMWE identification systems for the shared task. While this policy is non-standard and introduces a bias to system evaluation, we follow it for several reasons:
- For many languages there are only very few NLP teams, so adopting an exclusive approach (either you annotate or you present a system but not both) would actually exclude the whole language from participation.
- We are interested more in cross-language discussions than in a real competition.
- We admit that we can trust the teams to respect some best practices, including those:
- The test data are never used for training/development, even if system authors have access to them in advance.
- If any resources were used to annotate the corpus, the same resources should not be used by the system (in the open track).
- If system authors notice other sources of bias between their annotating activity and system evaluation, they should describe them in the submitted papers (if any).
Submission of results and system description paper
Shared task participants are invited to submit input of two kinds to the SHARED TASK TRACK of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG 2018):
- System results (by
May 4May 8) obtained on the blind data (released on 30 April). The results for all languages should be submitted in a single .zip archive containing a single folder per language, named according to the ISO 639-1 code (e.g. FR/ for French, SL/ for Slovene, etc.). Each output file must be named test.system.cupt and conform to the cupt format. The format of each file should be checked before submission by the validation script as follows:- ./validate_cupt.py --input test.system.cupt
- A system description paper (by May 25). These papers must follow the LAW-MWE-CxG workshop submission instructions and will go through double-blind peer reviewing by other participants and selected LAW-MWE-CxG 2018 Program Committee members. Their acceptance depends on the quality of the paper rather than on the results obtained in the shared task. Authors of the accepted papers will present their work as posters/demos in a dedicated session of the workshop. The submission of a system description paper is not mandatory.
The submission of system results and of a system description paper should be made via the dedicated START space:
https://www.softconf.com/coling2018/ws-LAW-MWE-CxG-2018
Provided data
The PARSEME corpus edition 1.1 (used in this shared task) is available via the CLARIN/LINDAT infrastructure.
The shared task covers 20 languages: Arabic (AR), Bulgarian (BG), German (DE), Greek (EL), English (EN), Spanish (ES), Basque (EU), Farsi (FA), French (FR), Hindi (HI), Hebrew (HE), Croatian (HR), Hungarian (HU), Italian (IT), Lithuanian (LT), Polish (PL), Brazilian Portuguese (PT), Romanian (RO), Slovenian (SL), Turkish (TR).
For each language, we provide corpora (in the .cupt format) in which VMWEs are annotated according to universal guidelines:
- Manually annotated training corpora made available to the participants in advance, in order to allow them to train their systems.
- Manually annotated development corpora also made available in advance so as to tune/optimize the systems' parameters.
- Raw (unannotated) test corpora to be used as input to the systems during the evaluation phase. The VMWE annotations in this corpus will be kept secret.
For most languages, morphosyntactic data (parts of speech, lemmas, morphological features and/or syntactic dependencies) are also be provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe).
Our public GitLab repository contains:
- trial data in .cupt format in English
- training/development data in .cupt format in all participating languages
- the evaluation script to calculate per-language evaluation scores (its extension to marco-average scores for all languages will be published shortly)
The table below summarizes the sizes of the training/development/test corpora per language:
| Lang-split | Sentences | Tokens | Avg. length | VMWE | VID | IRV | LVC.full | LVC.cause | VPC.full | VPC.semi | IAV | MVC | LS.ICV |
| AR-train | 2370 | 231030 | 97.4 | 3219 | 1272 | 17 | 940 | 0 | 957 | 0 | 0 | 33 | 0 |
| AR-dev | 387 | 16252 | 41.9 | 500 | 17 | 0 | 419 | 0 | 64 | 0 | 0 | 0 | 0 |
| AR-test | 380 | 17962 | 47.2 | 500 | 31 | 0 | 410 | 0 | 59 | 0 | 0 | 0 | 0 |
| AR-Total | 3137 | 265244 | 84.5 | 4219 | 1320 | 17 | 1769 | 1080 | 33 | ||||
| BG-train | 17813 | 399173 | 22.4 | 5364 | 1005 | 2729 | 1421 | 135 | 0 | 0 | 74 | 0 | 0 |
| BG-dev | 1954 | 42020 | 21.5 | 670 | 173 | 240 | 214 | 35 | 0 | 0 | 8 | 0 | 0 |
| BG-test | 1832 | 39220 | 21.4 | 670 | 82 | 254 | 274 | 52 | 0 | 0 | 8 | 0 | 0 |
| BG-Total | 21599 | 480413 | 22.2 | 6704 | 1260 | 3223 | 1909 | 222 | 0 | 0 | 90 | 0 | 0 |
| DE-train | 6734 | 130588 | 19.3 | 2820 | 977 | 220 | 218 | 28 | 1264 | 113 | 0 | 0 | 0 |
| DE-dev | 1184 | 22146 | 18.7 | 503 | 181 | 48 | 34 | 2 | 221 | 17 | 0 | 0 | 0 |
| DE-test | 1078 | 20559 | 19 | 500 | 183 | 40 | 42 | 2 | 210 | 23 | 0 | 0 | 0 |
| DE-Total | 8996 | 173293 | 19.2 | 3823 | 1341 | 308 | 294 | 32 | 1695 | 153 | 0 | 0 | 0 |
| EL-train | 4427 | 122458 | 27.6 | 1404 | 395 | 0 | 938 | 44 | 19 | 0 | 0 | 8 | 0 |
| EL-dev | 2562 | 66431 | 25.9 | 500 | 81 | 0 | 376 | 34 | 8 | 0 | 0 | 1 | 0 |
| EL-test | 1261 | 35873 | 28.4 | 501 | 169 | 0 | 308 | 11 | 11 | 0 | 0 | 2 | 0 |
| EL-Total | 8250 | 224762 | 27.2 | 2405 | 645 | 0 | 1622 | 89 | 38 | 0 | 0 | 11 | 0 |
| EN-train | 3471 | 53201 | 15.3 | 331 | 60 | 0 | 78 | 7 | 151 | 19 | 16 | 0 | 0 |
| EN-test | 3965 | 71002 | 17.9 | 501 | 79 | 0 | 166 | 36 | 146 | 26 | 44 | 4 | 0 |
| EN-Total | 7436 | 124203 | 16.7 | 832 | 139 | 0 | 244 | 43 | 297 | 45 | 60 | 4 | 0 |
| ES-train | 2771 | 96521 | 34.8 | 1739 | 167 | 479 | 223 | 36 | 0 | 0 | 360 | 474 | 0 |
| ES-dev | 698 | 26220 | 37.5 | 500 | 65 | 114 | 84 | 17 | 0 | 0 | 87 | 133 | 0 |
| ES-test | 2046 | 59623 | 29.1 | 500 | 95 | 121 | 85 | 28 | 1 | 0 | 64 | 106 | 0 |
| ES-Total | 5515 | 182364 | 33 | 2739 | 327 | 714 | 392 | 81 | 1 | 0 | 511 | 713 | 0 |
| EU-train | 8254 | 117165 | 14.1 | 2823 | 597 | 0 | 2074 | 152 | 0 | 0 | 0 | 0 | 0 |
| EU-dev | 1500 | 21604 | 14.4 | 500 | 104 | 0 | 382 | 14 | 0 | 0 | 0 | 0 | 0 |
| EU-test | 1404 | 19038 | 13.5 | 500 | 73 | 0 | 410 | 17 | 0 | 0 | 0 | 0 | 0 |
| EU-Total | 11158 | 157807 | 14.1 | 3823 | 774 | 0 | 2866 | 183 | 0 | 0 | 0 | 0 | 0 |
| FA-train | 2784 | 45153 | 16.2 | 2451 | 17 | 1 | 2433 | 0 | 0 | 0 | 0 | 0 | 0 |
| FA-dev | 474 | 8923 | 18.8 | 501 | 0 | 0 | 501 | 0 | 0 | 0 | 0 | 0 | 0 |
| FA-test | 359 | 7492 | 20.8 | 501 | 0 | 0 | 501 | 0 | 0 | 0 | 0 | 0 | 0 |
| FA-Total | 3617 | 61568 | 17 | 3453 | 17 | 1 | 3435 | 0 | 0 | 0 | 0 | 0 | 0 |
| FR-train | 17225 | 432389 | 25.1 | 4550 | 1746 | 1247 | 1470 | 68 | 0 | 0 | 0 | 19 | 0 |
| FR-dev | 2236 | 56254 | 25.1 | 629 | 207 | 154 | 252 | 15 | 0 | 0 | 0 | 1 | 0 |
| FR-test | 1606 | 39489 | 24.5 | 498 | 212 | 108 | 160 | 14 | 0 | 0 | 0 | 4 | 0 |
| FR-Total | 21067 | 528132 | 25 | 5677 | 2165 | 1509 | 1882 | 97 | 0 | 0 | 0 | 24 | 0 |
| HE-train | 12106 | 237472 | 19.6 | 1236 | 519 | 0 | 545 | 113 | 59 | 0 | 0 | 0 | 0 |
| HE-dev | 3385 | 65843 | 19.4 | 501 | 258 | 0 | 148 | 61 | 34 | 0 | 0 | 0 | 0 |
| HE-test | 3209 | 65698 | 20.4 | 502 | 182 | 0 | 211 | 49 | 60 | 0 | 0 | 0 | 0 |
| HE-Total | 18700 | 369013 | 19.7 | 2239 | 959 | 0 | 904 | 223 | 153 | 0 | 0 | 0 | 0 |
| HI-train | 856 | 17850 | 20.8 | 534 | 23 | 0 | 321 | 14 | 0 | 0 | 0 | 176 | 0 |
| HI-test | 828 | 17580 | 21.2 | 500 | 38 | 0 | 320 | 12 | 0 | 0 | 0 | 130 | 0 |
| HI-Total | 1684 | 35430 | 21 | 1034 | 61 | 0 | 641 | 26 | 0 | 0 | 0 | 306 | 0 |
| HR-train | 2295 | 53486 | 23.3 | 1450 | 113 | 468 | 303 | 45 | 0 | 0 | 521 | 0 | 0 |
| HR-dev | 834 | 19621 | 23.5 | 500 | 34 | 139 | 143 | 26 | 1 | 0 | 157 | 0 | 0 |
| HR-test | 708 | 16429 | 23.2 | 501 | 33 | 118 | 131 | 31 | 0 | 0 | 188 | 0 | 0 |
| HR-Total | 3837 | 89536 | 23.3 | 2451 | 180 | 725 | 577 | 102 | 1 | 0 | 866 | 0 | 0 |
| HU-train | 4803 | 120013 | 24.9 | 6205 | 84 | 0 | 892 | 363 | 4131 | 735 | 0 | 0 | 0 |
| HU-dev | 601 | 15564 | 25.8 | 779 | 10 | 0 | 85 | 10 | 539 | 135 | 0 | 0 | 0 |
| HU-test | 755 | 20759 | 27.4 | 776 | 10 | 0 | 166 | 28 | 486 | 86 | 0 | 0 | 0 |
| HU-Total | 6159 | 156336 | 25.3 | 7760 | 104 | 0 | 1143 | 401 | 5156 | 956 | 0 | 0 | 0 |
| IT-train | 13555 | 360883 | 26.6 | 3254 | 1098 | 942 | 544 | 147 | 66 | 0 | 414 | 23 | 20 |
| IT-dev | 917 | 32613 | 35.5 | 500 | 197 | 106 | 100 | 19 | 17 | 2 | 44 | 6 | 9 |
| IT-test | 1256 | 37293 | 29.6 | 503 | 201 | 96 | 104 | 25 | 23 | 0 | 41 | 5 | 8 |
| IT-Total | 15728 | 430789 | 27.3 | 4257 | 1496 | 1144 | 748 | 191 | 106 | 2 | 499 | 34 | 37 |
| LT-train | 4895 | 90110 | 18.4 | 312 | 106 | 0 | 195 | 11 | 0 | 0 | 0 | 0 | 0 |
| LT-test | 6209 | 118402 | 19 | 500 | 202 | 0 | 284 | 14 | 0 | 0 | 0 | 0 | 0 |
| LT-Total | 11104 | 208512 | 18.7 | 812 | 308 | 0 | 479 | 25 | 0 | 0 | 0 | 0 | 0 |
| PL-train | 13058 | 220465 | 16.8 | 4122 | 373 | 1785 | 1531 | 180 | 0 | 0 | 253 | 0 | 0 |
| PL-dev | 1763 | 26030 | 14.7 | 515 | 57 | 245 | 153 | 33 | 0 | 0 | 27 | 0 | 0 |
| PL-test | 1300 | 27823 | 21.4 | 515 | 73 | 249 | 149 | 15 | 0 | 0 | 29 | 0 | 0 |
| PL-Total | 16121 | 274318 | 17 | 5152 | 503 | 2279 | 1833 | 228 | 0 | 0 | 309 | 0 | 0 |
| PT-train | 22017 | 506773 | 23 | 4430 | 882 | 689 | 2775 | 84 | 0 | 0 | 0 | 0 | 0 |
| PT-dev | 3117 | 68581 | 22 | 553 | 130 | 83 | 337 | 3 | 0 | 0 | 0 | 0 | 0 |
| PT-test | 2770 | 62648 | 22.6 | 553 | 118 | 91 | 337 | 7 | 0 | 0 | 0 | 0 | 0 |
| PT-Total | 27904 | 638002 | 22.8 | 5536 | 1130 | 863 | 3449 | 94 | 0 | 0 | 0 | 0 | 0 |
| RO-train | 42704 | 781968 | 18.3 | 4713 | 1269 | 3048 | 250 | 146 | 0 | 0 | 0 | 0 | 0 |
| RO-dev | 7065 | 118658 | 16.7 | 589 | 169 | 373 | 29 | 18 | 0 | 0 | 0 | 0 | 0 |
| RO-test | 6934 | 114997 | 16.5 | 589 | 173 | 363 | 34 | 19 | 0 | 0 | 0 | 0 | 0 |
| RO-Total | 56703 | 1015623 | 17.9 | 5891 | 1611 | 3784 | 313 | 183 | 0 | 0 | 0 | 0 | 0 |
| SL-train | 9567 | 201853 | 21 | 2378 | 500 | 1162 | 176 | 40 | 0 | 0 | 500 | 0 | 0 |
| SL-dev | 1950 | 38146 | 19.5 | 500 | 121 | 224 | 30 | 12 | 0 | 0 | 113 | 0 | 0 |
| SL-test | 1994 | 40523 | 20.3 | 500 | 106 | 245 | 35 | 13 | 0 | 0 | 101 | 0 | 0 |
| SL-Total | 13511 | 280522 | 20.7 | 3378 | 727 | 1631 | 241 | 65 | 0 | 0 | 714 | 0 | 0 |
| TR-train | 16715 | 334880 | 20 | 6125 | 3172 | 0 | 2952 | 0 | 0 | 0 | 0 | 1 | 0 |
| TR-dev | 1320 | 27196 | 20.6 | 510 | 285 | 0 | 225 | 0 | 0 | 0 | 0 | 0 | 0 |
| TR-test | 577 | 14388 | 24.9 | 506 | 233 | 0 | 272 | 0 | 0 | 0 | 0 | 1 | 0 |
| TR-Total | 18612 | 376464 | 20.2 | 7141 | 3690 | 0 | 3449 | 0 | 0 | 0 | 0 | 2 | 0 |
| Total | 280838 | 6072331 | 21.6 | 79326 | 18757 | 16198 | 28190 | 2285 | 8527 | 1156 | 3049 | 1127 | 37 |
The training, development and test data are available in our public GitLab repository. Follow the Repository link to access folders for individual languages.
All VMWE annotations (except Arabic) are available under Creative Commons licenses (see README.md files for details).
The Arabic corpus does not have an open licence. Participants are required to fill in an agreement and obtain the corpus through LDC. Given that this is a late addition, Arabic will be considered as optional this year. This means that we will publish generic and per-category rankings for teams who address Arabic, but it will not be included in the macro-average rankings across languages.
A note for shared task participants: We cannot ensure that the test data of the current edition of the shared task do not overlap with the data published in edition 1.0. Therefore, we kindly ask participants not to use the .parsemetsv files from edition 1.0 for any language during the training or testing phase.
Tracks
System results can be submitted in two tracks:
- Closed track: Systems using only the provided training/development data in the cupt files - VMWE annotations + morpho-syntactic data (if any) - to learn VMWE identification models and/or rules.
- Open track: Systems using or not the provided training/development data, plus any additional resources deemed useful (MWE lexicons, symbolic grammars, wordnets, raw corpora, word embeddings, language models trained on external data, etc.). This track includes notably purely symbolic and rule-based systems.
Teams submitting systems in the open track will be requested to describe and provide references to all resources used at submission time. Teams are encouraged to favor freely available resources for better reproducibility of their results.
Important dates
All deadlines are at 23:59 UTC-12 (anywhere in the world).
March 21, 2018: shared task trial data and evaluation script releasedApril 4, 2018: shared task training and development data releasedApril 30, 2018: shared task blind test data releasedMay 4, 2018May 8, 2018: submission of system results (EXTENDED!)May 11, 2018: announcement of resultsMay 25, 2018: submission of system description papersJune 20, 2018: notificationJune 30, 2018: camera-ready papersAugust 25-26, 2018: shared task workshop colocated with LAW-MWE-CxG-2018