Legacy page
Please go to our new website https://multiword.org/ for up-to-date information
Data sets
The following resources are included in the MWE_resources package. The complete package can be downloaded from the Multiword project site.
| File name | Description | Contributor | Link | Updated | Chinese_IR_Xu | Chinese queries and relevant document annotation for IR | Ying Xu | Wiki | Sep 2010 | Czech_Bigram_Pecina | Czech Dependency Bigrams from the Prague Dependency Treebank | Pavel Pecina | Wiki | Feb 2008 |
| English_CE_Trawinski | CWen: 77 cranberry words of English CCen: corresponding cranberry collocations |
Beata Trawinski | Wiki | Feb 2008 |
| English_CN_Baldwin | A 1000 sentences corpus incl. 345 English compound nouns | Timothy Baldwin | Wiki | Feb 2008 |
| English_LVC_Tu | Dataset submitted with paper 'Learning English Light Verb Constructions: Contextual or Statistical' | Yuancheng Tu | Wiki | Jun 2011 |
| English_MWE_Vincze | Dataset submitted with paper 'Detecting noun compounds and light verb constructions: a contrastive study' | Veronika Vincze | Website | Jun 2011 |
| English_NC_Kim | Semantic relations for Noun Compounds | Su Nam Kim | Wiki | Feb 2008 |
| English_NC_Nakov | English Noun Compound interpretation | Preslav Nakov | Wiki | Feb 2008 |
| English_VNC_Cook | 2984 annotated idiomatic English verb-noun tokens | Paul Cook | Wiki | Feb 2008 |
| English_VPC_Baldwin | English Verb-Preposition combinations | Timothy Baldwin | Wiki | Feb 2008 |
| Estonian_MWV_corpus_Kaalep | manually tagged corpus with Estonian Multiword Verbs (300,000 tokens) | Heiki-Jaan Kaalep | Wiki | Feb 2008 |
| Estonian_MWV_db_Kaalep | 13,000 Estonian Multiword Verbs | Heiki-Jaan Kaalep | Wiki | Feb 2008 |
| French_MWA_Voyatzi | Dictionary of 6,763 French Multiword adverbs | Stavroula Voyatzi | Wiki | Feb 2008 |
| French_MWN_Laporte | French corpus annotated for Multiword Nouns (5,057 occurrences) | Eric Laporte | Wiki | Feb 2008 |
| Ge_En_MWE_Anastasiou | Bilingual German-English lexicon of 871 MWEs | Dimitra Anastasiou | Wiki | Feb 2008 |
| German_AN_La11t | Langenscheidt German Adj-N Collocations | Stefan Evert | Wiki | Feb 2008 |
| German_CE_Trawinski | CWde: 444 cranberry words of German CCde: corresponding cranberry collocations |
Beata Trawinski | Wiki | Feb 2008 |
| German_MWE_Anastasiou | Corpus with 536 German annotated sentences | Dimitra Anastasiou | Wiki | Feb 2008 |
| German_PNV_Fritzinger | Contextual annotation of idiomatic/literal German PNV | Fabienne Fritzinger | Wiki | Sep 2010 |
| German_PNV_Krenn | Brigitte Krenn's German PP-Verb Collocations | Brigitte Krenn | Wiki | Feb 2008 |
| Greek_MWE_Linardaki | 815 Greek nominal MWE candidates annotated by three judges | Evita Linardaki | Wiki | Sep 2010 |
| Portuguese_CP_Duran | Dataset submitted with paper 'Identifying and Analyzing Brazilian Portuguese Complex Predicates' | Magali Sanchez Duran | Wiki | Jun 2011 |
External links | ||||
| English_WSD_Finlayson | Resources submitted with paper 'Detecting Multi-Word Expressions Improves Word Sense Disambiguation' | Mark Finlayson | Jun 2011 | |
| Japanese-English_MT_Haugereid | Complimentary link for the paper 'Extracting Transfer Rules for Multiword Expressions from Parallel Corpora' | Peter Haugereid | Jun 2011 |
