Skip to main content.

Data sets

The following resources are included in the MWE_resources package. The complete package can be downloaded from the Multiword project site.

File name Description Contributor Link Updated
Chinese_IR_Xu Chinese queries and relevant document annotation for IR Ying Xu Wiki Sep 2010
Czech_Bigram_Pecina Czech Dependency Bigrams from the Prague Dependency Treebank Pavel Pecina Wiki Feb 2008
English_CE_Trawinski CWen: 77 cranberry words of English
CCen: corresponding cranberry collocations
Beata Trawinski Wiki Feb 2008
English_CN_Baldwin A 1000 sentences corpus incl. 345 English compound nouns Timothy Baldwin Wiki Feb 2008
English_LVC_Tu Dataset submitted with paper 'Learning English Light Verb Constructions: Contextual or Statistical' Yuancheng Tu Wiki Jun 2011
English_MWE_Vincze Dataset submitted with paper 'Detecting noun compounds and light verb constructions: a contrastive study' Veronika Vincze Website Jun 2011
English_NC_Kim Semantic relations for Noun Compounds Su Nam Kim Wiki Feb 2008
English_NC_Nakov English Noun Compound interpretation Preslav Nakov Wiki Feb 2008
English_VNC_Cook 2984 annotated idiomatic English verb-noun tokens Paul Cook Wiki Feb 2008
English_VPC_Baldwin English Verb-Preposition combinations Timothy Baldwin Wiki Feb 2008
Estonian_MWV_corpus_Kaalep manually tagged corpus with Estonian Multiword Verbs (300,000 tokens) Heiki-Jaan Kaalep Wiki Feb 2008
Estonian_MWV_db_Kaalep 13,000 Estonian Multiword Verbs Heiki-Jaan Kaalep Wiki Feb 2008
French_MWA_Voyatzi Dictionary of 6,763 French Multiword adverbs Stavroula Voyatzi Wiki Feb 2008
French_MWN_Laporte French corpus annotated for Multiword Nouns (5,057 occurrences) Eric Laporte Wiki Feb 2008
Ge_En_MWE_Anastasiou Bilingual German-English lexicon of 871 MWEs Dimitra Anastasiou Wiki Feb 2008
German_AN_La11t Langenscheidt German Adj-N Collocations Stefan Evert Wiki Feb 2008
German_CE_Trawinski CWde: 444 cranberry words of German
CCde: corresponding cranberry collocations
Beata Trawinski Wiki Feb 2008
German_MWE_Anastasiou Corpus with 536 German annotated sentences Dimitra Anastasiou Wiki Feb 2008
German_PNV_Fritzinger Contextual annotation of idiomatic/literal German PNV Fabienne Fritzinger Wiki Sep 2010
German_PNV_Krenn Brigitte Krenn's German PP-Verb Collocations Brigitte Krenn Wiki Feb 2008
Greek_MWE_Linardaki 815 Greek nominal MWE candidates annotated by three judges Evita Linardaki Wiki Sep 2010
Portuguese_CP_Duran Dataset submitted with paper 'Identifying and Analyzing Brazilian Portuguese Complex Predicates' Magali Sanchez Duran Wiki Jun 2011

External links

English_WSD_Finlayson Resources submitted with paper 'Detecting Multi-Word Expressions Improves Word Sense Disambiguation' Mark Finlayson Jun 2011
Japanese-English_MT_Haugereid Complimentary link for the paper 'Extracting Transfer Rules for Multiword Expressions from Parallel Corpora' Peter Haugereid Jun 2011

Creative Commons Licence