Skip to main content.

Corpus frequency data

In order to ensure fully comparable conditions for evaluation studies, we also provide standard corpus frequency data for some of the manually annotated MWE resources. Because of their size, these data sets are hosted on external servers and can be downloaded by clicking on the data set names in the table below.

File name Description Size MWE data Contributor
German_PNV_FR German PP-verb combinations from the Frankfurter Rundschau corpus 56 MiB German_PNV_Krenn
Shared task 2008
Stefan Evert
German_AN_FR German adjective-noun combinations from the Frankfurter Rundschau corpus 32 MiB German_AN_La11t
Shared task 2008
Stefan Evert
Czech_Bigram_PDT Czech dependency bigrams from the Prague Dependency Treebank 870 KiB Czech_Bigram_Pecina
Shared task 2008
Pavel Pecina

Shared task evaluation packages

Evaluation packages from previous shared tasks can be downloaded here: