Kategória:Szógyakorisági listák

Resources covering many languages

szerkesztés
  • Word frequency lists from 10K up to 1M+ for 270+ languages, available for download as part of the Leipzig Corpora Collection (CC BY-4.0)
  • 50K and larger word lists based on www.opensubtitles.org for 60+ Languages (CC BY-SA-4.0)
  • Frequency lists for English, Russian, Arabic, Chinese, French, German, Greek, Italian, Japanese, Portuguese and Spanish derived from corpora assembled by Leeds University's Centre for Translation Studies (CC BY-2.5)
  • The wordfreq Python library contains large frequency lists for 40+ languages. (Data under various licence conditions, some of which may be incompatible with Wiktionary.)
  • Frequency lists for learners of Arabic, Chinese, English, Greek, Italian, Norwegian, Polish, Russian and Swedish, available as part of the Kelly project. Swedish: (CC-BY-SA 3.0, LGPL 3.0); (all others: CC BY-ND-NC-SA 2.0, meaning they are incompatible with wiktionary)
  • The SEAlang Library aims to collect lexical resources for the languages of South-East Asia. Resources are available for Balinese, Burmese, Indonesian, Javanese, Karen, Khmer, Lao, Malay, Maguindanao, Maranao, Mon, Shan, Thai, Vietnamese, among others. (Some resources are available under a generic CC license, however others are covered by copyright. You should check on an individual basis.)
  • Wordlists in the CLARIN infrastructure - just over half are monolingual lists in 10 languages (Dutch, Estonian, Finnish, German, Greek, Maltese, Ngbugu, Slovenian, Spanish, Swedish), while the other two dozen are in bilingual and multilingual combinations. (Some resources are available under a permissive or copyleft license, however others may be covered by copyright. You should check on an individual basis.)
  • Gimenes, Manuel, and Boris New. "Worldlex: Twitter and blog word frequencies for 66 languages." Behavior research methods, 2015, pp. 1-10. PDF, data.

Alkategóriák

Ez a kategória az alábbi 44 alkategóriával rendelkezik (összesen 44 alkategóriája van).

1

2

3

5