5000 Frequency Word List

1 view
Skip to first unread message

Brian Scarano

unread,
Aug 4, 2024, 8:57:28 PM8/4/24
to chirmuzzguzzfi
Thewords have been chosen based on their frequency in the Oxford English Corpus and relevance to learners of English. Every word is aligned to the CEFR, guiding learners on the words they should know at A1-B2 level.

As well as the Oxford 3000 core word list, it includes an additional 2,000 words that are aligned to the CEFR, guiding advanced learners at B2-C1 level on the most useful high-level words to learn to expand their vocabulary.


The Oxford 3000 was developed in consultation with James Milton, Professor of Applied Linguistics, Swansea University, UK, and reviewed by Paul Nation, Emeritus Professor in Applied Linguistics, Victoria University of Wellington, New Zealand.


On this website, the words are listed alphabetically. You can browse the words, search for a word, filter the list, and download them. The CEFR level is shown beside each word, and you can hear the word pronounced in either British or American English.


This is a word list of 5000 most used Danish words based on contents of www.opensubtitles.org. the list has only been cleaned to an extent and it is possible that you might find English entries - as it is based on movie subtitles. The words are all in lower case (if applicable) to avoid duplicate entries. If you need a bigger list for any other purpose, please contact me.


Based on a 23-million-word corpus of French which includes written and spoken material both from France and overseas, this dictionary provides the user with detailed information for each of the 5000 entries, including English equivalents, a sample sentence, its English translation, usage statistics, and an indication of register variation.


Users can access the top 5000 words either through the main frequency listing or through an alphabetical index. Throughout the frequency listing there are thematically-organized lists of the top words from a variety of key topics such as sports, weather, clothing, and family terms.


Deryle Lonsdale is Associate Professor in the Linguistics and English Language Department at Brigham Young University (Provo, Utah). Yvon Le Bras is Associate Professor of French and Department Chair of the French and Italian Department at Brigham Young University (Provo, Utah).


I did write it as a comment before: I don't think that a translated list makes much sense since most of the German words (if not all) can be translated to much more then just one English word (and vice versa).


So, how does this list help you?

At position 10 you find 14 different valid translations of "sich", where 6 of them consist of two words each. And among the top 10 you find the word "the" at 4 positions. Within the top 5000 you will find "turn" at 35 different positions.


These are 5000 of the most common words in American English in order of usage. This can be a particularly useful list when starting to learn a new language and will help prioritise creating sentences using the words in other languages to ensure that you develop your core quickly. This process will be sped up if creating sentences using multiple words from the list to construct sentences like "They think it is time to go" - "Ellos piensan que es hora de irse" in Spanish for instance. It is important to learn words in a given context and will make the words easier to remember.


Frequency lists for the whole BNC (version 1), for the spoken versus written components, for the conversational (i.e. demographic) versus task-oriented (i.e. context-governed) parts of the spoken component, and for the imaginative versus informative parts of the written component. Also: ranked frequency word lists according to parts of speech (e.g. all nouns, all conjunctions) based on the whole BNC corpus (version 1), as well as frequencies for individual part-of-speech tags (e.g. NN1, VDG) based on the BNC Sampler.

Although the frequency lists for this book were based on all 4,124 files of the original BNC version 1 corpus, the text classifications and POS tags used were the updated and more accurate ones implemented in the BNC World Edition.


** For those who want a user-friendly word list (i.e. without frequency figures) based on the entire BNC, I am making one available here (all word forms occurring at least 10 times per million words, alphabetically arranged)


Select any of 70+ registers/genres, and then get a frequency listing for that genre. Just enter "*" (without quotation marks) for a general frequency listing for the selected genre, "[nn1]" for singular nouns in that genre, etc. You can also easily compare word frequency in one genre (or set of genres) against another, e.g. sermons vs. spoken, tabloids vs. broadsheet, medical vs. academic, etc..


word+frequency lists based on the Brown corpus (not disambiguated by parts of speech) may be found at the Brandeis University Computational Memory Lab or at the Psycholinguistic database at Rutherford Appleton Laboratory.


570 word families assumed to reflect the shared vocabulary of written academic English as used in a wide variety of disciplines (28 in total, 125K words from each) in an Academic Corpus of 3.5m words.


Selection was based on the principles of range, frequency and dispersion, using a specially compiled academic corpus of journal articles, book chapters, course workbooks, laboratory manuals, and course notes.


Sadly, though, the corpus composition was heavily skewed, a fact that affects its representativeness immensely. However, even these days, many people still appear to not have cottoned on to this, as the list still keeps getting cited as a model ;-)


Everything to do with Charles Kay Ogdens 1930s classic Basic English vocabulary list, including the electronic version of Basic English: International Second Language. New York: Harcourt, Brace & World Inc./Orthological Institute.


based on a corpus of modern Russian fiction and political texts (more than 35 million words). The list includes about 33000 words which frequency is greater than 1 ipm (instances per million words). A shorter selection of 5000 most frequent words is also available. The list provides word rank, frequency (per million), part of speech. Some analytical information about the lexical stock is provided, such as coverage of the total language use by word bands, e.g. first 3000 lemmas cover 76.6824% of the total number of word forms. The corpus, tools for working with it, as well as an aligned parallel English-Russian corpus are discussed in: Sharoff, Serge, (2002). Meaning as use: exploitation of aligned corpora for the contrastive study of lexical semantics. Proc. of Language Resources and Evaluation Conference (LREC02). May, 2002, Las Palmas, Spain.


For a quick-and-easy frequency listing/index of words in your own texts, try the following programs. For pedagogical software and vocabulary analysis programs, see the Teaching and Miscellaneous Links page.


With this 2nd book in the series, you will increase your vocabulary from roughly CEFR A2+ up to CEFR B2+ level. This book is for intermediate students of Swedish, and meant for those who want to improve their Swedish further.




By knowing the most used 5.000 words, you will have a vocabulary comparable to an adult native speaker without higher education. You will be able to express yourself comfortably, and are able to understand 89% of all Swedish text, and 85% of all spoken Swedish.




While it's important to note it's impossible to pin down these numbers and statistics with 100% accuracy, these are a global average of multiple sources. According to research, this is the amount of vocabulary needed for varying levels of fluency:




Keeping above facts in mind, the value of a frequency dictionary is immense. At least, that is if you want to become fluent in a language fast. Study the most frequent words, build your vocabulary and progress naturally.






A frequency list of the most frequently used Swedish words, based on analysis of 20 gigabytes of Swedish subtitles, the equivalent of 80.000 books of 200 pages each; more than two large libraries worth of text.

A large base text collection is absolutely vital in order to establish a reliable general frequency list.


1000 most common word forms in Spanish language TV Shows and Movies: Verb forms By Spanish Input Memorizing full conjugation tables for each verb you learn is utterly pointless. If we look at the most common 1000 word forms in Spanish, it turns out...


The step after that is to change the way the flashcards work, so that they are not even really flashcards anymore, but show you examples many sentences with words you are trying to learn. I would like to turn away from the idea of learning lists of words and their translations, and rather use the words in the list as ways to organise and index example sentences, which are the real source of learning. I think this will kind of serve your idea about verbs and their different forms, especially if we add some special logic for verbs.


EDIT: Maybe it would be better to give LLN the ability to load mp4s both from your hard drive and from any URL within Chrome. This way it would work in both Windows And MacOS. Some mp4s have embedded text-based subtitles encoded in tx3g format, and both Chrome and Safari can read those subtitles and turn them on and off.


The second version of the frequency listFrom this page you can access the frequency list for modern Russian.Up to now Chastotnyj slovarj russkogo jazyka (Zasorina, 1977)provided the most widely used frequency list for Russian. However,the corpus used in Zasorina is relatively small according to modernstandards (about 1 million words). It is outdated: mostly it coversuses from 1920s to 1960s and includes a high proportion of ideologicalsources, like texts by Lenin and Khrushchev and Soviet newspapers, thus,word frequencies in it are severely biased, e.g. Soviet andcomrade are in the first hundred of Russian words on a parwith function words. Finally, the list of (Zasorina, 1977) is notavailable electronically.

3a8082e126
Reply all
Reply to author
Forward
0 new messages