-----------------------------------------------------------------------
Wendy D. Cornell Graduate Group in Biophysics
Box 0446 (415) 476-2597 (phone)
Department of Parmaceutical Chemistry (415) 476-0688 (fax)
University of California, S.F. cor...@cgl.ucsf.edu
San Francisco, CA 94143-0446 USA
I have a list of the first 8000 most common German word forms in
decreasing order of frequency of usage. It resides on a computer I don't
have immediate access to at the moment. It's extracted from the work
by Meyer-Kaeding, who counted a corpus of about a million words in
the thirties -- manually. The purpose of the Meyer-Kaeding effort was
to provide a basis for the improvement of shorthand notation.
What do you want the list for? If it's for learning German, you'd
probably be disappointed. The first few most common words are not
the words you need for a head start in a foreign language. It is
amazing how little can be done with the first thousand. The first
few hundred don't even contain many nouns at all (the first noun to
appear is "Zeit" - time, somewhere in the three hundreds). All the
words denoting anything concrete, such as household items, tools,
clothing, etc. are way outside the first thousand.
Anno
>I have a list of the first 8000 most common German word forms in
>decreasing order of frequency of usage. It resides on a computer I don't
>have immediate access to at the moment. It's extracted from the work
>by Meyer-Kaeding, who counted a corpus of about a million words in
>the thirties -- manually. The purpose of the Meyer-Kaeding effort was
>to provide a basis for the improvement of shorthand notation.
>Anno
I'd be interested in seeing the list of 8000. Your point is well-taken
regarding the appearance of nouns in the list, but the appearance of
verbs, adverbs and prepositions may be just as important, if not more
so, precisely because they are not necessarily concrete objects.
Would you be able to upload the list to an ftp-server somewhere? I'd
appreciate it.
Thanks.
--
Anthony M. Becker 810/370-2117 | "Rufen Sie meinen Vater an, meine Mutter
Email: bec...@vela.acs.oakland.edu | ist zu beschaeftigt."
| - Chelsea Clinton
>What do you want the list for? If it's for learning German, you'd
>probably be disappointed.
Such a list is also interesting for use with automatic indexing
programms. The most frequently used words are normaly the ones which
bear the least information and which generate unnecessary big indexes
(noise-words).
I would be very interested in such a list, too.
Gruesze Gregor
[...]
|> What do you want the list for? If it's for learning German, you'd
|> probably be disappointed. The first few most common words are not
|> the words you need for a head start in a foreign language. It is
|> amazing how little can be done with the first thousand. The first
|> few hundred don't even contain many nouns at all (the first noun to
|> appear is "Zeit" - time, somewhere in the three hundreds). All the
|> words denoting anything concrete, such as household items, tools,
|> clothing, etc. are way outside the first thousand.
Is this a feature of the german language or is the same way in other languages?
As i read this i was _realy_ astonished, i allways believed a woardhoard of about
500 words would be sufficent to communicate (in some way :-)
CU
--
/| /| __ __ / *
/ | / | /__) / /_/ / meu...@informatik.tu-muenchen.de
/ |/ |_(____(___/ \_/_
>[...]
>|> What do you want the list for? If it's for learning German, you'd
>|> probably be disappointed. The first few most common words are not
>|> the words you need for a head start in a foreign language. It is
>|> amazing how little can be done with the first thousand. The first
>|> few hundred don't even contain many nouns at all (the first noun to
>|> appear is "Zeit" - time, somewhere in the three hundreds). All the
>|> words denoting anything concrete, such as household items, tools,
>|> clothing, etc. are way outside the first thousand.
>
>Is this a feature of the german language or is the same way in other languages?
>As i read this i was _realy_ astonished, i allways believed a woardhoard of about
>500 words would be sufficent to communicate (in some way :-)
No, I don't think this is peculiar to the German language. These word
counts are usually compiled from a corpus of literary, and business, and
legal, and what have you types of documents. These include a huge number
of words not needed for your basic communication. If someone were to
compile a list of words people use in shopping, asking for directions,
making jokes, etc., the top few hundred would be much more like it.
That's why phrase-books and the like don't just offer the most common
words of a language, they present a vocabulary specially sought out
for its use in basic day-to-day communication.
To put it differently, if you were to take a German text and delete
all the words that are not among the 10 000 (say) most common words, you'd
probably keep something like 90% of the text, but the meaning would
virtually disappear. You'd be left with sentence skeletons with all the
specifics deleted. Specific words are *rare* words, that's what makes
them specific.
Anno
>I have a list of the first 8000 most common German word forms in
>decreasing order of frequency of usage. It resides on a computer I don't
>have immediate access to at the moment. It's extracted from the work
>by Meyer-Kaeding, who counted a corpus of about a million words in
^^^^^^^^^^^^^
The name of the first co-author should be Meier not Meyer.
[...]
>... The first
>few hundred don't even contain many nouns at all (the first noun to
>appear is "Zeit" - time, somewhere in the three hundreds). All the
The first noun is indeed "Zeit", but it is in the nineties not tree hundreds.
Sorry for the misinformation, this was all from memory.
Anno
Well, at least for English, there exists (or existed?) a little book
(I think from Klett Verlag) called 'Englischer Grundwortschatz' [basic
English vocablary]. It stated, that about 50 words (so called structural
words) make up about 50% of an average English text. Of course, these are
'I, a, the, you, of...', so knowlegde of only these gives you an\
understanding of about 0% :-). The next 1000 words account for
about 85% of an average text (please dont ask me,what 'average'
means in htis context), and the next 2500 for additional 10%. So
with 3500 words you have 95% of an English text. From
my experience, with the first 1000 words you get along quite well.
I also have the Italian version of this book at home, and I know of
at least two more (French and Spanish). So I'd be very surprised,
if theredidnt exist a German one...
BTW: Any linguist around, who knows how many different words a
typical edition of the 'Bild' uses? I guess, this might be
a good indication, how many diffenrt German words you need
for basic communication skills :-)
Servus, Thomas
--
* FG Neuronale Netzwerke / Uni Kassel *
* Jochen Ruhland *
* Heinrich-Plett-Str. 40 *
* D-34132 Kassel *
* joc...@neuro.informatik.uni-kassel.de *
* Tel: +49-561-804-4376 FAX: -4244 *
/// Watson's extension to murphy's law:
There's always one more PTF
/// The AIX Lemma to Watson's extension:
You'll always need one more pre-req
>I have a list of the first 8000 most common German word forms in
>decreasing order of frequency of usage.
This file is temporarily available for anonymous ftp on ftp.zrz.tu-berlin.de
as incoming/wordfreq/wordfreq.ger.Z. A README file is also available.
Anno
>This file is temporarily available for anonymous ftp on ftp.zrz.tu-berlin.de
>as incoming/wordfreq/wordfreq.ger.Z. A README file is also available.
This message goes to all the people who have expressed interest in
my file of common German word forms. It will also be posted to
alt.usage.german.
The file has been available for a while for ftp on ftp.zrz.tu-berlin.de
in /incoming/wordfreq as wordfreq.ger.Z. A README file has also been
provided. Unfortunately, the encoding of the German umlauts had been
mangled (the most significant bit had gone lost). I have now provided
a corrected version at the same place where the umlauts are encoded using
the Latex convention ("A..."s).
Since there seems to be some interest in the word list, I will talk to
our ftp admin in about a week (he is on vacation right now). Presumably
the file will find a more permanent place below the /pub directory from
where it can also be mirrored to other ftp sites.
Regards, Anno