VOCABULARY SIZE, TEXT COVERAGE AND WORD LISTS

20 views
Skip to first unread message

Plums

unread,
Jun 11, 2008, 4:40:52 AM6/11/08
to To-Be-English-Greater
http://www.fltr.ucl.ac.be/fltr/germ/etan/bibs/vocab/cup.html

VOCABULARY SIZE, TEXT COVERAGE AND WORD LISTS

Paul Nation and Robert Waring

How much vocabulary does a second language learner need?

There are three ways of answering this question. One way is to ask
"How many words are there in the target language?" Another way is to
ask "How many words do native speakers know?" A third way is to ask
"How many words are needed to do the things that a language user needs
to do?" We will look at answers to each of these questions.

This discussion looks only at vocabulary and it should not be assumed
that if a learner has sufficient vocabulary then all else is easy.
Vocabulary knowledge is only one component of language skills such as
reading and speaking. It should also not be assumed that vocabulary
knowledge is always a prerequisite to the performance of language
skills. Vocabulary knowledge enables language use, language use
enables the increase of vocabulary knowledge, knowledge of the world
enables the increase of vocabulary knowledge and language use and so
on (Nation, 1993b). With these cautions in mind let us now look at
estimates of vocabulary size and their significance for second
language learners.

How many words are there in English?

The most straightforward way to answer this question is to look at the
number of words in the largest dictionary. This usually upsets
dictionary makers. They see the vocabulary of the language as a
continually changing entity with new words and new uses of old words
being added and old words falling into disuse. They also see the
problems in deciding if walk as a noun is the same word as walk as
verb, if compound items like goose grass are counted as separate
words, and if names like Vegemite, Agnes, and Nottingham are to be
counted as words. These are all real problems, but they are able to be
dealt with systematically in a reliable way.

Two separate studies (Dupuy, 1974; Goulden, Nation and Read, 1990)
have looked at the vocabulary of Webster's Third International
Dictionary (1963), the largest non-historical dictionary of English
when it was published. When compound words, archaic words,
abbreviations, proper names, alternative spellings and dialect forms
are excluded, and when words are classified into word families
consisting of a base word, inflected forms, and transparent
derivations, Webster's 3rd has a vocabulary of around 54,000 word
families. This is a learning goal far beyond the reaches of second
language learners and, as we shall see, most native speakers.

How many words do native speakers know?

For over 100 years there have been published reports of systematic
attempts to measure the vocabulary size of native speakers of English.
There have been various motivations for such studies but behind most
of them lies the idea that vocabulary size is a reflection of how
educated, intelligent, or well read a person is. A large vocabulary
size is seen as being something valuable. Unfortunately the
measurement of vocabulary size has been bedeviled by serious
methodological problems largely centring around the questions of "What
should be counted as a word?", "How can we draw a sample of words from
a dictionary to make a vocabulary test?", and "How do we test to see
if a word is known or not?". Failure to deal adequately with these
questions has resulted in several studies of vocabulary size which
give very misleading results. For a discussion of these issues see
Nation (1993a), Lorge and Chall (1963), and Thorndike (1924).

Teachers of English as a second language may be interested in measures
of native speakers' vocabulary size because these can provide some
indication of the size of the learning task facing second language
learners, particularly those who need to study and work alongside
native speakers in English medium schools and universities or
workplaces.

At present the best conservative rule of thumb that we have is that up
to a vocabulary size of around 20,000 word families, we should expect
that native speakers will add roughly 1000 word families a year to
their vocabulary size. That means that a five year old beginning
school will have a vocabulary of around 4000 to 5000 word families. A
university graduate will have a vocabulary of around 20,000 word
families (Goulden, Nation and Read, 1990). These figures are very
rough and there is likely to be very large variation between
individuals. These figures exclude proper names, compound words,
abbreviations, and foreign words. A word family is taken to include a
base word, its inflected forms, and a small number of reasonably
regular derived forms (Bauer and Nation, 1993). Some researchers
suggest vocabulary sizes larger than these (see Nagy, this volume),
but in the well conducted studies (for example, D'Anna, Zechmeister
nad Hall, 1991) the differences are mainly the result of differences
in what items are included in the count and how a word family is
defined.

A small study of the vocabulary growth of non-native speakers in an
English medium primary school (Jamieson, 1976) suggests that in such a
situation non-native speakers' vocabulary grows at the same rate as
native speakers' but that the initial gap that existed between them is
not closed. For adult learners of English as a foreign language, the
gap between their vocabulary size and that of native speakers is
usually very large, with many adult foreign learners of English having
a vocabulary size of much less than 5000 word families in spite of
having studied English for several years. Large numbers of second
language learners do achieve vocabulary sizes that are like those of
educated native speakers, but they are not the norm.

There is some encouraging news however. A study by Milton and Meara
(1995) using the Eurocentres Vocabulary Size Test (Meara and Jones,
1988, 1990) shows that significant vocabulary growth can occur if this
learning is done in the second language environment. In their study of
a study abroad program of 53 European students of advanced
proficiency, the average growth in vocabulary per person approached a
rate of 2500 words per year over the six months of the programme. This
rate of growth is similar to the larger estimates of first language
growth in adolescence. Although the goal of native speaker vocabulary
size is a possible goal, it is a very ambitious one for most learners
of English as a foreign language.

How many words are needed to do the things a language user needs to
do?

Although the language makes use of a large number of words, not all of
these words are equally useful. One measure of usefulness is word
frequency, that is, how often the word occurs in normal use of the
language. From the point of view of frequency, the word the is a very
useful word in English. It occurs so frequently that about 7% of the
words on a page of written English and the same proportion of the
words in a conversation are repetitions of the word the. Look back
over this paragraph and you will find an occurrence of the in almost
every line.

The good news for second language learners and second language
teachers is that a small number of the words of English occur very
frequently and if a learner knows these words, that learner will know
a very large proportion of the running words in a written or spoken
text. Most of these words are content words and knowing enough of them
allows a good degree of comprehension of a text. Here are some figures
showing what proportion of a text is covered by certain numbers of
high frequency words.


Table 1: Vocabulary size and text coverage in the Brown corpus
Vocabulary size Text coverage
1000

2000

3000

4000

5000

6000

15,851 72.0%

79.7%

84.0%

86.8%

88.7%

89.9%

97.8%

The figures in Table 1 refer to written texts and are from Francis and
Kucera (1982) which is a very diverse corpus of over 1,000,000 running
words made up of 500 texts of around 2000 running words long. As we
shall see the more diverse the texts in a corpus, the greater the
number of different words and the high frequency words cover slightly
less of the text, so these figures are a conservative estimate. The
figures in the last line of the table are from Kucera (1982). The
COBUILD Dictionary claims that 15,000 words cover 95% of the running
words of their corpus. The figures in Table 1 are for lemmas and not
word families. Word families would give fractionally higher coverage.
Table 1 assumes that high frequency words are known before lower
frequency words and shows that knowing about 2,000 word families gives
near to 80% coverage of written text. The same number of words gives
greater coverage of informal spoken text - around 96% (Schonell,
Meddleton and Shaw, 1956).

With a vocabulary size of 2,000 words, a learner knows 80% of the
words in a text which means that 1 word in every 5 (approximately 2
words in every line) are unknown. Research by Liu Na and Nation (1985)
has shown that this ratio of unknown to known words is not sufficient
to allow reasonably successful guessing of the meaning of the unknown
words. At least 95% coverage is needed for that. Research by Laufer
(1989) suggests that 95% coverage is sufficient to allow reasonable
comprehension of a text. A larger vocabulary size is clearly better.
Table 2 is based on research by Hirsh and Nation (1992) on novels
written for teenage or younger readers.

The Hirsh and Nation (1992) study looked at such novels because they
might provide the most favourable conditions for second language
learners to read unsimplified texts. These conditions could come about
because they are aimed at a non-adult audience and thus there may be a
tendency for the writer to use simpler vocabulary, and because a
continuous novel on one topic by one writer provides opportunity for
the repetition of vocabulary. Table 2 shows that under favourable
conditions, a vocabulary size of 2000 to 3000 words provides a very
good basis for language use.



Table 2: Vocabulary size and coverage in novels for teenagers
Vocabulary size % coverage Density of unknown words
2000 words

2000 + proper nouns

2600 words

5000 words 90%

93.7%

96%

98.5% 1 in every 10

1 in every 16

1 in every 25

1 in every 67

The significance of this information is that although there are well
over 54,000 word families in English, and although educated adult
native speakers know around 20,000 of these word families, a much
smaller number of words, say between 3,000 to 5,000 word families is
needed to provide a basis for comprehension. It is possible to make
use of a smaller number, around 2,000 to 3,000 for productive use in
speaking and writing. Hazenburg and Hulstijn (1996) however suggest a
figure nearer to 10,000 for Dutch as a second language.

Sutarsyah, Nation and Kennedy (1994) found that a single long
Economics text was made up of 5,438 word families and a corpus of
similar length made up of diverse short academic texts contained
12,744 word families. Within narrowly focused areas of interest, such
as in an Economics text, a much smaller vocabulary is needed than if
the reader wishes to read a wide range of texts on a variety of
different topics.

How much vocabulary and how should it be learned?

We are now ready to answer the question "How much vocabulary does a
second language learner need?" Clearly the learner needs to know the
3,000 or so high frequency words of the language. These are an
immediate high priority and there is little sense in focusing on other
vocabulary until these are well learned. Nation (1990) argues that
after these high frequency words are learned, the next focus for the
teacher is on helping the learners develop strategies to comprehend
and learn the low frequency words of the language. Because of the very
poor coverage that low frequency words give, it is not worth spending
class time on actually teaching these words. It is more efficient to
spend class time on the strategies of (1) guessing from context, (2)
using word parts and mnemonic techniques to remember words, and (3)
using vocabulary cards to remember foreign language - first language
word pairs. Detailed description of these strategies can be found in
Nation (1990). Notice that although the teacher's focus is on helping
learners gain control of important strategies, a major function of
these strategies is to help the learners to continue to learn new
words and increase their vocabulary size.

A way to manage the learning of huge amounts of vocabulary is through
indirect or incidental learning. An example of this is learning new
words (or deepening the knowledge of already known words) in context
through extensive listening and reading. Learning from context is so
important that some studies suggest that first language learners learn
most of their vocabulary in this way (Sternberg, 1987). Extensive
reading is a good way to enhance word knowledge and get a lot of
exposure to the most frequent and useful words. At the earlier and
intermediate levels of language learning, simplified reading books can
be of great benefit. Other sources of incidental learning include
problem solving group work activities (Joe, Nation and Newton, 1996)
and formal classroom activities where vocabulary is not the main
focus.

The problem for beginning learners and readers is getting to the
threshold where they can start to learn from context. Simply put, if
one does not know enough of the words on a page and have comprehension
of what is being read, one cannot easily learn from context. Liu Na
and Nation (1985) have shown that we need a vocabulary of about 3000
words which provides coverage of at least 95% of a text before we can
efficiently learn from context with unsimplified text. This is a large
amount of startup vocabulary a learner needs, and this just to
comprehend general texts. So how can we get learners to learn large
amounts of vocabulary in a short space of time?

The suggestion that learners should directly learn vocabulary from
cards, to a large degree out of context, may be seen by some teachers
as a step back to outdated methods of learning and not in agreement
with a communicative approach to language learning. This may be so,
but the research evidence supporting the use of such an approach as
one part of a vocabulary learning program is strong.


1There is a very large number of studies showing the effectiveness of
such learning in terms of amount and speed of learning. See Nation
(1982), Paivio and Desrochers (1981) and Pressley et al. (1982) for a
review of these studies.
2Research on learning from context shows that such learning does occur
but that it requires learners to engage in large amounts of reading
and listening because the learning is small and cumulative (Nagy,
Herman and Anderson, 1985). This should not be seen as an argument
that learning from context is not worthwhile. It is by far the most
important vocabulary learning strategy and an essential part of any
vocabulary learning program. For fast vocabulary expansion, however,
it is not sufficient by itself. There is no research that shows that
learning from context provides better results than learning from word
cards (Nation, 1982).
3Research on the learning of grammar shows that form focused
instruction is a valuable component of a language learning course
(Ellis, 1990; Long, 1988). Courses with a form focused component
achieve better results than courses without such a component. The
important issue is to achieve a balance between meaning focused
activities, form focused activities, and fluency development
activities (Nation, forthcoming). Direct learning of vocabulary from
cards is a kind of form focused instruction which can have the same
benefits, perhaps even more markedly so, as form focused grammar
instruction.

To these research based arguments might be added the argument that
most serious learners make use of such an approach. They can be helped
to do it more effectively. There are other advantages for using word
cards. They can give a sense of progress, and a sense of achievement,
particularly if numerical targets are set and met. They are readily
portable and can be used in idle moments in or out of class either for
learning new words or revising old ones. They are specifically made to
suit particular learners and their needs and are thus self motivating.

It should not be assumed that learning from word lists or word cards
means that the words are learned forever, nor does it mean that all
knowledge of a word has been learned. Learning from lists or word
cards is only an initial stage of learning a particular word (see
Schmitt and Schmitt, 1995 for further information). It is however a
learning tool for use at any level of vocabulary proficiency. There
will always be a need to have extra exposure to the words through
reading, listening and speaking as well as extra formal study of the
words, their collocates, associations, different meanings, grammar and
so on. This shows a complementary relationship between contextualized
learning of new words and the decontextualized learning from word
cards.


What vocabulary does a language learner need?

The previous sections of this paper have suggested that second
language learners need first to concentrate on the high frequency
words of the language. In this section we look at some useful
vocabulary lists based on frequency and review the research on the
adequacy of the General Service List (West, 1953). Most counts also
consider range, that is the occurrence of a word across several
subsections of a corpus (McIntosh, Halliday and Strevens, 1961).

The practice of counting words has a long history dating as far back
as Hellenic times (DeRocher, 1973). Several early word counts are
mentioned in Fries and Traver (1960). There are many lists of the most
frequently occurring words in English and a few of the most well known
are described here.

The General Service List (West, 1953) The GSL contains 2000 headwords
and was developed in the 1940s. The frequency figures for most items
are based on a 5,000,000 word written corpus. Percentage figures are
given for different meanings and parts of speech of the headword. In
spite of its age, some errors, and its solely written base, it still
remains the best of the available lists because of its information
about the frequency of meanings, and West's careful application of
criteria other than frequency and range.

The Teachers Word Book of 30,000 words (Thorndike and Lorge, 1944)
This list of 30,000 lemmas (or about 13,000 word families (Goulden,
Nation and Read, 1990)) is based on a count of an 18,000,000 word
written corpus. Its value lies in its size. It is based on a large
corpus and contains a large number of words. However, it is old, based
on counts done over sixty years ago.

The American Heritage Word Frequency Book (Carroll, Davies and
Richman, 1971) This comprehensive list is based on a corpus of
5,000,000 running words drawn from written texts used in United States
schools over a range of grades and over a range of subject areas. The
main values of the list are its focus on school texts and its listing
of range figures, namely the frequency of each word in each of the
school grade levels and in each of the subject areas.

The Brown (Francis and Kucera, 1982), LOB and related corpora There
are now several 1,000,000 word written corpora each representing a
different dialect of English. Some of these have published lemmatized
word lists ranked according to frequency.

The classic list of high frequency words is Michael West's General
Service List (1953). The 2000 word GSL is of practical use to teachers
and curriculum planners as it contains words within the word family
each with its own frequency. For example, excited, excites, exciting
and excitement come under the headword excite. The GSL was written so
that it could be used as a resource for compiling simplified reading
texts into stages or steps. West and his colleagues produced vast
numbers of simplified readers using this vocabulary. This is actually
a very old list being based on frequency studies done in the early
decades of this century. Doubts have been cast on its adequacy because
of its age (Richards, 1974) and the relatively poor coverage provided
by the words not in the first 1000 words of the list (Engels, 1968).

Engels makes two major points. Even if a limited vocabulary covers 95%
of a text, a much larger vocabulary is still needed to cover the
remaining 5% (p. 215). However Engels overestimates the size of this
vocabulary. He suggests 497,000 words. His second point is that the
limited vocabulary chosen by West (1953) is not the best selection.
Engels examined 10 texts of 1000 words each. He found that West's GSL
plus numbers covered 81.8% of the running words (This did not include
proper nouns which covered 4.13%). Engels' definition of what should
be included in a word family did not agree with West's and so Engels
considered that West's GSL contained 3,372 words. This is because
Engels considered flat and flatten, and police and policeman to be
different word families. West gives separate figures for such items
but indicates through the format of the GSL that they are in the same
family. This difference however does not influence results. Engels
considered the first 1000 of the GSL to be a good choice because the
words were of high frequency and wide range (p. 221).

Engels correctly points out that the GSL does not provide 95% coverage
of texts. He also says that the words outside the first 1000 of the
GSL are "fallacious ... [because] they cannot be called general
service words". Engels considers that the range and frequency of these
words are too low to be included in the list. He suggests that for the
lower frequency words in the GSL "the work should be done all over
again" (p. 226), giving more attention to topic and genre divisions.
Hwang and Nation (1995) report on such a study. The results only
partly support Engels' ideas. It is possible to replace 452 of the
words in the GSL with 250 words of higher frequency across a range of
genres, but the change in total text coverage is small - from 82.3% to
83.4%. Even adjusting for the difference in size of the GSL, 2,147
words, and the new list, 1,945 words, still leaves the percentage
difference in coverage at 1.68%. Thus although the GSL is in need of
replacement because of its age, errors it contains, and its written
focus, it is still the best available list, given the range of
information it contains about the relative frequency of the meanings
of the words. In a variety of studies (Hwang, 1989; Hirsh and Nation,
1992; Sutarsyah, Nation and Kennedy, 1994) the GSL has provided
coverage of 78% to 92% of various kinds of written text, averaging
around 82% coverage.

Engels (1968) criticized the low coverage of the words not in the
first 1000 words of the list. He found that whereas the first 1000
words covered 73.1% of the running words in the ten one thousand word
texts he looked at, the words in the GSL outside the first 1000
covered only 7.7% of the running words. Other researchers have found a
similar contrast.


Table 3: Coverage of first and second 1000 words of the GSL

Researchers 1st 1000 2nd 1000 Total

Sutarsyah (1993)
academic texts

a long economics text


Hwang (1989)
a range of texts

Hirsh (1992)
short novels


74.1%

77.7%


77.2%


84.8%


4.3%

4.8%



4.9%


5.8%


78.4%

82.5%




82.1%


90.6%


What is also interesting is the number of different words (word types)
from the second 1000 that actually occurred in a mixture of different
kinds of texts compared with more homogeneous texts. In any one text,
such as a novel or a textbook, around 400 to 550 of the second 1000
words from the GSL actually occurred. When a mixture of texts was
looked at however around 700 to 800 of the second 1000 words occurred
(Hirsh and Nation, 1992; Sutarsyah, Nation and Kennedy, 1994).

The second 1000 words behave in this way because they are lower
frequency words than the first 1000 words and have a narrower range of
occurrence. That is their occurrence is more closely related to the
topic or subject area of a text than the wide ranging more general
purpose words in the first 1000. But given a range of topics and
genres, and enough texts, the second 1000 words are more generally
useful than other lists of words.
After the 2000 high frequency words of the GSL, what vocabulary does a
second language learner need? The answer to this question depends on
what the language learner intends to use English for. If the learner
has no special academic purpose then the learner should work on the
strategies for dealing with low frequency words. If however the
learner intends to go on to academic study in upper high school or at
university, then there is a clear need for general academic
vocabulary. This can be found in the 836 word list called the
University Word List (UWL) (Xue and Nation, 1984; Nation, 1990).

The UWL consists of words that are not in the first 2000 words of the
GSL but which are frequent and of wide range in academic texts. Wide
range means that the words occur not just in one or two disciplines
like economics or mathematics, but occur across a wide range of
disciplines. The word frustrate for example which is in the UWL can be
found in many different disciplines. The UWL is really a compilation
from four separate studies, Lynn (1973), Ghadessy (1979), Campion and
Elley (1971), and Praninskas (1972). Here are some items from it.

accompany formulate index major objective
biology genuine indicate maintain occur
comply hemisphere individual maximum passive
deficient homogeneous job modify persist
edit identify labour negative quote
feasible ignore locate notion random
(Nation, 1990)

The value of the UWL can be seen when we look at the coverage of
academic text that it provides.


Table 4: Coverage by first 2000 of the GSL and the University Word
List

Researchers 1st 2000 UWL Total
Hwang (1989)
academic texts

Sutarsyah (1993)
an economics text
78.1%


82.5%
8.5%


8.7%
86.6%


91.2%



Table 4 shows that for academic text, knowing the UWL makes the
difference between approximately 80% coverage of a text (1 unknown
word in every 5 words) and 90% coverage (1 unknown word in every 10
words).

Table 5 derived from Hwang (1989) shows the somewhat specialized
nature of the UWL.


Table 5: Coverage by UWL of a range of texts


Source 1st 2000 (GSL) UWL Total
Academic
Newspapers

Popular magazines etc.

Fiction
78.1%

80.3%

82.9%

87.4% 8.5%

3.9%

4.0%

1.7% 86.6%

84.2%

86.9%

89.1%


Note the low coverage the UWL has of fiction. Newspapers and magazines
which are more formal make use of more of the UWL. Very formal
academic text makes the greatest use of the UWL. The UWL is thus a
word list for learners with specific purposes namely academic reading.
The purpose behind the setting up of the UWL is to create a list of
high frequency words for learners with academic purposes, so that
these words can be taught and directly studied in the same way as the
words from the GSL can.

Word frequency lists

The major theme of this paper has been that we need to have clear
sensible goals for vocabulary learning. Frequency information provides
a rational basis for making sure that learners get the best return for
their vocabulary learning effort. Vocabulary frequency lists which
take account of range have an important role to play in curriculum
design and in setting learning goals.

This does not necessarily mean that learners must be provided with
large vocabulary lists as the major source of their vocabulary
learning. It does mean however that course designers should have lists
to refer to when they consider the vocabulary component of a language
course, and teachers need to have reference lists to judge whether a
particular word deserves attention or not, and whether a text is
suitable for a class.

The availability of powerful computers and very large corpora now make
the development of such lists a much easier job than it was when
Thorndike and Lorge (1944) and their colleagues manually counted
18,000,000 running words. The making of a frequency list however is
not simply a mechanical task, and judgements based on well established
criteria need to be made. The following list suggests several of the
factors that would need to be considered in the development of a
resource list of high frequency words.

1Representativeness The corpora that the list is based on should
adequately represent the wide range of uses of language. In the past,
most word lists have been based on written corpora. There needs to be
a substantial spoken corpus involved in the development of a general
service list. The spoken and written corpora used should also cover a
range of representative text types. Biber's (1990) corpus studies have
shown how particular language features cluster in particular text
types. The corpora used should contain a wide range of useful types so
that the biases of a particular text type do not unduly influence the
resulting list.

2Frequency and range Most frequency studies have given recognition to
the importance of range of occurrence. A word should not become part
of a general service list because it occurs frequently. It should
occur frequently across a wide range of texts. This does not mean that
its frequency has to be roughly the same across the different texts,
but means that it should occur in some form or other in most of the
different texts or groupings of texts.

3Word families The development of a general service list needs to make
use of a sensible set of criteria regarding what forms and uses are
counted as being members of the same family. Should governor be
counted as part of the word family represented by govern? When making
this decision, the purposes of the list and the learners for which it
is intended need to be considered. As well as basing the decision on
features such as regularity, productivity, and frequency (Bauer and
Nation, 1993), the likelihood of learners seeing these relationships
needs to be considered (Nagy and Anderson, 1984).

4Idioms and set expressions Some items larger than a word behave like
high frequency words. That is, they occur frequently as a unit (Good
morning, Never mind), and their meaning is not clear from the meaning
of the parts (at once, set out). If the frequency of such items is
high enough to get them into a general service list in direct
competition with single words, then perhaps they should be there.
Certainly the arguments for idioms are strong, whereas set expressions
could be included under one of their constituent words (but see Nagy,
this volume).

5Range of information To be of full use in course design, a list of
high frequency words would need to include the following information
for each word - the forms and parts of speech included in a word
family, frequency, the underlying meaning of the word, variations of
meaning and collocations and the relative frequency of these meanings
and uses, and restrictions on the use of the word with regard to
politeness, geographical distribution etc. Some dictionaries, notably
the revised edition of the COBUILD dictionary, include much of this
information, but still do not go far enough. This variety of
information needs to be set out in a way that is readily accessible to
teachers and learners.

6Other criteria West (1953: ix) found that frequency and range alone
were not sufficient criteria for deciding what goes into a word list
designed for teaching purposes. West made use of ease or difficulty of
learning (it is easier to learn another related meaning for a known
word than to learn another word), necessity (words that express ideas
that cannot be expressed through other words), cover (it is not
efficient to be able to express the same idea in different ways. It is
more efficient to learn a word that covers a quite different idea),
stylistic level and emotional words (West saw second language learners
as initially needing neutral vocabulary). One of the many interesting
findings of the COBUILD project was that different forms of a word
often behave in different ways, taking their own set of collocates and
expressing different shades of meaning (Sinclair, 1991). Careful
consideration would need to be given to these and other criteria in
the final stages of making a general service list.


With a continuing emphasis on communication in language teaching there
is a tendency to give less attention to the selection and checking of
language forms in course design. Now that the benefits of form focused
instruction are being positively reassessed, we may see a change in
attitude towards vocabulary lists and frequency studies. The benefits
of giving attention to principles of selection and gradation in
teaching however remain important no matter what approach to teaching
is being used. A goal of this review of the findings of research on
vocabulary size and frequency is to show that this information can
result in considerable benefits for both teachers and learners.

References

Bauer, L. and I.S.P. Nation. 1993. Word families. International
Journal of Lexicography 6, 4: 253-279.

Biber, D. 1990. A typology of English texts. Linguistics 27: 3-43.

Campion, M.E. and W.B. Elley. 1971. An Academic Vocabulary List.
Wellington: NZCER.

Carroll, J.B., P. Davies and B. Richman. 1971. The American Heritage
Word Frequency Book. New York: American Heritage Publishing Co.

Carter, R. and M. McCarthy (eds.) 1988. Vocabulary and Language
Teaching. London: Longman.

D'Anna, C.A., E.B. Zechmeister and J.W. Hall. 1991. Toward a
meaningful definition of vocabulary size. Journal of Reading Behavior
23: 109-122.

DeRocher, J.E. 1973. The Counting of Words: A Review of the History,
Techniques and Theory of Word Counts with Annotated Bibliography. New
York: Syracuse University Research Corp.

Dupuy, H.J. 1974. The Rationale, Development and Standardization of a
Basic Word Vocabulary Test. Washington, D.C.: U.S. Government Printing
Office.

Ellis, R. 1990. Instructed Second Language Acquisition. London:
Blackwell.

Engels, L.K. 1968. The fallacy of word counts. IRAL 6: 213-231.

Fox, J. and J. Mahood. l982. Lexicons and the ELT materials writer.
English Language Teaching Journal 36, 2: l25-l29.

Francis, W.N. and H. Kucera. 1982. Frequency Analysis of English
Usage. Boston: Houghton Mifflin Company.

Fries, C.C. and A.A. Traver. 1960. English Word Lists. Ann Arbor:
George Wahr.

Ghadessy, M. 1979. Frequency counts, word lists, and materials
preparation: a new approach. English Teaching Forum 17, 1:24-27.

Goulden, R., P. Nation and J. Read. 1990. How large can a receptive
vocabulary be? Applied Linguistics 11: 341-363.

Hazenburg, S. and J. Hulstijn. 1996. Defining a minimal receptive
second-language vocabulary for non-native university students: An
empirical investigation. Applied Linguistics 17, 1: in press.

Hirsh, D. 1992. The vocabulary demands and vocabulary learning
opportunities in short novels. Unpublished MA thesis, Victoria
University of Wellington, New Zealand.

Hirsh, D. and P. Nation. 1992. What vocabulary size is needed to read
unsimplified texts for pleasure? Reading in a Foreign Language 8, 2:
689-696.

Hwang, K. 1989. Reading newspapers for the improvement of vocabulary
and reading skills. Unpublished MA thesis, Victoria University of
Wellington, New Zealand.

Hwang, K. and P. Nation. 1989. Reducing the vocabulary load and
encouraging vocabulary learning through reading newspapers. Reading in
a Foreign Language 6, 1: 323-35.

Hwang, K. and I.S.P. Nation. 1995. Where would general service
vocabulary stop and special purposes vocabulary begin? System 23, 1:
35-41.

Jamieson, P. 1976. The acquisition of English as a second language by
young Tokelau children living in New Zealand. Unpublished Ph.D.
thesis, Victoria University of Wellington.

Joe, A., P. Nation, and J. Newton. 1996. Speaking activities and
vocabulary learning. English Teaching Forum 34, 1: in press.

Judd, E. L. l978. Vocabulary teaching and TESOL: a need for re-
evaluation of existing assumptions. TESOL Quarterly l2, 1: 7l-76.

Kucera, H. 1982. The mathematics of language. In The American Heritage
Dictionary. Boston: Houghton Mifflin. 2nd ed.

Laufer, B. 1989. What percentage of text-lexis is essential for
comprehension? In C. Lauren and M. Nordman (eds.), Special Language:
From Humans Thinking to Thinking Machines. Clevedon: Multilingual
Matters.

Liu Na and I.S.P. Nation. 1985. Factors affecting guessing vocabulary
in context. RELC Journal 16, 1: 33-42.

Long, M. 1988. Instructed interlanguage development. In L. Beebe (ed.)
Issues in Second Language Acquisition. New York: Newbury House.

Lorge, I. and J. Chall. l963. Estimating the size of vocabularies of
children and adults: an analysis of methodological issues. Journal of
Experimental Education 32, 2: l47-l57.

Lynn, R.E. 1973. Preparing word lists: a suggested method. RELC
Journal 4, 1: 25-32.

McKeown, M.G. and M.E. Curtis (eds.) 1987. The Nature of Vocabulary
Acquisition. Hillsdale, N.J.: Erlbaum.

Meara, P. and G. Jones. 1990. The Eurocentres Vocabulary Size Tests.
10KA. Zurich: Eurocentres.

McIntosh, X., M. Halliday and P. Strevens. 1961.

Milton, J. and P. M. Meara. 1995. How periods abroad affect vocabulary
growth in a foreign language. ITL 107-108: 17-34.

Nagy, W.E. and R.C. Anderson l984. How many words are there in printed
school English? Reading Research Quarterly l9: 304-330

Nagy, W.E., P. Herman, and R.C. Anderson. l985. Learning words from
context. Reading Research Quarterly 20: 233-253.

Nation, I.S.P. l982. Beginning to learn foreign vocabulary: a review
of the research. RELC Journal l3: 14-36.

Nation, I.S.P. 1990. Teaching and Learning Vocabulary. New York:
Newbury House.

Nation, I.S.P. 1993a. Using dictionaries to estimate vocabulary size:
essential, but rarely followed, procedures. Language Testing 10, 1:
27-40.

Nation, I.S.P. 1993b. Vocabulary size, growth and use. In The
Bilingual Lexicon. ed. R. Schreuder and B. Weltens, Amsterdam/
Philadelphia: John Benjamins. pp. 115-134.

Nation, I. S. P. forthcoming. Teaching Listening and Speaking.

Paivio, A. and A. Desrochers. 1981. Mnemonic techniques in second
language learning. Journal of Educational Psychology. 73, 6: 780-795.

Praninskas, J. 1972. American University Word List. London: Longman.

Pressley, M., J.R. Levin and H. Delaney. l982. The mnemonic keyword
method. Review of Educational Research 52: 6l-9l.

Richards, J.C. l974. Word lists: problems and prospects. RELC Journal
5: 69-84.

Rosenweig, M.R. and D. McNeill. l962. Inaccuracies in the semantic
count of Lorge and Thorndike. American Journal of Psychology 75:
3l6-3l9.

Schmitt, N. and D. Schmitt. 1995. Vocabulary notebooks: theoretical
underpinnings and practical suggestions. English Language Teaching
Journal 49, 2: 133-143.

Schonell, F.J., I.G. Meddleton and B.A. Shaw. l956. A Study of the
Oral Vocabulary of Adults. Brisbane: University of Queensland Press.

Seashore, R.H. and L.D. Eckerson. l940. The measurement of individual
differences in general English vocabularies. Journal of Educational
Psychology 3l: l4-38.

Sinclair, J. 1991. Corpus, Concordance, Collocation Oxford: Oxford
University Press.

Sternberg, R.J. 1987. Most vocabulary is learned from context. In
McKeown and Curtis, 89 105.

Sutarsyah, C. 1993. The Vocabulary of Economics and Academic English.
Unpublished MA thesis, Victoria University of Wellington, New Zealand.

Sutarsyah, C., I.S.P. Nation and G. Kennedy. 1994. How useful is EAP
vocabulary for ESP? A corpus based case study. RELC Journal 25, 2:
34-50.

Thorndike, E.L. and I. Lorge. l944. The Teacher's Word Book of 30,000
Words. Teachers College, Columbia University.

Thorndike, E.L. l924. The vocabularies of school pupils. In J.
Carelton Bell (ed.) Contributions to Education. New York: World Book
Co.

Webster's Third New International Dictionary. 1963. Massachusetts: G.
& C. Merriam Co.

West, Michael l953. A General Service List of English Words. London:
Longman, Green & Co.

Xue Guoyi and I.S.P. Nation. 1984. A university word list. Language
Learning and Communication 3: 215-229.





Contact Info:
Rob Waring
Notre Dame Seishin University, 2-16-9 Ifuku-cho, Okayama, Japan 700
Tel 086 252 1155 Fax 255 7663 Home 086 223 0341
Email:Rob Waring

Return to Main menu of papers
Reply all
Reply to author
Forward
0 new messages