Minimum corpus size for Keyness analysis

135 views
Skip to first unread message

dnam

unread,
May 17, 2012, 5:50:22 AM5/17/12
to WordSmith Tools
Hello,

I have three questions about the keyness analysis.

First, is there a minimum study corpus size for the keyness analysis?
I am analyzing 10 student writing samples (3,972 running words)
against LOCNESS (324,214 running words). And after running KeyWords, I
got 42 keywords. I am wondering whether the keywords that KeyWords
generated might be distorted due to my small study corpus size. If it
is, where should the problem be?

Second, what would be the factors that may affect the number of
keywords in certain texts? In the same research above, after receiving
the intervention of corpus-based language instruction, the students
wrote about the same writing topic (a total of 3,316 running words),
and KeyWords generated 23 keywords. I am trying to find the factors
that reduce the number of keywords from the same topic writing by the
same students. The only think I can think of now is the instruction

My last questions is: what does the order of keywords (by keyness)
necessarily mean? As we see in the keywords output window, there's p
values for each keyword. I don't think it's appropriate to say one
with high keyness is more important(?) than the keywords with lower
keyness, while they all, let's say below certain level of p-value of
1E-6.

I would appreciate any comments and/or feedback.

Thanks in advance,

Daehyeon

Mike Scott

unread,
May 17, 2012, 6:17:29 AM5/17/12
to WordSmith Tools
Daehyeon, hi

Good questions.

> First, is there a minimum study corpus size for the keyness analysis?
The *number* of KWs is largely determined by your P value setting. A
small sample size (and I agree yours is very small) means the
*quality* of the KWs is not in itself as trustworthy as it would be if
you had 100 student samples instead of 10, and the small size of
Locness doesn't help: it is supposed here to represent academic
writing of a certain kind but actually only contains about 1/3 of a
million words so word frequencies cannot be estimated very reliably,
and each of your students wrote only about 400 words so there isn't
much to go on (remembering that this is only software, not human
reading!).

> Second, ... the students
> wrote about the same writing topic (a total of 3,316 running words),
> and KeyWords generated 23 keywords.

YES! the KW procedure tells us a lot about what texts are *about*. If
all your students wrote about the same topic there would be fewer KWs
than if they wrote about varied topics.

> My last questions is: what does the order of keywords (by keyness)
> necessarily mean?
http://lexically.net/wordsmith/version6/faqs/answers.htm#different_keynesses
gives an answer to that, I think.

Best -- Mike

dnam

unread,
May 17, 2012, 8:35:50 AM5/17/12
to WordSmith Tools
Thanks Mike! Your detailed explanation helps me improve my research
design and interpretation of keyness and keywords.

Best,
Daehyeon
> http://lexically.net/wordsmith/version6/faqs/answers.htm#different_ke...
Reply all
Reply to author
Forward
0 new messages