comparing two corpora

8 views
Skip to first unread message

Angelos Sirigos

unread,
Jun 30, 2008, 7:46:05 AM6/30/08
to WordSmith Tools
Dear Mike,

I try to compare two corpora.

My main question is:

Would you suggest that all the word that appear (p value < 0.00001)
and have a positive keyness are key? the alternative would be to
choose from these word only a fraction of these (let's say with
keyness > 150). in other words, is there a threshold for the degree of
keyness?

Also, do I take in the results an overall similarity test for the two
corpora? (For example, Overall keyness and p- value)

Thanks

Angelos

mi...@lexically.net

unread,
Jul 17, 2008, 12:51:46 PM7/17/08
to WordSmith Tools
Angelos, hi

Sorry for the delay.

On Jun 30, 12:46 pm, Angelos Sirigos <siri...@gmail.com> wrote:
> Dear Mike,
>
> I try to compare two corpora.
>
> My main question is:
>
> Would you suggest that all the word that appear (p value < 0.00001)
> and have a positive keyness are key? the alternative would be to
> choose from these word only a fraction of these (let's say with
> keyness > 150). in other words, is there a threshold for the degree of
> keyness?
>

YES, the threshold is one you have set by choosing the p value -- so,
no, you would not normally select only items with a keyness > 150. The
reason is that this keyness score is affected by the frequency of the
item in the reference corpus and really in human terms you may well
not consider the top items any more key than some others in the list
because you will use your human criteria as opposed to WS's mechanical
criteria.

> Also, do I take in the results an overall similarity test for the two
> corpora? (For example, Overall keyness and p- value)
Sorry I don't understand that question... Wd you like to re-phrase it?

Cheers -- Mike
Reply all
Reply to author
Forward
0 new messages