Angelos, hi
Sorry for the delay.
On Jun 30, 12:46 pm, Angelos Sirigos <
siri...@gmail.com> wrote:
> Dear Mike,
>
> I try to compare two corpora.
>
> My main question is:
>
> Would you suggest that all the word that appear (p value < 0.00001)
> and have a positive keyness are key? the alternative would be to
> choose from these word only a fraction of these (let's say with
> keyness > 150). in other words, is there a threshold for the degree of
> keyness?
>
YES, the threshold is one you have set by choosing the p value -- so,
no, you would not normally select only items with a keyness > 150. The
reason is that this keyness score is affected by the frequency of the
item in the reference corpus and really in human terms you may well
not consider the top items any more key than some others in the list
because you will use your human criteria as opposed to WS's mechanical
criteria.
> Also, do I take in the results an overall similarity test for the two
> corpora? (For example, Overall keyness and p- value)
Sorry I don't understand that question... Wd you like to re-phrase it?
Cheers -- Mike