Key word computation-- can targeted corpus larger than reference corpus?

jeremy

unread,

Feb 20, 2012, 4:55:39 AM2/20/12

to WordSmith Tools

Dear Mike,

This is Jeremy, who is working on a self-made business corpus and the
Brown corpus.

My MA thesis is almost done.

But I just found a problem in my keyword computation.

I have compared the 286 MB self-made corpus (174 files) to the 80 MB
Brown corpus to compute keywords with log likelihood.

I know that the reference corpus should be generally larger than the
corpus of interest.

But is it still OK to regard my case as methodologically acceptable?

(very panic...@@")

Best,

Jeremy

Mike

unread,

Feb 20, 2012, 7:14:24 AM2/20/12

to WordSmith Tools

Dear Jeremy

If you mean you simply compared one single word-list based on 286MB of
text with a much smaller reference corpus word-list based on 80MB of
text then that is indeed strange -- I am not sure how you could defend
it. If on the other hand you mean you compared 174 text files, one
after the other, and each with the Brown corpus which I assume is much
bigger than any one of the 174, what you have done is (in this regard)
OK.

Best -- Mike

jeremy

unread,

Feb 20, 2012, 7:47:40 AM2/20/12

to WordSmith Tools

Dear Mike,

Yes. I think I did it right.
In the phase of creating word lists, I had mistakenly made one list
based on all files.

Later it occurred to me that (Thank God!) I should "make a batch" of
word lists based on all files and save them as "one file per folder,
individual results". I have done this procedure correctly and used the
results to proceed key word computation later on.

Thank you very much. I will address this part carefully in writing my
research design.

Best,

Jeremy

Reply all

Reply to author

Forward