Collexeme Analysis and p-values

Skip to first unread message


Apr 16, 2017, 8:21:13 PM4/16/17
to CorpLing with R

I have been using the script in R for a collexeme analysis of nouns and various particles (e.g. that, of, to). A reviewer of a paper I'm working on has asked me to include exact p-values if making a statement about a word's association strength with a pattern. Previously I had simply stated which of the three "significance bands" the association strength falls into (i.e., coll.strength>3 => p<0.001; coll.strength>2 => p<0.01; coll.strength>1.30103 => p<0.05), but this reviewer has suggested instead to report actual p-values. While my main point in the project is to demonstrate the number of items falling within one of the three levels of significance, I do want to satisfy this reviewer. 

My question is: how might I find a precise p-value, if possible? As an example: if it is revealed that "possibility" and "of" have an attraction in a small corpus with FYE log-transformed score of 115, because this is greater than 3, then p<.001. Maybe I could "reverse" the Fisher-Yates log-tranformation somehow, or have, in addition to the script also report actual p-values? 

Hope this explanation makes sense, and thanks for the very useful collexeme script! 


Stefan Th. Gries

Apr 17, 2017, 2:49:28 AM4/17/17
to CorpLing with R
If you have a positive collexeme strength value of, say, 144.936653,
for a word that is attracted to something else, just exponentiate the
negative collexeme value:

> 10^-144.936653
[1] 1.157036e-145 # p-value

If you have a negative collexeme strength value of, say, -8.553066,
for a word that is repelled by something else, just exponentiate the
collexeme value:

> 10^-8.553066
[1] 2.798556e-09 # p-value

Stefan Th. Gries
Univ. of California, Santa Barbara
Reply all
Reply to author
0 new messages