Collocation 'Stat' Functionality

642 views
Skip to first unread message

Matthew H

unread,
Mar 5, 2014, 6:24:53 PM3/5/14
to ant...@googlegroups.com
Hello. I had a question about the Collocation tool. Specifically, I am wondering about the "Stat" column, in which collocate words' relation to the search word is displayed. My question is simply: how is the strength of this relationship determined? I am confused by the fact that, for example, two collocate words may both have a raw frequency of 1, but one of the words may have a larger/smaller stat than the other. Why is this?

Thanks,
Matt

Warren Tang

unread,
Mar 5, 2014, 8:03:45 PM3/5/14
to ant...@googlegroups.com
Hello Matthew,
The size of the two corpora you are using are different. The stat takes this into account to compare them. 

Hope that helps. 


Warren 
--
You received this message because you are subscribed to the Google Groups "AntConc-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To post to this group, send email to ant...@googlegroups.com.
Visit this group at http://groups.google.com/group/antconc.
For more options, visit https://groups.google.com/groups/opt_out.


--
Sent from Gmail Mobile

Matthew H

unread,
Mar 12, 2014, 11:48:46 PM3/12/14
to ant...@googlegroups.com
Hey Warren, thanks for the response. However, I'm only using one text file within AntConc. So, I upload file X, and then search for a word's collocates within X using the collocates tool. Then, the Stat column will return numbers, but I'm not quite sure how to interpret these numbers.

Matt

Warren Tang

unread,
Mar 13, 2014, 12:45:30 AM3/13/14
to ant...@googlegroups.com
Hi Matthew,
Sorry, I had quickly read the email and thought you were talking about comparing files.

AntConc uses Mutual Information (MI) and T-score (T-test?) as described in Stubbs (1995). But the basic gist of the stats (at least for MI) are that it not only takes into account of co-occurrence but also non-co-occurrence. So even if your collocational frequencies are the same their stats will be different depending on the frequencies of non-co-occurrence.

Check the wordlist to see the actual frequency, cross reference that with collocational freuquency and that should give you a good general indication of the statistics's outcome as intuitively correct or not.

Hope that helps.


Warren


--
You received this message because you are subscribed to the Google Groups "AntConc-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To post to this group, send email to ant...@googlegroups.com.
Visit this group at http://groups.google.com/group/antconc.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Matthew H

unread,
Mar 13, 2014, 6:37:09 PM3/13/14
to ant...@googlegroups.com
Great, that makes much more sense. Thanks a bunch!

Matt 
Reply all
Reply to author
Forward
0 new messages