Co-occurrence frequencies

17 views
Skip to first unread message

Timothy Flynn

unread,
Nov 6, 2010, 2:07:53 AM11/6/10
to MetaOptimize Challenge [discuss]
It seems to me that just using co-occurrence frequencies should work
for this problem. I've posted a very simple Python implementation on
github: https://github.com/tgflynn/NLP-Challenge and a brief
discussion on my blog here : http://cogniception.com/wp/.

Running this on the posted dataset took about 20 minutes on my
machine. The results look reasonable. For example here is the output
line for the word hospital:

hospital trust nhs royal general university london hospital community
eye medical

One comment I have on the data is that it contains many non-ascii
characters, and even many words which consist solely of non-ascii
characters or digits. I haven't filtered these out but I suspect
doing so would substantially reduce the size of the set and make the
output easier to read.
Reply all
Reply to author
Forward
0 new messages