Scoring script

5 views
Skip to first unread message

Timothy Flynn

unread,
Nov 14, 2010, 11:22:02 PM11/14/10
to MetaOptimize Challenge [discuss]

I just checked in some additions to my github code for this challenge
(https://github.com/tgflynn/NLP-Challenge).

Of most interest I think is the scoring script. It uses Word Net's
synset similarity score (http://nltk.googlecode.com/svn/trunk/doc/
howto/wordnet.html) to score an output file in the defined format.

To run this script you need to have the Python NLTK toolkit installed.

Also before you can run it you need to download the Word Net data
files, like this :

python
>>> import nltk
>>> nltk.download()
Downloader> d
Downloader> all
Downloader> q
>>> quit()

(takes a few minutes).

My hope is that the scoring script will provide a quick objective
measure for comparing algorithms. My understanding is that the Word
Net similarity scores are based on proximity within a human maintained
ontology.
Higher scores (closer to 1) are better. It seems like approaching
this metric would be a reasonable goal for a machine learning based
semantic similarity measure.

There's also some some new functionality in process.py. It can now
filter the non-alpha words out of the dataset and create a new
vocabulary file (see README for instructions).

Tim
Reply all
Reply to author
Forward
0 new messages