We've uploaded the test script to the task's website:
http://www.cs.york.ac.uk/semeval-2012/task1/data/uploads/test-script.zip
There's a readme file with a brief explanation about our main
evaluation metric. Let us know if you have any question about it.
Best,
Lucia, Sujay and Rada
Well spotted. After experimenting with a few evaluation strategies, we
found that the pairwise ranking is the best option, so there will be
no 'best substitution', I'll change the text on the website.
Best,
Lucia