Task results and gold-standard annotation for test data

29 views
Skip to first unread message

Lucia Specia

unread,
Apr 4, 2012, 1:21:14 PM4/4/12
to semeval2012_lexi...@googlegroups.com
Dear all,

Please check the final results for all submissions on the task website: http://www.cs.york.ac.uk/semeval-2012/task1/

There you will find an excel sheet with the results (http://www.cs.york.ac.uk/semeval-2012/task1/data/uploads/results.xlsx), as well as the test set distribution with the gold-standard annotations (http://www.cs.york.ac.uk/semeval-2012/task1/data/uploads/test-data.zip). Please check whether the results for your system(s) are correct and let us know if not.

Best,

Lucia, Sujay and Rada


Sujay Jauhar

unread,
Apr 4, 2012, 2:39:12 PM4/4/12
to semeval2012_lexi...@googlegroups.com
Hi all,

Further to the email below, I would like to inform you that 2 participants submitted systems that contained errors. Hence if they run the scorer on their system output against the gold standard, it will throw exceptions. The errors were:

User 'ajohannsen' submitted 2 systems, both of which have the same following errors:
- Incorrect input on line 411
   System provided following substitutes: ['17 ounces', 'pound', 'lb', 'sixteen ounces']
   Gold expected:  ['sixteen ounces', 'pound', 'lb', '16 ounces']
- Incorrect input on line 415
   System provided following substitutes: ['12.28 kilograms', 'pound', 'pn', 'kilo']
   Gold expected:  ['11.27 kilograms', 'pound', 'pn', 'kilo']

User 'm_amoia' submitted 1 system which had the following errors:
- Incorrect input on line 369
   System provided following substitutes: ['the service', 'mass', 'the Eucharist', 'the celebration of the Eucharist']
   Gold expected:  ['bulk', 'mass', 'weight', 'density']
- Incorrect input on line 1014
   System provided following substitutes: ['wide', 'extended', 'additional,expanded', 'broad']
   Gold expected:  ['wide', 'extended', 'expanded', 'additional', 'broad']

In our evaluation, both errors for user 'ajohannsen' and the 2nd error for 'm_amoia' were considered negligible enough that we corrected them manually before running the scorer (in any case this makes no practical change to eventual score). Error on line 369 for user 'm_amoia', unfortunately was more serious, and hence was simply skipped in the evaluation. When checking your system score you may want to remedy the faults before running the scorer.

Alternatively you can run the version of the scorer attached herewith that runs quietly and simply skips over errors. Either way, as mentioned earlier, with or without errors, the system scores will remain practically the same.

Regards,
Sujay.
scorer-quiet.py
Reply all
Reply to author
Forward
0 new messages