Further to the email below, I would like to inform you that 2 participants submitted systems that contained errors. Hence if they run the scorer on their system output against the gold standard, it will throw exceptions. The errors were:
User 'ajohannsen' submitted 2 systems, both of which have the same following errors:
- Incorrect input on line 411
System provided following substitutes: ['17 ounces', 'pound', 'lb', 'sixteen ounces']
Gold expected: ['sixteen ounces', 'pound', 'lb', '16 ounces']
- Incorrect input on line 415
System provided following substitutes: ['12.28 kilograms', 'pound', 'pn', 'kilo']
Gold expected: ['11.27 kilograms', 'pound', 'pn', 'kilo']
User 'm_amoia' submitted 1 system which had the following errors:
- Incorrect input on line 369
System provided following substitutes: ['the service', 'mass', 'the Eucharist', 'the celebration of the Eucharist']
Gold expected: ['bulk', 'mass', 'weight', 'density']
- Incorrect input on line 1014
System provided following substitutes: ['wide', 'extended', 'additional,expanded', 'broad']
Gold expected: ['wide', 'extended', 'expanded', 'additional', 'broad']
In our evaluation, both errors for user 'ajohannsen' and the 2nd error for 'm_amoia' were considered negligible enough that we corrected them manually before running the scorer (in any case this makes no practical change to eventual score). Error on line 369 for user 'm_amoia', unfortunately was more serious, and hence was simply skipped in the evaluation. When checking your system score you may want to remedy the faults before running the scorer.
Alternatively you can run the version of the scorer attached herewith that runs quietly and simply skips over errors. Either way, as mentioned earlier, with or without errors, the system scores will remain practically the same.
Regards,
Sujay.