while running the scoring script for the Trial Baselines,
I discovered some issues with the scoring results
1. In case you end the final system guess with ";"
(as it was specified in the documentation),
the scorer counts an extra "empty" guess.
I discussed this with Diana (from the lexical substitution task),
and we should indeed remove the final ";" from both the system
output and gold standard files.
I will also run an extra check on all system files to make sure
that your test files are scored correctly.
2. I made some other changes to the script (all in comment in the
header of the script),
most important are:
* fix for Unicode as suggested by Simone
=> I checked the other occurrences of \w, and this was only used
for regexps matching the English target word, so this should be fine.
=> I opted to keep Simone's fix instead of a conversion table,
in order to avoid that two different words (different on accent level)
match.
* the evaluation is made case-insensitive
=> I noticed that for "bank" there are a lot of proper names in the
corpus,
but not all lemmatizers keep the uppercase character in first
position,
and I don't want to penalize lemmatizer errors.
=> This is the only exception where duplication in the system output
is allowed
(e.g. Bank and bank)
* I've made a fix for windows input, where scoring went wrong as well.
You can find the Trial baselines (Trial_Baselines.pdf)
and an update of the scorer on:
http://lt3.hogent.be/semeval/Trial/
Please let me know in case you have comments/remarks/questions on the
changes.
Best,
Els