Dear shared task participants,
We found out that there was an error in the way we calculated P/R/R1 scores for unseen VMWEs.
As unseen VMWEs are the focus of this year's task, we decided to correct the official results.
What happened?
Unseen VMWEs were defined as those whose multi-set of lemmas is absent in the training corpus.
However, system developers could also use the development corpus to learn their models.
Therefore, some VMWEs occurring in dev (but not in train) were wrongly counted as unseen.
Instead, we should define unseen VMWEs as those whose multi-set of lemmas is absent in the training and development corpora.
What did the organisers do?
We re-evaluated systems passing train + dev instead of only train to the --train option of the evaluation script.
The updated results were published on the official website with a note explaining the correction:
Previous results have been deprecated, but are still available here:
What has changed in the results?
Some tables have different scores, but
rankings remain stable:
In these three languages, not only scores but also rankings were affected:
Hindi: Seen2Seen (3 -up-> 2), TRAVIS-multi (2 -down-> 3)
Italian: TRAVIS-multi (3 -up-> 2), MTLB-STRUCT (2 -down-> 3)
Romanian: TRAVIS-mono (2 -up-> 1), MTLB-STRUCT (1 -down-> 2)
What should participants do?
Many of you have submitted system description papers to the MWE-LEX 2020 workshop.
These papers probably contain results based on the previous ranking, which was wrong.
However, you should not re-submit your paper now.
Instead, we ask you to make sure the results are updated in the final, camera-ready versions.
You can start now or wait until you get the acceptance/rejection decision.
If you need help to update and/or check the results, please don't hesitate to ask.
We
are terribly sorry for the inconvenience, and we hope you understand
that it is important to update this now so that we have more meaningful
and consistent results.
All the best
PARSEME ST core organizers