IMPORTANT! Shared task results have been corrected

3 views

Skip to first unread message

Carlos Ramisch

unread,

Sep 18, 2020, 7:01:02 AM9/18/20

to verbalmwe

Dear shared task participants,

We found out that there was an error in the way we calculated P/R/R1 scores for unseen VMWEs.

As unseen VMWEs are the focus of this year's task, we decided to correct the official results.

What happened?

Unseen VMWEs were defined as those whose multi-set of lemmas is absent in the training corpus.

However, system developers could also use the development corpus to learn their models.

Therefore, some VMWEs occurring in dev (but not in train) were wrongly counted as unseen.

Instead, we should define unseen VMWEs as those whose multi-set of lemmas is absent in the training and development corpora.

What did the organisers do?

We re-evaluated systems passing train + dev instead of only train to the --train option of the evaluation script.

The updated results were published on the official website with a note explaining the correction:

http://multiword.sourceforge.net/sharedtaskresults2020/

Previous results have been deprecated, but are still available here:

http://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_02_MWE-LEX_2020___lb__COLING__rb__&subpage=Shared_task_results_deprecated

What has changed in the results?

Some tables have different scores, but rankings remain stable:

Unseen-in-traindev vs Seen-in-traindev VMWEs (notice we use "traindev" instead of "train" now)

Variant-of-traindev vs Identical-to-traindev VMWEs

All Language-specific rankings except Hindi, Italian and Romanian

In these three languages, not only scores but also rankings were affected:

Hindi: Seen2Seen (3 -up-> 2), TRAVIS-multi (2 -down-> 3)

Italian: TRAVIS-multi (3 -up-> 2), MTLB-STRUCT (2 -down-> 3)

Romanian: TRAVIS-mono (2 -up-> 1), MTLB-STRUCT (1 -down-> 2)

What should participants do?

Many of you have submitted system description papers to the MWE-LEX 2020 workshop.

These papers probably contain results based on the previous ranking, which was wrong.

However, you should not re-submit your paper now.

Instead, we ask you to make sure the results are updated in the final, camera-ready versions.

You can start now or wait until you get the acceptance/rejection decision.

If you need help to update and/or check the results, please don't hesitate to ask.

We are terribly sorry for the inconvenience, and we hope you understand that it is important to update this now so that we have more meaningful and consistent results.

All the best

PARSEME ST core organizers

Reply all

Reply to author

Forward

0 new messages

IMPORTANT! Shared task results have been **corrected**

Carlos Ramisch

IMPORTANT! Shared task results have been corrected