WMT22 Shared Task on Unsupervised MT and Very Low Resource Supervised MT: Call for Participation

85 views
Skip to first unread message

Marion Di Marco

unread,
Jun 28, 2022, 5:42:34 AM6/28/22
to Workshop on Statistical Machine Translation
Hi All,

I would like to announce the Shared Task on Unsupervised MT and Very Low Resource Supervised MT.

Like last year, the task will include translation of Upper Sorbian and Lower Sorbian, two minority Slavic languages spoken in Germany.

This year, the language pairs for Unsupervised MT and Low Resource Supervised MT are:

- Upper Sorbian to/from German
- Lower Sorbian to/from German
- Upper Sorbian to/from Lower Sorbian (new language pair)

More details are available at: 

Important dates:
Release of training/dev/test data: June 22, 2022
Release of blind test data: mid/end August 2022
Translation submission deadline: mid/end August 2022
Paper submission deadline: 7th September 2022

Best regards,
Marion Di Marco

ZongYao Li

unread,
Aug 23, 2022, 11:42:01 PM8/23/22
to Workshop on Statistical Machine Translation
HI,  Marion Di Marco,
Regarding the very low-resource task, we found two problems in the test set.
1. Since we found a submission with a 100 BLEU score, we go through the test sets and find that all test sets use the same bilingual sentences, although the orders are different. So there is a way to get the gold-standard translations.
2. Our submission achieves an extremely high BLEU score (80+)  in de-hsb language direction. After analysis, we found that 1000+ sentences in the test set are also in the training set.

The above two issues will lead to biased evaluation. Is there a way to address these problems?

Best regards,
Zongyao Li

Marion Di Marco

unread,
Aug 24, 2022, 10:12:17 AM8/24/22
to wmt-...@googlegroups.com
Hi Zongyao Li,

thank you very much for pointing out a problem with the test sets.

There was indeed a large overlap with the training data for the DE-HSB language pair that went unnoticed.
We removed the overlapping sentences and re-upload the new set.

The system receiving a perfect BLEU score was not a real submission, but just a test upload to check the technical functionality.
(It is removed now)

Please excuse the confusion and thank you again for your help.


Best regards,
Marion Di Marco
--
You received this message because you are subscribed to the Google Groups "Workshop on Statistical Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wmt-tasks+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wmt-tasks/01784d06-4717-4a91-a821-5c39708566a8n%40googlegroups.com.

Ayman Khalafallah

unread,
Aug 26, 2022, 9:46:54 AM8/26/22
to Workshop on Statistical Machine Translation
Hi Marion,

For the Unsupervised task Can we use a Pretrained model (both supervised and unsupervised) that didn't have the pairs in consideration in its Pre-training objective? It may have had German on one side but not with hsb or dsb.

Very best,
Ayman

Marion Di Marco

unread,
Aug 28, 2022, 3:29:13 PM8/28/22
to wmt-...@googlegroups.com
Hi Ayman,

by the rules outlined on the web site, the use of further data, such as a pretrained model, is not allowed.

However, a comparison of the effect of adding pre-trained models might be interesting -- if you wish to upload such a system, please mark it accordingly;
it will then not be listed with the other systems, but regarded separately.

Best regards,
Marion Di Marco

--
You received this message because you are subscribed to the Google Groups "Workshop on Statistical Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wmt-tasks+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages