WMT General MT - the submission week starts now

317 views
Skip to first unread message

Tom Kocmi

unread,
Jul 22, 2022, 6:36:06 AM7/22/22
to wmt-...@googlegroups.com

Hi All,

 

We have just released sources of WMT testsets to be translated and submitted via OCELoT. The deadline is 28th July (Anywhere on Earth).

Here are instructions for submission (in recommended order):

 

1) Register your team at https://ocelot-wmt22.mteval.org/

2) Send an email with your name, affiliation, OCELoT username to tomk...@microsoft.com to get your team activated (it is not possible to submit translations before team validation).

3) Download testsets https://www.statmt.org/wmt22/wmttest2022.src.zip

4) You may want to use XML wrapping and unwrapping scripts here: https://github.com/wmt-conference/wmt-format-tools/tree/main/wmtformat

5) Translate testsets

6) Upload your submissions to the OCELoT. Each team is allowed at most 7 submissions per language pair. Scores in the system do not reflect actual system performance, they are mainly for validation purposes.

- NOTE: we need to validate few details in OCELoT and will open submissions later today (we won’t activate any team until OCELoT is fully tested)

7) Before 4th August: Prepare an abstract of your system (it may be a half/one-page brief description, or already full system description paper) and upload it here: https://www.softconf.com/emnlp2022/wmt/

 

Notes:

Translations should be “human-ready”, i.e. in the form that text is normally published, so latin-script languages should be recased and detokenised, Chinese and Japanese should be unsegmented, etc.

Testsets contain multiple domains, but we do not provide additional information about the data on purpose.

Sources may contain anonymization placeholders.

Only primary systems of teams that submit an abstract paper will be included in the human evaluation.

 

Let us know, if you run into any issues and have a lovely day,

Tom

(in Germany, he/him)

 

曹智泉

unread,
Jul 23, 2022, 8:48:03 AM7/23/22
to Workshop on Statistical Machine Translation
Hi Tom,

We participated in the wmt22 English to Croatian translation competition, but the source monolingual for this translation direction was not provided when the test set was released.
we tried this link: https://www.statmt.org/wmt22/wmttest2022.src.zip

Best,
Zhiquan

曹智泉

unread,
Jul 25, 2022, 6:44:57 AM7/25/22
to Workshop on Statistical Machine Translation
Hi Tom,

We participated in the wmt22 Livonian to/from English translation competition, but we found no option for Livonian to English track when creating a new commit. 

Best,
Zhiquan
在2022年7月22日星期五 UTC+8 18:36:06<Tom Kocmi> 写道:

Tom Kocmi

unread,
Jul 25, 2022, 6:57:21 AM7/25/22
to wmt-...@googlegroups.com

Hi Zhiquan,

 

We had to update the reference files internally, so you will find the testsets at the end of the list (after Biomedical testsets).

 

Let me know if you run into any other issues,

Tom

--
You received this message because you are subscribed to the Google Groups "Workshop on Statistical Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wmt-tasks+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wmt-tasks/54f17e4c-2033-43f9-8668-b502ae2600d9n%40googlegroups.com.

vinc...@yahoo.com

unread,
Jul 25, 2022, 9:49:03 AM7/25/22
to Workshop on Statistical Machine Translation
Hello Tom,
the source testset for en-de and de-en are way bigger than all others. Are we supposed to translate and submit all the content (ie including biomedical when participating to General MT track only ?)
thanks,
Vincent

Tom Kocmi

unread,
Jul 25, 2022, 10:07:16 AM7/25/22
to wmt-...@googlegroups.com

Hi Vincent,

 

Yes, testsets for General MT contain testsuites and biomedical testsets, please, translate everything (OCELoT will not accept partial translations).

The size for DE-EN is larger in contrast to other languages which contain only General MT.

 

A note to an issue with submission that some teams may have: The submission must be in XML format (use wrap.py) with the “.xml” extension.

 

Have a lovely day,

Tom

Hui Zeng

unread,
Jul 29, 2022, 4:24:05 AM7/29/22
to Workshop on Statistical Machine Translation
Hi Tom,

The ocelot submission website it not working.
Could you check it and let us know if we could make the final submission before the deadline?

Best regards,
Hui Zeng


在2022年7月22日星期五 UTC+8 18:36:06<Tom Kocmi> 写道:

Tom Kocmi

unread,
Jul 29, 2022, 6:04:56 AM7/29/22
to wmt-...@googlegroups.com

Hi,

 

The system got overwhelmed by most people trying to submit on the last minute. However, it is still running. Have you managed to upload your systems?

 

Best,

Tom

 

 

From: wmt-...@googlegroups.com <wmt-...@googlegroups.com> On Behalf Of Hui Zeng
Sent: Friday, July 29, 2022 10:24 AM
To: Workshop on Statistical Machine Translation <wmt-...@googlegroups.com>
Subject: [EXTERNAL] Re: WMT General MT - the submission week starts now

 

You don't often get email from huize...@gmail.com. Learn why this is important

--

You received this message because you are subscribed to the Google Groups "Workshop on Statistical Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wmt-tasks+...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages