Hi All,
We have just released sources of WMT testsets to be translated and submitted via OCELoT. The deadline is 28th July (Anywhere on Earth).
Here are instructions for submission (in recommended order):
1) Register your team at https://ocelot-wmt22.mteval.org/
2) Send an email with your name, affiliation, OCELoT username to tomk...@microsoft.com to get your team activated (it is not possible to submit translations before team validation).
3) Download testsets https://www.statmt.org/wmt22/wmttest2022.src.zip
4) You may want to use XML wrapping and unwrapping scripts here: https://github.com/wmt-conference/wmt-format-tools/tree/main/wmtformat
5) Translate testsets
6) Upload your submissions to the OCELoT. Each team is allowed at most 7 submissions per language pair. Scores in the system do not reflect actual system performance, they are mainly for validation purposes.
- NOTE: we need to validate few details in OCELoT and will open submissions later today (we won’t activate any team until OCELoT is fully tested)
7) Before 4th August: Prepare an abstract of your system (it may be a half/one-page brief description, or already full system description paper) and upload it here: https://www.softconf.com/emnlp2022/wmt/
Notes:
Translations should be “human-ready”, i.e. in the form that text is normally published, so latin-script languages should be recased and detokenised, Chinese and Japanese should be unsegmented, etc.
Testsets contain multiple domains, but we do not provide additional information about the data on purpose.
Sources may contain anonymization placeholders.
Only primary systems of teams that submit an abstract paper will be included in the human evaluation.
Let us know, if you run into any issues and have a lovely day,
Tom
(in Germany, he/him)
Hi Zhiquan,
We had to update the reference files internally, so you will find the testsets at the end of the list (after Biomedical testsets).
Let me know if you run into any other issues,
Tom
--
You received this message because you are subscribed to the Google Groups "Workshop on Statistical Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
wmt-tasks+...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/wmt-tasks/54f17e4c-2033-43f9-8668-b502ae2600d9n%40googlegroups.com.
Hi Vincent,
Yes, testsets for General MT contain testsuites and biomedical testsets, please, translate everything (OCELoT will not accept partial translations).
The size for DE-EN is larger in contrast to other languages which contain only General MT.
A note to an issue with submission that some teams may have: The submission must be in XML format (use wrap.py) with the “.xml” extension.
Have a lovely day,
Tom
To view this discussion on the web visit https://groups.google.com/d/msgid/wmt-tasks/09fb90ac-d024-4392-8103-2a2dde7b38f0n%40googlegroups.com.
Hi,
The system got overwhelmed by most people trying to submit on the last minute. However, it is still running. Have you managed to upload your systems?
Best,
Tom
From: wmt-...@googlegroups.com <wmt-...@googlegroups.com>
On Behalf Of Hui Zeng
Sent: Friday, July 29, 2022 10:24 AM
To: Workshop on Statistical Machine Translation <wmt-...@googlegroups.com>
Subject: [EXTERNAL] Re: WMT General MT - the submission week starts now
You don't often get email from huize...@gmail.com. Learn why this is important |
--
You received this message because you are subscribed to the Google Groups "Workshop on Statistical Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
wmt-tasks+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wmt-tasks/e4595ce8-cc4f-44bf-a109-0d666760482cn%40googlegroups.com.