Dear All,
I am happy to announce plans for the WMT22 news translation shared task. We are preparing to incorporate significant changes for this year and we want to share our plans with the community and collect feedback.
Here is a list of the main changes:
We welcome any comments and suggestions from the community regarding our plans and will consider all such feedback. As there could be a lot of questions and detailed discussion, you may use this document for giving feedback and reading our replies:
https://1drv.ms/w/s!Aq0goPMF_LnlhPYwO46qJGJFcq51ig?e=j6ofH4
Confirmed list of languages at WMT22 General MT task: Chinese-EN, Czech-EN, German-EN, German-French, Japanese-EN, Russian-EN, and several low-resource languages (TBA). The deadline for system submissions is planned for June (webpage https://www.statmt.org/wmt22/ will be available during the next week).
Thank you and we are looking forward to your system submissions,
On behalf of General MT task organizers,
Tom Kocmi
Hi Adam,
those are great questions.
> domains
We do not plan to disclose any information about the domains, we only provided rough estimate what could be expected as this is a first year. The goal of General MT shared task is to investigate general capabilities of MT systems (which itself is open question, how to define it). We decided to simplify it by selecting few domains which we won’t be evaluating individually. Moreover, domains can slightly differ across languages as it is difficult to obtain monolingual resources (we are targeting data created in 2021 and 2022 whenever possible).
Regarding ParaCrawl and biomedical, at this moment only what is on the webpage is allowed for constrained task. Although we still have discussions about extending the constrained task and allowing larger quantities of training data, so potentially we will extend the training set.
As for monolingual data for Ukrainian, if you know about any publicly available Ukrainian data (both mono and parallel) that are not allowed at this moment for constrained task, send me a message and we can add them (if reasonable quality and quantity). Martin Popel potentially knows about other efforts for Ukrainian.
Have a lovely day,
Tom
From: wmt-...@googlegroups.com <wmt-...@googlegroups.com>
On Behalf Of Adam Dobrowolski
Sent: Friday, April 8, 2022 2:44 PM
To: Workshop on Statistical Machine Translation <wmt-...@googlegroups.com>
Subject: [EXTERNAL] Re: WMT22 - News task is going towards multidomain
Some people who received this message don't often get email from adobrow...@gmail.com. Learn why this is important |
--
You received this message because you are subscribed to the Google Groups "Workshop on Statistical Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
wmt-tasks+...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/wmt-tasks/6378e013-f67a-4b3d-807e-4d19809867een%40googlegroups.com.