Hi all,
The details of the WMT23 Metrics Task are up at
https://wmt-metrics-task.github.io/. We are looking for both reference-based metrics and reference-free metrics to evaluate the quality of MT systems. We’ll be using expert-based MQM annotations on Chinese-English, English-German and Hebrew-English as the primary gold standard for evaluating metrics.
We’ll be continuing the challenge sets subtask this year: we invite anyone to submit a new test suite and/or an analysis paper on metric behaviour for specific perturbations/phenomena (you’re welcome to resubmit last year’s challenge set!)
New this year:
En-de will be at a paragraph level, not sentence level, and we encourage you to develop metrics that evaluate at this level
New language pair: Hebrew-English
We will be distinguishing between public and closed metrics; please release your code + weights or LLM prompts so the MT community can easily adopt your metrics
Improved meta evaluation methodology to facilitate better evaluation of metrics that predict many ties in segment scores https://arxiv.org/pdf/2305.14324.pdf
Important dates:
Challenge sets submission deadline: 20th July
Metrics inputs ready to download: 10th August
Metric submission deadline: 17th August
Metric scores for challenge sets distributed: 24th August
Paper submission deadline to WMT: 5th September
Please register your metric submissions here and challenge set submissions here so we can keep track of participants.
Looking forward to your submissions,
Metrics 2023 team
Dear all,
Happy to announce that the metrics task inputs are now available! We have 14 language-pairs available in the generaltest2023 testset, as well as 3 additional challenge sets; for general purpose metrics, we expect participation in all language-pairs.
This year, we’ll be using the Codalab platform for submissions: https://codalab.lisn.upsaclay.fr/competitions/15074
Submission Deadline: 17th August, 2023, 11:59pm AoE (UTC-12)
Process:
Register your metric here, if you haven’t already
Create an account on Codalab.
You’re allowed one primary submission for a reference-based metric, and one primary submission for a reference-free metric. If you are submitting two metrics that have widely different approaches, for example, one LLM-based metric and one lexical metric, then create 2 accounts on Codalab.
Download the data (link; link also available on Codalab)
Prepare your scores:
Please follow the guidelines on submission format as described on the website.
The metric inputs download includes sample metrics as well as helper scripts to prepare your scores.
Submit your scores via Codalab:
When you submit your metric, Codalab might require some time to process your submission. We’ve noticed processing times between a few minutes and two hours when testing. Codalab does keep track of the submission time, so don’t panic if your last minute submission wasn’t processed before the deadline! Please contact us if it has been longer than 3 hours.
After uploading your submission, check its status (under Submit / View Results). It will return an error if there’s an issue with submission, such as formatting issues
If your submission is successful, the Codalab leaderboard currently displays correlations with an automatic metric.
You can have a maximum of 10 submissions. Don’t try to optimise your metric to have a higher correlation on the leaderboard, as this won’t generally improve your correlation with human evaluation.
Finally, the current he-en reference is highly likely to have been created via post-editing MT instead of translating from scratch. The WMT organisers are sourcing a higher quality translation, and we will require rescoring metrics at some point in the future. We appreciate the additional effort that this will require from participants, and will keep you posted on this.
Please contact us if you see any issue or have any other questions.
--
You received this message because you are subscribed to the Google Groups "WMT: Workshop on Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wmt-tasks+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wmt-tasks/CAM87YmfRJ%3D4Wfcr0Obn0DMZaj6DC4UXMi9_%2BpPv5wpOf%3Dq0UBw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wmt-tasks/CALXsDaWZLPJwmbAwj_nVs6HHgHxxm%3D8o1g4AXULtib3NDok%3DjA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wmt-tasks/3C6CE79F-04EF-44AD-9A54-CD47068A916B%40gmx.de.
To view this discussion on the web visit https://groups.google.com/d/msgid/wmt-tasks/CALXsDaXFpury1_a0GCs-T1b%2B7mXBEpag7joTanp-sM-5c%2BbYVQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wmt-tasks/F387D2E4-F1AB-47CA-9089-3AD2FFAE525D%40gmx.de.
Thanks Fred! +Ananya