System outputs

Barry Haddow

unread,

Jul 30, 2021, 11:35:51 AM7/30/21

to Workshop on Statistical Machine Translation

Hi All

The system outputs (de-anonymised) for primary are now available here:

https://github.com/wmt-conference/wmt21-news-systems

You should note that several submissions are unconstrained, and this
information is not recorded in the repository yet. The
constrained/unconstrained information is available in Ocelot, which
should be back up next week.

The scores (chrF and bleu) in the above repository are not the official
scores -- these will be from human evaluation which we expect to start
shortly.

best,

Barry

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

Sandeep Subramanian

unread,

Jul 30, 2021, 10:09:41 PM7/30/21

to Workshop on Statistical Machine Translation

Hi,

Thanks for putting these scores together! Is there a reference for what BLEU-all, BLEU-A, and BLEU-B mean?

Sandeep

Barry Haddow

unread,

Jul 31, 2021, 6:08:08 AM7/31/21

to wmt-...@googlegroups.com, Sandeep Subramanian

Hi Sandeep

Bleu-A is scored with the A reference, bleu-B with the B reference and bleu-all with both references - if there are two references. All systems are scored with sacrebleu 13a tokenisation, except for Japanese (char-based) and Chinese (zh tokenisation).

I should emphasise that the automatic scores are just for guidance. We will publish only human evaluation in the overview,

best

Barry

On 31/07/2021 03:09, Sandeep Subramanian wrote:

This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email is genuine and the content is safe.

--
You received this message because you are subscribed to the Google Groups "Workshop on Statistical Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wmt-tasks+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wmt-tasks/366e762c-1235-47a7-8c9e-97e79ae72385n%40googlegroups.com.

Jeremy Gwinnup

unread,

Aug 2, 2021, 8:28:44 PM8/2/21

to Workshop on Statistical Machine Translation

HI Barry,

We're trying to replicate our ru-en system scores for newstest2021 using sacreBLEU. We've noticed that there's quite a few lines in reference-B that say "NO TRANSLATION AVAILABLE" - Is that by design? We're working with a copy of test.tgz downloaded on 27 July.

Thanks!

-Jeremy

Barry Haddow

unread,

Aug 3, 2021, 2:33:20 AM8/3/21

to wmt-...@googlegroups.com, Jeremy Gwinnup

Hi Jeremy

This should only happen for lines in the testsuites documents, which sometimes do not have a reference. You can add the --no-testsuites option to wmt-unwrap in order to ignore the test suites,

best

Barry

To view this discussion on the web visit https://groups.google.com/d/msgid/wmt-tasks/0fa4066d-44a9-4a32-8092-72fb3876721an%40googlegroups.com.

Jeremy Gwinnup

unread,

Aug 3, 2021, 11:30:18 PM8/3/21

to Workshop on Statistical Machine Translation

Barry,

Thanks for the insight - The testsuite documents were in fact the culprit. We're now running with the 1000 line --no-testsuites set.

-Jeremy

Reply all

Reply to author

Forward