sacrebleu 2.3.0

36 views
Skip to first unread message

Matt Post

unread,
Oct 18, 2022, 5:15:15 PM10/18/22
to wmt-...@googlegroups.com
Hello,

I released sacrebleu 2.3.0 today, which normally wouldn’t merit a broadcast email, but the following features might be of interest for reasons explained below.

* New test sets, including WMT22
* Metadata fields
* System outputs
* SPM-based tokenization

## New test sets. At long last, you can get WMT22 data from sacrebleu.

sacrebleu -t wmt22 -l en-de --echo src

## Metadata fields

The `echo` option will now output other fields, such as “origlang” and “docid”. These are available even to old formats, e.g.,

sacrebleu -t wmt15 -l en-fr --echo origlang docid src
# list all fields
sacrebleu -t wmt15 --list

This makes it easy to, say, grep out non-source-language data, or group sentences by document.

## System outputs

Since the matrix is no longer with us (RIP), there has been no good way to get scores for system submissions. Thanks to the new XML format that Barry developed and deployed last year, this is now easy. You can get wmt21 systems using the wmt21/systems target, as well as wmt22.

To see a list of systems, give some invalid output to `--echo`:

sacrebleu -t wmt21/systems -l en-de --echo i-miss-the-matrix
sacrebleu -t wmt22 -l uk-cs --echo CharlesTranslator

You can then easily score them by piping the output right back into sacrebleu, e.g.,

$ sacrebleu -t wmt22 -l en-cs --echo Online-B | sacrebleu -t wmt22 -l en-cs -tok flores101
{
 "name": "BLEU",
 "score": 53.2,
 "signature": "nrefs:1|case:mixed|eff:no|tok:flores101|smooth:exp|version:2.3.0",
 "verbose_score": "72.5/57.6/48.1/41.0 (BP = 0.992 ratio = 0.992 hyp_len = 52182 ref_len = 52591)",
 "nrefs": "1",
 "case": "mixed",
 "eff": "no",
 "tok": "flores101",
 "smooth": "exp",
 "version": "2.3.0"
}

(Add “-f text” if you prefer the old text-based signature).

Here is a script to score every WMT22 de-en system:

for system in $(sacrebleu -t wmt22 --list | grep ^de-en | cut -d" " -f 7- | perl -pe "s/,//g"); do
    echo -ne "$system\t”

    sacrebleu -t wmt22 -l de-en --echo $system | sacrebleu -t wmt22 -l de-en | jq -r .score
done

JDExploreAcademy 33.7
LT22 26
Lan-Bridge 33.4
Online-A 33.3
Online-B 33.3
Online-W 32.6
Online-Y 32.9
Online-G 33.7
PROMT 32.5

(In retrospect, perhaps `--list` should work with `-l`, and output without commas…)

# SPM-based tokenizers

I added support flores200 tokenizer (flores101 was already there, thanks to James Cross), which replaces the original 13a tokenizer with big multilingual SPM models. You can use them like this:

cat output.txt | sacrebleu -t wmt21 -l en-de -tok flores200

The default remains 13a. Why? Because we’ve always done it that way!

matt
Reply all
Reply to author
Forward
0 new messages