MultEval V0.5.0: Important Bugfix Release

75 views

Skip to first unread message

unread,

Nov 29, 2012, 4:36:06 PM11/29/12

to multeval...@googlegroups.com

V0.5.0 - 11/29/2012

* Bug fix for BLEU implementation affecting only multiple reference translations (see below)

IMPORTANT: Scores from MultEval BLEU 0.5.0 are *NOT* comparable to previous versions.

Please score all of your experiments with a consistent version of all metrics.

NOTE: Jon rescored several results using the fixed version of BLEU and the differences

between systems remained virtually unchanged despite the magnitudes of the scores changing.

* Added Travis CI regression tests (see https://travis-ci.org/jhclark/multeval)

* Added ability to produce sentence-level scores via the --sentLevelDir option

* More verbose output for BLEU

Examples of BLEU bug fix's effects on an Arabic-English 4 referenc3e :

=============== V0.4.3 ======================= ||| =============== V0.5.0 ============== ||| === Comparison ==

Set | Baseline | Experimental | Improvement ||| Baseline | Experimental | Improvement ||| Improvement Delta

MT08nw | 47.8 | 47.8 | +/- ||| 48.3 | 48.4 | +/- ||| 0

MT08wb | 30.5 | 31.0 | +0.5 ||| 31.2 | 31.5 | +0.3 ||| 0.2

MT09nw | 51.6 | 51.5 | +/- ||| 53.2 | 53.1 | +/- ||| 0

MT09wb | 31.6 | 32.3 | +0.7 ||| 33.5 | 34.1 | +0.6 ||| 0.1

This same trend also held in several Chinese-English experiments with multiple

references -- absolute scores increased while relative differences remained nearly identical.

Reply all

Reply to author

Forward

0 new messages