BLEU and ROUGE scores for DART

43 views
Skip to first unread message

Nivranshu Pasricha

unread,
Mar 29, 2021, 6:14:52 AM3/29/21
to gem-benchmark
Hi,

Can you please confirm if the results for DART in Table 2 in the GEM paper are correctly reported? The BLEU and ROUGE scores are reported to be 0.02 and 0.0 respectively for the BART model, with similar scores for the T5 model.

However, adapting the starter code (with the BART-base model) to the DART dataset yields a ROUGE-2 score of 44.7 and ROUGE-L score of 54.9, while BLEU score from sacrebleu is 44.18 on the validation set. Running the evaluation script from gem-metrics results in slightly different scores which can be seen here.

Regards,
Nivranshu


Reply all
Reply to author
Forward
0 new messages