BLEU and ROUGE scores for DART

43 views

Skip to first unread message

Nivranshu Pasricha

unread,

Mar 29, 2021, 6:14:52 AM3/29/21

to gem-benchmark

Hi,

Can you please confirm if the results for DART in Table 2 in the GEM paper are correctly reported? The BLEU and ROUGE scores are reported to be 0.02 and 0.0 respectively for the BART model, with similar scores for the T5 model.

However, adapting the starter code (with the BART-base model) to the DART dataset yields a ROUGE-2 score of 44.7 and ROUGE-L score of 54.9, while BLEU score from sacrebleu is 44.18 on the validation set. Running the evaluation script from gem-metrics results in slightly different scores which can be seen here.

Regards,

Nivranshu

Reply all

Reply to author

Forward

0 new messages