Missing references in DART Test set

Skip to first unread message

Nivranshu Pasricha

Jun 9, 2021, 2:15:35 PMJun 9
to gem-benchmark

It looks like some targets/references are missing from the DART test set. Examples which have an empty list as references have the 'dart_id' in the range 1391-3252 (inclusive). 

Nivranshu Pasricha

Nivranshu Pasricha

Jun 9, 2021, 2:48:29 PMJun 9
to gem-benchmark
There are missing references in the CommonGen test set and challenge set too.

I am using the gem_metrics evaluation script to get the scores for writing in the system description but cannot get the correct scores as references are missing. Would it be possible for the GEM benchmark organisers to provide results on our submissions before system description deadline? We submitted outputs on the test set and the challenge sets a few days ago on May 31st. 


Sebastian Gehrmann

Jun 10, 2021, 9:49:08 AMJun 10
to Nivranshu Pasricha, gem-benchmark
Hi Nivranshu, 

Sorry about the issues with DART - the file in the original repository was changed which means that the examples don't match up anymore. This is something we cannot easily fix, but I am currently looking into the issue. CommonGen and ToTTo have a private test set, so we unfortunately cannot distribute any infos here. 

For your system description, are you able to get scores on the validation set? If not, please open an issue on the metrics repository and we will try to help / run the evaluation for you. 


You received this message because you are subscribed to the Google Groups "gem-benchmark" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gem-benchmar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gem-benchmark/05a38e49-2339-4024-9683-65d37d55a7b1n%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nivranshu Pasricha

Jun 10, 2021, 1:07:08 PMJun 10
to gem-benchmark
Thanks for your reply Sebastian. Yes, I can get the scores on the validation set and I will add these to our system description. I had posted an issue with the evaluation script a few weeks ago related to BERTScore but I have it working fine now. Thanks!
Reply all
Reply to author
0 new messages