Dataset creation for GEM in local language

Skip to first unread message

A Shvets

Jun 7, 2021, 5:59:31 AMJun 7
to gem-benchmark
Dear GEM organizers,

The management of the company I'm working for is interested in creation of an opensource dataset, similar to CommonGen, in French language. I'm therefore asking you for the requirements this dataset must meet in order to be included to GEM benchmark.

Thank you in advance for your answer,

Sebastian Gehrmann

Jun 7, 2021, 8:55:27 AMJun 7
to A Shvets, gem-benchmark
Hi Anna, 

Our plan is to have a yearly selection process for GEM (similar to our initial one) that decides which datasets we want to focus on. This year's process will start after the workshop in August. While I do not dictate the criteria, I can assume that we will prefer datasets with interesting and challenging test splits, good documentation and non-English language.

Besides that, even if not selected for the main challenges, it is fairly easy to make a task compatible with our evaluation framework and the one for creating challenge sets, which can be done independently (and we are happy to help).


You received this message because you are subscribed to the Google Groups "gem-benchmark" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit
For more options, visit
Reply all
Reply to author
0 new messages