Final CfP for the Workshop on Generation, Evaluation, and Metrics (GEM) at ACL ’21

Skip to first unread message

Sebastian Gehrmann

Apr 22, 2021, 3:37:46 PMApr 22

Final call for papers and shared task submissions for the Workshop on Generation, Evaluation, and Metrics (GEM) at ACL ’21

Update April 22: Our Paper submission deadline has been extended to May 3! Please submit your papers at this SoftConf link. The shared task submission deadline is May 14.

Update March 29: We have released our challenge sets! You can inspect and load them using HuggingFace Datasets or TFDS. For details, please see our updated writeup


Call for Participation


Natural language generation is one of the most active research fields in NLP. As such, the number of available datasets, metrics, models, and evaluation strategies is rising rapidly. Consequently, new models are often evaluated on different anglo-centric tasks with incompatible evaluation setups. With GEM, we are aiming to tackle this problem by standardizing and improving the corpora on which to evaluate NLG models, and by supporting the development of better evaluation approaches. In our shared task, models will be applied to a wide set of NLG tasks. It covers challenges that measure specific generation aspects, such as content selection and planning, surface realization, paraphrasing, simplification, and others. To avoid hill-climbing on automated metrics, a second part of the shared task focuses on an in-depth analysis of submitted model outputs across both human and automatic evaluation with the aim to uncover shortcomings and opportunities for progress. The GEM Workshop is a SIGGEN-endorsed event.


Shared Tasks


The shared task is described in-depth here:

It includes two parts:

  1. In the first part, participants are encouraged to apply their model to as many of the included tasks as possible and submit their formatted outputs. We provide GEM-specific test sets that will be used to evaluate specific generation aspects.

  2. In the second part, all submitted and baseline outputs will be released for an evaluation shared task. Participants can submit analyses and evaluations of the model outputs.

During the GEM workshop, shared task participants will come together to discuss their findings which will inform future iterations of GEM.


Call for Papers


All papers are allowed unlimited space for references and appendices. For papers associated with the shared task, we additionally highly encourage publishing the code used to generate the results. We ask for papers in the following categories:

- System Descriptions

Participants of the modeling shared task are invited to submit a system description of 4-8 pages.

- System Evaluation Descriptions

Participants of the evaluation shared task are invited to submit a paper describing their analysis approach and findings of 4-8 pages.

- Research Papers

We welcome papers discussing any of the following topics:

  • Automatic evaluation of NLG systems

  • Creating challenge sets for NLG corpora

  • Critiques of benchmarking efforts (including ours)

  • Crowdsourcing strategies to improve the inclusiveness of NLG research

  • Measuring progress in NLG / What should a GEM 2.0 look like

  • Modeling and data-augmentation strategies for training effective and/or efficient NLG systems that can be applied to a wide range of tasks

  • Standardizing human evaluation and making it more robust

We additionally invite every group that contributed to the creation and organization of GEM to submit a description of their considerations and contributions.

These submissions can take either of the following forms:

  • Archival Papers Papers describing original and unpublished work can be submitted in either a short (4-page) or a long (8-page) format.

  • Non-Archival Abstracts To discuss work already presented or under review at a peer-reviewed venue, we allow the submission of 2-page abstracts

Please note that we are not looking at submissions that focus on specific modeling challenges or introduce new model architectures, etc., which would fit better into conferences like ACL or INLG.


All submissions should conform to ACL 2021 style guidelines. Archival long and short paper submissions must be anonymized. Abstracts and shared task submission descriptions should include author information. Please submit your papers at the SoftConf link.


Important Dates



✅February 2 First Call for Shared Task Submissions and Papers, Release of the Training Data

April 26  May 3 Workshop Paper Due Date (excl. shared tasks)

May 28 Notification of Acceptance (excl. shared tasks)

June 7 Camera-ready papers due (excl. shared tasks)

Shared Task Dates


February 2 Release of the training Data

March 29 Release of the test sets

May 14 Modeling submissions due


March 29 April 2 Release of the baseline outputs

May 17 Release of the submission outputs

System Descriptions and Analyses

June 11 System Descriptions and Analyses due

June 25 Notification of Acceptance (shared task)

July 9 Camera-ready papers and task descriptions due

August 5-6 Workshop Dates




The workshop is organized by

The shared task and the GEM environment is organized by a larger team which is listed on this page.

Reply all
Reply to author
0 new messages