1st CFP: Call for SemEval Task Proposals 2027

45 views
Skip to first unread message

Ekaterina Kochmar

unread,
Jan 20, 2026, 6:50:19 AM (yesterday) Jan 20
to ml-...@googlegroups.com

Introduction


We invite proposals for tasks to be run as part of SemEval-2027. SemEval (the International Workshop on Semantic Evaluation) is an ongoing series of evaluations of computational semantics systems, organized under the umbrella of SIGLEX, the Special Interest Group on the Lexicon of the Association for Computational Linguistics.


SemEval tasks investigate the nature of meaning in natural languages, exploring how to characterize and compute meaning. This is achieved in practical terms, using shared datasets and standardized evaluation metrics to quantify the strengths and weaknesses and possible solutions. SemEval tasks encompass a broad range of semantic topics from the lexical level to the discourse level, including word sense identification, semantic parsing, coreference resolution, and sentiment analysis, among others.


For SemEval-2027, we welcome tasks that can test an automatic system for semantic analysis of text (e.g., intrinsic semantic evaluation, or an application-oriented evaluation). We especially encourage tasks for languages other than English, cross-lingual tasks, and tasks that develop novel applications of computational semantics. See the websites of previous editions of SemEval to get an idea about the range of tasks explored, e.g., SemEval-2020 (http://alt.qcri.org/semeval2020/) and SemEval-2021/2026 (https://semeval.github.io).


We strongly encourage proposals based on pilot studies that have already generated initial data, evaluation measures, and baselines. In this way, we can avoid unforeseen challenges down the road that may delay the task. We suggest providing a reasonable baseline (e.g., providing a Transformer / LLM baseline for a classification task) apart from the majority vote / random guess.


In case you are not sure whether a task is suitable for SemEval, please feel free to get in touch with the SemEval organizers at <semevalo...@gmail.com> to discuss your idea.


Task Selection


Task proposals will be reviewed by experts, and reviews will serve as the basis for acceptance decisions. Everything else being equal, more innovative new tasks will be given preference over task reruns. Task proposals will be evaluated on:

  1. Novelty: Is the task on a compelling new problem that has not been explored much in the community? Is the task a rerun, but covering substantially new ground (new subtasks, new types of data, new languages, etc. - one addition is not sufficient)?

  2. Interest: Is the proposed task likely to attract a sufficient number of participants?

  3. Data: Are the plans for collecting data convincing? Will the resulting data be of high quality? Will annotations have meaningfully high inter-annotator agreements? Have all appropriate licenses for use and re-use of the data after the evaluation been secured? Have all international privacy concerns been addressed? Will the data annotation be ready on time?

  4. Evaluation: Is the methodology for evaluation sound? Is the necessary infrastructure available, or can it be built in time for the shared task? Will research inspired by this task be able to evaluate in the same manner and on the same data after the initial task? Is the task significantly challenging (e.g., room for improvement over the baselines)? 

  5. Impact: What is the expected impact of the data in this task on future research beyond the SemEval Workshop?

  6. Ethical –  The data must be compliant with privacy policies. e.g.  

    1. avoid personally identifiable information (PII). Tasks aimed at identifying specific people will not be accepted.

    2. avoid medical decision making (compliance with HIPAA, do not try to replace medical professionals, especially if it has anything to do with mental health).

    3. these are representative and not exhaustive.


Roles -


  • Lead Organizer - main point of contact, expected to ensure deliverables are met on time and participate in contributing to task duties (see below).

  • Co-Organizers - provide significant contributions to ensuring the task runs smoothly. Some examples include maintaining communication with task participants, preparing data, creating and running evaluation scripts, leading paper reviewing, and acceptance. 

  • Advisory Organizers - more of a supervisor role, may not contribute to detailed tasks, but will provide guidance and support.


New Tasks vs. Task Reruns


We welcome both new tasks and task reruns. For a new task, the proposal should address whether the task would be able to attract participants. Preference will be given to novel tasks that have not received much attention yet.


For reruns of previous shared tasks (whether or not the previous task was part of SemEval), the proposal should address the need for another iteration of the task. Valid reasons include: a new form of evaluation (e.g., a new evaluation metric, a new application-oriented scenario), new genres or domains (e.g., social media, domain-specific corpora), or a significant expansion in scale. We further discourage carrying over a previous task and just adding new subtasks, as this can lead to the accumulation of too many subtasks. Evaluating on a different dataset with the same task formulation, or evaluating on the same dataset with a different evaluation metric, typically should not be considered a separate subtask.


Task Organization


We welcome people who have never organized a SemEval task before, as well as those who have. Apart from providing a dataset, task organizers are expected to:

- Verify the data annotations have sufficient inter-annotator agreement.

- Verify licenses for the data allow its use in the competition and afterwards. In particular, text that is publicly available online is not necessarily in the public domain; unless a license has been provided, the author retains all rights associated with their work, including copying, sharing and publishing. For more information, see: https://creativecommons.org/faq/#what-is-copyright-and-why-does-it-matter 

- Resolve any potential security, privacy, or ethical concerns about the data.

- Commit to make the data available also after the task in a long-term repository under an appropriate license, preferably using Zenodo: https://zenodo.org/communities/semeval/ 

- Provide task participants with format checkers and standard scorers.

- Provide task participants with baseline systems to use as a starting point (in order to lower the obstacles to participation). A baseline system typically contains code that reads the data, creates a baseline response (e.g., random guessing, majority class prediction), and outputs the evaluation results. Whenever possible, baseline systems should be written in widely used programming languages and/or should be implemented as a component for standard NLP pipelines.

- Create a mailing list and website for the task and post all relevant information there.

- Create a CodaLab or other similar competition for the task and upload the evaluation script.

- Manage submissions on CodaLab or a similar competition site.

- Write a task description paper to be included in SemEval proceedings, and present it at the workshop.

- Manage participants’ submissions of system description papers, manage participants’ peer review of each other’s papers, and possibly shepherd papers that need additional help in improving the writing.

- Review other task description papers.




Desk Rejects


- To ensure tasks have sufficient support, we require a minimum of two organizers at the time of proposal submission. A task proposal with only one organizer will be desk-rejected. Running a SemEval task is a significant time commitment; therefore, we highly recommend that a task have at least three-four organizers.

- A person can be a lead organizer on only one task. The second mandatory organizer on the task must be committed to the task as a key co-organizer. Any other organizers (beyond the lead and co-organizer) can participate in other tasks.

- All data should have a research-friendly license. The licensing must be provided in the proposal.

- Task organizers must commit to keeping the data available after the task, either by keeping the task alive, by uploading it to Zenodo or some other public data storage location that will be permanent, and sharing the link with the organizers.



=== Important dates ===


- Task proposals due 13 April 2026 (Anywhere on Earth)

- Task selection notification 25 May 2026


=== Preliminary timetable ===


- Sample data ready 15 July 2026

- Training data ready 1 September 2026

- Evaluation data ready 1 December 2026 (internal deadline; not for public release)

- Evaluation start 10 January 2027

- Evaluation end by 31 January 2027 (latest date; task organizers may choose an earlier date)

- Paper submission due February 2027

- Notification to authors March 2027

- Camera ready due April 2027

- SemEval workshop Summer 2027 (co-located with a major NLP conference)


Tasks that fail to keep up with crucial deadlines (such as the dates for having the task and CodaLab website up and dates for uploading sample, training, and evaluation data) may be cancelled at the discretion of SemEval organizers. While consideration will be given to extenuating circumstances, our goal is to provide sufficient time for the participants to develop strong and well-thought-out systems. Cancelled tasks will be encouraged to submit proposals for the subsequent year’s SemEval. To reduce the risk of tasks failing to meet the deadlines, we are unlikely to accept multiple tasks with overlap in the task organizers.



Submission Details


The task proposal should be a self-contained document of no longer than 3 pages (plus additional pages for references). All submissions must be in PDF format, following the ACL template: https://github.com/acl-org/acl-style-files 


Each proposal should contain the following:

- Overview

  - Summary of the task

  - Why this task is needed and which communities would be interested in participating

  - Expected impact of the task

- Data & Resources

  - How the training/testing data will be produced. Please discuss whether existing corpora will be reused.

  - Details of copyright and license, so that the data can be used by the research community both during the SemEval evaluation and afterwards

  - How much data will be produced

  - How data quality will be ensured and evaluated

  - An example of what the data would look like

  - Resources required to produce the data and prepare the task for participants (annotation cost, annotation time, computation time, etc.)

  - Assessment of any concerns with respect to ethics, privacy, or security (e.g., personally identifiable information of private individuals; potential for systems to cause harm)

- Pilot Task (strongly recommended)

  - Details of the pilot task

  - What lessons were learned, and how these will impact the task design

- Evaluation

  - The evaluation methodology to be used, including clear evaluation criteria

- For Task Reruns

  - Justification for why a new iteration of the task is needed (see criteria above)

  - What will differ from the previous iteration

  - Expected impact of the rerun compared with the previous iteration

- Task organizers

  - Names, affiliations, email addresses

  - (optional) brief description of relevant experience or expertise

  - (if applicable) years and task numbers of any SemEval tasks you have run in the past


Proposals will be reviewed by an independent group of area experts who may not have familiarity with recent SemEval tasks, and therefore, all proposals should be written in a self-explanatory manner and contain sufficient examples.


The submission webpage is: [TBC]

 

=== Chairs ===


Debanjan Ghosh, Analog Devices, USA

Kai North, Cambium Assessment, USA

Shervin Malmasi, Amazon Inc., USA

Ekaterina Kochmar, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), UAE

Mamoru Komachi, Hitotsubashi University, Japan

Marcos Zampieri, George Mason University, USA



Contact: semevalo...@gmail.com

Ekaterina Kochmar

unread,
Jan 20, 2026, 8:03:56 AM (yesterday) Jan 20
to ml-...@googlegroups.com

First Call for Papers

The 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)

San Diego, California, United States and online

Thursday, July 2 and Friday, July 3, 2026

(co-located with ACL 2026)

https://sig-edu.org/bea/current

 Submission Deadline: Thursday, March 5, 2026, 11:59pm UTC-12

 

WORKSHOP DESCRIPTION

The BEA Workshop is a leading venue for NLP innovation in the context of educational applications. It is one of the largest one-day workshops in the ACL community with over 100 registered attendees in the past several years. The growing interest in educational applications and a diverse community of researchers involved resulted in the creation of the Special Interest Group in Educational Applications (SIGEDU) in 2017, which currently has over 400 members. 

The 21st BEA will be a 2-day workshop, with one in-person workshop day and one virtual workshop day. The workshop will feature oral presentation sessions and large poster sessions to facilitate the presentation of a wide array of original research. Moreover, there will be a panel discussion on “
Transitioning from Academia to the EdTech Industrya half-day tutorial on Theory of Mind and its Applications in Educational Contexts, and two shared tasks on Vocabulary Difficulty Prediction for English Learners and on Rubric-based Short Answer Scoring for German comprising an oral overview presentation by the shared task organizers and several poster presentations by the shared task participants.

 

The workshop will accept submissions of both full papers and short papers, eligible for either oral or poster presentation. We solicit papers that incorporate NLP methods, including, but not limited to:

· use of generative AI in education and its impact;

· automated scoring of open-ended textual and spoken responses;

· automated scoring/evaluation for written student responses (across multiple genres);

· game-based instruction and assessment;

· educational data mining;

· intelligent tutoring;

· collaborative learning environments;

· peer review;

· grammatical error detection and correction;

· learner cognition;

· spoken dialog;

· multimodal applications;

· annotation standards and schemas;

· tools and applications for classroom teachers, learners and/or test developers; and

· use of corpora in educational tools.

 

SHARED TASKS

 

Vocabulary Difficulty Prediction for English Learners

Organizers: Mariano Felice (British Council) and Lucy Skidmore (British Council).

Description: This shared task aims to advance research into vocabulary difficulty prediction for learners of English with diverse L1 backgrounds, an essential step towards custom content creation, computer-adaptive testing and personalised learning. In a context where traditional item calibration methods have become a bottleneck for the implementation of digital learning and assessment systems, we believe predictive NLP models can provide a more scalable, cost-effective solution. The goal of this shared task is to build regression models to predict the difficulty of English words given a learner’s L1. We believe this new shared task provides a novel approach to vocabulary modelling, offering a multidimensional perspective that has not been explored in previous work. To this aim, we will use the British Council’s Knowledge-based Vocabulary Lists (KVL), a multilingual dataset with psychometrically calibrated difficulty scores. We believe this unique dataset is not only an invaluable contribution to the NLP community but also a powerful resource that will enable in-depth investigations into how linguistic features, L1 background and contextual cues influence vocabulary difficulty.

 

For more information on how to participate and latest updates, please refer to the shared task website: https://www.britishcouncil.org/data-science-and-insights/bea2026st

 

Rubric-based Short Answer Scoring for German

Organizers: Sebastian Gombert (DIPF), Zhifan Sun (DIPF), Fabian Zehner (DIPF), Jannik Lossjew (IPN), Tobias Wyrwich (IPN), Berrit Katharina Czinczel (IPN), David Bednorz (IPN), Sascha Bernholt (IPN), Knut Neumann (IPN), Ute Harms (IPN), Aiso Heinze (IPN), and Hendrik Drachsler (DIPF)

 

Description: Short answer scoring is a well-established task in educational natural language processing. In this shared task, we introduce and focus on rubric-based short-answer scoring, a task formulation in which models are provided with a question, a student answer, and a textual scoring rubric that specifies criteria for each possible score level. Successfully solving this task requires models to interpret the semantics of scoring rubrics and apply their criteria to previously unseen answers, closely mirroring how human raters assign scores in educational assessment. Although rubrics have been used as auxiliary information in prior work on free-text scoring and LLM-based approaches, there has been little focused investigation of rubric-based short-answer scoring as a task in its own right. This setting poses distinct challenges, including ambiguous or underspecified rubric criteria and a wide range of valid student responses. With this shared task, we aim to stimulate systematic research on rubric-based scoring, assess how well current NLP methods can reason over rubrics, and identify promising modeling strategies. Additionally, by providing a German-language dataset, the shared task contributes a new non-English benchmark to the field.

 

For more information on how to participate and latest updates, please refer to the shared task website: https://edutec.science/bea-2026-shared-task/

 

TUTORIAL

 

Theory of Mind and Application in Educational Context

Organizers: Effat Farhana (Auburn University), Maha Zainab (Auburn University), Qiaosi Wang (Carnegie Mellon University), Niloofar Mireshghallah (Carnegie Mellon University), Ramira van der Meulen (Leiden University), Max van Duijn (Leiden University).

Description: This tutorial examines the integration of Theory of Mind (ToM) into AI-driven online tutoring systems, focusing on how advanced technologies, such as Large Language Models (LLMs), can model learners’ cognitive and emotional states to provide adaptive, personalized feedback. Participants will learn foundational principles of ToM from cognitive science and psychology and how these concepts can be operationalized in AI systems. We will discuss mutual ToM, where both AI tutors and learners maintain models of each other’s mental states, and address challenges such as detecting learner misconceptions, modeling meta-cognition, and maintaining privacy in data-driven tutoring. The tutorial also presents hands-on demonstrations of Machine ToM applied to programming education using datasets such as CS1QA and CodeQA, which contain Java and Python samples. By combining conceptual foundations, research insights, and practical exercises, this tutorial provides a comprehensive overview of designing human-centered, ethically aware, and cognitively informed AI tutoring systems.

 

 

IMPORTANT DATES

All deadlines are 11.59 pm UTC-12 (anywhere on earth).

· Submission deadline: Thursday, March 5, 2026

· Notification of acceptance: Tuesday, April 28, 2026

· Camera-ready papers due: Tuesday, May 12, 2026

· Workshop: Thursday, July 2, and Friday, July 3, 2026 

SUBMISSION INFORMATION

We will be using the ACL Submission Guidelines for the BEA Workshop this year. Authors are invited to submit a long paper of up to eight (8) pages of content, plus unlimited references; final versions of long papers will be given one additional page of content (up to 9 pages) so that reviewers’ comments can be taken into account. We also invite short papers of up to four (4) pages of content, plus unlimited references. Upon acceptance, short papers will be given five (5) content pages in the proceedings. Authors are encouraged to use this additional page to address reviewers’ comments in their final versions. We generally follow ACL submission guidelines and will require that all submitted papers should include a dedicated "Limitations" section, which does not count toward the page limit.

Papers which describe systems are also invited to give a demo of their system. If you would like to present a demo in addition to presenting the paper, please make sure to select either “long paper + demo” or “short paper + demo” under “Submission Category” in the START submission page.

Previously published papers cannot be accepted. The submissions will be reviewed by the program committee. As reviewing will be blind, please ensure that papers are anonymous. Self-references that reveal the author’s identity, e.g., “We previously showed (Smith, 1991) …”, should be avoided. Instead, use citations such as “Smith previously showed (Smith, 1991) …”.

We have also included conflict of interest in the submission form. You should mark all potential reviewers who have been authors on the paper, are from the same research group or institution, or who have seen versions of this paper or discussed it with you.

We will be using the START conference system to manage submissions. The link will be provided soon.

 

DOUBLE SUBMISSION POLICY

We will follow the official ACL double-submission policy. Specifically, papers being submitted both to BEA and another conference or workshop must:

· Note on the title page the other conference or workshop to which they are being submitted.

· State on the title page that if the authors choose to present their paper at BEA (assuming it was accepted), then the paper will be withdrawn from other conferences and workshops.

 

ORGANIZING COMMITTEE

· Ekaterina Kochmar, MBZUAI

· Andrea Horbach, Hildesheim University

· Ronja Laarmann-Quante, Ruhr University Bochum

· Marie Bexte, FernUniversität in Hagen

· Anaïs Tack, KU Leuven, imec

· Victoria Yaneva, National Board of Medical Examiners

· Bashar Alhafni, MBZUAI

· Zheng Yuan, University of Sheffield

· Jill Burstein, Duolingo

· Stefano Bannò, Cambridge University

Workshop contact email address: bea.nlp....@gmail.com

 

 

PROGRAM COMMITTEE

Tazin Afrin; David Alfter; Bashar Alhafni; Maaz Amjad; Nischal Ashok Kumar; Stefano Bannò; Michael Gringo Angelo Bayona; Lee Becker; Beata Beigman Klebanov; Luca Benedetto; Bhavya Bhavya; Serge Bibauw; Ted Briscoe; Dominique Brunato; Jie Cao; Dan Carpenter; Jeevan Chapagain; Guanliang Chen; Mei-Hua Chen; Christopher Davis; Orphee De Clercq; Kordula De Kuthy; Jasper Degraeuwe; Dushyanta Dhyani; Yuning Ding; Rahul Divekar; Kosuke Doi; Mohsen Dorodchi; Yo Ehara; Hamza El Alaoui; Sarra El Ayari; Andrew Emerson; Yao-Chung Fan; Mariano Felice; Nigel Fernandez; Michael Flor; Thomas François; Thomas Gaillat; Ananya Ganesh; Ritik Garg; Sebastian Gombert; Samuel González López; Cyril Goutte; Abigail Gurin Schleifer; Na-Rae Han; Ching Nam Hang; Jiangang Hao; Aki Härmä; Hasnain Heickal; Chieh-Yang Huang; Chung-Chi Huang; Radu Tudor Ionescu; Elsayed Issa; N J Karthika; Anisia Katinskaia; Elma Kerz; Fazel Keshtkar; Grandee Lee; Ji-Ung Lee; Arun Balajiee Lekshmi Narayanan; Jiazheng Li; Anastassia Loukina; Wanjing Anya Ma; Jakub Macina; Lieve Macken; Nitin Madnani; Arianna Masciolini; Detmar Meurers; Michael Mohler; Phoebe Mulcaire; Ricardo Muñoz Sánchez; Sungjin Nam; Diane Napolitano; Huy Nguyen; S Jaya Nirmala; Sergiu Nisioi; Michael Noah-Manuel; Adam Nohejl; Amin Omidvar; Daniel Oyeniran; Robert Östling; Ulrike Pado; Yannick Parmentier; Ted Pedersen; Mengyang Qiu; Martí Quixal; Chatrine Qwaider; Arjun Ramesh Rao; Vivi Peggie Rantung; Manikandan Ravikiran; Hanumant Redkar; Robert Reynolds; Saed Rezayi; Frankie Robertson; Aiala Rosá; Andreas Säuberli; Nicy Scaria; Ronald Seoh; Pritam Sil; Astha Singh; Lucy Skidmore; Maja Stahl; Katherine Stasaski; Helmer Strik; Hakyung Sung; Sowmya Vajjala; Elena Volodina; Nikhil Wani; Alistair Willis; Fabian Zehner.

 

Reply all
Reply to author
Forward
0 new messages