Final CFP: Multilingual Automatic Clinical Corpus Generation and Entity Extraction
https://temu.bsc.es/MultiClinAI/
*Updates: Evaluation evaluation library released
MultiClinAI is the first shared task focused on (1) the automatic creation of comparable multilingual corpora and (2) the automatic detection of key clinical concepts (diseases, symptoms, and procedures) in seven languages: Spanish, English, Italian, Dutch, Romanian, Swedish and Czech. MultiClinAI will be held as part of the #SMM4H-HeaRD Workshop at the ACL 2026 conference (online).
Key information:
Annotation guidelines: https://zenodo.org/records/13151040
Registration: https://temu.bsc.es/MultiClinAI/registration/
*Evaluation Library: https://github.com/nlp4bia-bsc/MultiClinAIEval
Motivation
Despite recent progress in clinical language technology solutions there are few high-quality annotated corpora, datasets, and annotation guidelines available for the training/evaluation of NLP- or LLM-based clinical entity recognition systems beyond English.
There is a need to foster the generation of annotated datasets in multiple languages ensuring also that they align in terms of annotation criteria to generate comparable labeled datasets across languages and promote comparable entity extraction systems. Developing multilingual models helps reduce linguistic bias and improves the global applicability of clinical language technologies. Such models enable more equitable AI deployment across different regions and healthcare systems.
Multilingual clinical NLP has numerous important use cases including:
- In international clinical trials, it can be used to extract structured data from trial sites across different countries and to ensure consistent outcome definitions across languages.
- For cohort identification, it enables the identification of eligible patients from unstructured electronic health records (EHRs) and the extraction of phenotypes for observational studies.
-In disease surveillance, multilingual systems can help detect rare diseases or emerging health trends and identify post-marketing drug safety signals.
In this context, the MultiClinAI (Multilingual Clinical Entity Annotation Projection and Extraction) shared task addresses the creation and evaluation of comparable multilingual clinical resources across seven languages, focusing on three key entity types: diseases, symptoms, and procedures.
MultiClinNER subtask: multilingual clinical named entity recognition across expert-annotated gold-standard datasets.
MultiClinCorpus subtask: automatic generation of comparable multilingual clinical corpora through annotation projection techniques.
This setup will enable a robust benchmarking scenario for multilingual clinical NLP approaches.
Schedule
MultiClinAI Shared Task – training set release (February 6, 2026)
MultiClinNER test set release (March 18, 2026)
MultiClinNER test set prediction submissions (March 25, 2026)
MultiClinCorpus test set release (March 27, 2026)
MultiClinCorpus test set prediction submissions (April 9, 2026)
Result / evaluation returned to teams (April 14, 2026)
Participant proceedings due (April 24, 2026)
Notification of acceptance (May 15, 2026)
Camera-ready papers due (May 25, 2026)
ACL Proceedings due (hard deadline) (June 1, 2026)
Workshop (online) (July 2–3, 2026)
Publications and SMM4H-HeaRD in the ACL 2026 workshop
Teams participating in MultiClinAI will be invited to contribute a systems description paper for the ACL 2026 Working Notes proceedings and a short online presentation of their approach at the ACL 2026 workshop (online).
Main Organizers
Salvador Lima-López, Barcelona Supercomputing Center (BSC), Spain.
Fernando Gallego-Donoso, Barcelona Supercomputing Center (BSC), Spain.
Jan Rodríguez-Miret, Barcelona Supercomputing Center (BSC), Spain.
Judith Rosell, Barcelona Supercomputing Center (BSC), Spain.
Martin Krallinger, Barcelona Supercomputing Center (BSC), Spain.
Scientific Committee
Francisco M. Couto, Universidade de Lisboa, Portugal.
Ulf Leser, Humboldt-Universität zu Berlin, Germany.
Guergana Savova, Boston Children’s Hospital, United States.
Lourdes Araujo, Universidad Nacional de Educación a Distancia, Spain.
Pavel Pecina, Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Czech Republic.
Halil Kilicoglu, University of Illinois at Urbana-Champaign, United States.
Rodrigo Agerri, HiTZ Centre of the University of the Basque Country, Spain.