Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Important UPDATES/EXTENSION: ClinSpEn medical machine translation sub-track (Biomedical WMT Task, EMNLP 2022)

1 view
Skip to first unread message

Antonio Miranda

unread,
Aug 4, 2022, 8:41:46 AM8/4/22
to
Important UPDATES/EXTENSION: ClinSpEn sub-track (Biomedical WMT Task, EMNLP 2022)

Machine Translation of Clinical cases, ontologies & medical entities: Spanish - English

https://temu.bsc.es/clinspen/

Evaluation period extension, test and background data available on Zenodo and CodaLab submission available.

The ClinSpEn track of the Biomedical WMT 2022 shared task tries to address a pressing need and emerging research topic related to the development and exploitation of multilingual clinical NLP and text mining applications.

Recent advances in neural machine translation approaches (MT) adapted to specific domains and text genres have resulted in promising results that facilitate processing of healthcare and clinical data beyond language silos.

The ClinSpEn sub-track tries to promote the use of advanced machine translation technologies applied to three high impact healthcare application scenarios:

(1) automatic translation of clinical case documents of importance to examine how MT could be further applied to cope with clinical records

(2) automatic translation of clinical terms and entity mentions extracted directly from medical records and literature to improve multilingual semantic annotation technologies

(3) automatic translation of ontologies and controlled vocabulary concepts of uttermost importance for multilingual data and concept normalization



These three scenarios will be addressed by three specific benchmark data collections used for evaluation purposes by the ClinSpEn biomedical WMT track:

ClinSpEn-CC (Clinical Cases): EN>ES translation of clinical case documents.

ClinSpEn-CT (Clinical Terms): ES>EN translation of clinical terms and entity mentions extracted from records and literature.

ClinSpEn-OC (Ontology Concepts): EN>ES translation of highly used open clinical controlled vocabularies and ontology concepts.



Important links:

ClinSpEn web: https://temu.bsc.es/clinspen/

Biomedical WMT web: https://statmt.org/wmt22/biomedical-translation-task.html

WMT2022: https://statmt.org/wmt22/

EMNLP conference: https://2022.emnlp.org/

Data (NEW!):

Clinical Cases: https://doi.org/10.5281/zenodo.6497350

Clinical Terms: https://doi.org/10.5281/zenodo.6497372

Ontology Concepts: https://doi.org/10.5281/zenodo.6497388

CodaLab: https://codalab.lisn.upsaclay.fr/competitions/6696

Team Registration (mandatory): https://temu.bsc.es/clinspen/registration/

For the ClinSpEn track Gold Standard manual translations generated by professional medical translators have been generated to evaluate participating teams. The primary evaluation metric to be used for this track will be SacreBLEU.

Participants will also have access to a larger background collection to promote scalability and robustness assessment of machine translation technology.


Updated schedule:

Participant Predictions Due: August 30th, 2022 (UPDATED EXTENSION!)

Paper Submission: September 7th, 2022

Acceptance notification: October 9th, 2022

Camera-ready version: October 16th, 2022

WMT workshop at EMNLP: December 7th and 8th, 2022





Publications and workshop


Participating teams will be invited to contribute a systems description paper for the WMT 2022 Working Notes proceedings. This workshop will be part of the prestigious EMNLP 2022 conference. More information on the paper’s specifications, formatting guidelines and review process at: https://statmt.org/wmt22/index.html.




ClinSpEn Track Organizers

Salvador Lima-López (BSC)

Darryl Johan Estrada (BSC)

Eulàlia Farré-Maduell (BSC)

Martin Krallinger (BSC)


Biomedical WMT Organizers

Rachel Bawden (University of Edinburgh, UK)

Giorgio Maria Di Nunzio (University of Padua, Italy)

Darryl Johan Estrada (Barcelona Supercomputing Center, Spain)

Eulàlia Farré-Maduell (Barcelona Supercomputing Center, Spain)

Cristian Grozea (Fraunhofer Institute, Germany)

Antonio Jimeno Yepes (University of Melbourne, Australia)

Salvador Lima-López (Barcelona Supercomputing Center, Spain)

Martin Krallinger (Barcelona Supercomputing Center, Spain)

Aurélie Névéol (Université Paris Saclay, CNRS, LISN, France)

Mariana Neves (German Federal Institute for Risk Assessment, Germany)

Roland Roller (DFKI, Germany)

Amy Siu (Beuth University of Applied Sciences, Germany)

Philippe Thomas (DFKI, Germany)

Federica Vezzani (University of Padua, Italy)

Maika Vicente Navarro, Maika Spanish Translator, Melbourne, Australia

Dina Wiemann (Novartis, Switzerland)

Lana Yeganova (NCBI/NLM/NIH, USA)

0 new messages