CFP MESINESP2 track

2 views

Skip to first unread message

José María Fernández

unread,

Apr 12, 2021, 11:44:35 AM4/12/21

to biohac...@googlegroups.com

Hi everyone,
one colleague has asked me to distribute the announment of a spanish language text-mining challenge, related to DeCS (i.e. the spanish version of MeSH).

*** CFP MESINESP2 track:

Medical Semantic Indexing (BioASQ – CLEF 2021) ***

https://temu.bsc.es/mesinesp2/ (under maintenance until April 14th 19:00 CEST)

Plan TL Award for MESINESP2

There is a pressing need to improve information access, retrieval, classification, semantic annotation as well as integration across multiple document types, in particular for health-related content such as literature, clinical trials and medicinal patents.

This is especially true for multilingual content from heterogeneous sources (cross-genre), where for instance many of the initially reported COVID-19 case reports were published in a variety of languages, a considerable fraction being non-English publications.

Due to the significant practical impact of advanced semantic indexing technologies in health, and the direct collaboration and interest in the generated results by collaborating international and national healthcare organizations (BIREME/WHO, ISCIII/Spain) we are organizing the MESINESP2 shared task in collaboration with the well-established BioASQ (CLEF2021) initiative.

A variety of complementary strategies were explored so far for semantic indexing of health-content including (extreme) multi-label classification, multilingual X-BERT, transformers, graph matching, text similarity, string matching/term indexing, named entity recognition or machine translation components.

Inspired by the settings of past BioASQ tracks and our BioCreative corpora (CHEMPROT, BC4CHEMD/CHEMDNER) included in popular benchmark datasets like BioBERT, we propose the following three MESINESP2 subtracks:

MESINESP-L – Scientific Literature (sub-track 1): This track will require automatic indexing with DeCS terms of abstracts using two highly used databases in Spanish (IBECS and LILACS).
MESINESP-T – Clinical trials (sub-track 2): This track will require automatic indexing with DeCS terms of clinical trials from REEC (Registro Español de Estudios Clínicos).
MESINESP-P – Patents (sub-track 3): This track will require automatic indexing with DeCS terms the content of Spanish patents extracted from Google Patents.

Key information

MESINESP2 web, info & detailed description: https://temu.bsc.es/mesinesp2
Registration for MESINESP2: http://clef2021-labs-registration.dei.unipd.it/ and register to Task 3 – Task MESINESP: Medical Semantic Indexing In Spanish (Which is part of the workshop “BioASQ - Large-scale biomedical semantic indexing and question answering”)
Datasets: https://zenodo.org/record/4634129#.YFu0MZ1KiUl

Task impact

There is a pressing need to improve the access to information comprising health and biomedicine related documents, not only by professional medical users but also by researchers, public healthcare decision-makers, and other healthcare professionals. Information access is essential to improve knowledge of new infectious diseases such as COVID19, where researchers must efficiently access new research, but also to improve the competitiveness of the healthcare industry by improving patent intelligence processes.

Content indexing is fundamental to ensure access to relevant information. In recent years, Information Retrieval systems applied to search engines have been improved through query expansion approaches, which are often based on previous manual indexing of records with structured vocabularies that facilitate the use of more powerful document search engines. However, manual indexing is highly time-consuming, expensive, and laborious

Semantic indexing with medical vocabularies has resulted in a good solution to reduce costs and the time bottlenecks in document indexing. The importance of these technologies motivated several-shared tasks in the past, in particular the BIOASQ tracks, with a considerable number of participants and impact in the field for medical literature in English.

However, there are many other languages widely used in the biomedical field. For example, Spanish is a language spoken by more than 572 million people in the world today, either as a native, second or foreign language. According to results derived from WHO statistics, just in Spain there are over 180 thousand practicing physicians, more than 247 thousand nursing and midwifery personnel, and 55 thousand pharmaceutical personnel. These professionals use their mother tongue in the performance of their work, communicating, producing and demanding documents in their own language. For that reason, and following the outline of previous medical indexing efforts, in particular, the success of the BioASQ tracks centered on PubMed, we propose to carry out this year the second edition of the MESINESP task on semantic indexing of Spanish health-related texts.

The MESINESP2 shared-task invites researchers, medical, and industry professionals to develop automatic semantic indexing systems with structured medical vocabularies for Spanish documents. The main aim of MESINESP2 is to promote the development of semantic indexing tools of practical relevance of non-English content, determining the current state-of-the-art, identifying challenges, and comparing the strategies and results to those published for English data.

We foresee that the systems resulting from MESINESP2 will provide directly useful for a variety of use case scenarios beyond literature indexing, including competitive intelligence, prior art searches, complex search queries for systematic reviews, evidence-based medicine, decision making, as well database curation, elaboration of clinical practice guidelines. Moreover the document selection criteria of MESINESP2 considered additional scenarios of future tasks on semantic indexing of medical records.

Important dates

March, 17: Train set and guidelines release
March, 17: First development set release
April, 15: Test and Background set release
April, 30: BioASQ9 Lab @CLEF 2021 Registration Deadline
April, 30: End of the evaluation period
May, 28: Submission of Participant Papers at CLEF2021
July, 2: Camera-ready paper submission
Sep 21-24: CLEF 2021 Conference

Publications and workshop

The MESINESP2 track results will be presented at the BioASQ workshop allocated at CLEF 2021 (http://clef2021.clef-initiative.eu). Participating teams will be invited to present their systems and obtained results. Moreover, participating teams will be invited to submit their system description papers for publication at the CLEF 2021 Working Notes proceedings.

MESINESP2 awards

There will be awards for the top-scoring teams promoted by the Spanish Plan for the Advancement of Language Technology (Plan TL) and the Barcelona Supercomputing Center (BSC). We are currently managing the creation of an additional award (economic and technology transfer prize) that will allow the winning team to work with an important institution in the transfer of the results of their models to a Spanish Scientific Literature database.

Main Track organizers

Martin Krallinger, Barcelona Supercomputing Center (BSC), Spain.
Luis Gascó, Barcelona Supercomputing Center (BSC), Spain.
Anastasios Nentidis, National Center for Scientific Research Demokritos, Greece.
Elena Primo-Peña, Biblioteca Nacional de Ciencias de Salud. Instituto de Salud Carlos III, Spain.
Cristina Bojo Canales, Biblioteca Nacional de Ciencias de la Salud. Instituto de Salud Carlos III, Spain.
George Paliouras, National Center for Scientific Research Demokritos, Greece.
Anastasia Krithara, National Center for Scientific Research Demokritos, Greece.
Renato Murasaki, BIREME – Organización Panamericana de la Salud (WHO), Brasil.

Scientific Committee

David Camacho, Applied Intelligence and Data Analysis Research Group, Universidad Politécnica de Madrid (Spain)
Oscar Corcho, Ontology Engineering Group, Universidad Politécnica de Madrid (Spain)
Parminder Batia, Amazon Health AI (USA)
Irena Spasic, School of Computer Science & Informatics, co-Director of the Data Innovation Research Institute, Cardiff University (UK)
Jose Luis Redondo García, Amazon Alexa, Amazon (UK)
Carlos Badenes-Olmedo, Ontology Engineering Group, Universidad Politécnica de Madrid (Spain)
Xavier Tannier, Sorbonne Université and LIMICS (France)
Tristan Naumann, Microsoft Research (USA)
Allan Hanbury, E-Commerce Research Unit in the Faculty of Informatics, TU Wien (Austria)
Alfonso Valencia, Barcelona Supercomputing Center (Spain)
Jesús Tramullas, Departamento de Ciencias de la Documentación e Historia de la Ciencia, Universidad de Zaragoza (Spain)

Best,
José María

"There is no reason why anybody would want a computer in their 

home" -
	Ken Olson, founder of DEC 1977
"640K ought to be enough for anybody" - Bill Gates, 1981 
"Nobody will ever outgrow a 20Mb hard drive." - ???

"The three virtues of a programmer: laziness, impatience and hubris" -
	Larry Wall, creator of Perl
"Premature optimization is the root of all evil." - Donald Knuth

"Los ordenadores son inútiles. Sólo pueden darte respuestas" - Pablo Ruíz Picasso

José María Fernández González
Senior Research Scientist
e-mail: jose.m.f...@bsc.es
INB Node, Life Sciences Department
Torre Girona Building, 1st floor, Barcelona Supercomputing Center
C/. Jordi Girona, 31
Zip Code: 08034				Barcelona (Spain)
Phone: (+34) 934117074

Reply all

Reply to author

Forward

0 new messages