one colleague has asked me to distribute the announment of a spanish language text-mining challenge, related to DeCS (i.e. the spanish version of MeSH).
Medical Semantic Indexing (BioASQ – CLEF 2021) ***
(under maintenance until April 14th 19:00 CEST)
Plan TL Award for MESINESP2
There is a pressing need to improve information access, retrieval, classification, semantic annotation as well as integration across multiple document types, in particular for health-related content such as literature, clinical trials and medicinal patents.
This is especially true for multilingual content from heterogeneous sources (cross-genre), where for instance many of the initially reported COVID-19 case reports were published in a variety of languages, a considerable fraction being non-English publications.
Due to the significant practical impact of advanced semantic indexing technologies in health, and the direct collaboration and interest in the generated results by collaborating international and national healthcare organizations (BIREME/WHO, ISCIII/Spain) we are organizing the MESINESP2 shared task in collaboration with the well-established BioASQ (CLEF2021) initiative.
A variety of complementary strategies were explored so far for semantic indexing of health-content including (extreme) multi-label classification, multilingual X-BERT, transformers, graph matching, text similarity, string matching/term indexing, named entity recognition or machine translation components.
Inspired by the settings of past BioASQ tracks and our BioCreative corpora (CHEMPROT, BC4CHEMD/CHEMDNER) included in popular benchmark datasets like BioBERT, we propose the following three MESINESP2 subtracks:
There is a pressing need to improve the access to information comprising health and biomedicine related documents, not only by professional medical users but also by researchers, public healthcare decision-makers, and other healthcare professionals. Information access is essential to improve knowledge of new infectious diseases such as COVID19, where researchers must efficiently access new research, but also to improve the competitiveness of the healthcare industry by improving patent intelligence processes.
Content indexing is fundamental to ensure access to relevant information. In recent years, Information Retrieval systems applied to search engines have been improved through query expansion approaches, which are often based on previous manual indexing of records with structured vocabularies that facilitate the use of more powerful document search engines. However, manual indexing is highly time-consuming, expensive, and laborious
Semantic indexing with medical vocabularies has resulted in a good solution to reduce costs and the time bottlenecks in document indexing. The importance of these technologies motivated several-shared tasks in the past, in particular the BIOASQ tracks, with a considerable number of participants and impact in the field for medical literature in English.
However, there are many other languages widely used in the biomedical field. For example, Spanish is a language spoken by more than 572 million people in the world today, either as a native, second or foreign language. According to results derived from WHO statistics, just in Spain there are over 180 thousand practicing physicians, more than 247 thousand nursing and midwifery personnel, and 55 thousand pharmaceutical personnel. These professionals use their mother tongue in the performance of their work, communicating, producing and demanding documents in their own language. For that reason, and following the outline of previous medical indexing efforts, in particular, the success of the BioASQ tracks centered on PubMed, we propose to carry out this year the second edition of the MESINESP task on semantic indexing of Spanish health-related texts.
The MESINESP2 shared-task invites researchers, medical, and industry professionals to develop automatic semantic indexing systems with structured medical vocabularies for Spanish documents. The main aim of MESINESP2 is to promote the development of semantic indexing tools of practical relevance of non-English content, determining the current state-of-the-art, identifying challenges, and comparing the strategies and results to those published for English data.
We foresee that the systems resulting from MESINESP2 will provide directly useful for a variety of use case scenarios beyond literature indexing, including competitive intelligence, prior art searches, complex search queries for systematic reviews, evidence-based medicine, decision making, as well database curation, elaboration of clinical practice guidelines. Moreover the document selection criteria of MESINESP2 considered additional scenarios of future tasks on semantic indexing of medical records.
Publications and workshop
The MESINESP2 track results will be presented at the BioASQ workshop allocated at CLEF 2021 (http://clef2021.clef-initiative.eu). Participating teams will be invited to present their systems and obtained results. Moreover, participating teams will be invited to submit their system description papers for publication at the CLEF 2021 Working Notes proceedings.
There will be awards for the top-scoring teams promoted by the Spanish Plan for the Advancement of Language Technology (Plan TL) and the Barcelona Supercomputing Center (BSC). We are currently managing the creation of an additional award (economic and technology transfer prize) that will allow the winning team to work with an important institution in the transfer of the results of their models to a Spanish Scientific Literature database.
Main Track organizers
"There is no reason why anybody would want a computer in their home" - Ken Olson, founder of DEC 1977 "640K ought to be enough for anybody" - Bill Gates, 1981 "Nobody will ever outgrow a 20Mb hard drive." - ??? "The three virtues of a programmer: laziness, impatience and hubris" - Larry Wall, creator of Perl "Premature optimization is the root of all evil." - Donald Knuth "Los ordenadores son inútiles. Sólo pueden darte respuestas" - Pablo Ruíz Picasso José María Fernández González Senior Research Scientist e-mail: jose.m.f...@bsc.es INB Node, Life Sciences Department Torre Girona Building, 1st floor, Barcelona Supercomputing Center C/. Jordi Girona, 31 Zip Code: 08034 Barcelona (Spain) Phone: (+34) 934117074