3rd International Workshop on Natural Scientific Language Processing (NSLP 2026)
12 May 2026 – Co-located with LREC 2026
Palma, Mallorca (Spain)
NSLP 2026 features three shared tasks:
NSLP 2026 – important dates:
NSLP 2026 website (including the shared tasks):
https://nfdi4ds.github.io/nslp2026
Scientific research has witnessed a steep growth rate over the last decades. The number of scholarly publications is growing exponentially, and doubles every 15-17 years. Consequently, both general and specialised repositories, databases, knowledge graphs, and digital libraries have been developed to publish and manage scientific artifacts. Examples include the Open Research Knowledge Graph (ORKG), the Semantic Scholar Academic Graph (S2AG), PubMed Central and also the ACL Anthology. These resources enable the collection, reuse, tracking, and expansion of scientific findings, and facilitate downstream applications such as scientific search engines.
However, in order to develop robust systems that deal with scholarly text, various challenges need to be addressed. The current status quo of scientific communication mostly includes scholarly articles as unstructured PDF documents, which are not machine-readable in the sense that relevant scientific information can be extracted easily, thus making extracting and utilising this information as part of the scientific process a laborious and time-consuming task. Developing methods for converting unstructured information into structured formats is one of the major challenges in the field of Natural Scientific Language Processing (NSLP). This goal encompasses related challenges such as detecting, disambiguating, and linking mentions of scientific artifacts (e.g., software tools or specific datasets or language resources), and tracking state-of-the-art models and their evaluation scores (including new versions of existing models). Extracting and managing heterogeneous scientific knowledge effectively remains a challenging ongoing research area. Existing efforts are often fragmented, addressing separate issues with distinct datasets and conceptual approaches.
NSLP 2026 addresses current topics and issues in Natural Scientific Language Processing. It is proposed and organised with the support of NFDI for Data Science and Artificial Intelligence (NFDI4DS), a long-term project with approx. 20 partners who work towards building a German national research data infrastructure for DS and AI. The workshop aims to further bring together the international community of researchers who work on NSLP and related topics (including research knowledge graphs), to discuss current issues and possible solutions. NSLP 2026 includes two keynote speakers and presentations of accepted papers (oral and poster presentations), as well as three shared tasks.
Topics of interest include, but are not limited to
Important Dates
Submission Guidelines
The NSLP 2026 workshop invites submissions of: regular long papers; short papers; position papers. We especially encourage submissions from junior researchers and students from diverse backgrounds.
When submitting a paper through START, the authors will be asked to provide essential information about resources (in a broad sense, i.e., also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones).
Keynote Speakers
Shared Tasks
ClimateCheck 2026: Scientific Fact-Checking of Social Media Claims
The rise of climate discourse on social media offers new channels for public engagement but also amplifies mis- and disinformation. As online platforms increasingly shape public understanding of science, tools that ground claims in trustworthy, peer-reviewed evidence are necessary. The new iteration of ClimateCheck builds on the results and insights from the 2025 iteration (run at SDP 2025/ACL 2025), offering the following subtasks:
Subtask 1: Abstract retrieval and claim verification: given a claim and corpus of publications, retrieve the top 10 most relevant abstracts and classify each claim-abstract pair as supports, refutes, or not enough information.
Subtask 2: Disinformation narrative classification: given a claim, predict which climate disinformation narrative exists according to a predefined taxonomy.
New training data will be released for both tasks, with task 1 having triple the amount of the last iteration. The new iteration will focus on sustainability, emphasising the need to build climate-friendly NLP systems with minimal environmental impact.
Shared task co-organisers: Raia Abu Ahmad, Aida Usmanova, Max Upravitelev, Georg Rehm
SciVQA 2026: Scientific Visual Question Answering
Scientific papers communicate information through unstructured text as well as (semi-)structured figures and tables. Jointly reasoning over both modalities benefits downstream applications such as visual question answering (VQA). SciVQA 2026 builds on the insights from SciVQA 2025 (run at SDP 2025/ACL 2025), shifting the focus toward evaluating the ability of multimodal LLMs to reason over combined modalities (figures, tables, text). SciVQA 2026 will include a new set of papers and entirely new annotations, featuring two subtasks:
Subtask 1: Context retrieval: Given a question, a paper, and its corpus of paragraphs and images, retrieve the relevant context (tables, figures, paragraphs from the main text) required to answer it.
Subtask 2: Answer generation: Given a question and the context retrieved from the first task, generate an answer.
Shared task co-organisers: Ekaterina Borisova, Georg Rehm
SOMD 2026: Software Mention Detection & Coreference Resolution
Understanding software mentions is crucial for reproducibility and to interpret experimental results. Citations of software are often informal, lacking the use of persistent identifiers, making it hard to infer and disambiguate knowledge about software efficiently. This task will build on SOMD 2025 (run at SDP 2025, co-located with ACL 2025) and focus on entity disambiguation as an under-investigated problem in this context. More precisely, we address the task of coreference resolution of software mentions across multiple documents, i.e. given a set of software mentions extracted from multiple scientific publications, cluster these mentions so that all software mentions in a particular cluster refer to the same real world software. We define three subtasks with varying challenges:
Subtask 1: Software coreference resolution over gold standard mentions. Addresses the task based on high-quality (gold standard) mentions of software that are expert-annotated in multiple publications.
Subtask 2: Software coreference resolution over predicted mentions. Addresses the task on software mentions that are automatically extracted using a baseline model, i.e. reflecting a typical information extraction scenario, where upstream pipelines (such as entity and metadata extraction) are imperfect.
Subtask 3: Software coreference resolution at scale. Addresses the task using predicted mentions of software and metadata at a larger scale. This challenges models to scale effectively, maintain accuracy, and distinguish among an increasingly dense field of similar or overlapping software mentions.
Shared task co-organisers: Sharmila Upadhyaya, Stefan Dietze, Frank Krüger, Wolfgang Otto
Organisers
Programme Committee