CFP: Help Build FAIR Scientific Process Schemas (schema.org–Inspired, ORKG-Based)

26 views
Skip to first unread message

Jennifer D'Souza

unread,
Dec 9, 2025, 7:45:50 AM (3 days ago) Dec 9
to Machine Learning News

Dear colleagues,

We are writing to invite your collaboration in creating a community-driven collection of scientific process schemas, inspired by the spirit of schema.org but focused on capturing experimental and simulation workflows across scientific domains. These schemas will be openly published as ORKG templates and will form the basis of a paper planned for Nature Scientific Data.

Motivation

Machine learning increasingly depends on scientific data from diverse fields—materials science, chemistry, biology, environmental science, psychology, engineering, and more. Yet the processes underlying these datasets (e.g., ALD, CVD, PCR, CRISPR, tensile testing, Fischer–Tropsch synthesis, RCTs, fMRI task protocols) remain almost entirely unstructured in the literature.

This makes it extremely difficult to compare experiments, reproduce results, build FAIR datasets, or train ML systems that can reason about variations in scientific methods. If we want reliable scientific ML systems—whether for retrieval, inference, prediction, simulation, or automated experiment design—we first need standardized, machine-actionable descriptions of scientific processes.

Our goal is to create a community library of such schemas, covering many research workflows. These schemas will support FAIR metadata, reproducible benchmarks, and ML models capable of understanding experimental conditions rather than treating papers as raw text.

Why Collaborate

We are seeking contributors who can provide collections of full-text articles (~50+) describing a specific experimental or simulation process in their field. You may also offer expert feedback on automatically mined schemas or run schema-miner yourself. Individual or small-team participation is welcome, and co-authorship opportunities are available depending on involvement.

A wide variety of processes can be included—thin-film deposition, synthetic chemistry reactions, gene editing workflows, fatigue testing, soil leaching experiments, drug dissolution assays, fMRI tasks, cognitive experiments, and many more.
A broader list (non-exhaustive) is here:
https://docs.google.com/document/d/1iyL1l9vCXhnQ0To7j79vlr-pW4JvPlQC95svygqRDfg/edit

How to Participate

Please register your interest using this short form:
👉 https://forms.gle/9WEdouw4yMyNHcn19

We will notify selected contributors by January 31, 2026. The data collection and schema mining will conclude by April 30, 2026, followed by manuscript preparation.

We hope you will consider contributing to this effort to build FAIR, comparable representations of scientific processes—an essential step toward more transparent and trustworthy scientific ML. Also please help us spread the word!

Best regards,
Jennifer D’Souza, TIB Hannover
(on behalf of the schema-miner coordination team)


Reply all
Reply to author
Forward
0 new messages