Paper Share: Open-source models for development of data and metadata standards

4 views
Skip to first unread message

Mathew Biddle - NOAA Federal

unread,
Aug 4, 2025, 8:33:45 AMAug 4
to ioos...@googlegroups.com
Thought this might be of interest to the broader IOOS community.

Rokem, A., Mandava, V., Cristea, N., Tambay, A., Bouchard, K., Berys-Gonzalez, C., & Connolly, A. (2025). Open-source models for development of data and metadata standards. Patterns6(7). https://doi.org/10.1016/j.patter.2025.101316

Title: Open-source models for development of data and metadata standards

The bigger picture
Data standards are the rules and agreements that govern how data from a particular field are stored and described. These often also extend to metadata—“data about data”—which are associated information about the data that tells users how they were collected and how different parts should be interpreted. Standards are increasingly important in an age in which researchers in various fields are interested in accessing large amounts of data and integrating data of different kinds. There are many ways to develop standards for data and metadata. This review focuses on the adoption of approaches and methods that are taken from the field of open-source software (OSS) development. OSS is a dominant software development mode, and many of the world’s largest commercial, academic, and governmental computer systems run on OSS. One of the reasons for its success is the technical tools and social frameworks established by the community that allow different stakeholders to contribute collaboratively. Here, we describe some of the ways in which the communities that create and evolve data and metadata standards for different types of data have been inspired by the way OSS evolves. Standards creators have borrowed social and technical arrangements from OSS, such as the use of versioning systems or approaches to community consensus. Based on a workshop that the authors organized in April 2024, we provide a multidisciplinary and multisector perspective on this topic. We survey instances of such crossover in several different domains: neuroscience, astronomy, high-energy physics, and earth science. We identify challenges for developing data and metadata standards with the OSS model and distill recommendations for various stakeholders about how to leverage OSS approaches toward standards development.

Summary
Machine learning and artificial intelligence promise to accelerate research and understanding across many scientific disciplines. Harnessing the power of these techniques requires aggregating scientific data. In tandem, the importance of open data for reproducibility and scientific transparency is gaining recognition, and data are increasingly available through digital repositories. Leveraging efforts from disparate data collection sources, however, requires interoperable and adaptable standards for data description and storage. Through the synthesis of experiences in astronomy, high-energy physics, earth science, and neuroscience, we contend that the open-source software (OSS) model provides significant benefits for standard creation and adaptation. We highlight resultant issues, such as balancing flexibility vs. stability and utilizing new computing paradigms and technologies, that must be considered from both the user and developer perspectives to ensure pathways for recognition and sustainability. We recommend supporting and recognizing the development and maintenance of OSS data standards and software consistent with widely adopted scientific tools.

Matt
--
Mathew Biddle, Physical Scientist
NOAA/NOS
US Integrated Ocean Observing System Office
1315 East-West Highway
Silver Spring MD 20910
Reply all
Reply to author
Forward
0 new messages