Mission Accomplished for Vectors of Human Disease Data MobilizationThe final papers coming out in the GigaByte-GBIF-TDR Vectors of Human Disease series provides a nice opportunity to look back at the lessons learned from this targeted data mobilization approach
Regular readers may have seen my previous post covering some of the outputs from the sponsored data mobilization campaigns I facilitated at GigaScience Press that helped in the sharing of vectors of human disease data crucial in understanding and tackling some of the major killer diseases and new zoonotic outbreaks such as Oropouche virus that is spreading across the Americas. The last two papers to come from the three successive calls for papers have finally come out, so this milestone provides a nice opportunity to look back at the lessons learned from this targeted data mobilization approach. With that reason in mind our technical partners at River Valley Technologies have acknowledged this milestone on their and the ALPSP news pages, and they also kindly helped me publish a piece in Editorial Office News (EON) on how this was achieved from an infrastructure perspective. For those of you who may not keep up with the Editorial Office infrastructure literature I thought it could be useful to reshare an adapted version of these news items here. Data sharing is important in any field, and open biodiversity data are arguably one of the most important datasets for understanding the planet; helping prevent diseases, model the effects of climate change, and figure out where we need to tackle these problems. The de facto home for this data is GBIF (the Global Biodiversity Information Facility), an international network and data infrastructure funded by the world’s governments to provide open access to data about all types of life on Earth. To create global impact, public health datasets have to be discoverable, citable, and easy to reuse (see also the FAIR principles). In practice, valuable datasets remain under-utilised because publication and data-sharing steps can be too expensive, too slow, or operationally too complex. This is particularly true for researchers in parts of the world that are disproportionately affected by these public health challenges, but have traditionally lacked the resources and expertise to tackle them effectively. This is where we stepped in, with a WHO-supported collaboration with GBIF and our GigaByte journal that tackled the issue of this under-utilised and shared data proactively. In order to reduce the burden for contributors, the programme sponsored publication costs, a GBIF health data helpdesk, as well as hands-on support from the GigaScience Press GigaDB team for curation and data audits. The practical outcomes of these efforts have been improved accessibility, better discovery, stronger linking, and better reuse. Sponsorship removed the cost barrier (WHO covering the APCs), but scalability also came from making the process genuinely easier for contributors, through helpdesk support, data audits, and a publishing workflow we could run repeatedly. This program enabled the following:
This was truly a global effort, working with and sharing data from over 70 countries. The first call having a majority of papers from Latin-America, and the final call mostly publishing work from African authors (authors from Democratic Republic of Congo winning a Ben Barres Spotlight Award for their submission). Why metadata matters in global health publishingMetadata quality is not a technical detail. It enables discovery, linking, indexing, and machine-readability. In data-rich publishing, strong metadata practices help ensure that outputs travel properly through scholarly infrastructure and remain reusable long after publication. This programme mobilized data and papers relevant to disease surveillance and response, including the first large-scale and open disease vector datasets for the following:
River Valley supported this work through its end-to-end publishing platform, enabling repeatable workflows across calls and supporting structured publishing. The result was multilingual and machine readable outputs, interactive content, and the richest metadata in the publishing industry (which enabled us to win an inaugral Crossref Excellence in Metadata Award. You can see a video covering the features of the workflow we put together, and if you are looking to mobilise high-value datasets in a similar manner please get in touch with RVT and myself. To explore the final outputs from this effort check out the 31 data papers (and one Editorial) collected together via the series page here: https://doi.org/10.46471/GIGABYTE_SERIES_0002 Further ReadingEdmunds SC et al. Publishing data to support the fight against human vector-borne diseases, GigaScience, Volume 11, 2022, giac114, https://doi.org/10.1093/gigascience/giac114 Shimabukuro S et al, Bridging Biodiversity and Health: The Global Biodiversity Information Facility’s initiative on open data on vectors of human diseases, GigaByte, 2024 https://doi.org/10.46471/gigabyte.117 Edmunds C. (2026). Scaling Equitable Data Publishing: Insights for Editorial Teams From a WHO-Sponsored Program. EON. https://doi.org/10.18243/eon/2026.19.3.5 Scott Edmunds is free today. But if you enjoyed this post, you can tell Scott Edmunds that their writing is valuable by pledging a future subscription. You won't be charged unless they enable payments. |