JSON-LD for datasets with many(!) files

12 views
Skip to first unread message

Sebastian Karcher

unread,
Aug 19, 2025, 2:47:03 PMAug 19
to dataverse...@googlegroups.com
Dear all,

We recently published a fairly massive dataset with around 10,000 files at QDR. While our Dataverse, with some initial creaking and whining, is handling this mostly OK, we did notice that the generated JSON-LD file is massive, so we're loading a ~4MB JSON file into the site header for everyone who looks at the dataset. This seems.... suboptimal (it also appears to break google's parsing of the file).

Do any metadata experts here have an opinion on a better option for this -- like a shortened JSON-LD that takes it easy on the related items or so? Or even better, any of the repositories with large deposits like this, do you have a strategy for this?

Thanks in advance!
Sebastian
--
Sebastian Karcher, PhD
www.sebastiankarcher.com

Philip Durbin

unread,
Aug 19, 2025, 4:57:54 PMAug 19
to dataverse...@googlegroups.com
Hi Sebastian,

It's definitely a problem. Please see this related issue I opened: guidance on large Croissant files, especially in <head> - https://github.com/mlcommons/croissant/issues/646

I opened that issue the Croissant repo because as I wrote in the guides, Google is recommending Croissant over what we call Schema.org JSON-LD: https://guides.dataverse.org/en/6.7.1/admin/discoverability.html#schema-org-json-ld-croissant-metadata

That said, both formats are huge when there are many files.

Long term I'm hoping we can convince Google et al. to adopt Signposting. Stian Soiland-Reyes introduced it to them back in January: https://groups.google.com/g/dataverse-community/c/JI8HPgGarr8/m/IdDqDNV_AwAJ

I hope this helps!

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/CAOSYSD6CcDr516QMu8zNvEWLBbv-8dSLv8agy-MNJJo%3D4x5knA%40mail.gmail.com.


--
Reply all
Reply to author
Forward
0 new messages