Hello all,
I understand from this discussion that Dataverse is able to upload very big files.
How many of these files is safe to upload on a single dataset, to avoid problems in viewing the dataset and download the files?
There are limits on the wheight of a single file that can be uploaded, or on the total weight of files uploaded on a single dataset?
Thanks and best regards
Stefano
Stefano,
There are no fixed limits on data file size or number of files. There are configurations that handle very large files (TBs+). In general, people try to limit datasets to hundreds or thousands of files, but whether a given instance can support that depends on its resources and configuration. It’s definitely true that larger files and more files decrease the performance.
I’m currently working on a Big Data Admin Guide page (supported by The Texas Digital Library) – it might give you a better sense of all the factors that contribute. You can read the draft page here: https://dataverse-guide--11850.org.readthedocs.build/en/11850/admin/big-data-administration.html
(It’s a work in progress – happy to have anyone’s feedback or additions.)
-- Jim
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
dataverse-commu...@googlegroups.com.
To view this discussion visit
https://groups.google.com/d/msgid/dataverse-community/f4731964-da90-4460-9ea2-ee0c26ccd3c5n%40googlegroups.com.
Hi Stefano,
I’ll speak from my experience; as Jim mentioned, it highly depends on the resources allocated to your Dataverse instance.
We currently host datasets with hundreds of gigabytes. From what I learned, it’s not the size of the files, but the number of them that causes problems. To avoid that, we zip multiple files.
The first time I experimented, I uploaded a dataset that contained around 9000 files, none of which were zipped, and the total size was about 200GB. The record became unusable, as it was taking the Dataverse a long time to pull up the information about all those files. I then zipped chunks and reduced the number of files to 380. The Dataverse liked it a lot more and is showing the record and letting users download everything.
Here is that record with 380 files: https://doi.org/10.26027/DATAZEWSOH
Here is an example of a dataset where we have a few files that are over 100GB https://doi.org/10.26027/DATAD05EAS
From that lesson, we are now trying to limit the number of files to no more than 100. The size doesn’t matter.
We transfer large datasets with Globus with the dataverse-globus app and store them in the VAST S3 bucket.
Hope this somewhat helps. I know this would greatly vary depending on the infrastructure set up for the system.
Systems Librarian
MBLWHOI Library
Data Library and Archives
Woods Hole Oceanographic Institution
mblwhoilibrary.org -- whoi.edu
From: dataverse...@googlegroups.com <dataverse...@googlegroups.com>
On Behalf Of James Myers
Sent: Monday, September 29, 2025 4:24 PM
To: dataverse...@googlegroups.com
Subject: [EXTERNAL] RE: [Dataverse-Users] dataset with large files
This email originated outside of WHOI. Please use caution if clicking on links or opening attachments.