dataset with large files

32 views
Skip to first unread message

Stefano Bolelli Gallevi

unread,
Sep 29, 2025, 4:09:24 PM (2 days ago) Sep 29
to Dataverse Users Community

Hello all, 

I understand from this discussion that Dataverse is able to upload very big files.

How many of these files is safe to upload on a single dataset, to avoid problems in viewing the dataset and download the files?

There are limits on the wheight of a single file that can be uploaded, or on the total weight of files uploaded on a single dataset? 

Thanks and best regards
Stefano

 


James Myers

unread,
Sep 29, 2025, 4:24:09 PM (2 days ago) Sep 29
to dataverse...@googlegroups.com

Stefano,

There are no fixed limits on data file size or number of files. There are configurations that handle very large files (TBs+). In general, people try to limit datasets to hundreds or thousands of files, but whether a given instance can support that depends on its resources and configuration. It’s definitely true that larger files and more files decrease the performance.

 

I’m currently working on a Big Data Admin Guide page (supported by The Texas Digital Library) – it might give you a better sense of all the factors that contribute. You can read the draft page here: https://dataverse-guide--11850.org.readthedocs.build/en/11850/admin/big-data-administration.html

 

(It’s a work in progress – happy to have anyone’s feedback or additions.)

 

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/f4731964-da90-4460-9ea2-ee0c26ccd3c5n%40googlegroups.com.

Kirill Batyuk

unread,
Sep 30, 2025, 8:36:01 AM (yesterday) Sep 30
to dataverse...@googlegroups.com

Hi Stefano,

I’ll speak from my experience; as Jim mentioned, it highly depends on the resources allocated to your Dataverse instance. 

We currently host datasets with hundreds of gigabytes. From what I learned, it’s not the size of the files, but the number of them that causes problems. To avoid that, we zip multiple files.

The first time I experimented, I uploaded a dataset that contained around 9000 files, none of which were zipped, and the total size was about 200GB. The record became unusable, as it was taking the Dataverse a long time to pull up the information about all those files. I then zipped chunks and reduced the number of files to 380. The Dataverse liked it a lot more and is showing the record and letting users download everything. 

Here is that record with 380 files: https://doi.org/10.26027/DATAZEWSOH

Here is an example of a dataset where we have a few files that are over 100GB https://doi.org/10.26027/DATAD05EAS

From that lesson, we are now trying to limit the number of files to no more than 100. The size doesn’t matter. 

We transfer large datasets with Globus with the dataverse-globus app and store them in the VAST S3 bucket. 

Hope this somewhat helps. I know this would greatly vary depending on the infrastructure set up for the system.

 

Kirill Batyuk A button for name playback in email signature

Systems Librarian

MBLWHOI Library

Data Library and Archives

Woods Hole Oceanographic Institution

508-289-2850

kba...@whoi.edu

mblwhoilibrary.org -- whoi.edu

 

 

From: dataverse...@googlegroups.com <dataverse...@googlegroups.com> On Behalf Of James Myers
Sent: Monday, September 29, 2025 4:24 PM
To: dataverse...@googlegroups.com
Subject: [EXTERNAL] RE: [Dataverse-Users] dataset with large files

 

This email originated outside of WHOI. Please use caution if clicking on links or opening attachments.

Reply all
Reply to author
Forward
0 new messages