Largest Number of Files in a Dataset

43 views
Skip to first unread message

Sherry Lake

unread,
Jul 27, 2021, 9:51:25 AM7/27/21
to Dataverse Users Community
For curiosity sake....

What is the largest number of files you have seen in a dataset?

I know there are time-outs, limits to the number of files zipped for download, and other problems related to file size and numbers of files. I also know there are practical reasons for not having "lots" of files in a dataset.

I am trying to find a good way to tell my researcher who wants to upload 200,000 files (yes that is a 6-digit # of files) to our repository - "don't do it". Oh, and of course Dataverse UI times out on the zip upload - of course, I don't blame it.

Thanks for listening.
Sherry Lake



Sebastian Karcher

unread,
Jul 27, 2021, 9:53:21 AM7/27/21
to dataverse...@googlegroups.com
We have one with about ~2,000 files and the app creaked mightily not just during upload but also in every subsequent action with the dataset. I don't think you can make 200k files happen and be happy ;)

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/968a1860-d489-41e8-b704-f4a1ca9cffb6n%40googlegroups.com.


--
Sebastian Karcher, PhD
www.sebastiankarcher.com

danny...@g.harvard.edu

unread,
Jul 27, 2021, 9:59:16 AM7/27/21
to Dataverse Users Community
Hi - on the Harvard Dataverse Repository we have some datasets around the 5,000 file mark and there are issues in creating, editing, displaying, and publishing these datasets. In these cases we usually work with the person to upload one or several zip files instead of individual files. 

- Danny 

Philipp at UiT

unread,
Aug 14, 2021, 11:30:16 AM8/14/21
to Dataverse Users Community
Hi, on DataverseNO we have a few datasets containing over 1000 files, maybe 1000-5000 files. Since we have activated DOIs at file level, we had been experiencing some issues when trying to publish datasets with many files. Therefore, we advise our depositors not to include more than about 500 files in one single dataset. If they have more than approx. 500 files, they should zip them, and if there still would be more than approx. 500 zip files, they should consider splitting the dataset into multiple datasets.

Best, Philipp

Reply all
Reply to author
Forward
0 new messages