Direct download and multi-file download behavior in Dataverse

62 views
Skip to first unread message

Laura Lo Gerfo

unread,
Feb 4, 2026, 11:48:39 AMFeb 4
to Dataverse Users Community

Hi everyone,

I’m relatively new to Dataverse, so apologies in advance if some of my questions are basic. I’m looking for some clarification on how direct download works. Specifically, I would like to understand whether, when enabling direct download:

  • You can confirm that the direct download is actually active only for single files, whereas when selecting multiple files, a ZIP file is generated instead, and therefore no separate URLs are created for downloading each selected file individually;
  • Is there a specific configuration that allows downloading files one by one using their respective S3 URLs, similar to how Dataverse handles file uploads, even when multiple files are selected, instead of generating a ZIP file?

In case there is no alternative configuration, I would really appreciate understanding how the download logic works in detail, in order to better understand the limitation of the :ZipDownloadLimit.

Additionally, I’m wondering about best practices for large datasets in Dataverse:

Would it make sense, or is it recommended, to upload both the original dataset (with all the standard Dataverse features such as metadata, versioning, and searchability) and a ZIP bundle containing the same files for easier bulk download?

Thanks in advance for any clarification or pointers to documentation.

Best regards,
Laura

Philip Durbin

unread,
Feb 4, 2026, 3:23:59 PMFeb 4
to dataverse...@googlegroups.com
Hi Laura,

Welcome. Good questions.

Direct download is only for single files.

To download many single files at once, you could try the "download files from a dataset" script* at https://github.com/gdcc/dataverse-recipes

As for duplicating the files as a zip file, I don't love the idea, especially for large files, but, yes, it would be a workaround. When the user clicks the "download all (or some) files as a zip" button, the zip file is created on-the-fly (which can take a while) on the Dataverse server and it isn't served over direct download. That's why we have this :ZipDownloadLimit threshold you mentioned to prevent a poor user experience when many large files would be zipped up.

Perhaps Dataverse should work like Google Takeout, which generates a zip for you and then emails you when it's ready. 

I hope this helps! Please keep the questions coming!

Thanks,

Phil

* The download script was originally added in this pull request: https://github.com/gdcc/dataverse-recipes/pull/17


--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/39fcc689-4262-4234-82e1-dae312589ebbn%40googlegroups.com.


--

James Myers

unread,
Feb 4, 2026, 3:48:03 PMFeb 4
to dataverse...@googlegroups.com

One related option: if you upload a zip only, the Zip Previewer can be used to let people download individual files without doubling the storage. That’s also a recommended option for reducing the file count in general.

-- Jim

Laura Lo Gerfo

unread,
Feb 11, 2026, 3:06:35 AMFeb 11
to dataverse...@googlegroups.com

Hi Jim and Philip,

Thank you for your responses and for sharing the scripts link.

I have two quick questions for clarification:

  1. When ZIP files are uploaded via "direct upload," does Dataverse index the contents of these ZIPs?
  2. Regarding downloading multiple files, would it make sense to develop an external tool, to allow users to download individual files directly from S3 rather than only generating a ZIP I’ve already worked on a solution for improving the upload process and could potentially extend this to include a download feature as well, if there are no technical concerns from your side.

 

Best Regards,

 

Laura

 

 

From: dataverse...@googlegroups.com <dataverse...@googlegroups.com> On Behalf Of James Myers
Sent: Wednesday, February 4, 2026 9:48 PM
To: dataverse...@googlegroups.com
Subject: RE: [Dataverse-Users] Direct download and multi-file download behavior in Dataverse

 

You don't often get email from qqm...@hotmail.com. Learn why this is important

James Myers

unread,
Feb 11, 2026, 12:07:52 PMFeb 11
to dataverse...@googlegroups.com
  1. When ZIP files are uploaded via "direct upload," does Dataverse index the contents of these ZIPs?

With full-text indexing turned on, I think the contents of zip files are indexed. I haven’t tried it, but it should be easy to verify. We use Apache Tika for indexing and it understands how to handle zip files. It is possible to set a size limit on the maximum file size to index (would apply to the zip itself) and I’ve seen Tika fail for larger files by running out of memory (just skips indexing that file).

  1. Regarding downloading multiple files, would it make sense to develop an external tool, to allow users to download individual files directly from S3 rather than only generating a ZIP I’ve already worked on a solution for improving the upload process and could potentially extend this to include a download feature as well, if there are no technical concerns from your side.

FWIW: There are a variety of tools for upload (DVUploader, pyDataverse, …), including the browser-based DVWebloader, which are more efficient/scalable than the standard interface. For download, I think we only have non-browser tools (such as pyDataverse again), with https://github.com/gdcc/dataverse-recipes/tree/main/shell/download being the simplest. I did look at a non-zipping download solution from the browser at one point, but, at the time, there was no standardized and broadly supported way to download multiple files and write them to specified path/file names (e.g. to replicate the dataset structure). That may have changed – I know there were some proposed standards to allow it.

 

-- Jim

James Myers

unread,
Feb 11, 2026, 12:12:25 PMFeb 11
to dataverse...@googlegroups.com

Without full text indexing, I don’t think the file names for the entries in the zip would be indexed. That wouldn’t be too hard to add to the code.

-- Jim

Laura Lo Gerfo

unread,
Feb 12, 2026, 3:17:11 AMFeb 12
to dataverse...@googlegroups.com

Hi Jim!

We will verify the indexing behavior on our side as you suggested.

Regarding the possible implementation of an external tool, yes, I was referring specifically to a browser-based external tool for download (for our end users). I’ve started an Angular tool connecting to Dataverse (using the parameter-passing mechanism). Being new, I was unsure about limitations, but your explanations clarified things, and I’m now considering continuing this approach for multi-file downloads via the browser.


Thank you again.

Best regards,
Laura

Philip Durbin

unread,
Feb 12, 2026, 3:07:12 PMFeb 12
to dataverse...@googlegroups.com
An external tool to download multiple files sounds great! Please keep us posted! Since you're using Angular, you might find this library that we've created for the new frontend useful: https://github.com/IQSS/dataverse-client-javascript

Reply all
Reply to author
Forward
0 new messages