Hi everyone,
I’m relatively new to Dataverse, so apologies in advance if some of my questions are basic. I’m looking for some clarification on how direct download works. Specifically, I would like to understand whether, when enabling direct download:
In case there is no alternative configuration, I would really appreciate understanding how the download logic works in detail, in order to better understand the limitation of the :ZipDownloadLimit.
Additionally, I’m wondering about best practices for large datasets in Dataverse:
Would it make sense, or is it recommended, to upload both the original dataset (with all the standard Dataverse features such as metadata, versioning, and searchability) and a ZIP bundle containing the same files for easier bulk download?
Thanks in advance for any clarification or pointers to documentation.
Best regards,
Laura
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/39fcc689-4262-4234-82e1-dae312589ebbn%40googlegroups.com.
One related option: if you upload a zip only, the Zip Previewer can be used to let people download individual files without doubling the storage. That’s also a recommended option for reducing the file count in general.
-- Jim
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/CABbxx8E4aLvQasUGU5%3DCHbcr79C5PEiGEsTRduJntPUgdfUnxA%40mail.gmail.com.
Hi Jim and Philip,
Thank you for your responses and for sharing the scripts link.
I have two quick questions for clarification:
Best Regards,
Laura
From: dataverse...@googlegroups.com <dataverse...@googlegroups.com>
On Behalf Of James Myers
Sent: Wednesday, February 4, 2026 9:48 PM
To: dataverse...@googlegroups.com
Subject: RE: [Dataverse-Users] Direct download and multi-file download behavior in Dataverse
|
You don't often get email from qqm...@hotmail.com. Learn why this is important |
With full-text indexing turned on, I think the contents of zip files are indexed. I haven’t tried it, but it should be easy to verify. We use Apache Tika for indexing and it understands how to handle zip files. It is possible to set a size limit on the maximum file size to index (would apply to the zip itself) and I’ve seen Tika fail for larger files by running out of memory (just skips indexing that file).
FWIW: There are a variety of tools for upload (DVUploader, pyDataverse, …), including the browser-based DVWebloader, which are more efficient/scalable than the standard interface. For download, I think we only have non-browser tools (such as pyDataverse again), with https://github.com/gdcc/dataverse-recipes/tree/main/shell/download being the simplest. I did look at a non-zipping download solution from the browser at one point, but, at the time, there was no standardized and broadly supported way to download multiple files and write them to specified path/file names (e.g. to replicate the dataset structure). That may have changed – I know there were some proposed standards to allow it.
-- Jim
Without full text indexing, I don’t think the file names for the entries in the zip would be indexed. That wouldn’t be too hard to add to the code.
-- Jim
Hi Jim!
We will verify the indexing behavior on our side as you suggested.
Regarding the possible implementation of an external tool, yes, I was referring specifically to a browser-based external tool for download (for our end users). I’ve started an Angular tool connecting to Dataverse (using the parameter-passing mechanism). Being new, I was unsure about limitations, but your explanations clarified things, and I’m now considering continuing this approach for multi-file downloads via the browser.
Thank you again.
Best regards,
Laura