Re: [Dataverse-Users] Dataverse GUI Downloader

16 views
Skip to first unread message

Sergej Zr

unread,
May 8, 2026, 10:13:07 AM (6 days ago) May 8
to dataverse...@googlegroups.com
Hi Phil,

Thanks for your feedback — that’s a very interesting observation.

For tabular files, the MD5 checksum in Dataverse is apparently computed from the original TSV file stored in Dataverse. However, downloading the file via:

https://dataverse.harvard.edu/api/access/datafile/6867331

returns a modified TAB version of the file, which naturally results in a different checksum.

It would probably be more consistent if Dataverse did not serve a transformed file by default while still exposing the checksum of the original file. On the other hand, this may very well be an intentional design decision in Dataverse.

In any case, thanks for pointing this out! I’ve just fixed the issue by adding format=original&gbrecs=true to the download URL so that the original file is downloaded and the checksum validation now works correctly.

Best regards,
Sergej

P.S. Below is the metadata for the file. The checksum is clearly associated with the original TSV file rather than the transformed TAB file returned by the default download endpoint:

On Fri, May 8, 2026 at 11:49 AM 'Philip Durbin' via Dataverse Users Community <dataverse...@googlegroups.com> wrote:
Hi Sergej!

This download tool seems great, especially since it has a graphical user interface (GUI). The only other download tool I'm aware of is also great but it's a command line interface (CLI) tool. I was added in this pull request: https://github.com/gdcc/dataverse-recipes/pull/17

Back to your tool, it mostly worked but I did get a "checksum validation failed" on the one tabular file in https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/TJCLKP (which is my dataset). I'll attach a screenshot.

Thanks and see you in Barcelona!

Phil

On Thu, May 7, 2026 at 3:53 PM Sergej Zr <serg...@gmail.com> wrote:

Dear Dataverse Community,

We would like to share a small open-source tool that we are developing at the University of Bonn to simplify downloading large and complex datasets from Dataverse installations.

We are the Service Center for Research Data at the University of Bonn and operate our Dataverse installation within the university IT - Center.

The motivation behind the tool was a problem we repeatedly encountered with large or complex datasets:
downloads may fail, interrupted transfers are difficult to resume, and it can become unclear which files have already been downloaded successfully — especially when datasets are organized in hierarchical folder structures and partial downloads are difficult.

To address this, we developed a lightweight desktop downloader with features such as:

  • resumable downloads
  • selective file and folder downloads
  • checksum verification
  • progress tracking
  • preservation of folder structures
  • support for large datasets and unstable connections

The project is fully open source and available here:

https://github.com/sergejzr/harvard-dataverse-downloader

We have already received first positive feedback from the German Dataverse community and would be very happy to hear from others as well:

  • Are you using similar tools already (what tools are those)?
  • Which download workflows work well for your users?
  • Are there features you would consider important?

We are also currently testing direct Dataverse integration via custom deep links (e.g. opening datasets directly from the browser into the downloader application).

Feedback, ideas, and contributions are very welcome (here or directly at GitHub).

Best regards from Bonn,

Sergej Zerr & the RDM Team
University of Bonn
Service Center for Research Data / University IT

PS: See you in Barcelona next week! :)

image.png



--
-- 
Dr. Sergej Zerr
Hochschulrechenzentrum Bonn 
Servicestelle Forschungsdatenmanagement - SFD
Tel: +49 228 73-4121
Raum: 3.011
Wegelerstrasse 6
53115 Bonn
www.hrz.uni-bonn.de

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/CACdcJw2qcqTTueDtuTi%2Bhu2O8Ay60xM3hNABhJV_Hvvsd1dqsg%40mail.gmail.com.


--

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/CABbxx8FivxLvjCNpQrefh%3DqT7HLWmnxnwq-KGZh1ATte-asXzQ%40mail.gmail.com.


--
-- 
Dr. Sergej Zerr
Hochschulrechenzentrum Bonn 
Servicestelle Forschungsdatenmanagement - SFD
Tel: +49 228 73-4121
Raum: 3.011
Wegelerstrasse 6
53115 Bonn
www.hrz.uni-bonn.de
Reply all
Reply to author
Forward
0 new messages