Is anyone using the rsync feature?

74 views
Skip to first unread message

Philip Durbin

unread,
Oct 17, 2023, 7:40:59 AM10/17/23
to dataverse...@googlegroups.com
Is anyone using the rsync feature? I'm asking because we're talking about deprecating it, first by removing it from the guides and list of features*, then by not offering it in the new (React) UI**, then by removing it from the backend. Something like that.

Here's the issue to remove rsync from the guides: https://github.com/IQSS/dataverse/issues/8985

Thanks,

Phil



p.s. Here's an example of how having rsync in the guides is causing confusion: https://groups.google.com/g/dataverse-community/c/TMiv80BmpPA/m/1jUb64ODAQAJ

--

Sergej Zr

unread,
Oct 17, 2023, 12:51:51 PM10/17/23
to dataverse...@googlegroups.com
Hello Philip,
we are actually receiving requests for interfaces (other than web interface) to upload/download large files. At this time, I postponed the internal discussion on it until we receive a concrete use case.

When rsync feature becomes deprecated, will there be any alternative features for such tasks, or will be the handling of large files (we speak about several TBs) not possible in dataverse?

Thanks
Sergej

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CABbxx8EUEHsBCULeqHyAeFCU3btO8u6WXMx-be98AEWF%3DJm2Nw%40mail.gmail.com.


--
-- 
Dr. Sergej Zerr
Hochschulrechenzentrum Bonn 
Servicestelle Forschungsdatenmanagement - SFD
Tel: +49 228 73-4121
Raum: 3.011
Wegelerstrasse 6
53115 Bonn
www.hrz.uni-bonn.de

Philip Durbin

unread,
Oct 17, 2023, 1:50:43 PM10/17/23
to dataverse...@googlegroups.com
Great question!

At Harvard Dataverse we frequently use DVUploader (by Jim Myers) to upload large files or many files (or both): https://guides.dataverse.org/en/6.0/user/dataset-management.html#command-line-dvuploader



You could also push the files into Dataverse from a variety of tools such as GitHub, GitLab, OSF, RSpace, iRODS, etc: https://guides.dataverse.org/en/6.0/admin/integrations.html#getting-data-in

All that said, I'd advise starting with DVUploader. Oh and you'll want direct file upload enabled (requires S3-compatible storage): https://guides.dataverse.org/en/6.0/developers/big-data-support.html#s3-direct-upload-and-download

I hope this helps! Please keep the questions coming.

Thanks,

Phil

Sebastian Karcher

unread,
Oct 17, 2023, 2:12:37 PM10/17/23
to dataverse...@googlegroups.com
As I understand, AWS only supports up to 5TB for a single file. It's also going to be on the pricey side if you're running standard AWS S3 (>US$1k for a 5TB file + hundreds of dollars for each write out). 

For genuinely large files as mentioned above, I think Globus would be the only available option with rsync out of the picture with rsync out of the picture. FWIW, we don't handle invididual files that large, so not a priority for us, but does come up in lots of discussions whenever physicists, chemists, and the like are involved :)).



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

James Myers

unread,
Oct 17, 2023, 4:57:40 PM10/17/23
to dataverse...@googlegroups.com

FWIW: At some level, it will also still be possible to use rsync or other mechanism to move files – it just wouldn’t be something embedded in the UI as a special mechanism as it is now (if/when it’s been set up). The new option involves using the new ‘upload-out-of-band’ option for stores – that allows you to move files into position in that store’s storage (i.e. into the file system for a file store) and then call method in the direct upload api to tell Dataverse that the file in place. That process could be scripted or included in some external transfer app. (The S3 direct upload works this way, with the addition that Dataverse gives your script/app signedURLs to do the upload to S3. The Globus transfer mechanism we have is also similar in that there’s an external app that starts the Globus transfers and then an API call is made to tell Dataverse to add them to the dataset. In this case though, there’s support for Dataverse to track the progress of the transfer and wait until it succeeds before adding the file(s) to the dataset.)

 

-- Jim

Reply all
Reply to author
Forward
0 new messages