--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/39dbb215-9ff9-45a6-8e42-233e548f1111n%40googlegroups.com.
Sherry,
~yes – the basic split is that there are parameters you need to set as an admin to enable direct upload/download, and ways you use the API to actually do direct upload /download.
The basic documentation of direct up/down started in the Big Data section as it is primarily used for that. As direct download is both simple and generally useful even for smaller files, I think it got documented in the basic install guide as well and I added the JVM option needed for direct upload to the table there, but there’s no discussion of direct upload in the install guide.
For the install/config documentation, I think things could get rearranged/merged as Phil said, though I suspect that more people will want to enable direct download and that direct upload will continue to be specific to places that want to support larger data, so perhaps some way to indicate that direct upload is an ‘advanced option’ would be useful if we move discussion of it out of the dev/Big Data part.
For using them:
Both download and upload work in the UI and API.
Download is simpler in that what happens is, when you use the normal download API call, if direct download is enabled, the response an http ‘redirect’ rather than the bytes from the file. In the UI, your browser automatically follows the redirect and starts getting the file bytes from s3. For the API, your code has to be smart enough to follow the redirect (For example curl has a ‘-L’ flag that means follow redirects, and some Java libraries can automatically follow the redirect as well.) In either case though, it’s one simple, web standard, and easy to automate step.
Direct upload is a bit more complex (the 3 step process): basically you have to ask Dataverse to let you upload a file, do the direct upload to s3, and then tell Dataverse you did it. (Not counting the multi-part upload for even larger files where the upload step itself is actually multiple calls.) As with direct upload overall, the API is somewhat ‘advanced’ and probably only worth dealing with if you have big data and/or are going to use a toolkit/app (like DVUploader, and hopefully pyDataverse at some point). So the documentation is still in the dev guide – it could move but it would still be good to make it clear that it is ‘advanced’.
The UI works for direct upload because it is doing something similar to those three steps – it basically does the first two and then relies on the ‘save’ button to tell Dataverse everything is done – the same as with normal uploads.
Probably more info than you wanted. I guess the bottom line is that direct upload is probably advanced enough to keep at least the API info separate in an advanced/Big Data related section somewhere. The main docs could probably say more about how you set it up and point to any tools that support the API rather than going into details.
Hope that helps,
-- Jim
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CABbxx8GL%3D5GaN%2B85rk%2Bf9H63hXg1q_QEa-%2Bv1zeQ_MiPVDAHJw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CABbxx8GL%3D5GaN%2B85rk%2Bf9H63hXg1q_QEa-%2Bv1zeQ_MiPVDAHJw%40mail.gmail.com.
Those two options are for a given store – if you set both to true then any datasets in Dataverse collections using that store will do both direct upload/download. (Note the CORS setting needed on your S3 bucket to do direct upload (and to enable previewers to work with direct download) that is described in the Big Data section you linked to.) (Also – note that you can set up two s3 stores using the same bucket – one with direct upload (and probably a higher size limit) and one without – that’s a way to limit who gets to do direct upload (and use the higher limit) while not opening up big data for everyone.)
-- Jim
From: dataverse...@googlegroups.com [mailto:dataverse...@googlegroups.com]
On Behalf Of Sherry Lake
Sent: Friday, July 9, 2021 8:54 AM
To: dataverse...@googlegroups.com
Subject: Re: [Dataverse-Users] Different Documentation for S3 Direct Upload
Thanks Phil and Jim,
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CADL9p-V%2BLmKygZydXF-3Ajb4To6ADZ%3DY0%2BNo-yaD_FQd_3%3D-jw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/MN2PR07MB7343870013D2D30DEC7377F0BF189%40MN2PR07MB7343.namprd07.prod.outlook.com.