Dataverse upload methods

19 views
Skip to first unread message

j-n-c

unread,
Dec 13, 2021, 12:03:59 PM12/13/21
to Dataverse Big Data
Hi,

I have been trying to test the file upload methods in a Dataverse 5.8 installation. So far, I have been able to upload (.zip compressed) files of up to 8GB. Beyond that limit, the upload fails consistently without any error. Since I wish to enable users to upload bigger files I was trying to investigate a way of doing this (possibly by temporarily changing the dataset upload method).

To that regard, on the documentation , I have seen that it should be possible to change the upload method for a dataset which already contains files: "If you need to switch upload methods for a dataset that already contains files, then please contact Support by clicking on the Support link at the top of the application". However this only spawns a form to contact the dataverse administrator. I have searched the docs, but could not find the procedure to perform this change.

Could anyone please direct me to the relevant documentation or offer any advice on the best practices to upload large files?

Best Regards,

José

James Myers

unread,
Dec 13, 2021, 12:56:24 PM12/13/21
to j-n-c, Dataverse Big Data

José,

There’s some info at https://guides.dataverse.org/en/latest/developers/big-data-support.html?highlight=big%20data .

 

For the normal upload, there is no hard limit but at some point timeout settings for how long upload connections can stay open and/or disk space limitations can cause issues. Some of those issues just end the connection so there is little information in the Dataverse server.log but the developer console in the uploading browser usually has some information about what went wrong (and often which component timed out, etc.). There’s been a lot of discussion over time in this list and the community mailing list – searching the archives might help.

 

Aside from debugging normal uploads, the easiest way (IMO) to upload larger files is to enable direct uploads to S3. That requires that you use S3 rather than file storage though. There are some other options listed there in the big data support guide that don’t require S3. (FWIW: I suspect that the message you quoted is to have users make a request so admins can switch their collection/dataset to use a direct upload S3 store or other method, under the assumption that the site admin has/will configure that or some other method but doesn’t open big data support to everyone all the time (a good idea to avoid surprise large uploads).)

 

If S3 isn’t an option for you, I/we can probably help debug specific issues with the normal upload at your site, but we’d need more specifics (and please search the email archives for ‘timeout’, ‘big’, etc. to see if anything there helps.)

 

Lastly, at this point, I think the S3 direct uploads have been around long enough that moving more/all of the big data guidance to the installation guide would make sense. FYI: There’s also ongoing work to add Globus support for big data uploads that could/should be documented when it becomes available and that too should be in/referenced from the install guide. As usual, suggestions/help with that would be welcome.

 

Hope that helps,

 

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Big Data" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-big-d...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-big-data/243edd00-c72e-4fcb-9536-e749a1c5f47fn%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages