S3 Direct Uploads and Max Upload Filesize

114 views
Skip to first unread message

Sherry Lake

unread,
Nov 28, 2023, 9:55:32 AM11/28/23
to Dataverse Users Community
Hello,

Since we have S3 direct uploads configured, should I increase our :MaxFileUploadSizeInBytes size? It is currently set to 6GB. According to this page we can, it can handle more - https://guides.dataverse.org/en/latest/developers/big-data-support.html

Are there API upload commands or upload features that bypass the :MaxFileUploadSizeInBytes limit?

For those using S3 direct uploads, what is you max upload file size?

I can see a scenario where we keep our limit "small",  but also have the ability to by pass that limit on a case by case basis.

Discussion, advice welcome.

Thanks,
Sherry Lake

Jim Myers

unread,
Nov 28, 2023, 10:29:20 AM11/28/23
to Dataverse Users Community

Sherry,

You can limit the max size per store (see the JSON example at https://guides.dataverse.org/en/latest/installation/config.html?highlight=maxfileuploadsizeinbytes#maxfileuploadsizeinbytes), and then assign stores per collection or per dataset to allow different projects/groups to have different limits. Harvard is doing this, (with stores pointing to the same bucket - not sure what the max is). QDR sets 2GB and a higher max (~20GB) on a store that is using a cheaper S3 option (storJ).

That said, right now the max file size is somewhat of a crude way to limit overall dataset size and Leonid is working on actual quotas right now. With that in place, setting the max file size to something very large, limited only by the max allowed in the store (AWS is 5 TB I think) and/or by reasonable upload times for your users (given their average bandwidth), etc. and using quotas would be a better approach/reduce the need for stores with different max sizes.


-- Jim

Philip Durbin

unread,
Nov 29, 2023, 3:35:46 PM11/29/23
to dataverse...@googlegroups.com
Here's the pull request Leonid is working on for per-collection storage quotas: https://github.com/IQSS/dataverse/pull/10144

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/56cb8c49-609d-413c-849e-1694f3f51780n%40googlegroups.com.


--

Sherry Lake

unread,
Feb 23, 2024, 8:30:41 AM2/23/24
to Dataverse Users Community
Going back to this discussion about max file sizes.

I'm experimenting on our test server V 5.14 (using S3 - with direct upload set)

This is our setting:     ":MaxFileUploadSizeInBytes": "2147483648",
and when uploading a larger file, via the UI, get this warning:
Screenshot 2024-02-23 at 7.38.25 AM.png

But I was able to upload this 6.2 GB file using DVUploader, this command:
java -jar DVUploader-v1.2.0beta3.jar -key=$API_TOKEN  -did=doi:10.80100/FK2/LJMVKJ -server=https://dataversedev.internal.lib.virginia.edu rf-model-large.joblib

How do I set limits on DVUploader, or the API "add" (which didn't work, due to space and timeout errors in the log?)?

Thanks,
Sherry

Philip Durbin

unread,
Feb 23, 2024, 11:21:05 AM2/23/24
to dataverse...@googlegroups.com
Hi Sherry,

Good catch. Please feel free to open an issue about this. At the very least, we'd be happy to investigate and confirm that the latest release isn't affected. (We're pretty sure it isn't, judging from the code*, which was refactored after 5.14.)

Thanks,

Phil



--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

Sherry Lake

unread,
Feb 23, 2024, 11:38:42 AM2/23/24
to dataverse...@googlegroups.com
Hi Phil,

So you are saying that the maxfilesize setting is checked on DVUpload for DV versions after 5.14?

Then we will need to set a maxfilesize for S3 (command line direct upload) that is different from the UI? Or is the maxfilesize just on S3? 

I thought the UI created other bottlenecks for uploads of "large" files, but maybe not if direct upload is set?

Hmm.... I think I will need a white board and you and Jim at the DV Community Meeting to talk me through the scenarios.

Thanks,
Sherry

Reply all
Reply to author
Forward
0 new messages