Setting up Virtual S3 Buckets

26 views
Skip to first unread message

Sherry Lake

unread,
May 30, 2025, 12:41:45 PMMay 30
to Dataverse Users Community
UVa wants to create a "virtual" S3 bucket... or an alias??

We only have one S3 bucket, but want to allow users of one collection the ability to upload max filesize 100GB, while keeping the default bucket for all other collections at 6GB.

I'm not sure which JVM options are needed to define this virtual pool.

And would CORS be needed to be set up on the "virtual" bucket - or since it is already set up on the real bucket - the virtual one will use it?

Thanks,
Sherry Lake
University of Virginia

James Myers

unread,
May 30, 2025, 12:53:20 PMMay 30
to dataverse...@googlegroups.com

Sherry,

You shouldn’t need to do anything to the bucket definition – just create another S3 store (different id, different label) in Dataverse pointing to the same bucket and having a different max upload size.

With that second store, admins can use the UI (Dataverse/Edit/GeneralInfo) to select a different store or can use the API to change the store just for a single dataset. When this is done, new files go to the new store (which is the same bucket as before) – no existing files are affected.

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/b1b2c005-1be4-4a11-8479-e8a7e2dbfca9n%40googlegroups.com.

Sherry Lake

unread,
Jun 17, 2025, 4:40:53 PMJun 17
to Dataverse Users Community
I'm still having trouble. I can't figure out which JVM storage lines to add:
Can someone send me how Harvard.dataverse has setup virtual storage "pools"? Maybe just the output of what the files JVM settings are?

The output of this command:
bin/asadmin list-jvm-options | grep files

I'm really having trouble trying to figure out which JVM lines to use for the storage alias. I tried just two, "type" and "label" and got things really messed up, then I added a third defining the bucket name (same as our S3), made our system happy again.... but getting errors about the storage and AWS, see error from log file at end:

I added these 3 commands, seems that I am missing a few options, see error message:
./asadmin create-jvm-options "-Ddataverse.files.3d.type=s3"

./asadmin create-jvm-options "-Ddataverse.files.3d.label=3D Cultural Heritage"

./asadmin create-jvm-options "-Ddataverse.files.3d.bucket-name=dataverse-storage-production"

[2025-06-17T20:18:39.212+0000] [Payara 6.2025.2] [WARNING] [] [edu.harvard.iq.dataverse.ThumbnailServiceWrapper] [tid: _ThreadID=90 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1750191519212] [levelValue: 900] [[
getDatasetCardImageAsUrl(): Failed to initialize dataset StorageIO for 3d://10.80100/FK2/EUJETR (S3AccessIO: Failed to cache auxilary object : dataset_logo_original)]]

James Myers

unread,
Jun 20, 2025, 11:37:49 AMJun 20
to dataverse...@googlegroups.com

Sherry,

Sorry for the delay – hopefully you’ve solved this by now.

 

You need to duplicate all the entries you have for the other store, i.e. if your main store is called regular and you have jvm options like "-Ddataverse.files.regular.*=<value>” you can do your grep:

 

bin/asadmin list-jvm-options | grep regular

 

and just create new jvm options that mirror all your entries - except for the label which should still be unique. (There are some like dataverse.files.<id>.ingestsizelimit that you might also want different between the stores.)

 

The list of possible values fir these jvm options (hopefully complete) is at https://guides.dataverse.org/en/latest/installation/config.html#list-of-s3-storage-options. The ones related to profile and custom-endpoint are some that might result in an error like you saw.

The change in the max file upload size per store is all handled by the :MaxFileUploadSizeInBytes setting which can be configured to have different limits per store. Harvard has something like

{

  "s3m":"5000000000",

  "s3l":"10000000000",

  "s3xl":"25000000000",

  "s3xxl":"50000000000"

}

 

for that setting (there are some others – this is a subset).

Reply all
Reply to author
Forward
0 new messages