merging s3 buckets

28 views
Skip to first unread message

Jamie Jamison

unread,
Jan 26, 2026, 2:28:28 PMJan 26
to Dataverse Users Community
I've also asked this on zulip so apologies in advance for the cross-posting.

Our UCLA dataverse has multiple s3 buckets back from when regular and direct-upload needed to be separate.   

Has anyone had experience merging their s3 buckets together?  

Thank you in adbvance,

Jamie/UCLA

James Myers

unread,
Jan 26, 2026, 5:34:58 PMJan 26
to dataverse...@googlegroups.com

Jamie,

I haven’t merged buckets but I have moved content from one store to another. There isn’t any automated way to do it, but it is relatively straight forward since all of the file/s3 stores lay out content the same way. Basically you need to copy the content from one store to the other, with the same path/file names, and then edit the database to change the storageidentifier to match the new location. Because all of the data paths include the PID, e.g. a dataset’s files will be in /10.5072/FK2/ABCDEF, there won’t be any collisions with files for other datasets. With s3, the AWS client has a sync command that can be used to move content and the change in the db is a simple query – storageidentifiers for datasets prepend the store id to the PID, e.g. s31://10.5072/F2ABCDEF and the ones for files look like the store id and bucket followed by a number, e.g. s31://test-bucket:<number> . In both cases you just relace the old prefix for the one for the new store/bucket.

 

Hope that helps,

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/e0c338b5-163c-4988-8d06-515d3b05eef5n%40googlegroups.com.

Jamie Jamison

unread,
Mar 25, 2026, 8:11:08 PM (8 days ago) Mar 25
to Dataverse Users Community
This has also been posted to zulip.

update of sorts, merging files in s3:dataverse-files-direct-upload to s3:dataverse-files. Four datasets. Still on Dataverse 5.14

1) copied the files from the direct-load bucket to the dataverse-files (copied so the files still exist on the direct-upload bucket and the dataverse-file bucket)

2) updated postgresql

3) reindexed the datasets

Now the datasets in https://dataverse.ucla.edu/dataverse/textmining are giving 500 errors.

I've checked the database, no datasets with 'direct-upload' and solr so it looks like they are reindexed.

Could it be a problem that the files exist in two locations even though the database points to the new location? Put another way, should they be moved rather than copied to the new location?

Reply all
Reply to author
Forward
0 new messages