** cross posted from zulip. There was the thought I might get more visibility here with these questions **
Hello,
We have a few collections we'd like to migrate into Dataverse where the files are already in an S3 bucket and curated by another application. Ideally we wouldn't have to move the files, as they could be, in theory, accessed from where they already are, plus they already have handles pointing to them there (not that we couldn't change this pointer, I think). We'd like to just give Dataverse access to this other bucket, in addition to it's other datastores.
I know it'd be straightforward, via the native API, to move the *metadata* into Dataverse. For the files, if we didn't want to migrate them, would we essentially be following the process for moving a large data set?
0) ensure that the second S3 bucket is configured to be access by Dataverse
1) have the metadata migration create place holder files for the datasets
2) have a script that manipulates the Dataverse database to point to the right S3 bucket and location w/i it. (This would be more than just replacing a placeholder, as the files wouldn't be where the place holder was set)
Would this work?
There are a few unknowns for us --
1. Can dataverse link to multiple S3 buckets? Yes - Phil Durbin already confirmed this.
2. Is the only way to make the connection from the datasets to the file in S3 be by manipulating the database?
Note: As mentioned, we do have Handles on the files that point directly to the files in the buckets, and one thought we've had is to just use those as links to the data in the Dataverse record.
(I don't think OAI-PMH harvesting would be enough for this collection because the Datasets wouldn't technically be hosted elsewhere to point to. The goal here is to have the dataset in one place and the curation tool and the public access website (Dataverse) access it from there)
I'm still very new to Dataverse, so there might be other options I missing. Would love to hear some perspectives on this.
Best,
Bethany