How to references multiple S3 buckets contain in dataverse

43 views
Skip to first unread message

Michel Bamouni

unread,
Feb 21, 2019, 7:01:31 AM2/21/19
to Dataverse Users Community
HI,

In my dataverse (4.10.1), I use a custom S3 for storing the users upload files. To do this, I follow dataverse documentation which recommend to indicate the S3 bucket name.
This works fine.  Let's call "My bucket" the bucket in which dataverse reads and writ uploading files. I also have others buckets (e. g : bucket 2, bucket 3) on my S3 and these buckets have data inside them.

So I want to have your opinions about the best to references the others buckets (in my example bucket 2 and bucket 3) in my dataverse?

The join picture summarize my problem.

Best regards,

Michel

Crosas, Mercè

unread,
Feb 21, 2019, 8:01:37 AM2/21/19
to dataverse...@googlegroups.com
I'll let others reply, but great picture, Michel!

Merce

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/315a541f-810d-48db-941a-ac8afa87033b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Mercè Crosas, Ph.D., Chief Data Science and Technology Officer, IQSS, Harvard University

danny...@g.harvard.edu

unread,
Feb 22, 2019, 2:28:45 PM2/22/19
to Dataverse Users Community
Hey Michel, as we mention in the docs, we at IQSS are most familiar with AWS S3 and we can't provide good support for custom S3 configurations. Any information that other community members can share would be a helpful learning experience for all of us. :)

- Danny

o.be...@fz-juelich.de

unread,
Feb 25, 2019, 9:45:25 AM2/25/19
to Dataverse Users Community
Hey Michel,

just took another look at the S3AccessIO.java class. As far as I can see, you won't be able to use different bucket names at the same time, unless you fiddle around with database entries.

The bucketname is part of the storage identifier saved for every dvObject in the database and contains the bucket name. Without having tested this, you should be able to configure a new bucket name for new files and retrieve data from the old bucket unless you are updating the DB records. (IIRC there should be some scripts somewhere in the codebase for these tasks). Be aware that Dataverse expects a certain structure for the keys, which is not explained in detail in the docs AFAIK. (Maybe open an issue for that?)

What are you actually trying to do with the data? If you need the buckets choosable from the UI, that will be tough work. Updating or tampering with this via some clean API would be IMHO easier and cleaner. But it really depends on what you want to achieve...
Maybe you could elaborate a bit more on your use case and maybe open an issue on Github?

Cheers,
Oliver

Michel Bamouni

unread,
Mar 11, 2019, 10:31:56 AM3/11/19
to Dataverse Users Community
Hi Oliver,

First of all, thanks for your answer.

To answer your "What are you actually trying to do with the data?":
I don't need to have the ability to choose the bucket name in dataverse. I just want to describe the data containing in my S3 buckets in dataverse using the metadata and a kind of link  or something else to access the S3 data from dataverse. My first was to add the link to S3 in the dataverse "alternative url".
 
I come here to find if there is another solution?

Best regards,
Reply all
Reply to author
Forward
0 new messages