using alternative s3 storage locations

183 views
Skip to first unread message

Deirdre Kirmis

unread,
Jul 7, 2020, 12:40:16 PM7/7/20
to dataverse...@googlegroups.com

Hi all … another question this week!

 

I am trying to set up a “compatible” s3 storage location for dataverse, pointing to an s3 bucket on a wasabi account.

Wasabi is supposedly 100% aws s3 compatible …

 

I have created the bucket, create the .aws config files with my wasabi keys and region, etc, set the following jvm options:

 

./asadmin create-jvm-options "-Ddataverse.files.s3.type=s3"

./asadmin create-jvm-options "-Ddataverse.files.s3.label=s3"

./asadmin create-jvm-options "-Ddataverse.files.s3.label=s3"

./asadmin create-jvm-options "-Ddataverse.files.s3.bucket-name=<mybucketname>"

 

Restarted the glassfish server …

 

But, can’t upload a file … get an error “Failed to add files to the dataset .. “

 

I know that the instructions say that any aws commands should also include the wasabi region endpoint …

I am able to list buckets, list contents of my bucket, etc. using that format (ie: aws s3 ls --endpoint-url=https://s3.us-west-1.wasabisys.com)

 

Any ideas what other configuration is needed to make dv see the bucket?

 

Night Owl

 

James Myers

unread,
Jul 7, 2020, 12:55:31 PM7/7/20
to dataverse...@googlegroups.com

You’ll need to set

dataverse.files.s3.custom-endpoint-url= https://s3.us-west-1.wasabisys.com

 

and possibly

 

dataverse.files.s3.custom-endpoint-region=

dataverse.files.s3.path-style-access=true

dataverse.files.s3.payload-signing=true

dataverse.files.s3.chunked-encoding=true

 

where I’ve used ‘s3’ as the <id> of the store you’re configuring as in your examples below.

 

(FYI: The guides should have a table of all the possible s3 options, but I don’t see it in the latest, so here’s a direct link to the latest pre-release version in github: https://github.com/IQSS/dataverse/blob/develop/doc/sphinx-guides/source/installation/config.rst#s3-storage-options )

 

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/BYAPR06MB486999EA06C9F46B4A71211E87660%40BYAPR06MB4869.namprd06.prod.outlook.com.

Deirdre Kirmis

unread,
Jul 7, 2020, 12:59:11 PM7/7/20
to dataverse...@googlegroups.com

Deirdre Kirmis

unread,
Jul 15, 2020, 11:04:52 AM7/15/20
to dataverse...@googlegroups.com

I got the primary storage working using Wasabi using the JVM options mentioned here (thank you!) … but now trying to configure AWS S3 as a secondary storage. On the DV site, I can choose it in the General settings for a particular dataverse, but when I try to upload a file to a dataset in that dataverse when it is selected, I get an error “Failed to add files to dataset”. In the logs, there is an error message that looks as though the prefix s3aws:// (the id of the aws storage location) is being added to the file name, so I must have an option wrong somewhere. Here are my jvm options related to storage:

 

Wasabi storage:

Ddataverse.files.storage-driver-id=s3

Ddataverse.files.s3.type=s3

Ddataverse.files.s3.label=s3

Ddataverse.files.s3.bucket-name=bucket1

Ddataverse.files.s3.custom-endpoint-url=s3.us-west-1.wasabisys.com

Ddataverse.files.s3.custom-endpoint-region=us-west-1

Ddataverse.files.s3.path-style-access=true

Ddataverse.files.s3.payload-signing=true

Ddataverse.files.s3.chunked-encoding=true

 

AWS storage:

Ddataverse.files.s3aws.type=s3

Ddataverse.files.s3aws.profile=aws

Ddataverse.files.s3aws.label=s3aws

Ddataverse.files.s3aws.bucket-name=bucket2

 

Night Owl

Deirdre Kirmis

unread,
Jul 15, 2020, 11:06:43 AM7/15/20
to dataverse...@googlegroups.com

Oops sorry, accidentally hit “send” before finishing this … any ideas what I have wrong in my jvm options (or otherwise), to get a secondary storage location configured correctly?

James Myers

unread,
Jul 15, 2020, 11:11:48 AM7/15/20
to dataverse...@googlegroups.com

Nothing obviously wrong that I see. One possibility is a profile issue, e.g. that the profile [aws] isn’t in your credentials file, is incorrect, or the ~/.aws/credentials file doesn’t exist for the unix user running glassfish, etc.

 

If that doesn’t suggest a fix, it would be helpful to see more of the log messages.

Daniel Marques

unread,
Aug 5, 2020, 8:48:44 PM8/5/20
to Dataverse Users Community
Hi Jim!

Sorry to revive this thread. I am using the Ceph Storage solution and I am trying to configure it in my Dataverse instance, but I am having some issues. This is the error that I am receiving:
Failed to save the file, storage id 173c11e84c9-5091f1aa2c74 (ERROR: S3AccessIO - Failed to look up bucket bucket-02 (is AWS properly configured?))]]

I don't know what is causing this error, since the AWS credentials seems to be properly created and the awscli seems to be working fine.

Below is the configuration that I am using in my JVM.

-Ddataverse.files.s3.type=s3
-Ddataverse.files.s3.label=s3
-Ddataverse.files.s3.custom-endpoint-url=https://gti-rec-ceph-gw.rnp.br
-Ddataverse.files.storage-driver-id=s3
-Ddataverse.files.s3.payload-signing=true
-Ddataverse.files.s3.bucket-name=bucket-02
-Ddataverse.files.s3.chunked-encoding=false
-Ddataverse.files.s3.profile=glassfish

AWS' credentials path: /home/glassfish/.aws

AWS' credentials file:

[glassfish]
aws_access_key_id = REDACTED
aws_secret_access_key = REDACTED

AWS' config file:

[profile glassfish]

Do you know how can I debug this? Like, how do I verify that the JVM is accessing the credentials file?

Thanks for the help.

Regards,
Daniel Marques

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

James Myers

unread,
Aug 6, 2020, 10:43:03 AM8/6/20
to dataverse...@googlegroups.com

Daniel,

One quick sanity check – is glassfish is really running as the same user you’re running the awscli as (glassfish unix user) and with the glassfish profile?  

 

Another thing to check would be the S3 settings. I don’t know what Ceph requires w.r.t. payload signing and chunked encoding, and there are also settings for path style access and a custom region (see the table in https://github.com/IQSS/dataverse/blob/develop/doc/sphinx-guides/source/installation/config.rst#amazon-s3-storage-or-compatible.  (If you find that setting changes are needed for Ceph, that would be good to add to the guides with a pull request.)

 

Beyond that, I’m not sure what to suggest – perhaps upping the log level for the aws library (setting the level for its classes to fine as in http://guides.dataverse.org/en/latest/developers/debugging.html ). The ERROR: S3AccessIO message is just repeating the message from the exception the AWS library throws so within that lib is the only place to get more info about the problem.

 

Hope that helps!

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--

You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/ac76a975-3a30-4f3f-911f-d17b72249e7ao%40googlegroups.com.

Daniel Marques

unread,
Aug 7, 2020, 2:42:14 PM8/7/20
to Dataverse Users Community
Hi Jim!

Thanks for your clues! I managed to properly configure my Dataverse instance with Ceph.

My real problem was that the bucket name was incorrect. After that, I tried the settings and found out which ones were right for me.

Below is the configuration that I used:
<jvm-options>-Ddataverse.files.s3.type=s3</jvm-options>
<jvm-options>-Ddataverse.files.s3.label=s3</jvm-options>
<jvm-options>-Ddataverse.files.storage-driver-id=s3</jvm-options>
<jvm-options>-Ddataverse.files.s3.chunked-encoding=false</jvm-options>
<jvm-options>-Ddataverse.files.s3.payload-signing=false</jvm-options>
<jvm-options>-Ddataverse.files.s3.custom-endpoint-region=</jvm-options>
<jvm-options>-Ddataverse.files.s3.custom-endpoint-url=https://gti-rec-ceph-gw.rnp.br</jvm-options>
<jvm-options>-Ddataverse.files.s3.bucket-name=bucket-02</jvm-options>
<jvm-options>-Ddataverse.files.s3.path-style-access=true</jvm-options>

For my configuration, I decided to disable the chunked-encoding and the payload-signing parameters, but I found out it was necessary to activate the path-style-access. In addition, I decided to use a simpler version of the AWS credentials file, so I am only using the default profile.

I hope this can help others with their configurations.

Once again, thanks for the help.

Regards,
Daniel


To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages