Cannot use minio for deep storage

Arda Savran

unread,

Apr 12, 2019, 8:54:47 AM4/12/19

to Druid User

Hello folks:

I built this lab system with a single node to host all the druid components and minio for testing. Tasks seem to be working fine but at the end of the day the hand offs are failing. I am not seeing any segments in my minio.

I checked the indexing logs for the tasks and I keep seeing this message: 2019-04-12T02:05:03,185 INFO [forking-task-runner-4] org.apache.druid.indexing.overlord.ForkingTaskRunner - Exception caught during execution com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records.

I am not sure why it is complaining about a key. I configured my common.runtime.properties as follows:

#

# Deep storage

#

# For local disk (only viable in a cluster if this is a network mount):

#druid.storage.type=local

#druid.storage.storageDirectory=/var/druid/segments

# For HDFS (make sure to include the HDFS extension and that your Hadoop config files in the cp):

#druid.storage.type=hdfs

#druid.storage.storageDirectory=/druid/segments

# For S3:

druid.storage.type=s3

druid.storage.bucket=druid

druid.storage.baseKey=druid/segments

druid.s3.accessKey=XXXXXXXXXXXXXX

druid.s3.secretKey=XXXXXXXXXXXXXXXXXXXXXXXXXXXX

#

# Indexing service logs

#

# For local disk (only viable in a cluster if this is a network mount):

#druid.indexer.logs.type=file

#druid.indexer.logs.directory=/var/druid/indexing-logs

# For HDFS (make sure to include the HDFS extension and that your Hadoop config files in the cp):

#druid.indexer.logs.type=hdfs

#druid.indexer.logs.directory=/druid/indexing-logs

# For S3:

druid.indexer.logs.type=s3

druid.indexer.logs.s3Bucket=druid

druid.indexer.logs.s3Prefix=druid/indexing-logs

#

# Service discovery

#

druid.selectors.indexing.serviceName=druid/overlord

druid.selectors.coordinator.serviceName=druid/coordinator

#

# Monitoring

#

druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]

druid.emitter=logging

druid.emitter.logging.logLevel=info

# Storage type of double columns

# ommiting this will lead to index double as float at the storage layer

druid.indexing.doubleStorage=double

I also created the jets3t file under the same /usr/local/share/druid/conf/druid/_common folder with the following content:

s3service.s3-endpoint=collector1.XXXXXX.com

s3service.s3-endpoint-http-port=9000

s3service.https-only=false

s3service.disable-dns-buckets=true

The documentation says I need to add jets3t.properties in my Java PATH. Is that missing piece here? How can I do that?

Thanks

Gian Merlino

unread,

Apr 12, 2019, 2:34:45 PM4/12/19

to druid...@googlegroups.com

Why this is marked as abuse? It has been marked as abuse.

Report not abuse

Hey Arda,

The jets3t properties might indeed be the missing piece. It should go in your _common config directory, next to common.runtime.properties.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/9dd86ac3-8f5f-41f3-b60d-86187b18c578%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Arda Savran

unread,

Apr 13, 2019, 10:27:27 AM4/13/19

to druid...@googlegroups.com

Why this is marked as abuse? It has been marked as abuse.

Report not abuse

Thanks Gian.

I already have all those settings but how can I add jets3t to my classpath? Is there a special setting for that?

Arda

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CACZNdYBWOgfiVHatqEg5zBmoRhw4TWFgC3RiVgd_3XqJVEviKg%40mail.gmail.com.

Gian Merlino

unread,

Apr 13, 2019, 12:43:02 PM4/13/19

to druid...@googlegroups.com

Why this is marked as abuse? It has been marked as abuse.

Report not abuse

Oh, actually, I should have thought about this for more than 15 seconds :)

Starting in Druid 0.13.0 we don't use jets3t anymore -- we use the aws-java-sdk. If you're using that version, or newer, you should instead use normal Druid properties files, and use the properties mentioned here: http://druid.io/docs/latest/development/extensions-core/s3.html. For example, druid.s3.protocol=http instead of s3service.https-only=false, and druid.s3.enablePathStyleAccess=true instead of s3service.disable-dns-buckets=true.

Gian

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CA%2Be4Ebeog1mWoHV8rjHWk-CGFXdo%3DrPNiDBT7LOhYxj8%2ByyH1w%40mail.gmail.com.

Arda Savran

unread,

Apr 16, 2019, 7:07:26 PM4/16/19

to druid...@googlegroups.com

Why this is marked as abuse? It has been marked as abuse.

Report not abuse

Still having no luck. My tasks are still failing at the end of the day. I can only pull the real-time data from my Druid.

I changed my S3 configuration under _common as follows:

#

# Deep storage

#

# For local disk (only viable in a cluster if this is a network mount):

#druid.storage.type=local

#druid.storage.storageDirectory=/var/druid/segments

# For HDFS (make sure to include the HDFS extension and that your Hadoop config files in the cp):

#druid.storage.type=hdfs

#druid.storage.storageDirectory=/druid/segments

# For S3:

druid.storage.type=s3

druid.storage.bucket=druid

druid.storage.baseKey=druid/segments

druid.s3.accessKey=XXXXXXXXXXXXX

druid.s3.secretKey=XXXXXXXXXXXXXX

druid.s3.protocol=http

druid.s3.enablePathStyleAccess=true

druid.s3.endpoint.signingRegion=us-east-1

druid.s3.endpoint.url=collector1.abc.com:9000

I reviewed the attached log from a task that failed but couldn't fund any clues. Am I missing something?

Thanks

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CACZNdYAmtiY07G1M30YTCgwrOvUf5c5uHtruRAbJKuFU5hXHAQ%40mail.gmail.com.

log.zip

Message has been deleted

Arda Savran

unread,

Apr 22, 2019, 10:28:04 AM4/22/19

to druid...@googlegroups.com

Why this is marked as abuse? It has been marked as abuse.

Report not abuse

I noticed that my middleManager is not able to connect to minio and giving "Connection Refused". I am able to connect to Minio over http://IP:9000. Has anyone had the same issue before?

I am using Druid 0.13, and the following is my _common for deep storage:

#

# Deep storage

#

# For local disk (only viable in a cluster if this is a network mount):

#druid.storage.type=local

#druid.storage.storageDirectory=/var/druid/segments

# For HDFS (make sure to include the HDFS extension and that your Hadoop config files in the cp):

#druid.storage.type=hdfs

#druid.storage.storageDirectory=/druid/segments

# For S3:

druid.storage.type=s3

druid.storage.bucket=druid

druid.storage.baseKey=druid/segments

druid.s3.accessKey=XXXXXXXXXXXXXXXXXXXX

druid.s3.secretKey=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

druid.s3.protocol=http

druid.s3.enablePathStyleAccess=true

druid.s3.endpoint.signingRegion=us-east-1

druid.s3.endpoint.url=collector1.companyabc.com:9000

#

# Indexing service logs

#

# For local disk (only viable in a cluster if this is a network mount):

#druid.indexer.logs.type=file

#druid.indexer.logs.directory=/var/druid/indexing-logs

# For HDFS (make sure to include the HDFS extension and that your Hadoop config files in the cp):

#druid.indexer.logs.type=hdfs

#druid.indexer.logs.directory=/druid/indexing-logs

# For S3:

druid.indexer.logs.type=s3

druid.indexer.logs.s3Bucket=druid

druid.indexer.logs.s3Prefix=druid/indexing-logs

I confirmed my credentials for minio. One more thing I noticed: I can access the UI remotely with my browser, but when I enter "curl http://{local minio IP}:9000" I get an access denied.

Any ideas?

Thanks

connection refused.txt

Reply all

Reply to author

Forward