Cannot use minio for deep storage

403 views
Skip to first unread message

Arda Savran

unread,
Apr 12, 2019, 8:54:47 AM4/12/19
to Druid User
Hello folks:

I built this lab system with a single node to host all the druid components and minio for testing. Tasks seem to be working fine but at the end of the day the hand offs are failing. I am not seeing any segments in my minio. 

I checked the indexing logs for the tasks and I keep seeing this message: 2019-04-12T02:05:03,185 INFO [forking-task-runner-4] org.apache.druid.indexing.overlord.ForkingTaskRunner - Exception caught during execution com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records. 

I am not sure why it is complaining about a key. I configured my common.runtime.properties as follows:

#
# Deep storage
#

# For local disk (only viable in a cluster if this is a network mount):
#druid.storage.type=local
#druid.storage.storageDirectory=/var/druid/segments

# For HDFS (make sure to include the HDFS extension and that your Hadoop config files in the cp):
#druid.storage.type=hdfs
#druid.storage.storageDirectory=/druid/segments

# For S3:
druid.storage.type=s3
druid.storage.bucket=druid
druid.storage.baseKey=druid/segments
druid.s3.accessKey=XXXXXXXXXXXXXX
druid.s3.secretKey=XXXXXXXXXXXXXXXXXXXXXXXXXXXX

#
# Indexing service logs
#

# For local disk (only viable in a cluster if this is a network mount):
#druid.indexer.logs.type=file
#druid.indexer.logs.directory=/var/druid/indexing-logs

# For HDFS (make sure to include the HDFS extension and that your Hadoop config files in the cp):
#druid.indexer.logs.type=hdfs
#druid.indexer.logs.directory=/druid/indexing-logs

# For S3:
druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=druid
druid.indexer.logs.s3Prefix=druid/indexing-logs

#
# Service discovery
#

druid.selectors.indexing.serviceName=druid/overlord
druid.selectors.coordinator.serviceName=druid/coordinator

#
# Monitoring
#

druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
druid.emitter=logging
druid.emitter.logging.logLevel=info

# Storage type of double columns
# ommiting this will lead to index double as float at the storage layer

druid.indexing.doubleStorage=double

I also created the jets3t file under the same /usr/local/share/druid/conf/druid/_common folder with the following content:

s3service.s3-endpoint=collector1.XXXXXX.com
s3service.s3-endpoint-http-port=9000
s3service.https-only=false
s3service.disable-dns-buckets=true

The documentation says I need to add jets3t.properties in my Java PATH. Is that missing piece here? How can I do that?

Thanks

Gian Merlino

unread,
Apr 12, 2019, 2:34:45 PM4/12/19
to druid...@googlegroups.com
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Hey Arda,

The jets3t properties might indeed be the missing piece. It should go in your _common config directory, next to common.runtime.properties.

Gian


--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/9dd86ac3-8f5f-41f3-b60d-86187b18c578%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Arda Savran

unread,
Apr 13, 2019, 10:27:27 AM4/13/19
to druid...@googlegroups.com
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Thanks Gian.

I already have all those settings but how can I add jets3t to my classpath? Is there a special setting for that?

Arda

Gian Merlino

unread,
Apr 13, 2019, 12:43:02 PM4/13/19
to druid...@googlegroups.com
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Oh, actually, I should have thought about this for more than 15 seconds :)

Starting in Druid 0.13.0 we don't use jets3t anymore -- we use the aws-java-sdk. If you're using that version, or newer, you should instead use normal Druid properties files, and use the properties mentioned here: http://druid.io/docs/latest/development/extensions-core/s3.html. For example, druid.s3.protocol=http instead of s3service.https-only=false, and druid.s3.enablePathStyleAccess=true instead of s3service.disable-dns-buckets=true.

Gian


Arda Savran

unread,
Apr 16, 2019, 7:07:26 PM4/16/19
to druid...@googlegroups.com
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Still having no luck. My tasks are still failing at the end of the day. I can only pull the real-time data from my Druid.

I changed my S3 configuration under _common as follows:

#
# Deep storage
#

# For local disk (only viable in a cluster if this is a network mount):
#druid.storage.type=local
#druid.storage.storageDirectory=/var/druid/segments

# For HDFS (make sure to include the HDFS extension and that your Hadoop config files in the cp):
#druid.storage.type=hdfs
#druid.storage.storageDirectory=/druid/segments

# For S3:
druid.storage.type=s3
druid.storage.bucket=druid
druid.storage.baseKey=druid/segments
druid.s3.accessKey=XXXXXXXXXXXXX
druid.s3.secretKey=XXXXXXXXXXXXXX
druid.s3.protocol=http
druid.s3.enablePathStyleAccess=true
druid.s3.endpoint.signingRegion=us-east-1
druid.s3.endpoint.url=collector1.abc.com:9000

I reviewed the attached log from a task that failed but couldn't fund any clues. Am I missing something?

Thanks

log.zip
Message has been deleted

Arda Savran

unread,
Apr 22, 2019, 10:28:04 AM4/22/19
to druid...@googlegroups.com
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
I noticed that my middleManager is not able to connect to minio and giving "Connection Refused". I am able to connect to Minio over http://IP:9000. Has anyone had the same issue before?

I am using Druid 0.13, and the following is my _common for deep storage:

#
# Deep storage
#

# For local disk (only viable in a cluster if this is a network mount):
#druid.storage.type=local
#druid.storage.storageDirectory=/var/druid/segments

# For HDFS (make sure to include the HDFS extension and that your Hadoop config files in the cp):
#druid.storage.type=hdfs
#druid.storage.storageDirectory=/druid/segments

# For S3:
druid.storage.type=s3
druid.storage.bucket=druid
druid.storage.baseKey=druid/segments
druid.s3.accessKey=XXXXXXXXXXXXXXXXXXXX
druid.s3.secretKey=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
druid.s3.protocol=http
druid.s3.enablePathStyleAccess=true
druid.s3.endpoint.signingRegion=us-east-1
druid.s3.endpoint.url=collector1.companyabc.com:9000

#
# Indexing service logs
#

# For local disk (only viable in a cluster if this is a network mount):
#druid.indexer.logs.type=file
#druid.indexer.logs.directory=/var/druid/indexing-logs

# For HDFS (make sure to include the HDFS extension and that your Hadoop config files in the cp):
#druid.indexer.logs.type=hdfs
#druid.indexer.logs.directory=/druid/indexing-logs

# For S3:
druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=druid
druid.indexer.logs.s3Prefix=druid/indexing-logs

I confirmed my credentials for minio. One more thing I noticed: I can access the UI remotely with my browser, but when I enter "curl http://{local minio IP}:9000"  I get an access denied.

Any ideas?

Thanks
connection refused.txt
Reply all
Reply to author
Forward
0 new messages