Is there any possibility to work with S3 via plugin via "assume Role" and not with credentials.

609 views
Skip to first unread message

oleksandr...@ua.glomex.com

unread,
Mar 27, 2017, 11:18:20 AM3/27/17
to Druid User
Hi everyone!

As we are using EC2 instanes for druid-cluster, it is not secure to use "accessKey" and "secretKey" for access to s3 bucket.
We have generated Policy and Role which allow to our instances work with S3 buckets.
So is it possible to work with S3 buckets only with providing "druid.storage.bucket" parameter, probably Rolename if it is required?

Also it would be cool if someone can share "init.d scripts" for druid-cluster (centos).

Best regards,
Oleksandr.

Robert Ervin

unread,
Mar 28, 2017, 4:28:52 PM3/28/17
to Druid User
Hi Oleksandr,

We ran into the same issue recently when setting it up as well. Druid does support this, but it is currently not documented.

In your `_common/common.runtime.properties` file, instead of using druid.s3.accessKey and druid.s3.secretKey, use the following:

druid.s3.fileSessionCredentials=<IAM_ROLE_NAME>

where IAM_ROLE_NAME is the name of the role it will extract from the instance metadata's map

It should also be noted for your question that we do use druid.storage.baseKey for our base filepath in the specified bucket.

oleksandr...@ua.glomex.com

unread,
Mar 29, 2017, 5:18:50 AM3/29/17
to Druid User
Thank you, for your reply!
Let me specify a bit more.
So i should just specify:
druid.s3.fileSessionCredentials=<IAM_ROLE_NAME>

Is "<IAM_ROLE_NAME>" an arn like "arn:aws:iam::account_id:role/role_name" ?

Robert Ervin

unread,
Mar 29, 2017, 11:05:31 AM3/29/17
to Druid User
The <IAM_ROLE_NAME> is the name without the arn, so in your example it would simply be role_name :)

oleksandr...@ua.glomex.com

unread,
Mar 30, 2017, 5:00:16 AM3/30/17
to Druid User
Many thanks Robert!

It works for me!

oleksandr...@ua.glomex.com

unread,
Apr 13, 2017, 11:15:39 AM4/13/17
to Druid User
Hi Robert!

I am using druid.s3.fileSessionCredentials=<IAM_ROLE_NAME> as you mentioned before.
Everything fine for logs. I see them in S3 bucket.
But now i am trying to load test data, and i am getting an error:
“AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively)”

Seems that hadoop should be configured additionally. But is it possible to use Role for that too?
Do you have any experience with that?

{
  "type" : "index_hadoop",
  "spec" : {
    "ioConfig" : {
      "type" : "hadoop",
      "inputSpec" : {
        "type" : "static",
        "paths" : "quickstart/wikiticker-2015-09-12-sampled.json.gz"
      }
    },
    "dataSchema" : {
      "dataSource" : "wikiticker",
      "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "day",
        "queryGranularity" : "none",
        "intervals" : ["2015-09-12/2015-09-13"]
      },
      "parser" : {
        "type" : "hadoopyString",
        "parseSpec" : {
          "format" : "json",
          "dimensionsSpec" : {
            "dimensions" : [
              "channel",
              "cityName",
              "comment",
              "countryIsoCode",
              "countryName",
              "isAnonymous",
              "isMinor",
              "isNew",
              "isRobot",
              "isUnpatrolled",
              "metroCode",
              "namespace",
              "page",
              "regionIsoCode",
              "regionName",
              "user"
            ]
          },
          "timestampSpec" : {
            "format" : "auto",
            "column" : "time"
          }
        }
      },
      "metricsSpec" : [
        {
          "name" : "count",
          "type" : "count"
        },
        {
          "name" : "added",
          "type" : "longSum",
          "fieldName" : "added"
        },
        {
          "name" : "deleted",
          "type" : "longSum",
          "fieldName" : "deleted"
        },
        {
          "name" : "delta",
          "type" : "longSum",
          "fieldName" : "delta"
        },
        {
          "name" : "user_unique",
          "type" : "hyperUnique",
          "fieldName" : "user"
        }
      ]
    },
    "tuningConfig" : {
      "type" : "hadoop",
      "partitionsSpec" : {
        "type" : "hashed",
        "targetPartitionSize" : 5000000
      },
      "jobProperties" : {}

Robert Ervin

unread,
Apr 13, 2017, 8:16:41 PM4/13/17
to Druid User
Hi Oleksandr,

Great question! This is something we ran into as well. The root cause of this is that Druid does not support the s3a protocol, which allows for hadoop authentication via the ec2 instance's metadata. 

For now, we are sending the access key and secret key in the index request

E.g. 

"tuningConfig" : {
 
"type" : "hadoop",
 "jobProperties" : {
   
"fs.s3.awsAccessKeyId" : "ACCESS_KEY_ID",
   
"fs.s3n.awsAccessKeyId" : "ACCESS_KEY_ID",
   
"fs.s3.awsSecretAccessKey" : "SECRET_ACCESS_KEY",
   
"fs.s3n.awsSecretAccessKey" : "SECRET_ACCESS_KEY",
   
"fs.s3.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",
   
"fs.s3n.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem"
 
}
}

A solution to this (s3a support) is in the works at the moment, and is quite active. 

We are following
https://github.com/druid-io/druid/pull/4116

I hope this will be in 10.1, but we shall see.

Also, for security I recommend you lock down access to port 8090 since your access/secret keys will be sitting out in the open. You should also create an IAM user who can only access the IP of the box you're running on so that if someone does steal it, they only have HTTP access to that box, and nothing else in your cloud infrastructure.

oleksandr...@ua.glomex.com

unread,
Apr 18, 2017, 10:32:04 AM4/18/17
to Druid User
Hi Robert!
Thank you for your reply!
It works for me, but now i am getting strange error:
2017-04-18T14:25:33,246 DEBUG [pool-23-thread-1] org.apache.hadoop.fs.s3native.NativeS3FileSystem - getFileStatus could not find key 'druid/segments/wikiticker-s3-new/2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z/2017-04-18T14:25:09.949Z/0/index.zip.0' 2017-04-18T14:25:33,246 DEBUG [pool-23-thread-1] org.apache.hadoop.fs.s3native.NativeS3FileSystem - Renaming 's3n://BUCKET_NAME/druid/segments/wikiticker-s3-new/2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z/2017-04-18T14:25:09.949Z/0/index.zip.0' to 's3n://BUCKET_NAME/druid/segments/wikiticker-s3-new/2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z/2017-04-18T14:25:09.949Z/0/index.zip' - returning false as src does not exist 2017-04-18T14:25:33,250 INFO [Thread-61] org.apache.hadoop.mapred.LocalJobRunner - reduce task executor complete. 2017-04-18T14:25:33,254 WARN [Thread-61] org.apache.hadoop.mapred.LocalJobRunner - job_local1076469020_0002 java.lang.Exception: java.io.IOException: Unable to rename [s3n://BUCKET_NAME/druid/segments/wikiticker-s3-new/2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z/2017-04-18T14:25:09.949Z/0/index.zip.0] to [s3n://BUCKET_NAME/druid/segments/wikiticker-s3-new/2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z/2017-04-18T14:25:09.949Z/0/index.zip] at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.3.0.jar:?] at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) [hadoop-mapreduce-client-common-2.3.0.jar:?] Caused by: java.io.IOException: Unable to rename [s3n://BUCKET_NAME/druid/segments/wikiticker-s3-new/2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z/2017-04-18T14:25:09.949Z/0/index.zip.0] to [s3n://BUCKET_NAME/druid/segments/wikiticker-s3-new/2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z/2017-04-18T14:25:09.949Z/0/index.zip] at io.druid.indexer.JobHelper.serializeOutIndex(JobHelper.java:452) ~[druid-indexing-hadoop-0.9.2.jar:0.9.2] at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:727) ~[druid-indexing-hadoop-0.9.2.jar:0.9.2] at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:478) ~[druid-indexing-hadoop-0.9.2.jar:0.9.2] at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) ~[hadoop-mapreduce-client-core-2.3.0.jar:?] at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) ~[hadoop-mapreduce-client-core-2.3.0.jar:?] at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) ~[hadoop-mapreduce-client-core-2.3.0.jar:?] at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) ~[hadoop-mapreduce-client-common-2.3.0.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473) ~[?:1.7.0_131] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_131] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[?:1.7.0_131] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[?:1.7.0_131] at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_131]

This file is really there.
Have you ever seen something like that?

{
 
"type" : "index_hadoop",
 
"spec" : {
   
"ioConfig" : {
     
"type" : "hadoop",
     
"inputSpec" : {
       
"type" : "static",
       
"paths" : "quickstart/wikiticker-2015-09-12-sampled.json.gz"
     
}
   
},
   
"dataSchema" : {

     
"dataSource" : "wikiticker-s3-new",

         
"fs.s3.awsAccessKeyId" : "***",
       
"fs.s3n.awsAccessKeyId" : "***",
       
"fs.s3.awsSecretAccessKey" : "***",
       
"fs.s3n.awsSecretAccessKey" : "***",

Robert Ervin

unread,
Apr 18, 2017, 7:22:04 PM4/18/17
to Druid User
Hey Oleksandr,

I haven't seen anything like that before, but it looks like it's renaming a file in s3, then attempting to access via the pre-renaming file name. 

In our system we use spark to upload the data file to s3 before we send the request to index the data, which then pulls down the data file from s3 into hadoop for indexing. Perhaps it has to do with the fact that it's a zip? Either way, it would take some playing around with to get right. Try making another question for that since it appears your initial problem was solved :)

oleksandr.shkovyra

unread,
Apr 19, 2017, 6:36:59 AM4/19/17
to druid...@googlegroups.com
Thank you, for your reply!
I resolved this problem with updating to druid-0.10.0.

Best regards,
Oleksandr.


4/19/17 02:22, Robert Ervin пишет:
-- You received this message because you are subscribed to a topic in the Google Groups "Druid User" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/Lu_3XDi2l4w/unsubscribe. To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com. To post to this group, send email to druid...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/bd094c92-9739-4ba1-89b3-16b810b52975%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

oleksandr...@ua.glomex.com

unread,
Apr 19, 2017, 7:06:27 AM4/19/17
to Druid User
Nope, problem was not resolved yet :(
-- You received this message because you are subscribed to a topic in the Google Groups "Druid User" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/Lu_3XDi2l4w/unsubscribe. To unsubscribe from this group and all its topics, send an email to druid-user+unsubscribe@googlegroups.com. To post to this group, send email to druid...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/bd094c92-9739-4ba1-89b3-16b810b52975%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

oleksandr.shkovyra

unread,
Apr 19, 2017, 9:49:49 AM4/19/17
to Druid User

Now, problem is resolved, was set incorrect permissions for user keys.


4/19/17 14:06, oleksandr...@ua.glomex.com пишет:
-- You received this message because you are subscribed to a topic in the Google Groups "Druid User" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/Lu_3XDi2l4w/unsubscribe. To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com. To post to this group, send email to druid...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/02adcd77-1b30-42ee-a18d-4a04d479e171%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages