S3 Deep storage set up-errors

Sunil Nair

unread,

Aug 17, 2016, 6:58:44 AM8/17/16

to Druid User

Hi All,

I am getting some errors while trying to ingest the sample wiki data.The job is trying to read data from S3.
We have verified the access of the user to read from and write to the s3 bucket.

--------------------------------------------------------------------------------------

2016-08-17T09:49:35,326 ERROR [pool-22-thread-1] io.druid.indexer.JobHelper - Exception in retry loop

java.lang.NullPointerException

at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:433) ~[hadoop-common-2.3.0.jar:?]

at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1399) ~[hadoop-common-2.3.0.jar:?]

at org.apache.hadoop.fs.s3native.NativeS3FileSystem.create(NativeS3FileSystem.java:341) ~[hadoop-common-2.3.0.jar:?]

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907) ~[hadoop-common-2.3.0.jar:?]

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:868) ~[hadoop-common-2.3.0.jar:?]

at io.druid.indexer.JobHelper$4.push(JobHelper.java:368) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_65]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_65]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_65]

at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_65]

at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) [hadoop-common-2.3.0.jar:?]

at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) [hadoop-common-2.3.0.jar:?]

at com.sun.proxy.$Proxy194.push(Unknown Source) [?:?]

at io.druid.indexer.JobHelper.serializeOutIndex(JobHelper.java:386) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:703) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:469) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]

at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) [hadoop-mapreduce-client-core-2.3.0.jar:?]

at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) [hadoop-mapreduce-client-core-2.3.0.jar:?]

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) [hadoop-mapreduce-client-core-2.3.0.jar:?]

at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) [hadoop-mapreduce-client-common-2.3.0.jar:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_65]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_65]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_65]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_65]

                at java.lang.Thread.run(Thread.java:745) [?:1.8.0_65]
--------------------------------------------------------------------------------------

I have set up S3 as my deep storage and configured access keys and secret keys.

This is how the common.runtime properties file looks like and the same configuration is copied to all the nodes(co-ordinator,2 historical,middlemanager & broker)

druid.storage.type=s3
druid.storage.bucket=bucket-name
druid.storage.baseKey=data/druid/segments
druid.s3.accessKey={access_key}
druid.s3.secretKey={secret_key}

druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=bucket-name
druid.indexer.logs.s3Prefix=data/druid/indexing-logs

Attached is the ingestion proc which I am using.

For the first run the job properties were set as null and the job was giving access denied error message.
Post that I have set up job properties as below.

"jobProperties" : {
        "fs.s3.awsAccessKeyId" : "{access-key}",
        "fs.s3.awsSecretAccessKey" : "{secret_key}",
        "fs.s3.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",
        "fs.s3n.awsAccessKeyId" : "{access-key}",
        "fs.s3n.awsSecretAccessKey" : "{secret-key}"
}

Infact I played around ,including and removing some of these properties to see if the job works fine,however it didn't work.
Sorry for posting multiple questions here,I am pretty new to druid and trying to set up a high availability cluster setup for poc.

Any help would be appreciated.

Regards
Sunil

wikiticker-index.json

Jonathan Wei

unread,

Aug 17, 2016, 5:17:01 PM8/17/16

to druid...@googlegroups.com

Can you try adding the following to your existing jobProperties?

"fs.s3n.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/03694e75-7f5a-4d8d-91aa-3b3bafb0cc30%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Fangjin Yang

unread,

Aug 25, 2016, 5:00:56 PM8/25/16

to Druid User

Reference docs: https://imply.io/docs/latest/ingestion-batch

Pritesh Damani

unread,

Sep 6, 2016, 1:00:55 PM9/6/16

to Druid User

Hi Sunil,

Did you figure this out? We are having similar issues. Anything you can share would be nice.

Thanks,

Pritesh Damani

unread,

Sep 6, 2016, 11:36:51 PM9/6/16

to Druid User

For anyone who cares, you need this in the lib folder https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.6.0 for this to work.

Sunil Nair

unread,

Sep 7, 2016, 1:18:28 AM9/7/16

to Druid User

Hi Pritesh,

We resolved it by making use of regular indexing task instead of hadoop indexing task.
Also we wanted to make use of the IAM roles assigned to EC2 instances and not access keys and secret keys.

Regards
Sunil

Ravali Sabbineni

unread,

Sep 20, 2016, 7:42:31 PM9/20/16

to Druid User

Hi Sunil,

We are new to druid and face the similar issue while writing the segmented files to S3

Please let us know how is IAM role used in the properties but not the access and secret keys. Also can you attach your JSON where you have used the regular indexing instead of hadoop indexing

Thanks,

Ravali

Sunil Nair

unread,

Sep 21, 2016, 10:03:48 AM9/21/16

to Druid User

Hi Ravali,

The access keys and secret keys were omitted from the common.runtime.properties and we included only the following ones.

# For S3:
druid.storage.type=s3
druid.storage.bucket=abcd
druid.storage.baseKey=/druid/segments

# For S3:
druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=abcd
druid.indexer.logs.s3Prefix=druid/indexing-logs

Also we placed jets3t.properties in the druid class path (under conf/druid/_common folder).

s3service.https-only=true
s3service.s3-endpoint=s3.amazonaws.com
s3service.s3-endpoint-https-port=443
s3service.server-side-encryption=AES256

hope this helps !!

Sunil

Reply all

Reply to author

Forward