Hadoop Ingestion With an EMR cluster

Torche Guillaume

unread,

Jan 31, 2015, 10:01:59 PM1/31/15

to druid-de...@googlegroups.com

Hi all,

I am trying to configure my indexing service so that I can start Hadoop index tasks. I created an EMR cluster for tests purposes. (It is using the 3.3.1 AMI version with the Amazon Hadoop distribution 2.4.0).

I have copied the xml hadoop config files from my Hadoop master node to my middle managers config path which is in the classpath.

I have set the hadoopCoordinates property to "org.apache.hadoop:hadoop-client:2.4.0" for my hadoop index task.

However the task is failing with the following log:

2015-01-31 18:37:34,607 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_rtb_auctions_2015-01-31T18:37:26.116-08:00, type=index_hadoop, dataSource=rtb_auctions}]
java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:227)
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:218)
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:197)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2379)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(FileInputFormat.java:505)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPaths(FileInputFormat.java:470)
	at io.druid.indexer.path.StaticPathSpec.addInputPaths(StaticPathSpec.java:58)
	at io.druid.indexer.HadoopDruidIndexerConfig.addInputPaths(HadoopDruidIndexerConfig.java:312)
	at io.druid.indexer.JobHelper.ensurePaths(JobHelper.java:123)
	at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:55)
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:324)
	... 11 more
Caused by: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
	... 25 more
2015-01-31 18:37:34,612 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_rtb_auctions_2015-01-31T18:37:26.116-08:00",
  "status" : "FAILED",
  "duration" : 2893
}

I am not sure of what is happening, any suggestion? Do I have to re compile Druid with Hadoop 2.4.0 as Druid comes with Hadoop 2.3? It seems Hadoop 2.3 is not supported by Amazon...

I am also not sure if I set the hadoopCoordinates properly. This hadoop coordinates refer to the regular Hadoop distribution, however I am using the Amazon Hadoop distribution...

I tried to find a maven repository for Hadoop EMR 2.4.0 but impossible to find it. Did anyone have successfully set up a batch pipeline with EMR/Druid and can help me with that?

Thanks!

Guillaume

Fangjin Yang

unread,

Feb 1, 2015, 12:47:29 AM2/1/15

to druid-de...@googlegroups.com

Torche, are you including the amazon hadoop 2.4 jar as part of the classpath for the middle manager?

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/e637dcfe-e95e-4eaf-8685-d11e80e6772a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Torche Guillaume

unread,

Feb 1, 2015, 4:58:34 PM2/1/15

to druid-de...@googlegroups.com

Hi Fangjin,

No I'm not including it. I specified the hadoop 2.4 maven repo for the hadoop task as specified in the documentation. Isn't it the way we tell the middle manager where to find hadoop classes it needs to launch the job?

You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/la4XtqYuaLU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.

To post to this group, send email to druid-de...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAKyF60KQeWkt9%3DQZL25%3D7%2BnK3X6f3ThL2PhO8JeUkhWNMyq7aQ%40mail.gmail.com.

Fangjin Yang

unread,

Feb 2, 2015, 2:20:02 PM2/2/15

to druid-de...@googlegroups.com

Hi Torche, I believe to get EMR working, you will probably need to include the EMR jar on the classpath of the middle manager and also override the hadoopCoordinates. We need to make supporting different versions of Hadoop better.

To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/e637dcfe-e95e-4eaf-8685-d11e80e6772a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/la4XtqYuaLU/unsubscribe.

To unsubscribe from this group and all its topics, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

Torche Guillaume

unread,

Feb 2, 2015, 9:53:58 PM2/2/15

to druid-de...@googlegroups.com

I actually gave up trying to use the indexing service for batch ingestion as we don't want to rely on our middle managers for our daily jobs.

I am therefore using the hadoopDruidIndexer directly from my EMR. So I downloaded Druid services on my master node and tried to run the following command:

java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:/home/hadoop/conf/:/usr/share/aws/emr/lib/*:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emr-metrics/lib/* io.druid.cli.Main index hadoop /home/hadoop/hadoop_index_task_hadoopDruidIndexer.json

The task is now failing with the following error:

java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at io.druid.cli.CliHadoopIndexer.run(CliHadoopIndexer.java:122)

at io.druid.cli.Main.main(Main.java:92)

Caused by: java.lang.NoSuchMethodError: com.amazonaws.auth.AWSCredentialsProviderChain.setReuseLastProvider(Z)V

at com.amazon.ws.emr.hadoop.fs.guice.EmrFSProdModule.getAwsCredentialsProvider(EmrFSProdModule.java:93)

at com.amazon.ws.emr.hadoop.fs.guice.EmrFSProdModule.getAwsCredentialsProvider(EmrFSProdModule.java:81)

at com.amazon.ws.emr.hadoop.fs.guice.EmrFSProdModule.createAmazonS3(EmrFSProdModule.java:99)

at com.amazon.ws.emr.hadoop.fs.guice.EmrFSBaseModule.provideAmazonS3(EmrFSBaseModule.java:79)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at com.google.inject.internal.ProviderMethod.get(ProviderMethod.java:105)

at com.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:86)

at com.google.inject.internal.InternalFactoryToInitializableAdapter.provision(InternalFactoryToInitializableAdapter.java:55)

at com.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:66)

at com.google.inject.internal.InternalFactoryToInitializableAdapter.get(InternalFactoryToInitializableAdapter.java:47)

at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)

at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1058)

at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)

at com.google.inject.Scopes$1$1.get(Scopes.java:65)

at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)

at com.google.inject.internal.SingleFieldInjector.inject(SingleFieldInjector.java:54)

at com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:132)

at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:117)

at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:88)

at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:269)

at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56)

at com.google.inject.internal.InjectorImpl$3$1.call(InjectorImpl.java:1005)

at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1051)

at com.google.inject.internal.InjectorImpl$3.get(InjectorImpl.java:1001)

at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1040)

at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.initialize(EmrFileSystem.java:98)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2445)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)

at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2479)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2461)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)

at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(FileInputFormat.java:466)

at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPaths(FileInputFormat.java:431)

at io.druid.indexer.path.StaticPathSpec.addInputPaths(StaticPathSpec.java:58)

at io.druid.indexer.HadoopDruidIndexerConfig.addInputPaths(HadoopDruidIndexerConfig.java:312)

at io.druid.indexer.JobHelper.ensurePaths(JobHelper.java:123)

at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:55)

at io.druid.indexer.JobHelper.runJobs(JobHelper.java:135)

at io.druid.cli.CliInternalHadoopIndexer.run(CliInternalHadoopIndexer.java:57)

at io.druid.cli.Main.main(Main.java:92)

As you can see I included the emr jars in the classpath. However I cannot overrride the hadoop coordinates with the emr core jar because there is no maven repository for the Amazon hadoop distribution.

I am kind of lost, I have been spending all my day trying to figure out how to make the druid indexer working with EMR without success...

First of all I don't understand what is behind the hadoop coordinates task property. Can anyone explain me how it works exactly and what is happening when you override this property?

Secondly, if you have already a batch pipeline for Druid using EMR, can you give me some details about the following points:

What is the AMI version and the Hadoop version your EMR cluster is running?
How are you overriding the hadoop coordinates according to the configuration of your EMR cluster?

These could help me understand what I'm doing wrong. Any help would be greatly appreciated!

Thanks.

Guillaume

To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/ade95d72-b18f-4223-86d4-c7a2d4076771%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Guillaume Torche

Computer engineering student - Université Technologique de Compiègne (UTC)

gto...@gmail.com

0033 6 60 48 51 23

Gian Merlino

unread,

Feb 3, 2015, 2:33:58 PM2/3/15

to druid-de...@googlegroups.com

It looks this may be a conflict between the aws sdk included with Druid and the one included with EMR. Can you try putting "lib/*" last on the classpath instead of first, and seeing if that works? If it doesn't, you may need to recompile Druid using versions of things more similar to what EMR is using.

To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.

To post to this group, send email to druid-de...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/e637dcfe-e95e-4eaf-8685-d11e80e6772a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/la4XtqYuaLU/unsubscribe.

To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAKyF60KQeWkt9%3DQZL25%3D7%2BnK3X6f3ThL2PhO8JeUkhWNMyq7aQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/la4XtqYuaLU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/ade95d72-b18f-4223-86d4-c7a2d4076771%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Torche Guillaume

unread,

Feb 3, 2015, 6:15:26 PM2/3/15

to druid-de...@googlegroups.com

Hi Gian,

After investigation, It seems Druid was not able to get the right aws s3n access and secret key. I'm now running this command:

java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dhadoop.fs.s3n.impl=org.apache.hadoop.fs.s3native.NativeS3FileSystem -Dhadoop.fs.s3.impl=org.apache.hadoop.fs.s3native.NativeS3FileSystem -Dfs.s3n.awsAccessKeyId=xxxxxx -Dfs.s3n.awsSecretAccessKey=xxxxxxxx -classpath /home/hadoop/conf/:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$HADOOP_YARN_HOME/share/hadoop/yarn/*:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/lib/*:lib/* io.druid.cli.Main index hadoop /home/hadoop/goodJsonTask.json

It seems the job is now able to get files from my s3n path and set the input path correctly. However it's failing just after:

2015-02-03 23:08:26,012 INFO [main] io.druid.indexer.path.StaticPathSpec - Adding paths[s3n://gumgum-elastic-mapreduce/druid-test/rtbevents/druidIngestTest/*.gz]

2015-02-03 23:08:26,063 INFO [main] org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /10.166.129.212:9022

2015-02-03 23:08:26,273 WARN [main] org.apache.hadoop.mapreduce.JobSubmitter - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.

2015-02-03 23:08:26,289 WARN [main] org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).

2015-02-03 23:08:27,280 INFO [main] org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1

2015-02-03 23:08:27,295 INFO [main] com.hadoop.compression.lzo.GPLNativeCodeLoader - Loaded native gpl library from the embedded binaries

2015-02-03 23:08:27,297 INFO [main] com.hadoop.compression.lzo.LzoCodec - Successfully loaded & initialized native-lzo library [hadoop-lzo rev 77cfa96225d62546008ca339b7c2076a3da91578]

2015-02-03 23:08:27,342 INFO [main] org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1

2015-02-03 23:08:27,527 INFO [main] org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1422982363212_0030

2015-02-03 23:08:27,666 INFO [main] org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.

2015-02-03 23:08:27,671 INFO [main] org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1422982363212_0030

2015-02-03 23:08:27,677 ERROR [main] io.druid.cli.CliHadoopIndexer - failure!!!!

java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at io.druid.cli.CliHadoopIndexer.run(CliHadoopIndexer.java:122)

at io.druid.cli.Main.main(Main.java:92)

Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.yarn.util.Apps.addToEnvironment(Ljava/util/Map;Ljava/lang/String;Ljava/lang/String;)V

at org.apache.hadoop.mapreduce.v2.util.MRApps.setClasspath(MRApps.java:213)

at org.apache.hadoop.mapred.YARNRunner.createApplicationSubmissionContext(YARNRunner.java:445)

at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:283)

at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)

at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)

at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)

at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)

at io.druid.indexer.DeterminePartitionsJob.run(DeterminePartitionsJob.java:148)

at io.druid.indexer.JobHelper.runJobs(JobHelper.java:135)

at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:86)

at io.druid.indexer.JobHelper.runJobs(JobHelper.java:135)

at io.druid.cli.CliInternalHadoopIndexer.run(CliInternalHadoopIndexer.java:57)

at io.druid.cli.Main.main(Main.java:92)

... 6 more

Do you think I have to recompile Druid using aws emr sdk?

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/b79ea078-7f21-4f4c-8570-495a5b62c736%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Gian Merlino

unread,

Feb 4, 2015, 11:59:04 AM2/4/15

to druid-de...@googlegroups.com

I think you're getting these link errors because Druid is pulling down its default Hadoop version (2.3.0, apache distribution) which is probably not compatible with EMR. So, I would probably try one of these things next.

1) Run the hadoop indexer with the --no-default-hadoop option. This will prevent it from pulling down the default Hadoop version and may allow things to work.

2) If you still get link errors from #1, this means that there is some incompatibility between something Druid was compiled against and a something that EMR is providing. You can try getting classloader isolation working by copying the EMR jars from the EMR machines into a private Maven repository, configuring Druid to use that repository, and running the hadoop indexer with the --coordinate foo:bar:x.y.z. foo, bar, x.y.z are the groupId, artifactId, and version of jars you uploaded. You can write --coordinate multiple times to pull multiple jars. These will be loaded with classloader isolation and should prevent the link errors.

3) Another option if #1 didn't work, and you don't want to do #2, is that you can try recompiling Druid with EMR's preferred versions of things that are used by both (like the aws-java-sdk). This should get rid of the incompatibilities and let you run successfully with --no-default-hadoop.

Torche Guillaume

unread,

Feb 5, 2015, 9:32:36 PM2/5/15

to druid-de...@googlegroups.com

Thanks a lot for your answer Gian!

The --no-default-hadoop option did the trick! I am now able to make the hadoop indexer working with this option.

I only have to specify the hadoop classes used by my EMR cluster in the classpath!

Thank you again!

Gian Merlino

unread,

Feb 5, 2015, 10:11:42 PM2/5/15

to druid-de...@googlegroups.com

Awesome!

Udayakumar Pandurangan

unread,

Dec 21, 2016, 9:39:34 PM12/21/16

to Druid Development

Hi Gian & Torche,

I have the exact use case and I followed the exact above steps - but getting into the following exception.

Here is the command-line:

java -Xmx4096m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dhadoop.fs.s3n.impl=org.apache.hadoop.fs.s3native.NativeS3FileSystem -Dhadoop.fs.s3.impl=org.apache.hadoop.fs.s3native.NativeS3FileSystem -Dfs.s3n.awsAccessKeyId=XXXXXXXX -Dfs.s3n.awsSecretAccessKey=XXXXXXXX -classpath /home/hadoop/conf/:/home/ec2-user/druid-0.9.1.1/conf-aws-edit/:/home/ec2-user/druid-0.9.1.1/conf-aws-edit/druid/_common/:/home/ec2-user/druid-0.9.1.1/conf-aws-edit/druid/middleManager/:/etc/hadoop/conf/:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-mapreduce/lib/*:/usr/share/aws/emr/emrfs/lib/*:/home/ec2-user/druid-0.9.1.1/lib/* io.druid.cli.Main index hadoop /home/ec2-user/gbi-ica-index-aws-s3.json --no-default-hadoop

But getting the following error and exception:

16/12/22 02:38:03 INFO config.ConfigurationObjectFactory: Using method itself for [${base_path}.fifo] on [io.druid.query.DruidProcessingConfig#isFifo()]

16/12/22 02:38:03 INFO config.ConfigurationObjectFactory: Assigning default value [processing-%s] for [${base_path}.formatString] on [com.metamx.common.concurrent.ExecutorServ

iceConfig#getFormatString()]

Dec 22, 2016 2:38:03 AM com.google.inject.internal.MessageProcessor visit

INFO: An exception was caught and reported. Message: java.lang.NullPointerException

java.lang.NullPointerException

at io.druid.cli.CliInternalHadoopIndexer$1.configure(CliInternalHadoopIndexer.java:95)

at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)

at com.google.inject.spi.Elements.getElements(Elements.java:101)

at com.google.inject.spi.Elements.getElements(Elements.java:92)

at com.google.inject.util.Modules$RealOverriddenModuleBuilder$1.configure(Modules.java:172)

at com.google.inject.AbstractModule.configure(AbstractModule.java:59)

at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)

at com.google.inject.spi.Elements.getElements(Elements.java:101)

at com.google.inject.spi.Elements.getElements(Elements.java:92)

at com.google.inject.util.Modules$RealOverriddenModuleBuilder$1.configure(Modules.java:152)

at com.google.inject.AbstractModule.configure(AbstractModule.java:59)

at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)

at com.google.inject.spi.Elements.getElements(Elements.java:101)

at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)

at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)

at io.druid.initialization.Initialization.makeInjectorWithModules(Initialization.java:367)

at io.druid.cli.GuiceRunnable.makeInjector(GuiceRunnable.java:60)

at io.druid.cli.CliInternalHadoopIndexer.run(CliInternalHadoopIndexer.java:108)

at io.druid.cli.Main.main(Main.java:105)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at io.druid.cli.CliHadoopIndexer.run(CliHadoopIndexer.java:115)

at io.druid.cli.Main.main(Main.java:105)

16/12/22 02:38:03 ERROR cli.CliHadoopIndexer: failure!!!!

java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at io.druid.cli.CliHadoopIndexer.run(CliHadoopIndexer.java:115)

at io.druid.cli.Main.main(Main.java:105)

Caused by: com.google.inject.CreationException: Guice creation errors:

1) An exception was caught and reported. Message: null

at com.google.inject.util.Modules$RealOverriddenModuleBuilder$1.configure(Modules.java:172)

2) Binding to null instances is not allowed. Use toProvider(Providers.of(null)) if this is your intended behaviour.

at io.druid.cli.CliInternalHadoopIndexer$1.configure(CliInternalHadoopIndexer.java:93)

3) Binding to null instances is not allowed. Use toProvider(Providers.of(null)) if this is your intended behaviour.

at io.druid.cli.CliInternalHadoopIndexer$1.configure(CliInternalHadoopIndexer.java:94)

4) Could not find a suitable constructor in io.druid.metadata.MetadataStorageTablesConfig. Classes must have either one (and only one) constructor annotated with @Inject or a

zero-argument constructor that is not private.

at io.druid.metadata.MetadataStorageTablesConfig.class(MetadataStorageTablesConfig.java:34)

at io.druid.cli.CliInternalHadoopIndexer$1.configure(CliInternalHadoopIndexer.java:95)

4 errors

at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)

at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)

at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)

at com.google.inject.Guice.createInjector(Guice.java:95)

Any help from here will be of greatly appreciated!!

Thanks,

Uday.

Fangjin Yang

unread,

Dec 22, 2016, 3:02:54 PM12/22/16

to Druid Development

Hi,

Try these docs for EMR: https://imply.io/docs/latest/ingestion-files

They apply to stock Druid as well.

-- FJ

Reply all

Reply to author

Forward