Batch Ingestion Failed

166 views
Skip to first unread message

Kien Trinh

unread,
Nov 1, 2018, 4:19:46 AM11/1/18
to Druid User
Dear all,

I'm doing a Batch Ingestion and it is failed with following exception:

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:222) ~[druid-indexing-service-0.12.3.jar:0.12.3]
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:238) ~[druid-indexing-service-0.12.3.jar:0.12.3]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.3.jar:0.12.3]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.3.jar:0.12.3]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_151]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_151]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_151]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_151]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_151]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.12.3.jar:0.12.3]
	... 7 more
Caused by: io.druid.java.util.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed!
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:391) ~[druid-indexing-hadoop-0.12.3.jar:0.12.3]
	at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:95) ~[druid-indexing-hadoop-0.12.3.jar:0.12.3]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:293) ~[druid-indexing-service-0.12.3.jar:0.12.3]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_151]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_151]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_151]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_151]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.12.3.jar:0.12.3]
	... 7 more



I view the log file in YARN, i see this exception:

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class io.druid.indexer.IndexGeneratorJob$IndexGeneratorOutputFormat not found
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:374)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1455)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1452)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1385)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class io.druid.indexer.IndexGeneratorJob$IndexGeneratorOutputFormat not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
	at org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:469)
	... 8 more
Caused by: java.lang.ClassNotFoundException: Class io.druid.indexer.IndexGeneratorJob$IndexGeneratorOutputFormat not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
	... 10 more


I know that IndexGeneratorJob is in druid-indexing-hadoop-0.12.3.jar, and i put druid-indexing-hadoop-0.12.3.jar under hadoop-dependencies and extensions/druid-hdfs-storage/ but it doesn't help.

This issue take me some days, please help me out. thanks

Regards,
Kien 




Jonathan Wei

unread,
Nov 1, 2018, 4:44:19 PM11/1/18
to druid...@googlegroups.com
It seems like maybe your classpath is in a strange state, or cluster is misconfigured, I would recommend trying out the Druid quickstart and hadoop tutorial, for an example of a working Druid+Hadoop deployment:


Thanks,
Jon

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/ea250c89-438e-4b47-9ff8-e1c041c19ccf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kien Trinh

unread,
Nov 2, 2018, 3:22:32 AM11/2/18
to druid...@googlegroups.com
Thanks Jon very much for your response, I do appreciate your time. 
I ran it successfully in local, but when I ran on production, I got this issue.
The different is that in local, the yarn-site.xml we can define yarn.application.classpath, but in production the middlewaremanager is not allocated with hadoop, so if we define like this the druid job is failed.

<property>
      <name>yarn.application.classpath</name>
      <value>/usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/*
    </value>
</property>

I see that Druid ingestion submit two jobs to Yarn the first one is determine-partitions and this job run successfully but the second job index-generator is failed.

Thanks,
Kien


Jonathan Wei

unread,
Nov 2, 2018, 8:45:31 PM11/2/18
to druid...@googlegroups.com
Hm, what version of Hadoop are you using?

It seems like there is a Hadoop bug that can cause the issue you're seeing:


I know that IndexGeneratorJob is in druid-indexing-hadoop-0.12.3.jar, and i put druid-indexing-hadoop-0.12.3.jar under hadoop-dependencies and extensions/druid-hdfs-storage/ but it doesn't help.

This step isn't needed, Druid would normally copy the druid-indexing-hadoop jar to Hadoop, it doesn't need to be in hadoop-dependencies or the HDFS storage extension.

Thanks,
Jon



Kien Trinh

unread,
Nov 7, 2018, 5:39:14 AM11/7/18
to druid...@googlegroups.com
Dear Jon,
Sorry for my late response.
I'm using hadoop 2.4.1, but upgrade our production is so risky :(

Thanks,
Kien

Adithya Shetty

unread,
Jan 9, 2025, 5:36:05 AMJan 9
to Druid User
Hi Kien,
I know its long time since u posted this issue. Where u able to fix this issue? If so could u please send over the steps
Reply all
Reply to author
Forward
0 new messages