Batch ingestion issue

886 views
Skip to the first unread message

Jaewon Chung

unread,
29 Mar 2016, 11:53:42 pm29/3/16
to Druid User
Using druid 0.9.0-rc3. I am trying to make sure my AWS s3 and RDS (postgres instance) are working properly. I have modified the _common/config accordingly. I am getting this error when I am trying to batch ingest the wikipedia example json. Any ideas on fixing the issue?

2016-03-30T01:21:14,105 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_wikiticker_2016-03-30T01:20:57.548Z, type=index_hadoop, dataSource=wikiticker}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:160) ~[druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]
at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:208) ~[druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:338) [druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:318) [druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_74]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_74]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_74]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_74]
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_74]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_74]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_74]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_74]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:157) ~[druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]
... 7 more
Caused by: com.metamx.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed!
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:343) ~[druid-indexing-hadoop-0.9.0-rc3.jar:0.9.0-rc3]
at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:94) ~[druid-indexing-hadoop-0.9.0-rc3.jar:0.9.0-rc3]
at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:261) ~[druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_74]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_74]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_74]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_74]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:157) ~[druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]
... 7 more
2016-03-30T01:21:14,113 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_wikiticker_2016-03-30T01:20:57.548Z",
  "status" : "FAILED",
  "duration" : 12287
}

Fangjin Yang

unread,
1 Apr 2016, 7:57:50 pm1/4/16
to Druid User
Are you using EMR for your batch ingestion? 
If so, there are more configs you'll have to override: http://imply.io/docs/latest/ingestion-batch

Otherwise, can you post the full task log?

anindita dey

unread,
20 Apr 2016, 11:31:43 pm20/4/16
to Druid User
Hi Yang,

I am trying perform batch ingestion using EMR , getting the same error. Attached my full log, removed the keys purposely.

Thanks
Anindita Dey
hadoop-druid-index-error

Fangjin Yang

unread,
26 Apr 2016, 8:48:10 pm26/4/16
to Druid User
Anindita, please try the docs I linked in my previous response.

anindita dey

unread,
26 Apr 2016, 8:55:10 pm26/4/16
to Druid User
Thanks Yang, I was able to do the Batch Ingestion once I have placed both hadoop and druid in the same cluster.

whenever I am trying to connect to Hadoop remotely , hadoop container is throwing the below error.

Diagnostics:
                    Application application_1430204789433_38485 failed 2
 times due to AM Container for appattempt_1430204789433_38485_000002
exited with  exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:                 
                    org.apache.hadoop.util.Shell$ExitCodeException:
                        at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
                        at org.apache.hadoop.util.Shell.run(Shell.java:418)
                        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
                        at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
                        at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
                        at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
                        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
                        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                        at java.lang.Thread.run(Thread.java:745)
                    Container exited with a non-zero exit code 1
                    .Failing this attempt.. Failing the application.  

anindita dey

unread,
26 Apr 2016, 9:11:08 pm26/4/16
to Druid User
Hi Yang,

Another question is that, when I tried the job with a normal .gz file, it worked. But , when I am trying with an encrypted .gz file, job fails .

I know you have mentioned a note in your site :

" Note that this method uses Hadoop's builtin S3 filesystem rather than Amazon's EMRFS, and is not compatible with Amazon-specific features such as S3 encryption and consistent views. If you need to use those features, you will need to make the Amazon EMR Hadoop JARs available to Druid through one of the mechanisms described in the Using other Hadoop distributions section. "

But, do you think there is any workaround I can use.

Thanks a lot.

Fangjin Yang

unread,
29 Apr 2016, 8:52:17 pm29/4/16
to Druid User
Anindita, did you place all your Hadoop configuration files in the cp of the server you are trying to run?

Fangjin Yang

unread,
29 Apr 2016, 8:53:02 pm29/4/16
to Druid User
What do you mean workaround? The workarounds are described in the document linked.

kl...@sertiscorp.com

unread,
12 June 2017, 1:49:50 am12/6/17
to Druid User
Hi, all

I experienced the same error as Anindita, but the solution link doesn't work.
Could you please share the solution here again?

Best Regards,
Kamolphan

kl...@sertiscorp.com

unread,
15 June 2017, 7:18:00 am15/6/17
to Druid User
Dear Fangjin Yang,

I'm really appreciate your contribution to Druid community.
I would like to bother you with a question.

Now I'm facing a problem in submitting the wikiticker job with Druid 0.9.2 bundled in HDP 2.6.0.3. (But there isn't any problem with standalone version of 0.10.0 in my local PC)
However, I must use the HDP version for the compatibility to Apache Ambari and the rest of my existing cluster.
This is how I submit the job.

[centos@dev-server1 druid]$ curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-index.json localhost:8090/druid/indexer/v1/task
{"task":"index_hadoop_wikiticker_2017-06-15T11:04:18.145Z"}

But it appeared to be FAILED in Coordinator Console with the following error (I tried many times with a few different setting that I thought it might solve: e.g. change UNIX timezone to be my local time or add mapred.job.classloader in jobProperties as describe in this link, but to no avail):

2017-06-15T11:04:31,361 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_wikiticker_2017-06-15T11:04:18.145Z, type=index_hadoop, dataSource=wikiticker}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:204) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:175) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_131]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	... 7 more
Caused by: java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://dev-server1.c.sertis-data-center.internal:8020/user/druid/quickstart/wikiticker-2015-09-12-sampled.json
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexer.DetermineHashedPartitionsJob.run(DetermineHashedPartitionsJob.java:208) ~[druid-indexing-hadoop-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:349) ~[druid-indexing-hadoop-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:91) ~[druid-indexing-hadoop-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:291) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_131]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	... 7 more
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://dev-server1.c.sertis-data-center.internal:8020/user/druid/quickstart/wikiticker-2015-09-12-sampled.json
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323) ~[?:?]
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265) ~[?:?]
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387) ~[?:?]
	at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:115) ~[?:?]
	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301) ~[?:?]
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318) ~[?:?]
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) ~[?:?]
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) ~[?:?]
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) ~[?:?]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131]
	at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131]
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) ~[?:?]
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) ~[?:?]
	at io.druid.indexer.DetermineHashedPartitionsJob.run(DetermineHashedPartitionsJob.java:116) ~[druid-indexing-hadoop-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:349) ~[druid-indexing-hadoop-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:91) ~[druid-indexing-hadoop-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:291) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_131]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	... 7 more
2017-06-15T11:04:31,375 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_hadoop_wikiticker_2017-06-15T11:04:18.145Z] status changed to [FAILED].
2017-06-15T11:04:31,378 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_wikiticker_2017-06-15T11:04:18.145Z",
  "status" : "FAILED",
  "duration" : 6650
}

Thank you very much.
log.txt

mstephe...@gmail.com

unread,
21 June 2017, 10:49:11 am21/6/17
to Druid User
Forgive me for asking something obvious, but I experienced something similar when going through the wikiticker example.  Are you certain the json config was placed into HDFS in the location you have specified (e.g. /user/druid/...)?  FWIW, I got all of that working with the files on the local filesystem, even though that's not ideal because you have to distribute it to all the nodes doing the indexing if you go that route.

Slim Bouguerra

unread,
21 June 2017, 11:38:39 am21/6/17
to druid...@googlegroups.com
Your task is failing due to 

Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://dev-server1.c.sertis-data-center.internal:8020/user/druid/quickstart/wikiticker-2015-09-12-sampled.json

This means you are asking druid to index a none existing file on HDFS.
To fix this issue you need to copy the file  from you local mac to HDFS under /user/druid/quickstart/wikiticker-2015-09-12-sampled.json

-- 

B-Slim
_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______

-- 
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/1344efe7-0ab2-44a8-8fa3-e48141639e0d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<log.txt>

kl...@sertiscorp.com

unread,
22 June 2017, 11:28:33 pm22/6/17
to Druid User
Thanks you for your kindly reply both Slim and mstephe.

I did put file into HDFS after that and it turned out to be another error produced from security setting in my cluster which is apart from Druid.
I will deal with that problem later on since it's not the problem with Druid.

Thanks! 

Best Regards,
Kamolphan
Reply all
Reply to author
Forward
0 new messages