Batch ingestion issue

Jaewon Chung

unread,

29 Mar 2016, 11:53:42 pm29/3/16

to Druid User

Using druid 0.9.0-rc3. I am trying to make sure my AWS s3 and RDS (postgres instance) are working properly. I have modified the _common/config accordingly. I am getting this error when I am trying to batch ingest the wikipedia example json. Any ideas on fixing the issue?

2016-03-30T01:21:14,105 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_wikiticker_2016-03-30T01:20:57.548Z, type=index_hadoop, dataSource=wikiticker}]

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException

at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:160) ~[druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]

at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:208) ~[druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:338) [druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:318) [druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_74]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_74]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_74]

at java.lang.Thread.run(Thread.java:745) [?:1.8.0_74]

Caused by: java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_74]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_74]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_74]

at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_74]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:157) ~[druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]

... 7 more

Caused by: com.metamx.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed!

at io.druid.indexer.JobHelper.runJobs(JobHelper.java:343) ~[druid-indexing-hadoop-0.9.0-rc3.jar:0.9.0-rc3]

at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:94) ~[druid-indexing-hadoop-0.9.0-rc3.jar:0.9.0-rc3]

at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:261) ~[druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_74]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_74]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_74]

at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_74]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:157) ~[druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]

... 7 more

2016-03-30T01:21:14,113 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {

"id" : "index_hadoop_wikiticker_2016-03-30T01:20:57.548Z",

"status" : "FAILED",

"duration" : 12287

}

Fangjin Yang

unread,

1 Apr 2016, 7:57:50 pm1/4/16

to Druid User

Are you using EMR for your batch ingestion?
If so, there are more configs you'll have to override: http://imply.io/docs/latest/ingestion-batch

Otherwise, can you post the full task log?

anindita dey

unread,

20 Apr 2016, 11:31:43 pm20/4/16

to Druid User

Hi Yang,

I am trying perform batch ingestion using EMR , getting the same error. Attached my full log, removed the keys purposely.

Thanks
Anindita Dey

hadoop-druid-index-error

Fangjin Yang

unread,

26 Apr 2016, 8:48:10 pm26/4/16

to Druid User

Anindita, please try the docs I linked in my previous response.

anindita dey

unread,

26 Apr 2016, 8:55:10 pm26/4/16

to Druid User

Thanks Yang, I was able to do the Batch Ingestion once I have placed both hadoop and druid in the same cluster.

whenever I am trying to connect to Hadoop remotely , hadoop container is throwing the below error.

Diagnostics:
                    Application application_1430204789433_38485 failed 2
times due to AM Container for appattempt_1430204789433_38485_000002
exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
                    org.apache.hadoop.util.Shell$ExitCodeException:
                        at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
                        at org.apache.hadoop.util.Shell.run(Shell.java:418)
                        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
                        at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
                        at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
                        at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
                        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
                        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                        at java.lang.Thread.run(Thread.java:745)
                    Container exited with a non-zero exit code 1
                    .Failing this attempt.. Failing the application.

anindita dey

unread,

26 Apr 2016, 9:11:08 pm26/4/16

to Druid User

Hi Yang,

Another question is that, when I tried the job with a normal .gz file, it worked. But , when I am trying with an encrypted .gz file, job fails .

I know you have mentioned a note in your site :

" Note that this method uses Hadoop's builtin S3 filesystem rather than Amazon's EMRFS, and is not compatible with Amazon-specific features such as S3 encryption and consistent views. If you need to use those features, you will need to make the Amazon EMR Hadoop JARs available to Druid through one of the mechanisms described in the Using other Hadoop distributions section. "

But, do you think there is any workaround I can use.

Thanks a lot.

Fangjin Yang

unread,

29 Apr 2016, 8:52:17 pm29/4/16

to Druid User

Anindita, did you place all your Hadoop configuration files in the cp of the server you are trying to run?

Fangjin Yang

unread,

29 Apr 2016, 8:53:02 pm29/4/16

to Druid User

What do you mean workaround? The workarounds are described in the document linked.

kl...@sertiscorp.com

unread,

12 June 2017, 1:49:50 am12/6/17

to Druid User

Hi, all

I experienced the same error as Anindita, but the solution link doesn't work.

Could you please share the solution here again?

Best Regards,

Kamolphan

kl...@sertiscorp.com

unread,

15 June 2017, 7:18:00 am15/6/17

to Druid User

Dear Fangjin Yang,

I'm really appreciate your contribution to Druid community.
I would like to bother you with a question.

Now I'm facing a problem in submitting the wikiticker job with Druid 0.9.2 bundled in HDP 2.6.0.3. (But there isn't any problem with standalone version of 0.10.0 in my local PC)

However, I must use the HDP version for the compatibility to Apache Ambari and the rest of my existing cluster.

This is how I submit the job.

[centos@dev-server1 druid]$ curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-index.json localhost:8090/druid/indexer/v1/task

{"task":"index_hadoop_wikiticker_2017-06-15T11:04:18.145Z"}

But it appeared to be FAILED in Coordinator Console with the following error (I tried many times with a few different setting that I thought it might solve: e.g. change UNIX timezone to be my local time or add mapred.job.classloader in jobProperties as describe in this link, but to no avail):

2017-06-15T11:04:31,361 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_wikiticker_2017-06-15T11:04:18.145Z, type=index_hadoop, dataSource=wikiticker}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:204) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:175) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_131]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	... 7 more
Caused by: java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://dev-server1.c.sertis-data-center.internal:8020/user/druid/quickstart/wikiticker-2015-09-12-sampled.json
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexer.DetermineHashedPartitionsJob.run(DetermineHashedPartitionsJob.java:208) ~[druid-indexing-hadoop-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:349) ~[druid-indexing-hadoop-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:91) ~[druid-indexing-hadoop-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:291) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_131]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	... 7 more
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://dev-server1.c.sertis-data-center.internal:8020/user/druid/quickstart/wikiticker-2015-09-12-sampled.json
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323) ~[?:?]
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265) ~[?:?]
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387) ~[?:?]
	at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:115) ~[?:?]
	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301) ~[?:?]
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318) ~[?:?]
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) ~[?:?]
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) ~[?:?]
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) ~[?:?]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131]
	at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131]
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) ~[?:?]
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) ~[?:?]
	at io.druid.indexer.DetermineHashedPartitionsJob.run(DetermineHashedPartitionsJob.java:116) ~[druid-indexing-hadoop-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:349) ~[druid-indexing-hadoop-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:91) ~[druid-indexing-hadoop-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:291) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_131]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	... 7 more
2017-06-15T11:04:31,375 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_hadoop_wikiticker_2017-06-15T11:04:18.145Z] status changed to [FAILED].
2017-06-15T11:04:31,378 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_wikiticker_2017-06-15T11:04:18.145Z",
  "status" : "FAILED",
  "duration" : 6650
}

Thank you very much.

log.txt

mstephe...@gmail.com

unread,

21 June 2017, 10:49:11 am21/6/17

to Druid User

Forgive me for asking something obvious, but I experienced something similar when going through the wikiticker example. Are you certain the json config was placed into HDFS in the location you have specified (e.g. /user/druid/...)? FWIW, I got all of that working with the files on the local filesystem, even though that's not ideal because you have to distribute it to all the nodes doing the indexing if you go that route.

Slim Bouguerra

unread,

21 June 2017, 11:38:39 am21/6/17

to druid...@googlegroups.com

Your task is failing due to

Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://dev-server1.c.sertis-data-center.internal:8020/user/druid/quickstart/wikiticker-2015-09-12-sampled.json

This means you are asking druid to index a none existing file on HDFS.

To fix this issue you need to copy the file from you local mac to HDFS under /user/druid/quickstart/wikiticker-2015-09-12-sampled.json

--

B-Slim
_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/1344efe7-0ab2-44a8-8fa3-e48141639e0d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<log.txt>

kl...@sertiscorp.com

unread,

22 June 2017, 11:28:33 pm22/6/17

to Druid User

Thanks you for your kindly reply both Slim and mstephe.

I did put file into HDFS after that and it turned out to be another error produced from security setting in my cluster which is apart from Druid.
I will deal with that problem later on since it's not the problem with Druid.

Thanks!

Best Regards,
Kamolphan

Reply all

Reply to author

Forward