Failed hadoop indexing task

George Hant

unread,

Jun 6, 2016, 2:50:22 AM6/6/16

to Druid User

Hi,

I'm trying to run a batch ingestion task and I keep on failing due to this error:

2016-06-06T06:35:08,521 INFO [Thread-56] org.apache.hadoop.mapred.LocalJobRunner - reduce task executor complete.
2016-06-06T06:35:09,085 WARN [Thread-56] org.apache.hadoop.mapred.LocalJobRunner - job_local1392479455_0001
java.lang.Exception: java.io.IOException: No such file or directory
        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) [hadoop-mapreduce-client-common-2.3.0.jar:?]
Caused by: java.io.IOException: No such file or directory
        at java.io.UnixFileSystem.createFileExclusively(Native Method) ~[?:1.8.0_91]
        at java.io.File.createTempFile(File.java:2024) ~[?:1.8.0_91]
        at java.io.File.createTempFile(File.java:2070) ~[?:1.8.0_91]
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:558) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:469) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
        at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_91]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_91]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_91]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_91]
        at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_91]
2016-06-06T06:35:09,650 INFO [communication thread] org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2016-06-06T06:35:09,807 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job -  map 100% reduce 99%
2016-06-06T06:35:09,808 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Job job_local1392479455_0001 failed with state FAILED due to: NA
2016-06-06T06:35:09,912 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Counters: 38

What is causing this?

Thanks

George Hant

unread,

Jun 6, 2016, 3:15:21 AM6/6/16

to Druid User

In the task log I get

druid.indexer.task.hadoopWorkingPath: var/druid/hadoop-tmp

but nothing is written in this directory. The map reduce temp files are written in

/tmp/hadoop-root/mapred/

Scott Kinney

unread,

Jun 6, 2016, 2:35:13 PM6/6/16

to Druid User

I just moved from 0.9.0 to 0.9.1-rc1 and now i'm getting:

2016-06-06T18:11:58,758 INFO [Thread-43] org.apache.hadoop.mapred.LocalJobRunner - reduce task executor complete.

2016-06-06T18:11:58,759 WARN [Thread-43] org.apache.hadoop.mapred.LocalJobRunner - job_local149987833_0002

java.lang.Exception: java.io.IOException: No such file or directory

at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) [hadoop-mapreduce-client-common-2.3.0.jar:?]

Caused by: java.io.IOException: No such file or directory

at java.io.UnixFileSystem.createFileExclusively(Native Method) ~[?:1.7.0_101]

This is the same schema and the datafile hasn't moved.

Are you using 0.9.1-rc1?

Scott Kinney

unread,

Jun 6, 2016, 3:04:10 PM6/6/16

to Druid User

I'm running duid in local, "quickstart" mode. I forgot to 'bin/init' before starting all the processes. that seems to have fixed my problem.

On Sunday, June 5, 2016 at 11:50:22 PM UTC-7, George Hant wrote:

Scott Kinney

unread,

Jun 7, 2016, 3:11:20 PM6/7/16

to Druid User

Did you every find out what was causing this?

i'm running into this now in s3 when druid is half way thru ingesting a gz file.

Not sure what file it's not finding.

On Sunday, June 5, 2016 at 11:50:22 PM UTC-7, George Hant wrote:

Jonathan Wei

unread,

Jun 7, 2016, 6:25:22 PM6/7/16

to druid...@googlegroups.com

Seems like you might be running into the same issue here, where the tmp directory needs to exist beforehand:

https://groups.google.com/forum/#!topic/druid-user/YwKDLrr1ZOI/discussion

Also looks like your hadoopWorkingPath is a relative path, could that be causing problems?

  druid.indexer.task.hadoopWorkingPath: var/druid/hadoop-tmp

Thanks,

Jon

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/500670c8-7b64-4b5a-b0fe-f3fb0caf2cb6%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward