Failed hadoop indexing task

797 views
Skip to first unread message

George Hant

unread,
Jun 6, 2016, 2:50:22 AM6/6/16
to Druid User
Hi,

I'm trying to run a batch ingestion task and I keep on failing due to this error: 

2016-06-06T06:35:08,521 INFO [Thread-56] org.apache.hadoop.mapred.LocalJobRunner - reduce task executor complete.
2016-06-06T06:35:09,085 WARN [Thread-56] org.apache.hadoop.mapred.LocalJobRunner - job_local1392479455_0001
java.lang.Exception: java.io.IOException: No such file or directory
        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) [hadoop-mapreduce-client-common-2.3.0.jar:?]
Caused by: java.io.IOException: No such file or directory
        at java.io.UnixFileSystem.createFileExclusively(Native Method) ~[?:1.8.0_91]
        at java.io.File.createTempFile(File.java:2024) ~[?:1.8.0_91]
        at java.io.File.createTempFile(File.java:2070) ~[?:1.8.0_91]
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:558) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:469) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
        at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_91]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_91]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_91]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_91]
        at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_91]
2016-06-06T06:35:09,650 INFO [communication thread] org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2016-06-06T06:35:09,807 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job -  map 100% reduce 99%
2016-06-06T06:35:09,808 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Job job_local1392479455_0001 failed with state FAILED due to: NA
2016-06-06T06:35:09,912 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Counters: 38


What is causing this?

Thanks

George Hant

unread,
Jun 6, 2016, 3:15:21 AM6/6/16
to Druid User
In the task log I get 
  druid.indexer.task.hadoopWorkingPath: var/druid/hadoop-tmp
but nothing is written in this directory. The map reduce temp files are written in 
  /tmp/hadoop-root/mapred/



Scott Kinney

unread,
Jun 6, 2016, 2:35:13 PM6/6/16
to Druid User
I just moved from 0.9.0 to 0.9.1-rc1 and now i'm getting:

2016-06-06T18:11:58,758 INFO [Thread-43] org.apache.hadoop.mapred.LocalJobRunner - reduce task executor complete.
2016-06-06T18:11:58,759 WARN [Thread-43] org.apache.hadoop.mapred.LocalJobRunner - job_local149987833_0002
java.lang.Exception: java.io.IOException: No such file or directory
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) [hadoop-mapreduce-client-common-2.3.0.jar:?]
Caused by: java.io.IOException: No such file or directory
at java.io.UnixFileSystem.createFileExclusively(Native Method) ~[?:1.7.0_101]

This is the same schema and the datafile hasn't moved.
Are you using 0.9.1-rc1?

Scott Kinney

unread,
Jun 6, 2016, 3:04:10 PM6/6/16
to Druid User
I'm running duid in local, "quickstart" mode. I forgot to 'bin/init' before starting all the processes. that seems to have fixed my problem.


On Sunday, June 5, 2016 at 11:50:22 PM UTC-7, George Hant wrote:

Scott Kinney

unread,
Jun 7, 2016, 3:11:20 PM6/7/16
to Druid User
Did you every find out what was causing this? 
i'm running into this now in s3 when druid is half way thru ingesting a gz file.
Not sure what file it's not finding.


On Sunday, June 5, 2016 at 11:50:22 PM UTC-7, George Hant wrote:

Jonathan Wei

unread,
Jun 7, 2016, 6:25:22 PM6/7/16
to druid...@googlegroups.com
Seems like you might be running into the same issue here, where the tmp directory needs to exist beforehand:


Also looks like your hadoopWorkingPath is a relative path, could that be causing problems?

  druid.indexer.task.hadoopWorkingPath: var/druid/hadoop-tmp



Thanks,
Jon


--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/500670c8-7b64-4b5a-b0fe-f3fb0caf2cb6%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages