local step failed exception on wordcount example

719 views
Skip to first unread message

Chris Guthrie

unread,
Jun 29, 2012, 2:31:18 PM6/29/12
to cascadi...@googlegroups.com
This started when I received similar errors for a Cascading program I wrote myself; I decided to try to get wordcount working first.

I manually compiled wordcount's main.java source against Cascading 2.0.1 and Cloudera's CDH4, then manually packaged it into a jar with the Cascading library.

Running: $ hadoop jar CascadingJob.jar Main data/url+page.200.txt output local

Fails with the output below. The error messages are extremely unhelpful, and I'm new to Hadoop/Cascading and I'm not sure if there are any relevant info messages below. Hadoop logs show no errors.

Any thoughts would be great. Thanks in advance, and sorry if this is a total newbie question.

12/06/29 11:28:06 INFO util.HadoopUtil: resolving application jar from found main method on: Main
12/06/29 11:28:06 INFO planner.HadoopPlanner: using application jar: /home/guthriec/cascadingManual/CascadingJob.jar
12/06/29 11:28:06 INFO property.AppProps: using app.id: 58992DBF194B209F568EA9053FD38B5A
12/06/29 11:28:06 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
12/06/29 11:28:07 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/06/29 11:28:07 INFO hadoop.Hfs: forcing job to local mode, via source: Lfs["TextLine[['offset', 'line']->[ALL]]"]["data/url+page.200.txt"]"]
12/06/29 11:28:07 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/06/29 11:28:07 INFO planner.HadoopPlanner: using application jar: /home/guthriec/cascadingManual/CascadingJob.jar
12/06/29 11:28:07 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/06/29 11:28:07 INFO planner.HadoopPlanner: using application jar: /home/guthriec/cascadingManual/CascadingJob.jar
12/06/29 11:28:07 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/06/29 11:28:07 WARN conf.Configuration: mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
12/06/29 11:28:07 INFO hadoop.Hfs: forcing job to local mode, via sink: Lfs["TextLine[['offset', 'line']->['url', 'word', 'count']]"]["local/urls"]"]
12/06/29 11:28:07 INFO planner.HadoopPlanner: using application jar: /home/guthriec/cascadingManual/CascadingJob.jar
12/06/29 11:28:07 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/06/29 11:28:07 INFO hadoop.Hfs: forcing job to local mode, via sink: Lfs["TextLine[['offset', 'line']->['word', 'count']]"]["local/words"]"]
12/06/29 11:28:07 INFO util.Version: Concurrent, Inc - Cascading 2.0.1
12/06/29 11:28:07 INFO cascade.Cascade: [import pages+url pipe+...] starting
12/06/29 11:28:07 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/06/29 11:28:07 INFO cascade.Cascade: [import pages+url pipe+...]  parallel execution is enabled: true
12/06/29 11:28:07 INFO cascade.Cascade: [import pages+url pipe+...]  starting flows: 4
12/06/29 11:28:07 INFO cascade.Cascade: [import pages+url pipe+...]  allocating threads: 4
12/06/29 11:28:07 INFO cascade.Cascade: [import pages+url pipe+...] starting flow: import pages
12/06/29 11:28:07 INFO flow.Flow: [import pages] at least one sink does not exist
12/06/29 11:28:07 INFO flow.Flow: [import pages] starting
12/06/29 11:28:07 INFO flow.Flow: [import pages]  source: Lfs["TextLine[['offset', 'line']->[ALL]]"]["data/url+page.200.txt"]"]
12/06/29 11:28:07 INFO flow.Flow: [import pages]  sink: Hfs["SequenceFile[['url', 'page']]"]["output/pages"]"]
12/06/29 11:28:07 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/06/29 11:28:07 INFO flow.Flow: [import pages]  parallel execution is enabled: true
12/06/29 11:28:07 INFO flow.Flow: [import pages]  starting jobs: 1
12/06/29 11:28:07 INFO flow.Flow: [import pages]  allocating threads: 1
12/06/29 11:28:07 INFO flow.FlowStep: [import pages] starting step: (1/1) output/pages
12/06/29 11:28:08 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/06/29 11:28:08 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
12/06/29 11:28:08 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/06/29 11:28:08 WARN snappy.LoadSnappy: Snappy native library is available
12/06/29 11:28:08 INFO snappy.LoadSnappy: Snappy native library loaded
12/06/29 11:28:08 INFO mapred.FileInputFormat: Total input paths to process : 1
12/06/29 11:28:08 INFO mapreduce.JobSubmitter: number of splits:2
12/06/29 11:28:08 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
12/06/29 11:28:08 WARN conf.Configuration: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
12/06/29 11:28:08 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
12/06/29 11:28:08 WARN conf.Configuration: mapred.output.key.comparator.class is deprecated. Instead, use mapreduce.job.output.key.comparator.class
12/06/29 11:28:08 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
12/06/29 11:28:08 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
12/06/29 11:28:08 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
12/06/29 11:28:08 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
12/06/29 11:28:08 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
12/06/29 11:28:08 INFO mapred.ResourceMgrDelegate: Submitted application application_1340992436467_0015 to ResourceManager at /0.0.0.0:8032
12/06/29 11:28:08 INFO mapreduce.Job: The url to track the job: http://guthriec-ThinkPad-T420s:8088/proxy/application_1340992436467_0015/
12/06/29 11:28:08 INFO flow.FlowStep: [import pages] submitted hadoop job: job_1340992436467_0015
12/06/29 11:28:34 WARN flow.FlowStep: [import pages] task completion events identify failed tasks
12/06/29 11:28:34 WARN flow.FlowStep: [import pages] task completion events count: 7
12/06/29 11:28:34 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1340992436467_0015_m_000000_0, Status : FAILED
12/06/29 11:28:34 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1340992436467_0015_m_000001_0, Status : FAILED
12/06/29 11:28:34 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1340992436467_0015_m_000001_1, Status : FAILED
12/06/29 11:28:34 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1340992436467_0015_m_000000_1, Status : FAILED
12/06/29 11:28:34 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1340992436467_0015_m_000001_2, Status : FAILED
12/06/29 11:28:34 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1340992436467_0015_m_000000_2, Status : FAILED
12/06/29 11:28:34 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1340992436467_0015_m_000001_3, Status : TIPFAILED
12/06/29 11:28:34 INFO flow.Flow: [import pages] stopping all jobs
12/06/29 11:28:34 INFO flow.FlowStep: [import pages] stopping: (1/1) output/pages
12/06/29 11:28:34 INFO mapred.ResourceMgrDelegate: Killing application application_1340992436467_0015
12/06/29 11:28:34 INFO flow.Flow: [import pages] stopped all jobs
12/06/29 11:28:34 INFO util.Hadoop18TapUtil: deleting temp path output/pages/_temporary
12/06/29 11:28:34 WARN cascade.Cascade: [import pages+url pipe+...] flow failed: import pages
cascading.flow.FlowException: local step failed
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:191)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:137)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:122)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
12/06/29 11:28:34 INFO cascade.Cascade: [import pages+url pipe+...] starting flow: export word
12/06/29 11:28:34 INFO cascade.Cascade: [import pages+url pipe+...] starting flow: export url
12/06/29 11:28:34 INFO flow.Flow: [export word] at least one sink does not exist
12/06/29 11:28:34 INFO flow.Flow: [export url] at least one sink does not exist
12/06/29 11:28:34 INFO flow.Flow: [export word] starting
12/06/29 11:28:34 INFO flow.Flow: [export word]  source: Hfs["SequenceFile[['word', 'count']]"]["output/words"]"]
12/06/29 11:28:34 INFO flow.Flow: [export word]  sink: Lfs["TextLine[['offset', 'line']->['word', 'count']]"]["local/words"]"]
12/06/29 11:28:34 INFO flow.Flow: [export word]  parallel execution is enabled: true
12/06/29 11:28:34 INFO flow.Flow: [export word]  starting jobs: 1
12/06/29 11:28:34 INFO flow.Flow: [export word]  allocating threads: 1
12/06/29 11:28:34 INFO flow.FlowStep: [export word] starting step: (1/1) local/words
12/06/29 11:28:34 INFO flow.Flow: [export url] starting
12/06/29 11:28:34 INFO flow.Flow: [export url]  source: Hfs["SequenceFile[['url', 'word', 'count']]"]["output/urls"]"]
12/06/29 11:28:34 INFO flow.Flow: [export url]  sink: Lfs["TextLine[['offset', 'line']->['url', 'word', 'count']]"]["local/urls"]"]
12/06/29 11:28:34 INFO flow.Flow: [export url]  parallel execution is enabled: true
12/06/29 11:28:34 INFO flow.Flow: [export url]  starting jobs: 1
12/06/29 11:28:34 INFO flow.Flow: [export url]  allocating threads: 1
12/06/29 11:28:34 INFO flow.FlowStep: [export url] starting step: (1/1) local/urls
12/06/29 11:28:34 INFO mapred.FileInputFormat: Total input paths to process : 0
12/06/29 11:28:34 INFO mapred.FileInputFormat: Total input paths to process : 0
12/06/29 11:28:34 INFO mapreduce.JobSubmitter: number of splits:0
12/06/29 11:28:34 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/06/29 11:28:34 INFO mapreduce.JobSubmitter: number of splits:0
12/06/29 11:28:34 INFO mapred.ResourceMgrDelegate: Submitted application application_1340992436467_0016 to ResourceManager at /0.0.0.0:8032
12/06/29 11:28:34 INFO mapreduce.Job: The url to track the job: http://guthriec-ThinkPad-T420s:8088/proxy/application_1340992436467_0016/
12/06/29 11:28:34 INFO flow.FlowStep: [export word] submitted hadoop job: job_1340992436467_0016
12/06/29 11:28:34 INFO mapred.ResourceMgrDelegate: Submitted application application_1340992436467_0017 to ResourceManager at /0.0.0.0:8032
12/06/29 11:28:34 INFO mapreduce.Job: The url to track the job: http://guthriec-ThinkPad-T420s:8088/proxy/application_1340992436467_0017/
12/06/29 11:28:34 INFO flow.FlowStep: [export url] submitted hadoop job: job_1340992436467_0017
12/06/29 11:28:45 WARN flow.FlowStep: [export word] task completion events identify failed tasks
12/06/29 11:28:45 WARN flow.FlowStep: [export word] task completion events count: 0
12/06/29 11:28:45 INFO flow.Flow: [export word] stopping all jobs
12/06/29 11:28:45 INFO flow.FlowStep: [export word] stopping: (1/1) local/words
12/06/29 11:28:45 INFO mapred.ResourceMgrDelegate: Killing application application_1340992436467_0016
12/06/29 11:28:45 INFO flow.Flow: [export word] stopped all jobs
12/06/29 11:28:45 WARN cascade.Cascade: [import pages+url pipe+...] flow failed: export word
cascading.flow.FlowException: local step failed
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:191)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:137)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:122)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
12/06/29 11:28:45 WARN flow.FlowStep: [export url] task completion events identify failed tasks
12/06/29 11:28:45 WARN flow.FlowStep: [export url] task completion events count: 0
12/06/29 11:28:45 INFO flow.Flow: [export url] stopping all jobs
12/06/29 11:28:45 INFO flow.FlowStep: [export url] stopping: (1/1) local/urls
12/06/29 11:28:45 INFO mapred.ResourceMgrDelegate: Killing application application_1340992436467_0017
12/06/29 11:28:45 INFO flow.Flow: [export url] stopped all jobs
12/06/29 11:28:45 WARN cascade.Cascade: [import pages+url pipe+...] flow failed: export url
cascading.flow.FlowException: local step failed
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:191)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:137)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:122)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
12/06/29 11:28:45 INFO cascade.Cascade: [import pages+url pipe+...] stopping all flows
12/06/29 11:28:45 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: export url
12/06/29 11:28:45 INFO flow.Flow: [export url] stopping all jobs
12/06/29 11:28:45 INFO flow.FlowStep: [export url] stopping: (1/1) local/urls
12/06/29 11:28:45 INFO mapred.ResourceMgrDelegate: Killing application application_1340992436467_0017
12/06/29 11:28:45 INFO flow.Flow: [export url] stopped all jobs
12/06/29 11:28:45 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: export word
12/06/29 11:28:45 INFO flow.Flow: [export word] stopping all jobs
12/06/29 11:28:45 INFO flow.FlowStep: [export word] stopping: (1/1) local/words
12/06/29 11:28:45 WARN ipc.Client: Unexpected error reading responses on connection Thread[IPC Client (799490076) connection to guthriec-ThinkPad-T420s/127.0.1.1:53224 from guthriec,5,main]
java.lang.NullPointerException
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:852)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:781)
12/06/29 11:28:45 INFO mapred.ResourceMgrDelegate: Killing application application_1340992436467_0016
12/06/29 11:28:45 INFO flow.Flow: [export word] stopped all jobs
12/06/29 11:28:45 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: url pipe+word pipe
12/06/29 11:28:45 INFO flow.Flow: [url pipe+word pipe] stopping all jobs
12/06/29 11:28:45 INFO flow.FlowStep: [url pipe+word pipe] stopping: (2/2) output/words
12/06/29 11:28:45 INFO flow.FlowStep: [url pipe+word pipe] stopping: (1/2) output/urls
12/06/29 11:28:45 INFO flow.Flow: [url pipe+word pipe] stopped all jobs
12/06/29 11:28:45 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: import pages
12/06/29 11:28:45 INFO flow.Flow: [import pages] stopping all jobs
12/06/29 11:28:45 INFO flow.FlowStep: [import pages] stopping: (1/1) output/pages
12/06/29 11:28:46 INFO ipc.Client: Retrying connect to server: guthriec-ThinkPad-T420s/127.0.1.1:35123. Already tried 0 time(s).
12/06/29 11:28:46 INFO mapred.ResourceMgrDelegate: Killing application application_1340992436467_0015
12/06/29 11:28:46 INFO flow.Flow: [import pages] stopped all jobs
12/06/29 11:28:46 INFO cascade.Cascade: [import pages+url pipe+...] stopped all flows
Exception in thread "main" cascading.cascade.CascadeException: flow failed: import pages
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:771)
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:710)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: cascading.flow.FlowException: local step failed
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:191)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:137)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:122)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42)
... 5 more

Chris K Wensel

unread,
Jun 29, 2012, 3:12:36 PM6/29/12
to cascadi...@googlegroups.com

two issues here..

one, there is evidence that Cloudera isn't byte code (binary) compatible with Apache Hadoop per earlier messages in this thread. so you may need to also recompile Cascading or just use stock Apache Hadoop.

two, if you use Lfs as a source, it forces the first MR job to run in hadoop local mode (so it can see the local file), but will push the results onto hdfs and run all other jobs on the cluster as Hfs files. 

so the error messages are hard to unwind, but i'm getting the impression you have a cluster configured. if the above is/isn't causing a failure you will need to see the cluster logs for the actual failure. or it could be just that you have a local misconfiguration of your cluster. or maybe the cluster is configured, and not turned on..

12/06/29 11:28:45 WARN ipc.Client: Unexpected error reading responses on connection Thread[IPC Client (799490076) connection to guthriec-ThinkPad-T420s/127.0.1.1:53224 from guthriec,5,main]
java.lang.NullPointerException
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:852)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:781)

if AWS doesn't freak you out, just boot a small EMR cluster and keep it alive, ssh in, and run stuff from there. you are guaranteed a working cluster that way.

ckw


--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/hRahqB3sNlYJ.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.


Philippe Laflamme

unread,
Jun 29, 2012, 3:59:59 PM6/29/12
to cascadi...@googlegroups.com
There seems to be an issue with serialization when using Cascading 2.0 on CDH4:


I opened an issue at Cloudera, you can contribute what you know or vote for it here:


I'm not sure what's happening, but it seems like the Cascading serializers aren't registered correctly when the job moves around the cluster.

Hope that helps,
Philippe
Reply all
Reply to author
Forward
0 new messages