local step failed exception on wordcount example

719 views

Skip to first unread message

Chris Guthrie

unread,

Jun 29, 2012, 2:31:18 PM6/29/12

to cascadi...@googlegroups.com

This started when I received similar errors for a Cascading program I wrote myself; I decided to try to get wordcount working first.

I manually compiled wordcount's main.java source against Cascading 2.0.1 and Cloudera's CDH4, then manually packaged it into a jar with the Cascading library.

Running: $ hadoop jar CascadingJob.jar Main data/url+page.200.txt output local

Fails with the output below. The error messages are extremely unhelpful, and I'm new to Hadoop/Cascading and I'm not sure if there are any relevant info messages below. Hadoop logs show no errors.

Any thoughts would be great. Thanks in advance, and sorry if this is a total newbie question.

12/06/29 11:28:06 INFO util.HadoopUtil: resolving application jar from found main method on: Main

12/06/29 11:28:06 INFO planner.HadoopPlanner: using application jar: /home/guthriec/cascadingManual/CascadingJob.jar

12/06/29 11:28:06 INFO property.AppProps: using app.id: 58992DBF194B209F568EA9053FD38B5A

12/06/29 11:28:06 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used

12/06/29 11:28:07 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

12/06/29 11:28:07 INFO hadoop.Hfs: forcing job to local mode, via source: Lfs["TextLine[['offset', 'line']->[ALL]]"]["data/url+page.200.txt"]"]

12/06/29 11:28:07 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

12/06/29 11:28:07 INFO planner.HadoopPlanner: using application jar: /home/guthriec/cascadingManual/CascadingJob.jar

12/06/29 11:28:07 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

12/06/29 11:28:07 INFO planner.HadoopPlanner: using application jar: /home/guthriec/cascadingManual/CascadingJob.jar

12/06/29 11:28:07 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

12/06/29 11:28:07 WARN conf.Configuration: mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress

12/06/29 11:28:07 INFO hadoop.Hfs: forcing job to local mode, via sink: Lfs["TextLine[['offset', 'line']->['url', 'word', 'count']]"]["local/urls"]"]

12/06/29 11:28:07 INFO planner.HadoopPlanner: using application jar: /home/guthriec/cascadingManual/CascadingJob.jar

12/06/29 11:28:07 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

12/06/29 11:28:07 INFO hadoop.Hfs: forcing job to local mode, via sink: Lfs["TextLine[['offset', 'line']->['word', 'count']]"]["local/words"]"]

12/06/29 11:28:07 INFO util.Version: Concurrent, Inc - Cascading 2.0.1

12/06/29 11:28:07 INFO cascade.Cascade: [import pages+url pipe+...] starting

12/06/29 11:28:07 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

12/06/29 11:28:07 INFO cascade.Cascade: [import pages+url pipe+...] parallel execution is enabled: true

12/06/29 11:28:07 INFO cascade.Cascade: [import pages+url pipe+...] starting flows: 4

12/06/29 11:28:07 INFO cascade.Cascade: [import pages+url pipe+...] allocating threads: 4

12/06/29 11:28:07 INFO cascade.Cascade: [import pages+url pipe+...] starting flow: import pages

12/06/29 11:28:07 INFO flow.Flow: [import pages] at least one sink does not exist

12/06/29 11:28:07 INFO flow.Flow: [import pages] starting

12/06/29 11:28:07 INFO flow.Flow: [import pages] source: Lfs["TextLine[['offset', 'line']->[ALL]]"]["data/url+page.200.txt"]"]

12/06/29 11:28:07 INFO flow.Flow: [import pages] sink: Hfs["SequenceFile[['url', 'page']]"]["output/pages"]"]

12/06/29 11:28:07 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

12/06/29 11:28:07 INFO flow.Flow: [import pages] parallel execution is enabled: true

12/06/29 11:28:07 INFO flow.Flow: [import pages] starting jobs: 1

12/06/29 11:28:07 INFO flow.Flow: [import pages] allocating threads: 1

12/06/29 11:28:07 INFO flow.FlowStep: [import pages] starting step: (1/1) output/pages

12/06/29 11:28:08 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

12/06/29 11:28:08 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir

12/06/29 11:28:08 INFO util.NativeCodeLoader: Loaded the native-hadoop library

12/06/29 11:28:08 WARN snappy.LoadSnappy: Snappy native library is available

12/06/29 11:28:08 INFO snappy.LoadSnappy: Snappy native library loaded

12/06/29 11:28:08 INFO mapred.FileInputFormat: Total input paths to process : 1

12/06/29 11:28:08 INFO mapreduce.JobSubmitter: number of splits:2

12/06/29 11:28:08 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar

12/06/29 11:28:08 WARN conf.Configuration: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces

12/06/29 11:28:08 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class

12/06/29 11:28:08 WARN conf.Configuration: mapred.output.key.comparator.class is deprecated. Instead, use mapreduce.job.output.key.comparator.class

12/06/29 11:28:08 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name

12/06/29 11:28:08 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir

12/06/29 11:28:08 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

12/06/29 11:28:08 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class

12/06/29 11:28:08 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir

12/06/29 11:28:08 INFO mapred.ResourceMgrDelegate: Submitted application application_1340992436467_0015 to ResourceManager at /0.0.0.0:8032

12/06/29 11:28:08 INFO mapreduce.Job: The url to track the job: http://guthriec-ThinkPad-T420s:8088/proxy/application_1340992436467_0015/

12/06/29 11:28:08 INFO flow.FlowStep: [import pages] submitted hadoop job: job_1340992436467_0015

12/06/29 11:28:34 WARN flow.FlowStep: [import pages] task completion events identify failed tasks

12/06/29 11:28:34 WARN flow.FlowStep: [import pages] task completion events count: 7

12/06/29 11:28:34 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1340992436467_0015_m_000000_0, Status : FAILED

12/06/29 11:28:34 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1340992436467_0015_m_000001_0, Status : FAILED

12/06/29 11:28:34 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1340992436467_0015_m_000001_1, Status : FAILED

12/06/29 11:28:34 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1340992436467_0015_m_000000_1, Status : FAILED

12/06/29 11:28:34 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1340992436467_0015_m_000001_2, Status : FAILED

12/06/29 11:28:34 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1340992436467_0015_m_000000_2, Status : FAILED

12/06/29 11:28:34 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1340992436467_0015_m_000001_3, Status : TIPFAILED

12/06/29 11:28:34 INFO flow.Flow: [import pages] stopping all jobs

12/06/29 11:28:34 INFO flow.FlowStep: [import pages] stopping: (1/1) output/pages

12/06/29 11:28:34 INFO mapred.ResourceMgrDelegate: Killing application application_1340992436467_0015

12/06/29 11:28:34 INFO flow.Flow: [import pages] stopped all jobs

12/06/29 11:28:34 INFO util.Hadoop18TapUtil: deleting temp path output/pages/_temporary

12/06/29 11:28:34 WARN cascade.Cascade: [import pages+url pipe+...] flow failed: import pages

cascading.flow.FlowException: local step failed

at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:191)

at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:137)

at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:122)

at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

at java.util.concurrent.FutureTask.run(FutureTask.java:166)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

at java.lang.Thread.run(Thread.java:722)

12/06/29 11:28:34 INFO cascade.Cascade: [import pages+url pipe+...] starting flow: export word

12/06/29 11:28:34 INFO cascade.Cascade: [import pages+url pipe+...] starting flow: export url

12/06/29 11:28:34 INFO flow.Flow: [export word] at least one sink does not exist

12/06/29 11:28:34 INFO flow.Flow: [export url] at least one sink does not exist

12/06/29 11:28:34 INFO flow.Flow: [export word] starting

12/06/29 11:28:34 INFO flow.Flow: [export word] source: Hfs["SequenceFile[['word', 'count']]"]["output/words"]"]

12/06/29 11:28:34 INFO flow.Flow: [export word] sink: Lfs["TextLine[['offset', 'line']->['word', 'count']]"]["local/words"]"]

12/06/29 11:28:34 INFO flow.Flow: [export word] parallel execution is enabled: true

12/06/29 11:28:34 INFO flow.Flow: [export word] starting jobs: 1

12/06/29 11:28:34 INFO flow.Flow: [export word] allocating threads: 1

12/06/29 11:28:34 INFO flow.FlowStep: [export word] starting step: (1/1) local/words

12/06/29 11:28:34 INFO flow.Flow: [export url] starting

12/06/29 11:28:34 INFO flow.Flow: [export url] source: Hfs["SequenceFile[['url', 'word', 'count']]"]["output/urls"]"]

12/06/29 11:28:34 INFO flow.Flow: [export url] sink: Lfs["TextLine[['offset', 'line']->['url', 'word', 'count']]"]["local/urls"]"]

12/06/29 11:28:34 INFO flow.Flow: [export url] parallel execution is enabled: true

12/06/29 11:28:34 INFO flow.Flow: [export url] starting jobs: 1

12/06/29 11:28:34 INFO flow.Flow: [export url] allocating threads: 1

12/06/29 11:28:34 INFO flow.FlowStep: [export url] starting step: (1/1) local/urls

12/06/29 11:28:34 INFO mapred.FileInputFormat: Total input paths to process : 0

12/06/29 11:28:34 INFO mapreduce.JobSubmitter: number of splits:0

12/06/29 11:28:34 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

12/06/29 11:28:34 INFO mapreduce.JobSubmitter: number of splits:0

12/06/29 11:28:34 INFO mapred.ResourceMgrDelegate: Submitted application application_1340992436467_0016 to ResourceManager at /0.0.0.0:8032

12/06/29 11:28:34 INFO mapreduce.Job: The url to track the job: http://guthriec-ThinkPad-T420s:8088/proxy/application_1340992436467_0016/

12/06/29 11:28:34 INFO flow.FlowStep: [export word] submitted hadoop job: job_1340992436467_0016

12/06/29 11:28:34 INFO mapred.ResourceMgrDelegate: Submitted application application_1340992436467_0017 to ResourceManager at /0.0.0.0:8032

12/06/29 11:28:34 INFO mapreduce.Job: The url to track the job: http://guthriec-ThinkPad-T420s:8088/proxy/application_1340992436467_0017/

12/06/29 11:28:34 INFO flow.FlowStep: [export url] submitted hadoop job: job_1340992436467_0017

12/06/29 11:28:45 WARN flow.FlowStep: [export word] task completion events identify failed tasks

12/06/29 11:28:45 WARN flow.FlowStep: [export word] task completion events count: 0

12/06/29 11:28:45 INFO flow.Flow: [export word] stopping all jobs

12/06/29 11:28:45 INFO flow.FlowStep: [export word] stopping: (1/1) local/words

12/06/29 11:28:45 INFO mapred.ResourceMgrDelegate: Killing application application_1340992436467_0016

12/06/29 11:28:45 INFO flow.Flow: [export word] stopped all jobs

12/06/29 11:28:45 WARN cascade.Cascade: [import pages+url pipe+...] flow failed: export word

cascading.flow.FlowException: local step failed

at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:191)

at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:137)

at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:122)

at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

at java.util.concurrent.FutureTask.run(FutureTask.java:166)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

at java.lang.Thread.run(Thread.java:722)

12/06/29 11:28:45 WARN flow.FlowStep: [export url] task completion events identify failed tasks

12/06/29 11:28:45 WARN flow.FlowStep: [export url] task completion events count: 0

12/06/29 11:28:45 INFO flow.Flow: [export url] stopping all jobs

12/06/29 11:28:45 INFO flow.FlowStep: [export url] stopping: (1/1) local/urls

12/06/29 11:28:45 INFO mapred.ResourceMgrDelegate: Killing application application_1340992436467_0017

12/06/29 11:28:45 INFO flow.Flow: [export url] stopped all jobs

12/06/29 11:28:45 WARN cascade.Cascade: [import pages+url pipe+...] flow failed: export url

cascading.flow.FlowException: local step failed

at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:191)

at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:137)

at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:122)

at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

at java.util.concurrent.FutureTask.run(FutureTask.java:166)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

at java.lang.Thread.run(Thread.java:722)

12/06/29 11:28:45 INFO cascade.Cascade: [import pages+url pipe+...] stopping all flows

12/06/29 11:28:45 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: export url

12/06/29 11:28:45 INFO flow.Flow: [export url] stopping all jobs

12/06/29 11:28:45 INFO flow.FlowStep: [export url] stopping: (1/1) local/urls

12/06/29 11:28:45 INFO mapred.ResourceMgrDelegate: Killing application application_1340992436467_0017

12/06/29 11:28:45 INFO flow.Flow: [export url] stopped all jobs

12/06/29 11:28:45 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: export word

12/06/29 11:28:45 INFO flow.Flow: [export word] stopping all jobs

12/06/29 11:28:45 INFO flow.FlowStep: [export word] stopping: (1/1) local/words

12/06/29 11:28:45 WARN ipc.Client: Unexpected error reading responses on connection Thread[IPC Client (799490076) connection to guthriec-ThinkPad-T420s/127.0.1.1:53224 from guthriec,5,main]

java.lang.NullPointerException

at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:852)

at org.apache.hadoop.ipc.Client$Connection.run(Client.java:781)

12/06/29 11:28:45 INFO mapred.ResourceMgrDelegate: Killing application application_1340992436467_0016

12/06/29 11:28:45 INFO flow.Flow: [export word] stopped all jobs

12/06/29 11:28:45 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: url pipe+word pipe

12/06/29 11:28:45 INFO flow.Flow: [url pipe+word pipe] stopping all jobs

12/06/29 11:28:45 INFO flow.FlowStep: [url pipe+word pipe] stopping: (2/2) output/words

12/06/29 11:28:45 INFO flow.FlowStep: [url pipe+word pipe] stopping: (1/2) output/urls

12/06/29 11:28:45 INFO flow.Flow: [url pipe+word pipe] stopped all jobs

12/06/29 11:28:45 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: import pages

12/06/29 11:28:45 INFO flow.Flow: [import pages] stopping all jobs

12/06/29 11:28:45 INFO flow.FlowStep: [import pages] stopping: (1/1) output/pages

12/06/29 11:28:46 INFO ipc.Client: Retrying connect to server: guthriec-ThinkPad-T420s/127.0.1.1:35123. Already tried 0 time(s).

12/06/29 11:28:46 INFO mapred.ResourceMgrDelegate: Killing application application_1340992436467_0015

12/06/29 11:28:46 INFO flow.Flow: [import pages] stopped all jobs

12/06/29 11:28:46 INFO cascade.Cascade: [import pages+url pipe+...] stopped all flows

Exception in thread "main" cascading.cascade.CascadeException: flow failed: import pages

at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:771)

at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:710)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

at java.util.concurrent.FutureTask.run(FutureTask.java:166)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

at java.lang.Thread.run(Thread.java:722)

Caused by: cascading.flow.FlowException: local step failed

at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:191)

at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:137)

at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:122)

at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42)

... 5 more

Chris K Wensel

unread,

Jun 29, 2012, 3:12:36 PM6/29/12

to cascadi...@googlegroups.com

two issues here..

one, there is evidence that Cloudera isn't byte code (binary) compatible with Apache Hadoop per earlier messages in this thread. so you may need to also recompile Cascading or just use stock Apache Hadoop.

two, if you use Lfs as a source, it forces the first MR job to run in hadoop local mode (so it can see the local file), but will push the results onto hdfs and run all other jobs on the cluster as Hfs files.

so the error messages are hard to unwind, but i'm getting the impression you have a cluster configured. if the above is/isn't causing a failure you will need to see the cluster logs for the actual failure. or it could be just that you have a local misconfiguration of your cluster. or maybe the cluster is configured, and not turned on..

12/06/29 11:28:45 WARN ipc.Client: Unexpected error reading responses on connection Thread[IPC Client (799490076) connection to guthriec-ThinkPad-T420s/127.0.1.1:53224 from guthriec,5,main]
java.lang.NullPointerException
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:852)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:781)

if AWS doesn't freak you out, just boot a small EMR cluster and keep it alive, ssh in, and run stuff from there. you are guaranteed a working cluster that way.

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/hRahqB3sNlYJ.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.

Chris K Wensel

ch...@concurrentinc.com

http://concurrentinc.com

Philippe Laflamme

unread,

Jun 29, 2012, 3:59:59 PM6/29/12

to cascadi...@googlegroups.com

There seems to be an issue with serialization when using Cascading 2.0 on CDH4:

https://groups.google.com/d/topic/cascading-user/MtIfG74xNTQ/discussion

I opened an issue at Cloudera, you can contribute what you know or vote for it here:

https://issues.cloudera.org/browse/DISTRO-401

I'm not sure what's happening, but it seems like the Cascading serializers aren't registered correctly when the job moves around the cluster.

Hope that helps,

Philippe

Reply all

Reply to author

Forward

0 new messages