Hi Philippe
By switching to Hfs(), did you change the wordcount code or just change the command line output to use HDFS?
Here is what I did, but I still got error. It seems output/urls wasn't created in HDFS correctly.
[cloudera@localhost wordcount]$ hadoop jar wordcount.jar data/url+page.200.txt hdfs:///user/cloudera/output local
12/09/07 16:11:46 INFO util.HadoopUtil: resolving application jar from found main method on: wordcount.Main
12/09/07 16:11:46 INFO planner.HadoopPlanner: using application jar: /home/cloudera/Cascading-2.0-SDK-20120822/source/wordcount/wordcount.jar
12/09/07 16:11:46 INFO property.AppProps: using
app.id: 37481464AE115BB68BF9D659CA662E12
12/09/07 16:11:46 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
12/09/07 16:11:47 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/09/07 16:11:47 INFO hadoop.Hfs: forcing job to local mode, via
source: Lfs["TextLine[['offset', 'line']->[ALL]]"]["data/url+page.200.txt"]"]
12/09/07 16:11:47 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/09/07 16:11:47 INFO planner.HadoopPlanner: using application jar: /home/cloudera/Cascading-2.0-SDK-20120822/source/wordcount/wordcount.jar
12/09/07 16:11:47 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/09/07 16:11:47 INFO planner.HadoopPlanner: using application jar: /home/cloudera/Cascading-2.0-SDK-20120822/source/wordcount/wordcount.jar
12/09/07 16:11:47 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/09/07 16:11:47 WARN conf.Configuration: mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
12/09/07
16:11:47 INFO hadoop.Hfs: forcing job to local mode, via sink:
Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/urls"]"]
12/09/07 16:11:47 INFO planner.HadoopPlanner: using application jar: /home/cloudera/Cascading-2.0-SDK-20120822/source/wordcount/wordcount.jar
12/09/07 16:11:47 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/09/07 16:11:47 INFO hadoop.Hfs: forcing job to local mode, via sink: Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/words"]"]
12/09/07 16:11:47 INFO cascade.Cascade: [import pages+url pipe+...] starting
12/09/07 16:11:47 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/09/07 16:11:47 INFO cascade.Cascade: [import pages+url pipe+...] parallel execution is enabled: true
12/09/07 16:11:47 INFO cascade.Cascade: [import pages+url pipe+...] starting flows: 4
12/09/07 16:11:47 INFO cascade.Cascade: [import pages+url pipe+...] allocating threads: 4
12/09/07 16:11:47 INFO cascade.Cascade: [import pages+url pipe+...] starting flow: import pages
12/09/07 16:11:48 INFO flow.Flow: [import pages] at least one sink does not exist
12/09/07 16:11:48 INFO flow.Flow: [import pages] starting
12/09/07 16:11:48 INFO flow.Flow: [import pages] source: Lfs["TextLine[['offset', 'line']->[ALL]]"]["data/url+page.200.txt"]"]
12/09/07 16:11:48 INFO flow.Flow: [import pages] sink: Hfs["SequenceFile[['url', 'page']]"]["hdfs:/user/cloudera/output/pages"]"]
12/09/07 16:11:48 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/09/07 16:11:48 INFO flow.Flow: [import pages] parallel execution is enabled: true
12/09/07 16:11:48 INFO flow.Flow: [import pages] starting jobs: 1
12/09/07 16:11:48 INFO flow.Flow: [import pages] allocating threads: 1
12/09/07 16:11:48 INFO flow.FlowStep: [import pages] starting step: (1/1) ...ser/cloudera/output/pages
12/09/07 16:11:48 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/09/07 16:11:48 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
12/09/07 16:11:48 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/09/07 16:11:48 WARN snappy.LoadSnappy: Snappy native library is available
12/09/07 16:11:48 INFO snappy.LoadSnappy: Snappy native library loaded
12/09/07 16:11:48 INFO mapred.FileInputFormat: Total input paths to process : 1
12/09/07 16:11:48 INFO mapreduce.JobSubmitter: number of splits:2
12/09/07 16:11:48 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
12/09/07 16:11:48 WARN conf.Configuration: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
12/09/07 16:11:48 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
12/09/07 16:11:48 WARN conf.Configuration: mapred.output.key.comparator.class is deprecated. Instead, use mapreduce.job.output.key.comparator.class
12/09/07 16:11:48 WARN conf.Configuration:
mapred.job.name is deprecated. Instead, use
mapreduce.job.name
12/09/07 16:11:48 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
12/09/07 16:11:48 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
12/09/07 16:11:48 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
12/09/07 16:11:48 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
12/09/07 16:11:48 INFO mapred.ResourceMgrDelegate: Submitted application application_1347025361263_0013 to ResourceManager at /
0.0.0.0:803212/09/07 16:11:49 INFO mapreduce.Job: The url to track the job:
http://localhost.localdomain:8088/proxy/application_1347025361263_0013/
12/09/07 16:11:49 INFO flow.FlowStep: [import pages] submitted hadoop job: job_1347025361263_0013
12/09/07 16:12:31 WARN flow.FlowStep: [import pages] task completion events identify failed tasks
12/09/07 16:12:31 WARN flow.FlowStep: [import pages] task completion events count: 7
12/09/07 16:12:31 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1347025361263_0013_m_000000_0, Status : FAILED
12/09/07 16:12:31 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1347025361263_0013_m_000001_0, Status : FAILED
12/09/07 16:12:31 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1347025361263_0013_m_000001_1, Status : FAILED
12/09/07 16:12:31 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1347025361263_0013_m_000000_1, Status : FAILED
12/09/07 16:12:31 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1347025361263_0013_m_000001_2, Status : FAILED
12/09/07 16:12:31 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1347025361263_0013_m_000000_2, Status : FAILED
12/09/07 16:12:31 WARN flow.FlowStep: [import pages] event = Task Id : attempt_1347025361263_0013_m_000001_3, Status : TIPFAILED
12/09/07 16:12:31 INFO flow.Flow: [import pages] stopping all jobs
12/09/07 16:12:31 INFO flow.FlowStep: [import pages] stopping: (1/1) ...ser/cloudera/output/pages
12/09/07 16:12:31 INFO mapred.ResourceMgrDelegate: Killing application application_1347025361263_0013
12/09/07 16:12:31 INFO flow.Flow: [import pages] stopped all jobs
12/09/07 16:12:31 INFO util.Hadoop18TapUtil: deleting temp path hdfs:/user/cloudera/output/pages/_temporary
12/09/07 16:12:31 WARN cascade.Cascade: [import pages+url pipe+...] flow failed: import pages
cascading.flow.FlowException: local step failed
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:191)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:137)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:122)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
12/09/07 16:12:31 INFO cascade.Cascade: [import pages+url pipe+...] starting flow: export word
12/09/07 16:12:31 INFO flow.Flow: [export word] at least one sink does not exist
12/09/07 16:12:31 INFO cascade.Cascade: [import pages+url pipe+...] starting flow: export url
12/09/07 16:12:31 INFO flow.Flow: [export url] at least one sink does not exist
12/09/07 16:12:31 INFO flow.Flow: [export word] starting
12/09/07 16:12:31 INFO flow.Flow: [export word] source: Hfs["SequenceFile[['word', 'count']]"]["hdfs:/user/cloudera/output/words"]"]
12/09/07 16:12:31 INFO flow.Flow: [export word] sink: Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/words"]"]
12/09/07 16:12:31 INFO flow.Flow: [export word] parallel execution is enabled: true
12/09/07 16:12:31 INFO flow.Flow: [export word] starting jobs: 1
12/09/07 16:12:31 INFO flow.Flow: [export word] allocating threads: 1
12/09/07 16:12:32 INFO flow.Flow: [export url] starting
12/09/07 16:12:32 INFO flow.Flow: [export url] source: Hfs["SequenceFile[['url', 'word', 'count']]"]["hdfs:/user/cloudera/output/urls"]"]
12/09/07 16:12:32 INFO flow.Flow: [export url] sink: Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/urls"]"]
12/09/07 16:12:32 INFO flow.Flow: [export url] parallel execution is enabled: true
12/09/07 16:12:32 INFO flow.Flow: [export url] starting jobs: 1
12/09/07 16:12:32 INFO flow.Flow: [export url] allocating threads: 1
12/09/07 16:12:32 INFO flow.FlowStep: [export url] starting step: (1/1) local/urls
12/09/07 16:12:32 INFO flow.FlowStep: [export word] starting step: (1/1) local/words
12/09/07 16:12:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/cloudera/.staging/job_1347025361263_0014
12/09/07 16:12:32 ERROR security.UserGroupInformation:
PriviledgedActionException as:cloudera (auth:SIMPLE)
cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://
0.0.0.0:8020/user/cloudera/output/words
12/09/07 16:12:32 ERROR security.UserGroupInformation:
PriviledgedActionException as:cloudera (auth:SIMPLE)
cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://
0.0.0.0:8020/user/cloudera/output/words
12/09/07 16:12:32 INFO flow.Flow: [export word] stopping all jobs
12/09/07 16:12:32 INFO flow.FlowStep: [export word] stopping: (1/1) local/words
12/09/07 16:12:32 INFO flow.Flow: [export word] stopped all jobs
12/09/07 16:12:32 WARN cascade.Cascade: [import pages+url pipe+...] flow failed: export word
cascading.flow.FlowException: unhandled exception
at cascading.flow.BaseFlow.complete(BaseFlow.java:840)
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:762)
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:710)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://
0.0.0.0:8020/user/cloudera/output/words
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:231)
at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:251)
at cascading.tap.hadoop.io.MultiInputFormat.getSplits(MultiInputFormat.java:194)
at cascading.tap.hadoop.io.MultiInputFormat.getSplits(MultiInputFormat.java:130)
at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:478)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:470)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:360)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:609)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:604)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:604)
at cascading.flow.hadoop.planner.HadoopFlowStepJob.internalNonBlockingStart(HadoopFlowStepJob.java:104)
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:174)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:137)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:122)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42)
... 5 more
12/09/07 16:12:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/cloudera/.staging/job_1347025361263_0015
12/09/07 16:12:32 ERROR security.UserGroupInformation:
PriviledgedActionException as:cloudera (auth:SIMPLE)
cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://
0.0.0.0:8020/user/cloudera/output/urls
12/09/07 16:12:32 ERROR security.UserGroupInformation:
PriviledgedActionException as:cloudera (auth:SIMPLE)
cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://
0.0.0.0:8020/user/cloudera/output/urls
12/09/07 16:12:32 INFO flow.Flow: [export url] stopping all jobs
12/09/07 16:12:32 INFO flow.FlowStep: [export url] stopping: (1/1) local/urls
12/09/07 16:12:32 INFO flow.Flow: [export url] stopped all jobs
12/09/07 16:12:32 INFO flow.Flow: [export url] shutting down job executor
12/09/07 16:12:32 INFO flow.Flow: [export url] shutdown complete
12/09/07 16:12:32 WARN cascade.Cascade: [import pages+url pipe+...] flow failed: export url
cascading.flow.FlowException: unhandled exception
at cascading.flow.BaseFlow.complete(BaseFlow.java:840)
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:762)
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:710)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://
0.0.0.0:8020/user/cloudera/output/urls
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:231)
at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:251)
at cascading.tap.hadoop.io.MultiInputFormat.getSplits(MultiInputFormat.java:194)
at cascading.tap.hadoop.io.MultiInputFormat.getSplits(MultiInputFormat.java:130)
at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:478)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:470)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:360)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:609)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:604)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:604)
at cascading.flow.hadoop.planner.HadoopFlowStepJob.internalNonBlockingStart(HadoopFlowStepJob.java:104)
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:174)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:137)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:122)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42)
... 5 more
12/09/07 16:12:32 INFO cascade.Cascade: [import pages+url pipe+...] stopping all flows
12/09/07 16:12:32 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: export url
12/09/07 16:12:32 INFO flow.Flow: [export url] stopping all jobs
12/09/07 16:12:32 INFO flow.FlowStep: [export url] stopping: (1/1) local/urls
12/09/07 16:12:32 INFO flow.Flow: [export url] stopped all jobs
12/09/07 16:12:32 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: export word
12/09/07 16:12:32 INFO flow.Flow: [export word] stopping all jobs
12/09/07 16:12:32 INFO flow.FlowStep: [export word] stopping: (1/1) local/words
12/09/07 16:12:32 INFO flow.Flow: [export word] stopped all jobs
12/09/07 16:12:32 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: url pipe+word pipe
12/09/07 16:12:32 INFO flow.Flow: [url pipe+word pipe] stopping all jobs
12/09/07 16:12:32 INFO flow.FlowStep: [url pipe+word pipe] stopping: (2/2) ...user/cloudera/output/urls
12/09/07 16:12:32 INFO flow.FlowStep: [url pipe+word pipe] stopping: (1/2) ...ser/cloudera/output/words
12/09/07 16:12:32 INFO flow.Flow: [url pipe+word pipe] stopped all jobs
12/09/07 16:12:32 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: import pages
12/09/07 16:12:32 INFO flow.Flow: [import pages] stopping all jobs
12/09/07 16:12:32 INFO flow.FlowStep: [import pages] stopping: (1/1) ...ser/cloudera/output/pages
12/09/07
16:12:32 WARN ipc.Client: Unexpected error reading responses on
connection Thread[IPC Client (1043744321) connection to
localhost.localdomain/
127.0.0.1:41379 from cloudera,5,main]
java.lang.NullPointerException
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:852)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:781)
12/09/07 16:12:32 INFO mapred.ResourceMgrDelegate: Killing application application_1347025361263_0013
12/09/07 16:12:32 INFO flow.Flow: [import pages] stopped all jobs
12/09/07 16:12:32 INFO cascade.Cascade: [import pages+url pipe+...] stopped all flows
Exception in thread "main" cascading.cascade.CascadeException: flow failed: import pages
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:771)
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:710)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: cascading.flow.FlowException: local step failed
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:191)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:137)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:122)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42)
... 5 more