problems with TemplateTap on HDFS on amazon EMR?

77 views
Skip to first unread message

nird

unread,
Jul 12, 2012, 2:31:36 PM7/12/12
to cascadi...@googlegroups.com
I have a simple TemplateTap which worked okay testing locally, it just appends the value of one field from the tuple to the end of a path (in this case "good" or "bad" for data I'm flagging).
I put my job up to Amazon EMR and it seems to be having trouble finding the path created by the TemplateTap.  I would expect this if I was writing out to S3, but I'm just writing locally (to HDFS).  

Here is the error message:

Exception in thread "main" cascading.cascade.CascadeException: flow failed: catchUnmatched+catchUndated+page
	at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:714)
	at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:653)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: cascading.flow.FlowException: unhandled exception
	at cascading.flow.Flow.complete(Flow.java:821)
	at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:705)
	... 6 more
Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://10.4.217.249:9000/user/hadoop/output/flavored_access_logs/good
	at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
	at cascading.tap.hadoop.MultiInputFormat.getSplits(MultiInputFormat.java:240)
	at cascading.tap.hadoop.MultiInputFormat.getSplits(MultiInputFormat.java:180)
	at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1036)
	at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1028)
	at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:172)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:944)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:897)
	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:871)
	at cascading.flow.FlowStepJob.blockOnJob(FlowStepJob.java:164)
	at cascading.flow.FlowStepJob.start(FlowStepJob.java:140)
	at cascading.flow.FlowStepJob.call(FlowStepJob.java:129)
	at cascading.flow.FlowStepJob.call(FlowStepJob.java:39)
	... 5 more

I had a peek at the contents of HDFS and by the time the job had failed, that path did exist, which was weird.  This is cascading 1.2.  Possibly a known EMR issue, like with S3?  Or do I need to set some sort of setting in my site config for Hadoop?

Chris K Wensel

unread,
Jul 12, 2012, 3:35:15 PM7/12/12
to cascadi...@googlegroups.com
well, TemplateTap is for writing, and you are getting a failure on reading (input).

So i'm thinking you are trying to use the results of a TemplateTap as a source as a new source Tap.

And if so, they are being run in a Cascade. Unfortunately you can't do that as Cascading has no way to know the TemplateTap is the prior sink to the Tap source in a downstream Flow.

It probably only worked in testing because the jobs were run sequentially. but on the cluster they were run in parallel (because there was no way to identify the relationships).

short answer is you can't use a TemplateTap in a Cascade if another Flow is dependent on it. 

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/Y7QlK_tvXN4J.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.


Reply all
Reply to author
Forward
0 new messages