How use cascades and use a sink of one flow as the source of the other flow?

92 views
Skip to first unread message

Kevin Pamplona

unread,
Dec 2, 2015, 8:14:37 PM12/2/15
to cascading-user
I have two Flows that I want to cascade together via:

CascadeConnector connector = new CascadeConnector(); 
Cascade cascade = connector.connect(flowA, flowB);

flowA has an output sink tap that is a source tap for flowB. 

When I try to run, it complains that that tap which is both a sink and source doesn't exist. I suspect the Planner is complaining in the context of flowB's source since it doesn't exit yet (it will when flowA is finished). How do I get around this?

When defining the path as a flowA's sink, it is a S3 HFS tap path to a directory. Since flowA writes multiple files into that directory, I define the path as a S3 GlobHfs when I'm declaring it as a source.

Exception in thread "main" cascading.flow.planner.PlannerException: [KASKADE--Conversion_Tr...] could not build flow from assembly: [unable to find paths matching path pattern: s3n://tm-xfer/kaskade/runs/cascade/03-12-15__01-04-33/uu-codes/]
at cascading.flow.planner.FlowPlanner.handleExceptionDuringPlanning(FlowPlanner.java:748)
at cascading.flow.planner.FlowPlanner.buildFlow(FlowPlanner.java:208)
at cascading.flow.FlowConnector.connect(FlowConnector.java:456)
at com.tm.ml.cascades.CentroidToUUCodeCascade.createCascadeUsing(CentroidToUUCodeCascade.java:42)
at com.tm.ml.Kaskade.doCompleteCascade(Kaskade.java:113)
at com.tm.ml.Kaskade.main(Kaskade.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: cascading.tap.TapException: unable to find paths matching path pattern: s3n://tm-xfer/kaskade/runs/cascade/03-12-15__01-04-33/uu-codes/
at cascading.tap.hadoop.GlobHfs.makeTaps(GlobHfs.java:136)
at cascading.tap.hadoop.GlobHfs.initTapsInternal(GlobHfs.java:112)
at cascading.tap.hadoop.GlobHfs.sourceConfInit(GlobHfs.java:157)
at cascading.tap.hadoop.GlobHfs.sourceConfInit(GlobHfs.java:59)
at cascading.flow.hadoop.HadoopFlowStep.initFromSources(HadoopFlowStep.java:351)
at cascading.flow.hadoop.HadoopFlowStep.createInitializedConfig(HadoopFlowStep.java:111)
at cascading.flow.hadoop.HadoopFlowStep.createInitializedConfig(HadoopFlowStep.java:75)
at cascading.flow.planner.BaseFlowStep.getCreateFlowStepJob(BaseFlowStep.java:860)
at cascading.flow.BaseFlow.initializeNewJobsMap(BaseFlow.java:1289)
at cascading.flow.BaseFlow.initialize(BaseFlow.java:234)
at cascading.flow.planner.FlowPlanner.buildFlow(FlowPlanner.java:202)
... 10 more

Kevin Pamplona

unread,
Dec 2, 2015, 8:18:25 PM12/2/15
to cascading-user
Also I suspect that it could be because I am using difference Java objects for the Source/Sink, but I am using the same resource path 

Ken Krugler

unread,
Dec 2, 2015, 10:38:06 PM12/2/15
to cascadi...@googlegroups.com
Hi Kevin,

From four or five years ago…Chris Wensel said:

Sorry, GlobHfs won't really work in a Cascade. 

If the source doesn't exist because the previous flow didn't write it yet, we can't magically resolve the wildcards into paths to determine dependencies before determining execution order.

-- Ken

From: Kevin Pamplona

Sent: December 2, 2015 5:14:37pm PST

To: cascading-user

Subject: How use cascades and use a sink of one flow as the source of the other flow?




--------------------------
Ken Krugler
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





Reply all
Reply to author
Forward
0 new messages