Hash join fails giving PlannerException when same input file are used multiple time

52 views
Skip to first unread message

santlal gupta

unread,
Sep 6, 2016, 8:06:44 AM9/6/16
to cascading-user
Hi, 

I am trying to use HashJoin. I have used one input file thrice. I have gone through below scenario.

Scenario : 

Input ----------------------> gather-------------------->HashJoin ------------------output
    |__________________^                                          ^
    |                                                                                      |
    |_____________________________________|

So here schema is same during HashJoin.
 
After executing this scenario i am getting below exception : 

log4j:WARN No appenders could be found for logger (cascading.util.Util).
log4j:WARN Please initialize the log4j system properly.
Exception in thread "main" cascading.flow.planner.PlannerException: no pipelines partitioned from node: MapReduceHadoopRuleRegistry
at cascading.flow.planner.FlowPlanner.verifyResultInternal(FlowPlanner.java:679)
at cascading.flow.planner.FlowPlanner.verifyResult(FlowPlanner.java:561)
at cascading.flow.planner.rule.RuleSetExec.execPlannerFor(RuleSetExec.java:163)
at cascading.flow.planner.rule.RuleSetExec$3.call(RuleSetExec.java:336)
at cascading.flow.planner.rule.RuleSetExec$3.call(RuleSetExec.java:328)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

I am attaching sample cascading code and input file for this scenario.

I am using below version of jar file: 
cascading-hadoop2-mr1-3.1.0.jar
cascading-local-3.1.0.jar
hadoop-mapreduce-client-common-2.6.0.jar

For 

Can someone please help me in resolving this?

Thanks
Santlal J. Gupta

LookupTest.java
lookupInput.txt

Chris K Wensel

unread,
Sep 9, 2016, 1:19:13 PM9/9/16
to cascadi...@googlegroups.com
We have bug fixes across wip 3.1 and wip 3.2 that have not been released.

<dependency>
  <groupId>cascading
</groupId>
  <artifactId>cascading-core
</artifactId>
  <version>3.1.0-wip-+
</version>
</dependency>


<dependency>
  <groupId>cascading
</groupId>
  <artifactId>cascading-core
</artifactId>
  <version>3.2.0-wip-+
</version>
</dependency>

If you can test the latest wip releases across each of those minor releases that would be great.

If the bug persists, please open a pull request with your test against wip 3.2 on my public branch.


don’t worry about the contrib agreement for the test yet, I may end up adapting a different test.

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/7b74c897-cbe6-477d-bfee-983dbd27fca2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<LookupTest.java><lookupInput.txt>

Chris K Wensel




santlal gupta

unread,
Sep 14, 2016, 2:39:52 AM9/14/16
to cascading-user
Hi, 

I have tested this issue against latest wip releases wip 3.1+ and 3.2+ and this issue persists.

So i have opened pull request https://github.com/cwensel/cascading/pull/56 against wip 3.2 .

Thanks
Santlal J. Gupta

Chris K Wensel

unread,
Sep 22, 2016, 4:57:26 PM9/22/16
to cascadi...@googlegroups.com
I’ve commented on the PR, but wanted to share some additional color.

what the test suite hasn’t covered very well are trivial but well connected assemblies, like reading a file, and merging it with itself to duplicate tuples.

mainly assemblies when reduced/planned end up being multi-graphs, where one pipe/tap as two edges to another pipe.

the solution has always been (i think more obvious in 2.x as there was probably a reasonable error thrown) to add an Identity function after the head to prevent two edges from the same tap into a Merge/CoGroup/GroupBy/HashJoin.

in this case, that is the workaround.

leaving us with two solutions, one is to support multi-graphs directly in the internal physical assemblies, or to simply inject an IdentityFunction (it imparts zero cost).

the later is the easiest, via the rule engine. the former may be reserved for a 3.5 release because it may introduce some edge case issues the suite isn’t catching.

regardless, i’ll explore both to see where my confidence levels are.

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.

For more options, visit https://groups.google.com/d/optout.

Chris K Wensel

unread,
Oct 3, 2016, 6:22:40 PM10/3/16
to cascadi...@googlegroups.com
turns out supporting multi-graphs was the better approach and have left the work in the 3.2 wip branch.

besides the ability to support trivial graphs (now resulting in trivial steps), I also had to make a concession for logical merges on tez (a merge that merges the same input). 

now some tez dags that were two or more nodes are a single node (with branching and merging within the node).

if things go well on the local regression tests, I’ll push a wip tonight that should publish tomorrow (tuesday)

ckw



For more options, visit https://groups.google.com/d/optout.

Chris K Wensel

unread,
Oct 4, 2016, 11:26:34 AM10/4/16
to cascadi...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages