[Cascading 3.0.0-wip-75] Attempting to give Tez a spin

34 views
Skip to first unread message

Cyrille Chépélov

unread,
Mar 6, 2015, 7:44:08 AM3/6/15
to cascadi...@googlegroups.com
Hello,

after kind of successfully running my scalding workload under hadoop-on-YARN, and hadoop2-mr1-on-yarn[*], I attempted to throw the same at Tez.

I had to disable cascading.planner.plan.path, because of a NPE that sprung while writing the planner plan.

I now end up with this exception during the planning phase:
15/03/06 13:24:46 INFO util.Util: resolving application jar from found main method on: com.twitter.scalding.Tool$
15/03/06 13:24:46 INFO planner.Hadoop2TezPlanner: using application jar: /home/cchepelov/[REDACTED]
15/03/06 13:24:47 INFO flow.Flow: [REDACTED] partition rule created duplicate element graph to prior partitioner: ConsecutiveGroupOrMergesNodePartitioner, replacing duplicate result
15/03/06 13:24:47 INFO flow.Flow: [REDACTED] partition rule created duplicate element graph to prior partitioner: ConsecutiveGroupOrMergesNodePartitioner, replacing duplicate result
15/03/06 13:24:47 INFO flow.Flow: [REDACTED] partition rule created duplicate element graph to prior partitioner: ConsecutiveGroupOrMergesNodePartitioner, replacing duplicate result
15/03/06 13:24:47 INFO flow.Flow: [REDACTED] executed rule registry: NoHashJoinHadoop2TezRuleRegistry, completed in: 00:00.053
15/03/06 13:24:47 INFO flow.Flow: [REDACTED] partition rule created duplicate element graph to prior partitioner: BottomUpBoundariesNodePartitioner, replacing duplicate result
15/03/06 13:24:47 INFO flow.Flow: [REDACTED] partition rule created duplicate element graph to prior partitioner: BottomUpBoundariesNodePartitioner, replacing duplicate result
15/03/06 13:24:47 INFO flow.Flow: [REDACTED] partition rule created duplicate element graph to prior partitioner: BottomUpBoundariesNodePartitioner, replacing duplicate result
15/03/06 13:24:47 INFO flow.Flow: [REDACTED] executed rule registry: HashJoinHadoop2TezRuleRegistry, completed in: 00:00.069
15/03/06 13:24:47 INFO flow.Flow: [REDACTED] rule registry: NoHashJoinHadoop2TezRuleRegistry, found assembly to be malformed
15/03/06 13:24:47 INFO flow.Flow: [REDACTED] rule registry: HashJoinHadoop2TezRuleRegistry, found assembly to be malformed
Exception in thread "main" java.lang.Throwable: If you know what exactly caused this error, please consider contributing to GitHub via following link.
https://github.com/twitter/scalding/wiki/Common-Exceptions-and-possible-reasons#javalangillegalargumentexception
    at com.twitter.scalding.Tool$.main(Tool.scala:132)
    at com.twitter.scalding.Tool.main(Tool.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: cascading.flow.planner.PlannerException: failed on rule: SplitJoinBoundariesNodeRePartitioner
    at cascading.flow.planner.rule.RuleExec.executeRulePhase(RuleExec.java:189)
    at cascading.flow.planner.rule.RuleExec.planPhases(RuleExec.java:122)
    at cascading.flow.planner.rule.RuleExec.exec(RuleExec.java:85)
    at cascading.flow.planner.rule.RuleSetExec.execPlannerFor(RuleSetExec.java:142)
    at cascading.flow.planner.rule.RuleSetExec$3.call(RuleSetExec.java:298)
    at cascading.flow.planner.rule.RuleSetExec$3.call(RuleSetExec.java:294)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: only supports single source and single sink graphs
    at cascading.flow.planner.iso.subgraph.iterator.UniquePathSubGraphIterator.advance(UniquePathSubGraphIterator.java:95)
    at cascading.flow.planner.iso.subgraph.iterator.UniquePathSubGraphIterator.hasNext(UniquePathSubGraphIterator.java:70)
    at cascading.flow.planner.iso.subgraph.iterator.IncludeRemainderSubGraphIterator.hasNext(IncludeRemainderSubGraphIterator.java:71)
    at cascading.flow.planner.iso.subgraph.partitioner.ExpressionGraphPartitioner.partition(ExpressionGraphPartitioner.java:83)
    at cascading.flow.planner.rule.RulePartitioner.partition(RulePartitioner.java:91)
    at cascading.flow.planner.rule.RuleExec.handleCurrentPartitioning(RuleExec.java:254)
    at cascading.flow.planner.rule.RuleExec.performPartition(RuleExec.java:230)
    at cascading.flow.planner.rule.RuleExec.executeRulePhase(RuleExec.java:179)
    ... 11 more
and the "util.TezUtil: adding to cluster side classpath" message is not printed.

From what I could discern, it seems my work structure is not yet supported by cascading-tez, perhaps for lack of a *TezRuleRegistry able to recognise a specific pattern?

Chris, I'll re-generate and send you now the --local, --hdfs and --hadoop2-mr1 planner outputs, together with the complete output from the Tez attempt. Please let me know if you'd like more details, if we can narrow that down to a test case.

    -- Cyrille


[*] for lack of time, not yet the whole 15+ hours nor a detailed accuracy check, but early behaviour over a hour matches hadoop1.
Using scalding 0.13.1 patched with https://github.com/twitter/scalding/pull/1220 , so running with --hadoop2-mr1 and --hadoop2-tez (respectively) instead of --hdfs

Chris K Wensel

unread,
Mar 6, 2015, 12:33:52 PM3/6/15
to cascadi...@googlegroups.com
Can you send me the NPE stacktrace, obviously that shouldn’t be happening. If I can quickly fix that, the tez planner trace stuff will help immensely. 

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/54F9A10D.5080902%40transparencyrights.com.
For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




Cyrille Chépélov

unread,
Mar 6, 2015, 5:43:04 PM3/6/15
to cascadi...@googlegroups.com
Here's the NPE; not sure the job mentioned is the one causing the crash or a neighbour. /tmp/plan-hadoop2-tez.lst isn't empty (I'll send it your way off-list).

15/03/06 23:31:03 INFO flow.Flow: [c.t.s.o.stats.Statisti...] executed rule registry: HashJoinHadoop2TezRuleRegistry, completed in: 00:00.054
15/03/06 23:31:03 INFO util.TraceWriter: writing trace element plan: /tmp/plan-hadoop2-tez.lst/c.t.s.o.stats.StatisticsBeforeAndAfterJob/HashJoinHadoop2TezRuleRegistry/completed-flow-element-graph.dot

Exception in thread "main" java.lang.Throwable: If you know what exactly caused this error, please consider contributing to GitHub via following link.

    at com.twitter.scalding.Tool$.main(Tool.scala:132)
    at com.twitter.scalding.Tool.main(Tool.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: cascading.flow.planner.PlannerException: [c.t.s.o.stats.Statisti...] could not build flow from assembly: [null]
    at cascading.flow.planner.FlowPlanner.handleExceptionDuringPlanning(FlowPlanner.java:627)
    at cascading.flow.planner.FlowPlanner.buildFlow(FlowPlanner.java:196)
    at cascading.flow.FlowConnector.connect(FlowConnector.java:456)
    at com.twitter.scalding.ExecutionContext$class.buildFlow(ExecutionContext.scala:53)
    at com.twitter.scalding.ExecutionContext$$anon$1.buildFlow(ExecutionContext.scala:100)
    at com.twitter.scalding.Job$$anonfun$buildFlow$1.apply(Job.scala:230)
    at com.twitter.scalding.Job$$anonfun$buildFlow$1.apply(Job.scala:230)
    at scala.util.Success.flatMap(Try.scala:230)
    at com.twitter.scalding.Job.buildFlow(Job.scala:230)
    at com.twitter.scalding.CascadeJob$$anonfun$1.apply(CascadeJob.scala:11)
    at com.twitter.scalding.CascadeJob$$anonfun$1.apply(CascadeJob.scala:11)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
    at scala.collection.immutable.List.map(List.scala:285)
    at com.twitter.scalding.CascadeJob.run(CascadeJob.scala:11)
    at com.twitter.scalding.Tool.start$1(Tool.scala:104)
    at com.twitter.scalding.Tool.run(Tool.scala:120)
    at com.twitter.scalding.Tool.run(Tool.scala:68)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at com.twitter.scalding.Tool$.main(Tool.scala:128)
    ... 7 more
Caused by: java.lang.NullPointerException
    at cascading.flow.planner.rule.util.TraceWriter.writeTracePlan(TraceWriter.java:258)
    at cascading.flow.planner.rule.RuleSetExec.execPlannerFor(RuleSetExec.java:146)

    at cascading.flow.planner.rule.RuleSetExec$3.call(RuleSetExec.java:298)
    at cascading.flow.planner.rule.RuleSetExec$3.call(RuleSetExec.java:294)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
15/03/06 23:31:03 INFO flow.Flow: [c.t.s.o.stats.Statisti...] executed rule registry: NoHashJoinHadoop2TezRuleRegistry, completed in: 00:00.206
15/03/06 23:31:03 INFO util.TraceWriter: writing trace element plan: /tmp/plan-hadoop2-tez.lst/c.t.s.o.stats.StatisticsBeforeAndAfterJob/NoHashJoinHadoop2TezRuleRegistry/completed-flow-element-graph.dot

    -- Cyrille

For more options, visit https://groups.google.com/d/optout.


--

Logo Transparency

Cyrille CHÉPÉLOV
Chief Innovation Officer

Transparency Rights Management
15 rue Jean-Baptiste Berlier - Hall B, 75013 Paris
T : +33 1 84 16 52 74 / F : +33 1 84 17 83 34

Chris K Wensel

unread,
Mar 6, 2015, 6:24:31 PM3/6/15
to cascadi...@googlegroups.com
I have some cleanup on the logging i’m about to check in, will include the fix for the npe so we can get a step closer.

ckw

--

<logo-carte-de-visite-elec.png>

Cyrille CHÉPÉLOV
Chief Innovation Officer

Transparency Rights Management
15 rue Jean-Baptiste Berlier - Hall B, 75013 Paris
T : +33 1 84 16 52 74 / F : +33 1 84 17 83 34

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.

For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




Reply all
Reply to author
Forward
0 new messages