I'm seeing a similar situation, where the planning phase is taking a very long time. It's been hanging in the planning phase for about 10 hours now. The stack trace is very similar to the one above. I ran jstack on the hadoop process probably 100 times or so and it's always the same (see below). I can't tell if it's hanging inside RankingPathElementList.isDifferent, or if that's simply being called many times. This is using Cascading 2.2, called directly from Clojure (not using Cascalog). I will try to come up with a Java test case that reproduces it, but I just wanted to check in the meantime and see if anyone has any insight.
"main" prio=10 tid=0x00000000008bd800 nid=0x6070 runnable [0x00007fa14e0d5000]
java.lang.Thread.State: RUNNABLE
at org.jgrapht.alg.RankingPathElementList.isDifferent(Unknown Source)
at org.jgrapht.alg.RankingPathElementList.isAlreadyAdded(Unknown Source)
at org.jgrapht.alg.RankingPathElementList.addPathElements(Unknown Source)
at org.jgrapht.alg.KShortestPathsIterator.tryToAddNewPaths(Unknown Source)
at org.jgrapht.alg.KShortestPathsIterator.updateOutgoingVertices(Unknown Source)
at org.jgrapht.alg.KShortestPathsIterator.next(Unknown Source)
at org.jgrapht.alg.KShortestPaths.getPaths(Unknown Source)
at cascading.flow.planner.ElementGraphs.getAllShortestPathsBetween(ElementGraphs.java:53)
at cascading.flow.planner.ElementGraph.getAllShortestPathsFrom(ElementGraph.java:382)
at cascading.flow.planner.FlowPlanner.failOnLoneGroupAssertion(FlowPlanner.java:428)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:231)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:80)
at cascading.flow.FlowConnector.connect(FlowConnector.java:459)
at batcheetah.derivatives.platforms$eval12261$fn__12264.invoke(platforms.clj:156)
at clojure.lang.MultiFn.invoke(MultiFn.java:231)
at batcheetah.derivatives.main$run.invoke(main.clj:48)
at batcheetah.derivatives.main$main.invoke(main.clj:110)
at clojure.lang.Var.invoke(Var.java:415)
at batcheetah.derivatives.RunDerivatives.main(RunDerivatives.java:14)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:187)