Need some help with a [no such vertex in graph] error in scalding.avro

259 views
Skip to first unread message

Christopher Severs

unread,
Jun 13, 2013, 4:03:42 PM6/13/13
to cascadi...@googlegroups.com
I'm using Scalding 0.8.5 (so Cascading 2.1.6 by extension I believe). I compiled a local version of cascading.avro that uses Cascading 2.1.6 as well and then compiled scalding.avro against that and Scalding 0.8.5. I have a source like this:

case class PackedAvroSource[AvroType : Manifest: AvroSchemaType : TupleConverter](paths: Seq[String])
extends FixedPathSource(paths: _*) with PackedAvroFileScheme[AvroType]  {
   val schemaType = implicitly[AvroSchemaType[AvroType]]
   override val schema = schemaType.schema
   override val converter = implicitly[TupleConverter[AvroType]]
}

If I give it a paths argument that is a Seq of length 1 it works fine. If it is a Seq of length > 1 I get an error. Here is the stack trace:
cascading.flow.planner.PlannerException: could not build flow from assembly: [no such vertex in graph]
    at cascading.flow.planner.FlowPlanner.handleExceptionDuringPlanning(FlowPlanner.java:533)
    at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:237)
    at cascading.flow.FlowConnector.connect(FlowConnector.java:454)
    at com.twitter.scalding.Job.buildFlow(Job.scala:93)
    at com.twitter.scalding.Job.run(Job.scala:126)
    at com.twitter.scalding.Tool.start$1(Tool.scala:109)
    at com.twitter.scalding.Tool.run(Tool.scala:125)
    at com.twitter.scalding.Tool.run(Tool.scala:72)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at com.twitter.scalding.Tool$.main(Tool.scala:133)
    at com.twitter.scalding.Tool.main(Tool.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at sbt.Run.invokeMain(Run.scala:68)
    at sbt.Run.run0(Run.scala:61)
    at sbt.Run.execute$1(Run.scala:50)
    at sbt.Run$$anonfun$run$1.apply$mcV$sp(Run.scala:54)
    at sbt.TrapExit$.executeMain$1(TrapExit.scala:33)
    at sbt.TrapExit$$anon$1.run(TrapExit.scala:42)
Caused by: java.lang.IllegalArgumentException: no such vertex in graph
    at org.jgrapht.graph.AbstractGraph.assertVertexExist(Unknown Source)
    at org.jgrapht.graph.AbstractBaseGraph$DirectedSpecifics.getEdgeContainer(Unknown Source)
    at org.jgrapht.graph.AbstractBaseGraph$DirectedSpecifics.inDegreeOf(Unknown Source)
    at org.jgrapht.graph.AbstractBaseGraph.inDegreeOf(Unknown Source)
    at org.jgrapht.traverse.TopologicalOrderIterator.initialize(Unknown Source)
    at org.jgrapht.traverse.TopologicalOrderIterator.<init>(Unknown Source)
    at org.jgrapht.traverse.TopologicalOrderIterator.<init>(Unknown Source)
    at org.jgrapht.traverse.TopologicalOrderIterator.<init>(Unknown Source)
    at cascading.flow.planner.BaseFlowStep.getTopologicalOrderIterator(BaseFlowStep.java:546)
    at cascading.flow.planner.BaseFlowStep.initConfFromProcessConfigDef(BaseFlowStep.java:692)
    at cascading.flow.hadoop.HadoopFlowStep.initFromProcessConfigDef(HadoopFlowStep.java:363)
    at cascading.flow.hadoop.HadoopFlowStep.getInitializedConfig(HadoopFlowStep.java:105)
    at cascading.flow.hadoop.HadoopFlowStep.createFlowStepJob(HadoopFlowStep.java:201)
    at cascading.flow.hadoop.HadoopFlowStep.createFlowStepJob(HadoopFlowStep.java:69)
    at cascading.flow.planner.BaseFlowStep.getFlowStepJob(BaseFlowStep.java:680)
    at cascading.flow.BaseFlow.initializeNewJobsMap(BaseFlow.java:1148)
    at cascading.flow.BaseFlow.initialize(BaseFlow.java:198)
    at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:231)
    ... 19 more

I thought it might have something to do with the equals or hashCode methods in PackedAvroSource so I tried making them essentially identical to TypedTSV but no luck there. My guess then is that it is somewhere in cascading.avro. The PackedAvroScheme there doesn't do anything special for equals or hashCode, it just falls back to the ones in Scheme.

Any ideas?

Thanks,
Chris


Chris K Wensel

unread,
Jun 13, 2013, 4:14:58 PM6/13/13
to cascadi...@googlegroups.com
I may have improved the error message in 2.2..

but typically this comes from the #hashCode value changing mid flight. which could happen if the sink/source fields update after the planner started.

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
For more options, visit https://groups.google.com/groups/opt_out.
 
 


Christopher Severs

unread,
Jun 13, 2013, 4:41:09 PM6/13/13
to cascadi...@googlegroups.com
That was super quick, thanks Chris. I changed hashCode to not worry about the sink/source and look only at the avro schema, works now. I'll have to think about whether that is a good idea but at least I know why it wasn't working.
Reply all
Reply to author
Forward
0 new messages