The TinkerPop ref docs give as nice example how to get the hadoop result graph as an RDD for the PeerPressureVertexProgram :
gremlin> Spark.create('local[4]')
==>org.apache.spark.SparkContext@4293e066
gremlin> graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
==>hadoopgraph[gryoinputformat->gryooutputformat]
gremlin> graph.configuration().setProperty('gremlin.hadoop.graphWriter', PersistedOutputRDD.class.getCanonicalName())
gremlin> graph.configuration().setProperty('gremlin.spark.persistContext',true)
gremlin> graph.compute(SparkGraphComputer).program(PeerPressureVertexProgram.build().create(graph)).mapReduce(ClusterCountMapReduce.build().memoryKey('clusterCount').create()).submit().get();
==>result[hadoopgraph[persistedinputrdd->persistedoutputrdd],memory[size:1]]
gremlin> spark.ls()
==>output/clusterCount [Memory Deserialized 1x Replicated]
==>output/~g [Memory Deserialized 1x Replicated]
gremlin> spark.ls('output')
==>output/clusterCount [Memory Deserialized 1x Replicated]
==>output/~g [Memory Deserialized 1x Replicated]
gremlin> Spark.getRDD('output/~g').getClass()
==>class org.apache.spark.rdd.MapPartitionsRDD
However, when you try to do the same for the TraversalVertexProgram,
spark.ls() and Spark.getRDDs() return null. When looking in the code of TraversalVertexProgram I find the following hardcoded values which I believe are the cause of this behavior:
@Override
public GraphComputer.ResultGraph getPreferredResultGraph() {
return GraphComputer.ResultGraph.ORIGINAL;
}
@Override
public GraphComputer.Persist getPreferredPersist() {
return GraphComputer.Persist.NOTHING;
}
Of cource, it is easy to subclass the TraversalVertexProgram and change these values, but is there any reason for this chosen, fixed behaviour?
Cheers, Marc