How to create a TinkerGraph from Spark RDDs

155 views
Skip to first unread message

Rob Keevil

unread,
Jan 21, 2017, 10:25:31 AM1/21/17
to Gremlin-users
Hi,

I have an existing graph analysis program written in Spark, and I would like to load the resulting graphs into JanusGraph via a BulkLoaderVertexProgram. I'm struggling to find a nice way to do this!

As a test I have loaded the example Grateful Dead graph into Janus from Spark.  This went fine, using loadGraphML to create a TinkerGraph from the test data, and:
val computer:GraphComputer = SparkGraphComputer
val blvp = BulkLoaderVertexProgram.build().writeGraph("conf/testConfig.properties").create(graph)
graph
.compute(computer.getClass).program(blvp).submit().get()
to push it to Janus.

The problem is, my Spark program's graph data is currently stored in two RDDs, edges and vertices.  I think I need to make a TinkerGraph object from these RDDs first, but can only do this via iterating over each vertex, adding them one at a time (very slow!).

The only other way I can see to do this is to dump the data into hdfs as csvs and load them that way.  I'd like to avoid that if possible as I want to move to streaming input data at some point in future.

Any help much appreciated,
Rob
Reply all
Reply to author
Forward
0 new messages