Neo4j, OLAP & GraphComputer

drVillo

unread,

Jul 12, 2016, 4:04:45 PM7/12/16

to Gremlin-users

I'm very interested in the approach proposed by graph computer as it fills the gap of running graph algorithms that are otherwise very expensive to run via the standard traverser. At the same time, I'm a bit confused by how someone would implement a data pipeline that starts with Neo4j as the OLTP graph and ends in a graph that supports graph computers.

At the beginning I just thought I could do

graph.compute().program(PageRankVertexProgram.build().create()).submit().get()

on a Neo4jGraph but it turns out that doesn't seem to be the way things work...

I can only see two ways around this:

doing some ETL to move the graph in Neo4j to either TinkerGraph or HadoopGraph, which have a GraphComputer implementation
write a Neo4jGraphComputer implementation, although I'm not sure distributed graph computations make a lot of sense there

Another approach I had been looking at a while back was to ETL the Neo4j graph to Spark's GraphX and implement traversals there. This would effectively bypass TinkerPop; I'd be interested in hearing from people that have evaluated this option in comparison to the above.

Hope this makes sense somehow, thanks!

F

Marko Rodriguez

unread,

Jul 12, 2016, 11:55:06 PM7/12/16

to gremli...@googlegroups.com

Hello,

I'm very interested in the approach proposed by graph computer as it fills the gap of running graph algorithms that are otherwise very expensive to run via the standard traverser. At the same time, I'm a bit confused by how someone would implement a data pipeline that starts with Neo4j as the OLTP graph and ends in a graph that supports graph computers.

Apache TinkerPop’s Neo4jGraph implementation does not have an OLAP GraphComputer. Someone can build one. Even easier, someone can create an InputRDD (Spark) or InputFormat (Giraph/Spark) that reads in Neo4j data and feeds it to SparkGraphComputer or GiraphGraphComputer. You can read more here:

http://tinkerpop.apache.org/providers.html

I’m not to certain how to do parallel sequential reads from Neo4j as their Java API only has Iterator<Node> as their “stream the graph” method. I haven’t seen anything like List<Iterator<Node>> (partitions) or Spliterator<Node>, etc. If such things exist, then yes, InputRDD/InputFormat would immediately give Neo4jGraph OLAP capabilities where Gremlin would be able to execute OLTP or OLAP against Neo4j by simply either doing:

g = graph.traversal()

or

g = graph.traversal().withComputer()

…and your snippet here:

graph.compute().program(PageRankVertexProgram.build().create()).submit().get()

…would be written as:

g.V().pageRank()

Easy peasy lemon oh so squeezy, geezy fo-sheezy my sleazy eazy,

Marko.

http://markorodriguez.com

drVillo

unread,

Jul 29, 2016, 9:57:51 AM7/29/16

to Gremlin-users

Thanks Marko, makes sense!

F

Reply all

Reply to author

Forward