Use of graph computer

208 views
Skip to first unread message

Lisa Fiedler

unread,
Jul 4, 2019, 4:41:10 AM7/4/19
to Gremlin-users
Hi,

It is possible to use gremlin's graph computer for traversals as explained here:

Based on the documentation on graph computer I was under the impression, that it should always be more performant to use a graph computer
approach for queries concerning the entire graph such as for instance:
g.V().groupCount().by(out().count())

However, using the timeutil clock this query seems to perform way poorer on a graph computer (i.e. with graph.traversal().withComputer())
than for a standard OLTP query (i.e. with graph.traversal()).

Am I missing something here?

Thanks a lot!

Florian Hockmann

unread,
Jul 4, 2019, 5:30:04 AM7/4/19
to Gremlin-users
Hi Lisa,

how many vertices and edges are in your graph? It's expected that graph computer is slower on small graphs as it has to load your full graph into memory, spin up some Spark workers, let them run the jobs and then aggregate the results. This overhead will dominate the runtime for the graph computer for very small graphs. With OLTP traversals you don't have that overhead which makes it faster for small graphs, but it won't scale to big graphs.

Lisa Fiedler

unread,
Jul 4, 2019, 7:29:18 AM7/4/19
to Gremlin-users
Hi Florian,

Thanks for comment. This explains my problem.
My working graph is rather large (billions of vertices alone). However, I was testing performance on a small subgraph.

But what precisely do you mean by load it into memory? It is not supposed to fit into RAM is it?

Stephen Mallette

unread,
Jul 6, 2019, 11:45:17 AM7/6/19
to gremli...@googlegroups.com
>  But what precisely do you mean by load it into memory? It is not supposed to fit into RAM is it?  

in JVM terms since you're using JanusGraph, it's basically the amount of memory you allot to the JVM (i.e.-Xmx), so, yes, RAM. 

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/50a89f2f-bcc2-4d9a-a0e5-a0e9b28c7b34%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

HadoopMarc

unread,
Jul 7, 2019, 3:22:55 PM7/7/19
to Gremlin-users
Hi Lisa,

If your graph does not fit in the memory/RAM of one system, using the default graphcomputer on this single system to divide the work over multiple workers is not going to fly. If you use SparkGraphComputer you can assign enough total memory in the spark cluster for the job so that your large graph does fit into this total memory and a groupcount can be done in reasonable time.

Of course, for using SparkGraphComputer your graph implementation has to provide a TinkerPop compliant hadoop InputFormat.

Cheers,     Marc

Op zaterdag 6 juli 2019 17:45:17 UTC+2 schreef Stephen Mallette:
>  But what precisely do you mean by load it into memory? It is not supposed to fit into RAM is it?  

in JVM terms since you're using JanusGraph, it's basically the amount of memory you allot to the JVM (i.e.-Xmx), so, yes, RAM. 

On Thu, Jul 4, 2019 at 7:29 AM Lisa Fiedler <liz...@hotmail.de> wrote:
Hi Florian,

Thanks for comment. This explains my problem.
My working graph is rather large (billions of vertices alone). However, I was testing performance on a small subgraph.

But what precisely do you mean by load it into memory? It is not supposed to fit into RAM is it?


Am Donnerstag, 4. Juli 2019 11:30:04 UTC+2 schrieb Florian Hockmann:
Hi Lisa,

how many vertices and edges are in your graph? It's expected that graph computer is slower on small graphs as it has to load your full graph into memory, spin up some Spark workers, let them run the jobs and then aggregate the results. This overhead will dominate the runtime for the graph computer for very small graphs. With OLTP traversals you don't have that overhead which makes it faster for small graphs, but it won't scale to big graphs.

Am Donnerstag, 4. Juli 2019 10:41:10 UTC+2 schrieb Lisa Fiedler:
Hi,

It is possible to use gremlin's graph computer for traversals as explained here:

Based on the documentation on graph computer I was under the impression, that it should always be more performant to use a graph computer
approach for queries concerning the entire graph such as for instance:
g.V().groupCount().by(out().count())

However, using the timeutil clock this query seems to perform way poorer on a graph computer (i.e. with graph.traversal().withComputer())
than for a standard OLTP query (i.e. with graph.traversal()).

Am I missing something here?

Thanks a lot!

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremli...@googlegroups.com.

Lisa Fiedler

unread,
Jul 8, 2019, 8:24:18 AM7/8/19
to Gremlin-users
Hey Marc,

As I am using janusgraph with a cassandra backend I was trying to realize this, by using cassandra's org.janusgraph.hadoop.formats.cassandra.Cassandra3InputFormat
for the gremlin.hadoop.graphReader. (The contents of my complete properties file is provided at the end of my message.)

I included the required dependencies in my pom file (like janusgraph-hadoop-2, hadoop-gremlin..).
It seems to run.
However, I am now not entirely sure what is happening now, since I did not actually need to set up a hadoop cluster or anything,
which is always done in all related posts, where OLAP queries are executed within the gremlin console.
Is this not necessary in my case, since maven already downloads the required hadoop jars or am I missing something?

Here is my properties file:

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cassandra.Cassandra3InputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output

janusgraphmr.ioformat.conf.storage.backend=cassandra
janusgraphmr.ioformat.conf.storage.hostname=localhost
janusgraphmr.ioformat.conf.storage.port=9160
janusgraphmr.ioformat.conf.storage.cassandra.keyspace=janusgraph

cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner

spark.master=local[4]
spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.executor.memory=4g
gremlin.spark.graphStorageLevel=MEMORY_AND_DISK

HadoopMarc

unread,
Jul 8, 2019, 9:24:32 AM7/8/19
to Gremlin-users
Hi Lisa,

First thing, least important, to run SparkGraphComputer you either have to specify the default graph computer in the properties file or call it explicitly using withComputer(SparkGraphComputer).

The spark local[4] cluster is not really going to help you if your graph is larger than RAM memory as soon as the traversal requires the data to be shuffled. The data will mostly written to disk and each receiving shuffle task will have to read data from disk coming from each sending shuffle task. So, for OLAP gremlin traversals with SparkGraphComputer you need a spark cluster than can hold your graph in memory.

To circumvent this memory problem, the most efficient way is to build a poor man's OLAP. You do a g.V().id() once and store all vertex id's on disk. Afterwards you can run many threads each doing a traversal on part of the id's: G.V(id_part1).groupCount(). Only problem left is that you have to the final reduce step of the query yourself, that is add all the groupCounts of the separate threads for your example query.

HTH,   Marc

Op maandag 8 juli 2019 14:24:18 UTC+2 schreef Lisa Fiedler:

Lisa Fiedler

unread,
Jul 8, 2019, 9:51:38 AM7/8/19
to Gremlin-users
Hey Marc,

Yes, I did this (withComputer(SparkGraphComputer)). Sorry, I forgot to mention it.

Ok, so does the graph always actually need to fit into RAM? Hence if I had an actual cluster consisting of multiple computers would the graph still need to fit in total RAM available to the cluster?
I was hoping there is a way to tell graph computer to only look at as much of the graph as fits into RAM and then aggregate these partial results (basically what you suggested to do manually). So this is not possible then?

Thank you for your elaborate answers!!!

HadoopMarc

unread,
Jul 8, 2019, 3:26:17 PM7/8/19
to Gremlin-users
Hi Lisa,

Yes, you need that much cluster RAM for efficient OLAP querying on graphs, unless you are willing to wait 10 times longer and put out 10 times more CO2 into the atmosphere because of all the additional disk reading and writing. Just give it a try and see what suits you best, some things you have to experience for yourself to get the right feeling for what is possible!

Cheers,     Marc

Op maandag 8 juli 2019 15:51:38 UTC+2 schreef Lisa Fiedler:

Lisa Fiedler

unread,
Jul 9, 2019, 4:46:48 AM7/9/19
to Gremlin-users
Hey Marc,

Yes it seems reasonable, that efficient queries require the graph to fit into RAM.
The problem is only, if no adequate cluster is available, then I was hoping to still have a way to conduct OLAP queries that are not possible on split up data (such as clustering algorithms).

I was hoping it is possible this was still possible, since the tinkerpop documentation on sparks graph computer states:
  • The graph may fit within the total RAM of the cluster (supports larger graphs). Message passing is coordinated via Spark map/reduce/join operations on in-memory and disk-cached data (average speed traversals).

(http://tinkerpop.apache.org/docs/3.3.7/reference/#_olap_hadoop_gremlin)
And I thought a configuration of gremlin.spark.graphStorageLevel=MEMORY_AND_DISK would realize this.

Am I wrong in this?

Sorry, to bother you that long.

Mike Thomsen

unread,
Jul 9, 2019, 8:13:31 AM7/9/19
to Gremlin-users
I don't know about the JanusGraph use cases here, but in general Databricks is a very good option for when you need to quickly test a very large data set against Spark.

HadoopMarc

unread,
Jul 9, 2019, 3:59:57 PM7/9/19
to Gremlin-users
Hi Lisa,

The setting of graphStorageLevel seems right. Whether a VertexProgram will finish in reasonable time will depend on the number of data shuffles. For the TraversalVertexProgram each additional out() step needs a shuffle, causing all graph data to be reread from disk. For the ShortestPathVertexProgram you would be in serious trouble. I do not know about the required number of message passing steps for your clustering VertexProgram.

But again, give it a try and see what is possible!

Cheers,    Marc

Op dinsdag 9 juli 2019 10:46:48 UTC+2 schreef Lisa Fiedler:
Reply all
Reply to author
Forward
0 new messages