Gremlin-Server bulk addV VS Groovy bulk addV speed discrepancy

158 views
Skip to first unread message

Carlos

unread,
Jun 22, 2017, 6:15:08 PM6/22/17
to Gremlin-users
I'm running some benchmarking and I've noticed that there's a huge speed discrepancy between doing thousands of vertex adds through the GremlinServer and through a Groovy script. 
The Python script is using Websockets, and completes in about ~250 seconds. The Groovy script completes in about ~75 seconds. 

I'm using the master branch build of JanusGraph (as of today 6/22) with its InMemory database. The gremlin-server.yaml hasn't been edited other than to point to configuration file for the InMemory database. 

The test is being run on an Ubuntu 14.04 machine with 24GB of RAM and 4 CPUs. Can anyone explain why I'm seeing this discrepancy and maybe something I can do to get GremlinServer to perform up to speed like the Groovy script? 

Attached are the scripts I've used to test with. 
bulk_python.py
bulk_groovy.groovy

Stephen Mallette

unread,
Jun 22, 2017, 7:24:04 PM6/22/17
to Gremlin-users
I doubt you'll get complete parity here. Gremlin Server isn't really a bulk loading tool. Look to OLAP and specific tools of graph providers for that. That said, you can probably get faster by not returning a result in add_node_message. doesn't look like you are using it in either script. Change it to: 

g.addV(__newVLabel).property(\\"$PROPERTY\\",___$PROPERTY).iterate()

Also, since you are using scripts, you could also consider issuing the entire script to the server. Send a binding that contains a list of Maps containing the vertex data to load. If that binding is called "data" you could then send a script like:

data.each {
  g.addV(it.label).property('value',it.value).iterate()
}

Then you kill the use of a session and use the size of "data" to control the commit batch size. That might speed things up too.



--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/0e3babc4-0e6b-493a-88a9-a0e621dac79e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robert Dale

unread,
Jun 23, 2017, 7:07:31 AM6/23/17
to gremli...@googlegroups.com
What are you intending to compare?  When I first read it, it sounded like the tests were comparing Gremlin Server WebSocket Vs. Gremlin Server Script.  However, after reading the scripts, it is actually comparing Gremlin Server WebSocket Vs. Gremlin Console script.   So maybe you're trying to compare the overhead of Gremlin Server?  If so, this is not a good benchmark because it is not a like for like comparison.

I'm assuming you're running with implicit schema enabled because I ran your tests and got similar results.  The first problem with that is your test is creating unique label and property keys on every call. Thus, on every call janusgraph has to check the schema and create new schema types.  Second, the python WebSocket script runs 10 concurrent writers. While they should end up in a single thread pool, it may cause some contention while servicing requests. IMO, I wouldn't consider concurrent writers to the same session as typical behavior. Next, the WebSocket script sometimes generates invalid requests so the server has the extra overhead of dealing with those errors and serializing them back (think: stacktraces). 

That said, if the schema is created first, the WebSocket requests are sent serially, and the scripts are error-free, Gremlin Server takes only 1.5s on my machine.  Gremlin console takes 0.9s.



Robert Dale

Carlos

unread,
Jun 27, 2017, 1:16:14 PM6/27/17
to Gremlin-users
So, I decided to fiddle with how I'm doing bindings and I found out that if I changed the binding variables to always be the same:
g.addV(__newVLabel).property(_prop_name, _prop_value).toList()
The rate at which the server is able to process my requests sped up quite significantly over websockets (250 seconds down to 6 seconds for 5000 adds). I intend to apply the change to the rest of my code, but are there any side effects I should be worried about?



On Friday, June 23, 2017 at 7:07:31 AM UTC-4, Robert Dale wrote:
What are you intending to compare?  When I first read it, it sounded like the tests were comparing Gremlin Server WebSocket Vs. Gremlin Server Script.  However, after reading the scripts, it is actually comparing Gremlin Server WebSocket Vs. Gremlin Console script.   So maybe you're trying to compare the overhead of Gremlin Server?  If so, this is not a good benchmark because it is not a like for like comparison.

I'm assuming you're running with implicit schema enabled because I ran your tests and got similar results.  The first problem with that is your test is creating unique label and property keys on every call. Thus, on every call janusgraph has to check the schema and create new schema types.  Second, the python WebSocket script runs 10 concurrent writers. While they should end up in a single thread pool, it may cause some contention while servicing requests. IMO, I wouldn't consider concurrent writers to the same session as typical behavior. Next, the WebSocket script sometimes generates invalid requests so the server has the extra overhead of dealing with those errors and serializing them back (think: stacktraces). 

That said, if the schema is created first, the WebSocket requests are sent serially, and the scripts are error-free, Gremlin Server takes only 1.5s on my machine.  Gremlin console takes 0.9s.



Robert Dale

On Thu, Jun 22, 2017 at 7:24 PM, Stephen Mallette <spmal...@gmail.com> wrote:
I doubt you'll get complete parity here. Gremlin Server isn't really a bulk loading tool. Look to OLAP and specific tools of graph providers for that. That said, you can probably get faster by not returning a result in add_node_message. doesn't look like you are using it in either script. Change it to: 

g.addV(__newVLabel).property(\\"$PROPERTY\\",___$PROPERTY).iterate()

Also, since you are using scripts, you could also consider issuing the entire script to the server. Send a binding that contains a list of Maps containing the vertex data to load. If that binding is called "data" you could then send a script like:

data.each {
  g.addV(it.label).property('value',it.value).iterate()
}

Then you kill the use of a session and use the size of "data" to control the commit batch size. That might speed things up too.


On Thu, Jun 22, 2017 at 2:36 PM, Carlos <512.quad...@gmail.com> wrote:
I'm running some benchmarking and I've noticed that there's a huge speed discrepancy between doing thousands of vertex adds through the GremlinServer and through a Groovy script. 
The Python script is using Websockets, and completes in about ~250 seconds. The Groovy script completes in about ~75 seconds. 

I'm using the master branch build of JanusGraph (as of today 6/22) with its InMemory database. The gremlin-server.yaml hasn't been edited other than to point to configuration file for the InMemory database. 

The test is being run on an Ubuntu 14.04 machine with 24GB of RAM and 4 CPUs. Can anyone explain why I'm seeing this discrepancy and maybe something I can do to get GremlinServer to perform up to speed like the Groovy script? 

Attached are the scripts I've used to test with. 

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

Stephen Mallette

unread,
Jun 27, 2017, 4:51:05 PM6/27/17
to Gremlin-users
That's the biggest mistake folks make when they use bindings - they don't make the variable names the same. The cache gets keyed on the entire script so:

g.V(x) != g.V(y)

and the script will get recompiled and performance is not as good.

To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/7dacc1de-7f5f-41bd-a3ce-e773c2044a65%40googlegroups.com.

Robert Dale

unread,
Jun 28, 2017, 8:32:49 AM6/28/17
to gremli...@googlegroups.com
Remote bytecode has the best performance for straight gremlin (I haven't benchmarked embedded lambdas yet).  I would only use scripting where it couldn't be done in bytecode - groovy, lambdas, graph, graph management. And where possible, batch mutations to get them in the same transaction. That alone would be the biggest boost. 

Robert Dale

Robert Dale

unread,
Jun 28, 2017, 12:37:18 PM6/28/17
to Gremlin-users

This does perform faster than batched bytecode.

data.each {
  g
.addV(it.label).property('value',it.value).iterate()
}

Carlos

unread,
Jun 28, 2017, 2:36:52 PM6/28/17
to Gremlin-users
Does bytecode support the use of sessions/transactions? I would like to be able to control when I commit. Looking at the source code for the ByteCodeProcessor, it doesn't seem like it.

Robert Dale

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

Robert Dale

unread,
Jun 28, 2017, 4:12:29 PM6/28/17
to gremli...@googlegroups.com
No, but you can chain them together in the same request.  For example, addV('person').property('name','x').addV('person').property('name','y')....  At some point the cost to iterate that long chain becomes greater than sending smaller batches. I've found that to be around 500 steps for some use cases.   Apparently though, if you have a tight loop, the cost of parsing and executing script is much less than small batches of long chains of bytecode.  I guess the script advantage there is with smaller code, more data whereas with bytecode, it's more bytecode (repitition of steps), less data.  So with that in mind, I was exploring injecting a map of data similar to script but with bytecode. The issue I have run into is how to select a single value from a map.  It would look like:  g.inject(['name':'marko','age':100]).as('a').addV('person').property('name', select('a').values('name')).property('age', select('a').values('age'))
Then it becomes more like single instruction, multiple data.


Robert Dale

To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/e8e0f968-69d0-4891-a215-fae897e5433c%40googlegroups.com.

Robert Dale

unread,
Jun 29, 2017, 8:58:25 AM6/29/17
to Gremlin-users

This is exciting.  Injecting the data into bytecode is 2x faster than the script way. Bytecode wins again!

Test was 1,000,000 data points in batches of 1000. On TinkerGraph with index on 'name'. First execution is on an empty db, the second execution was after population to exercise both parts of the traversal.

Remote traversal (bytecode):

        g.inject(data).unfold().as('a').coalesce(__.V().has('person', 'name', select('a').select('name')), addV('person').property('name', select('a').select('name'))).iterate()

Rate: 72987 msgs/s
Rate: 89469 msgs/s

Remote script:

        client.submit("data.each { " +
                "g.inject(1).coalesce(V().has('person', 'name', it.name), addV('person').property('name', it.name)).iterate()" +
                "}", bindings).all().get()

Rate: 33710 msgs/s
Rate: 40290 msgs/s

Robert Dale

unread,
Jun 29, 2017, 9:01:53 AM6/29/17
to Gremlin-users
I should probably clarify that msgs/s is really data points per second, not submitting traversals.

Robert Dale

unread,
Jun 29, 2017, 12:27:22 PM6/29/17
to gremli...@googlegroups.com
It just occurred to me that the script could also use the same traversal as the remote graph. So this benchmark is more like for like comparison.  The script wins on addVs by a 10% margin. I wonder if that's an advantage of compiled, cached scripts.  They are roughly equivalent when an vertex exists in the index.

Rate: 83535 msgs/s
Rate: 88331 msgs/s

Robert Dale

Reply all
Reply to author
Forward
0 new messages