gremlin-java repeated query profiling

43 views
Skip to first unread message

HadoopMarc

unread,
Apr 19, 2018, 5:49:54 AM4/19/18
to Gremlin-users
Probably a basic, but not often asked, question.

When running a profile on the most basic query g.V(x).profile().next() on the Tinkergraph modern graph, the output for repeated queries looks as follows:

Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1])                                            1           1           9.281   100.00
                                           
>TOTAL                     -           -           9.281        -
Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1])                                            1           1           0.095   100.00
                                           
>TOTAL                     -           -           0.095        -
Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1])                                            1           1           0.077   100.00
                                           
>TOTAL                     -           -           0.077        -
Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[2])                                            1           1           0.050   100.00
                                           
>TOTAL                     -           -           0.050        -
Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[2])                                            1           1           0.050   100.00
                                           
>TOTAL                     -           -           0.050        -
Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[2])                                            1           1           0.097   100.00
                                           
>TOTAL                     -           -           0.097        -

This is for gremlin-java, but results are similar in the gremlin console. Also, preceding these queries with a different query does not change the behavior (so loading the graph is not the issue). Apparenly, the runtime binary knows to use the same object for these six queries and creating this object needs some time each time you use a new query.

I guess the initialization time for a new query derives from the query plan calculated by gremlin (applyStrategies()). The questions:
  1. Does gremlin cache query objects in some way?  Or is this a java bytecode compiler optimization?
  2. What should the developer do to be sure that query objects are reused (apart from profiling ...)?
Superficially, this question seems related to script parameterization for gremlin server but in this case g.V(1) and g.V(2) are said to considered as different queries for which the compiled queries are cached separately).

For completeness, the relevant part of the code run in gremlin-java:

        System.out.println(g.V().count().profile().next());
       
System.out.println(g.V(1).profile().next());
       
System.out.println(g.V(1).profile().next());
       
System.out.println(g.V(1).profile().next());
       
System.out.println(g.V(2).profile().next());
       
System.out.println(g.V(2).profile().next());
       
System.out.println(g.V(2).profile().next());
       
System.out.println(g.V(3).profile().next());
       
System.out.println(g.V(3).profile().next());
       
System.out.println(g.V(3).profile().next());

Cheers,        Marc


Robert Dale

unread,
Apr 19, 2018, 7:17:00 AM4/19/18
to gremli...@googlegroups.com
There's definitely some classloading. You can see it with: export JAVA_OPTIONS='-XX:+TraceClassLoading '


Robert Dale

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/58d9b3bb-9cf0-4bfc-9e3d-fab9d807c6ca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stephen Mallette

unread,
Apr 19, 2018, 7:52:24 AM4/19/18
to Gremlin-users
Just a word on caching, be careful not to conflate an embedded traversal versus one executed remotely. They have fairly different behaviors. For embedded and by that i just mean:

graph = TinkerGraph.open()
g = graph.traversal()
g.V(1).next()

There is Gremlin bytecode construction, but technically it goes unused for purpose of query execution. Traversals can be used one time and once iterated they are done. If you need to execute the exact same traversal again, it is 3x-4x faster to clone() the traversal than it is to reconstruct it from scratch. 

For remote traversals and by that i mean those executed by GLVs and those in java configured by:

graph = EmptyGraph.instance()
g = graph.traversal().withRemote(...)
g.V(1).next()

The same bytecode construction that happens for embedded traversals happens here, but in this case it will actually be used in execution when it is submitted to the server. There is no caching for bytecode so traversals reconstituted on the server from bytecode must be generated each time (for clarity, that caching that Robert linked to is to help in shortening reflective method lookups when reconstituting a traversal from bytecode). In other words, the server side system does not have the same behavior as scripts in the context of caching - it's not as though there is some bytecode to traversal cache.  Anyway, I don't think you were interested in the remoting path given the nature of your question so I'll just leave it at that. 

I think the difference you're seeing there is pretty much as Robert said - it's a bunch of classloading stuff going on in that first traversal.


HadoopMarc

unread,
Apr 19, 2018, 8:28:26 AM4/19/18
to Gremlin-users
Hi Robert, Stephen,

Thanks for the illuminating anwers!  This sufficiently explains TinkerGraph behaviour for me (as asked). Maybe I'll be back on the JanusGraph list because I observe a bit different behaviour for JG.

Cheers,    Marc  



Op donderdag 19 april 2018 13:52:24 UTC+2 schreef Stephen Mallette:

Robert Dale

To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages