why "g.V(99080632428550).repeat(out()).until(loops().is(eq(9))).limit(10)" is slow?

272 views
Skip to first unread message

kaiyang zhang

unread,
Jun 13, 2019, 9:35:57 AM6/13/19
to Gremlin-users
1、The amount of data
1 billion vertices, 20 billion edges, there can be 20 out-edge for a given vertex.
2、What is the difference between these two gremlin traversal statements?
(1)g.V(99080632428550).out().out().out().out().out().out().out().out().out().limit(10)  //Return results within 10 seconds
(2)g.V(99080632428550).repeat(out()).times(9).limit(10)                                          //Return results within 10 seconds
(3)g.V(99080632428550).repeat(out()).until(loops().is(eq(9))).limit(10)         //scriptEvaluationTimeout threshold of 180 seconds
3、operation result
gremlin> g.V(99080632428550).out().out().out().out().out().out().out().out().out().limit(10)
==>v[8830644770920764558]
==>v[5479581532318355670]
==>v[4016380489293982174]
==>v[2248543243211431414]
==>v[3949609383134423542]
==>v[674617584523921070]
==>v[3976404157965374286]
==>v[1457737582673669038]
==>v[6231037793945643166]
==>v[7276954905296467190]
gremlin> g.V(99080632428550).repeat(out()).times(9).limit(10)
==>v[8830644770920764558]
==>v[5479581532318355670]
==>v[4016380489293982174]
==>v[2248543243211431414]
==>v[3949609383134423542]
==>v[674617584523921070]
==>v[3976404157965374286]
==>v[1457737582673669038]
==>v[6231037793945643166]
==>v[7276954905296467190]
gremlin>  g.V(99080632428550).repeat(out()).until(loops().is(eq(9))).limit(10)
Script evaluation exceeded the configured 'scriptEvaluationTimeout' threshold of 180000 ms or evaluation was otherwise cancelled directly for request [ g.V(99080632428550).repeat(out()).until(loops().is(eq(9))).limit(10)]
Type ':help' or ':h' for help.
Display stack trace? [yN]

4、why "g.V(99080632428550).repeat(out()).until(loops().is(eq(9))).limit(10)" is slow?

Fabio Lorenzi

unread,
Jun 13, 2019, 11:04:40 AM6/13/19
to Gremlin-users
That is because of how it is expanded to bytecode.
This is my understanding so I would wait for someone whose more expert than me, but looking at the code and the docs I would say the following

using Tinkergraph and the grateful dead graph while profiling you would see that the times are significantly different on small graphs as well:

g.V(1L).repeat(out()).times(9).limit(10).profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1])                                            1           1           0.109     0.52
VertexStep(OUT,vertex)                                                 7           7           0.018     0.09
NoOpBarrierStep(2500)                                                  7           7           0.014     0.07
VertexStep(OUT,vertex)                                               190         190           0.052     0.25
NoOpBarrierStep(2500)                                                190         131           0.091     0.43
VertexStep(OUT,vertex)                                              7846        4610           1.130     5.36
NoOpBarrierStep(2500)                                               7846         357           1.064     5.05
VertexStep(OUT,vertex)                                            344093        7460           1.752     8.32
NoOpBarrierStep(2500)                                             344093         441           1.596     7.58
VertexStep(OUT,vertex)                                          15255311        7539           1.844     8.75
NoOpBarrierStep(2500)                                           15255311         450           1.623     7.70
VertexStep(OUT,vertex)                                         677077425        7547           2.177    10.33
NoOpBarrierStep(2500)                                          677077425         452           1.850     8.78
VertexStep(OUT,vertex)                                       30088639040        7551           2.109    10.01
NoOpBarrierStep(2500)                                        30088639040         452           1.983     9.41
VertexStep(OUT,vertex)                                     1336992404740        7551           1.973     9.36
NoOpBarrierStep(2500)                                         3113192661           1           1.659     7.87
VertexStep(OUT,vertex)                                        6226385322           2           0.008     0.04
RangeGlobalStep(0,10)                                                 10           1           0.017     0.08
                                            >TOTAL                     -           -          21.076        -

and

gremlin> g.V(1L).repeat(out()).until(loops().is(9)).limit(10).profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1])                                            1           1           0.030     0.04
RepeatStep([VertexStep(OUT,vertex), ProfileStep...            6226385322           2          71.910    99.84
  LoopsStep                                                        42457       42457          23.038
  IsStep(eq(9))                                                                               22.944
  VertexStep(OUT,vertex)                                   1374000113974       42457           9.612
  RepeatEndStep                                               6226385322           2          62.276
RangeGlobalStep(0,10)                                                 10           1           0.086     0.12
                                            >TOTAL                     -           -          72.027        -

As you can see the expansion of the times() step is simply a concatenation of the repeat() pattern for the given amount of time which (not in great detail) translates to a bunch of sequential ops whose job is to follow "pointers" to out-coming edges;
whereas the expansion of the repeat-until (that is a do-while) has the overhead of the condition checking and accessing each traverser metadata loops().
Reply all
Reply to author
Forward
0 new messages