Gremlin query timing out via golang driver for gremlin

656 views
Skip to first unread message

Amit Chandak

unread,
Feb 20, 2018, 4:38:21 PM2/20/18
to Gremlin-users
Hi,
      I have a gremlin server with Neo4j plugin and i only have like 2 nodes in the db. I am trying to delete all the nodes using the gremlin console as well as using the golang driver. 

Using gremlin console, the query 'g.V().drop()' works fine
gremlin> g.V()
==>v[61]
==>v[62]
gremlin> g.V().drop()
gremlin> g.V()
gremlin>

However when i use the golang driver, the query times out 

java.util.concurrent.TimeoutException: Script evaluation exceeded the configured 'scriptEvaluationTimeout' threshold of 300000 ms or evaluation was otherwise cancelled directly for request [g.V().limit(100).drop()]

I don't think its timeout issue per say, as the gremlin console query is working fine.

Any pointers on how to debug this?

Thanks
Amit

Stephen Mallette

unread,
Feb 21, 2018, 6:41:16 AM2/21/18
to Gremlin-users
I don't know much about the go driver (and there are several afaik) so I'm not sure what might be happening there. It obviously shouldn't take much time to remove two vertices, such that a timeout should ensue. Do other scripts run fine? If you were to just do g.V() - does that work ok with the go driver? Another idea, what would happen if you force iterated in evaluation and returned a different value. I would try to send this:

g.V().drop().iterate();1+1

I'd also try to test all this with TinkerGraph and see what your results are....



--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/c6a8a874-5e8f-454c-b3f0-4626bb2742ea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Amit Chandak

unread,
Feb 21, 2018, 12:13:50 PM2/21/18
to Gremlin-users
Thanks Stephen, as usual ur spot on. g.V().drop().iterate();1+1 works fine. Infact g.V().drop().iterate() works fine too.
However, all other queries work fine with the golang driver, like g.V(). Also i am adding nodes/edges using that and all that works fine.

What can be the issue with just g.V().drop()??

Thanks again.
Amit


On Wednesday, February 21, 2018 at 3:41:16 AM UTC-8, Stephen Mallette wrote:
I don't know much about the go driver (and there are several afaik) so I'm not sure what might be happening there. It obviously shouldn't take much time to remove two vertices, such that a timeout should ensue. Do other scripts run fine? If you were to just do g.V() - does that work ok with the go driver? Another idea, what would happen if you force iterated in evaluation and returned a different value. I would try to send this:

g.V().drop().iterate();1+1

I'd also try to test all this with TinkerGraph and see what your results are....


On Tue, Feb 20, 2018 at 4:38 PM, Amit Chandak <amit.c...@gmail.com> wrote:
Hi,
      I have a gremlin server with Neo4j plugin and i only have like 2 nodes in the db. I am trying to delete all the nodes using the gremlin console as well as using the golang driver. 

Using gremlin console, the query 'g.V().drop()' works fine
gremlin> g.V()
==>v[61]
==>v[62]
gremlin> g.V().drop()
gremlin> g.V()
gremlin>

However when i use the golang driver, the query times out 

java.util.concurrent.TimeoutException: Script evaluation exceeded the configured 'scriptEvaluationTimeout' threshold of 300000 ms or evaluation was otherwise cancelled directly for request [g.V().limit(100).drop()]

I don't think its timeout issue per say, as the gremlin console query is working fine.

Any pointers on how to debug this?

Thanks
Amit

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

Amit Chandak

unread,
Feb 21, 2018, 12:20:05 PM2/21/18
to Gremlin-users
Update: I just relaunched the gremlin server and tried the g.V().drop() and it worked fine. So looks like the gremlin server had gone bad, hmm, thats scary. I am using 3.3.1 tinker pop version with neo4j plugin.
I will try to repro this consistently, i didn't see any errors in the logs? How to debug this gremlin server stuck issues?

Amit

Stephen Mallette

unread,
Feb 22, 2018, 4:49:31 PM2/22/18
to Gremlin-users
I had a hunch that the go driver isn't properly handling the empty result that returns from g.V().drop(), but that doesn't make much sense because you're getting a script timeout error which means that the problem is occurring at the time the script is executing (i.e. before the empty result is returned). Not sure what's going on there. If you can recreate the problem with the java driver I could try to look into it, but at this point I'm going to guess it is go driver related somehow......


To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/d8cc8064-4bde-4c38-9523-57b9763f68d8%40googlegroups.com.

Duc Trinh

unread,
Sep 11, 2018, 10:54:54 AM9/11/18
to Gremlin-users
hello
Im having problem with this kind of timeout issue
my query returns about ~100k records, so when I go for debug, it returns "SERVER TIMEOUT"
my code is in Golang btw

Stephen Mallette

unread,
Sep 12, 2018, 11:26:17 AM9/12/18
to Gremlin-users
You should be able to increase the length of the timeout in your server settings in the yaml file with scriptEvaluationTimeout

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

Olav Laudy

unread,
Sep 12, 2018, 11:41:52 AM9/12/18
to gremli...@googlegroups.com
I have experienced the same with Neptune and come to the conclusion that returning 100K records might not be best practice.

I currently solve it by giving every edge a random number and then I loop over the random number to retrieve the partitions in serie. So, I return 10x a set of 10K records.



Olav


Stephen Mallette

unread,
Sep 12, 2018, 11:52:55 AM9/12/18
to Gremlin-users
Olav, could you explain your approach a bit more? are you saying that the random number you create is a physical property on an edge?

Olav Laudy

unread,
Sep 12, 2018, 12:39:57 PM9/12/18
to gremli...@googlegroups.com
I've tried two ways:

1) Using the IDs of the property:

g.V().hasId(between('a','b')).limit(1)

and then write an outer loop in the programming language of choice that visits all pairs (i.e. between (1,2), (2,3) etc).

This has the advantage that you don't need to define a property with a random number, but it is slow because it has to do the substring operation for every node/edge.

Another disadvantage is that you only can partition in 16 or 16*16 etc.

2) Using a random number as property.

g.V().has('partition,12).limit(1)  and go over all partitions using an external loop

This has the advantage that you can flexibly determine the number of partitions you need (I have 25 and found that the fastest for my size db). It does require you to create this random number. I didn't find how to do that inside Gremlin, so I do it at graph creation time and when I create nodes/edges from other nodes/edges I always copy that property so every node has a partition.

Here's an example of the partitionin action:

    for partition in range(26):
     
        taskList.append("g.V().has('object','partition',{}).as('o')  \
                          .outE('ACL_till_date').as('ACL').inV()  \
                          .map(out('observe').inE('ecl').has('relativeStrength',neq(0)).as('d','q')    \
                                             .where(outV().in('observe').as('o'))  \
                                             .dedup().by(select('d','q').by('intDate').by('quote'))  \
                               .values('relativeStrength').mean()).as('rs')  \
                          .select('ACL').property( 'strength all time', select('rs')).iterate()"  \
                       .format(partition))

The impact on the queries (the syntax) is rather small: 

If I were to do it for the whole set, I'd have the first line instead:

g.V().hasLabel('object').as('o')   


-I use Python and wrote my own multi-thread driver. I collect the queries in a task list and then empirically determine how many of those tasks can run in parallel. Sometimes queries are so heavy that I can really only run one at the time, other times I run 10 or 100 in parallel. 

I use "from threading import Thread" & "from queue import Queue" rather than asyncio, because I felt I couldn't fully control that.  







Stephen Mallette

unread,
Sep 12, 2018, 1:07:49 PM9/12/18
to Gremlin-users
Interesting - i guess you could get a random number this way in Gremlin:

gremlin> g.withSack{rand.nextInt(128)}.addV('person').property('partition', sack())
==>v[0]
gremlin> g.withSack{rand.nextInt(128)}.addV('person').property('partition', sack())
==>v[2]
gremlin> g.V().hasLabel('person').valueMap()
==>[partition:[0]]
==>[partition:[33]]

not sure if using sack() is much better than just generating the random number client side. Huh, maybe this works too and doesn't require the lambda:

gremlin> g.withSideEffect('r',(0..<128)).addV('person').property('partition', select('r').unfold().sample(1))
==>v[0]
gremlin> g.withSideEffect('r',(0..<128)).addV('person').property('partition', select('r').unfold().sample(1))
==>v[2]
gremlin> g.V().hasLabel('person').valueMap()
==>[partition:[18]]
==>[partition:[2]]

I swear sample() is sometimes weird though and doesn't give a good normal distribution. Or...this which is probably the least performant:

gremlin> g.withSideEffect('r',(0..<128)).addV('person').property('partition', select('r').unfold().order().by(shuffle).limit(1))
==>v[0]
gremlin> g.withSideEffect('r',(0..<128)).addV('person').property('partition', select('r').unfold().order().by(shuffle).limit(1))
==>v[2]
gremlin> g.V().hasLabel('person').valueMap()
==>[partition:[43]]
==>[partition:[60]]

maybe kuppitz has something better...anyway, just thought i'd play with a few ideas for "random" integers in Gremlin.





Duc Trinh

unread,
Sep 12, 2018, 10:09:51 PM9/12/18
to Gremlin-users
thank you guys, it's worth a try
indeed, when I run a query which returns 1k records, I see gremlin returns me an interface of 16 lists - first 15 lists has 64 record, the last one has 40 records
Reply all
Reply to author
Forward
0 new messages