AWS Neptune query limit

415 views
Skip to first unread message

Olav Laudy

unread,
Jan 7, 2018, 1:48:24 PM1/7/18
to Gremlin-users
Hi,


I'm trying to populate the Neptune db with my own graph. (30k nodes, 100K edges.).

I find the CSV bulk load a bit burdensome so I decided to populate the graph by sending off queries.

If I send a query per node via the REST interface:

"g.addV('object').property('name','aaa').next()", it consumes about .5 second per node.

I can combine the queries:

g.addV('object').property('name','aaa').addV('object').property('name','bbb').next()

and it consumes about the same time. 

BUT i found the character limit of the query to be 5000. Any character over 5000 results in a error.

Can I extent this? Any comments on the speed and the approach?


Thanks!


Olav



Stephen Mallette

unread,
Jan 8, 2018, 7:16:48 AM1/8/18
to Gremlin-users
Typically speaking, when using string-based scripts, the use of massive scripts is typically a bad code smell. I don't know about Neptune and how they internally process things (as they may just implement the Gremlin Server protocols and have a completely different method of processing traversals), but with Gremlin Server you can run into a number of problems doing that and even if you get past the 5k character limit you still have a 64k bytecode limit imposed by the jvm itself, so using code to generate big scripts to submit to the server may not work either in that case. 

In Gremlin Server you would prefer to submit a general parameterized script that just took a large list of parameters and you would loop over those parameters in your script, but I'm not sure that Neptune supports that. Someone with more experience with Neptune would need to chime in on this thread on this topic - we actually had a separate thread going along these lines that dead ended given the lack of knowledge of Neptunes internals/capabilities.

>  "g.addV('object').property('name','aaa').next()", it consumes about .5 second per node.

That's pretty slow. That almost seems like something is wrong. Given that the addition of 1 vertex is the same as multiple, I guess you would build your Gremlin to 5000 characters and submit it, but that feels wrong. If you don't need the result back, you might also consider calling iterate() rather than next() to see if that shaves anything off the .5 seconds. 

I just reached out to someone at AWS to see if we can get better help with these types of questions or at least find out where such question may be directed.



--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/17cf306a-32eb-4d7f-8045-992ba146fe27%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages