[Gremlin] sort{}.reverse()_() taking longer

137 views
Skip to first unread message

Nikhil

unread,
Jun 8, 2012, 1:35:13 AM6/8/12
to gremli...@googlegroups.com
Hi,

I have been running a Gremlin traversal on my production instance for a couple of months now. The traversal fetches a set of vertices satisfying certain criteria, sorts them on their updated timestamp and then picks the latest 20 out of those. Something like:

g.v...(do something here).as('result_set')...(do something else).back('result_set').sort{it.updated_at}.reverse()_()[0..19].id

This worked fine initially but has now slowed down because of an increased count of the result_set. While trying to debug this delay, I figured out that removing sort{} fetched the results much faster. I'm assuming this is because sort{} takes place in Groovy space. It's like lazy evaluating everything before sorting and then take the whole bunch to sort, as opposed to emitting the sorted result set itself out of Gremlin Pipes.

Here are the benchmarks with sort:
Gremlin (35826.0ms)  g.v....as('result_set').....back('result_set').sort{it.updated_at}.reverse()_()[0..10].id
Gremlin (22239.4ms)  g.v....as('result_set').....back('result_set').sort{it.updated_at}.reverse()_()[0..10].id
Gremlin (20377.7ms)  g.v....as('result_set').....back('result_set').sort{it.updated_at}.reverse()_()[0..10].id

And here are the benchmarks after removing sort{}.reverse()
Gremlin (29.0ms)  g.v....as('result_set').....back('result_set')[0..10].id
Gremlin (18.7ms)  g.v....as('result_set').....back('result_set')[0..10].id
Gremlin (15.7ms)  g.v....as('result_set').....back('result_set')[0..10].id

As can be observed above, there's ~1000x increase in execution time when sort{} is applied. I don't think reverse() would be an expensive operation. result_set contains ~15k vertices. Sorting them is taking longer. Is there any way to speed things up?

I'm using Neo4j 1.7 stable release on production.

--
Nikhil Lanjewar
Engineering Lead at YourNextLeap
http://yournextleap.com

Marko Rodriguez

unread,
Jun 8, 2012, 9:56:15 AM6/8/12
to gremli...@googlegroups.com
Sup,

Gremlin (29.0ms)  g.v....as('result_set').....back('result_set')[0..10].id
As can be observed above, there's ~1000x increase in execution time when sort{} is applied. I don't think reverse() would be an expensive operation. result_set contains ~15k vertices. Sorting them is taking longer. Is there any way to speed things up?


There is one massive speed improvement I see that you can do. Do:

....sort{it.getProperty('updated_at')}.reverse()....

...in production systems, be wary of the "vertex.propertyName" shorthand in Gremlin-Groovy. See this page:
I recommend that you read it thoroughly so you know the "gotchas" when doing Gremlin-Groovy in production.

Next, in Gremlin 2.0, we introduced OrderPipe which does sorting native to Java, but I have a feeling that it will not be 29ms fast :) for 15k vertices. You might want to try playing with native sorting in Java:

List<Vertex> list = ... fill with 15k vertices.
Collectors.sort(list)

Get the runtimes and realize that is what order() step will do for you in Gremlin 2.0.

I hope the first trick does the job for you. Tell me how it goes.

Marko.

Nikhil

unread,
Jun 8, 2012, 10:42:09 AM6/8/12
to gremli...@googlegroups.com
Thanks for your inputs, Marko.

I tried benchmarking the sort{it.getProperty('updated_at')} traversal. Here are the results:

Gremlin (18391.3ms)  g.v...as('result_set')...back('result_set').sort{it.getProperty('updated_at')}.reverse()_()[0..10].id
Gremlin (17463.4ms)  g.v...as('result_set')...back('result_set').sort{it.getProperty('updated_at')}.reverse()_()[0..10].id

I would want the execution to be faster than this. In order to try native sorting in Java, do I need to implement a Comparator? Or is there a Comparator for Vertex which can accept a property name? It would be great if you could point me to some documentation which illustrates custom sorting on List<Vertex> collection. Else, I might as well write my own custom Comparator there. :)

I read about order(closure?) step (which is awesome!) being added in Blueprints 2.0. However, I don't think Neo4j 1.7 comes with Blueprints 2.0.

Finally, thanks for sharing the documentation for Path Optimizations, I'll make the required changes in my code.

--
Nikhil
Reply all
Reply to author
Forward
0 new messages