Traversal Response Times

Tomas Aftalion

unread,

Nov 28, 2016, 3:51:22 PM11/28/16

to OrientDB

Hi,

I've recently set up a db with 500K nodes and 16M edges (distributed, 2 instances on aws, ubuntu 16, java 8, version 2.2.13). I'm running:

select count(*) from (traverse out('MyEdge') from (SELECT FROM MyVertex WHERE id = <my-id> ) while $depth <= 2) where $depth >= 1

and can take anywhere in the 1-60 second range, average around 10 seconds. This coincides with a power law social graph structure ranging from 0-500 edges per node.

Roughly speaking, what performance would you expect from this sort of graph? Is there something definitely wrong in my configuration that would cause these response times or is it a not so unexpected behavior?
I will be fetching and aggregating data from nodes at 1,2 and 3 degrees (to generate ego network features, for example average/max/min age at 2nd degree) and was hoping to have responses within a 0-5 second response time. Any help/suggestions?

Thanks in advance,
Tomas

Luigi Dell'Aquila

unread,

Nov 29, 2016, 2:05:39 AM11/29/16

to orient-...@googlegroups.com

Hi Tomas,

Try to replace WHILE $depth <= 2 with MAXDEPTH 2

You should see an important improvement. The difference is that with WHILE condition the TRAVERSE also does the 3rd level traversal and then checks the condition, while with a MAXDEPTH it just stops at the specified level.

Thanks

Luigi

--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-database+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tomas Aftalion

unread,

Nov 29, 2016, 1:39:37 PM11/29/16

to OrientDB

Thank you Luigi, it did help. Also, when using directly, for example, out('MyEdge').out('MyEdge') speeds things up considerably ~100x.

Best,
Tomas

To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.

Eric24

unread,

Nov 30, 2016, 10:05:14 AM11/30/16

to OrientDB

@Luigi--

Hmmm. If the difference between traverse out('MyEdge') and out('MyEdge').out('MyEdge') is 100X, that sounds like an opportunity for an internal optimization. Unless there are dozens of other edges beyond the first level in this schema, it seems that the query optimizer could figure this out up-front. Thoughts?

--Eric

Luigi Dell'Aquila

unread,

Nov 30, 2016, 10:22:51 AM11/30/16

to orient-...@googlegroups.com

Hi Eric,

In a schemaless/mixed schema it's hard to make assumptions, but something can be optimized of course.

Probably the big part of the difference is due to the number of traversed edges (N at the first level, NxN at the second, probably), but part of that can also be due to internal mechanisms of the ODocument structure (eg. management of reference trees for save) that in this case could be avoided.

We'll do some profiling on this, I'll let you know

Thanks

Luigi

--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.

To unsubscribe from this group and stop receiving emails from it, send an email to orient-database+unsubscribe@googlegroups.com.

Reply all

Reply to author

Forward