Index creation speed is too slow

Suhas

unread,

May 8, 2019, 10:04:05 AM5/8/19

to OrientDB

I’m creating indexes for an Edge class containing about 500 million records on keys (in, out). The index creation progressed well in the beginning at about 20,000 items/sec. But then after some time has decreased to <1000 items/sec.

2019-05-08 08:43:25:885 INFO  {db=cgraph} --> 37.00% progress, 177,405,476 indexed so far (855 items/sec) [OIndexRebuildOutputListener]
2019-05-08 08:43:35:899 INFO  {db=cgraph} --> 37.00% progress, 177,415,347 indexed so far (987 items/sec) [OIndexRebuildOutputListener] 
2019-05-08 08:43:45:902 INFO  {db=cgraph} --> 37.00% progress, 177,427,464 indexed so far (1,211 items/sec) [OIndexRebuildOutputListener]

At this speed, it’ll take like 3-4 days!!
Settings used on 16GB RAM and 300GB SSD
java -server -Xms2G -Xmx7G -Dstorage.diskCache.bufferSize=7200

Screenshot from 2019-05-08 09-06-47.png

Any idea why the speed of indexing decreased so drastically? And how can I increase the speed of indexing?

Orientdb 3.0.15

Jérôme Mainaud

unread,

May 8, 2019, 3:15:25 PM5/8/19

to orient-...@googlegroups.com

Hello,

I don't know the exact implementation used by OrientDB, and it depends of the type of index you choose.

But it's not a big surprise that the time to include a key increase with the number of entries in the index.

Hash indexes should be less sensible to cost increase.

What the purpose of indexing in and ou keys of your edge ?

Queries won't benefit from them as they use links from vertex to the edge to traverse the graph which is far more efficient.

Tell me if I'm wrong about that.

--
Jérôme Mainaud
jer...@mainaud.com

--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/orient-database/95597c3e-632b-4570-af51-f07227dc1965%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Suhas

unread,

May 8, 2019, 5:37:44 PM5/8/19

to OrientDB

Hey Jerome,

Here are a few reasons why I needed an index:

1. Apply unique constraint on the edge. (no more than a single edge between a pair of vertices)

2. Compute incoming and outgoing edge count faster.

3. Whether two vertices are connected or not.

Meanwhile, I'm using an SB-Tree Index

To unsubscribe from this group and stop receiving emails from it, send an email to orient-...@googlegroups.com.

Jérôme Mainaud

unread,

May 9, 2019, 11:21:47 AM5/9/19

to orient-...@googlegroups.com

OK, I'm not surprised by the SB-Tree insert cost increase as adding a key complexity in such a Tree is O(log(n)).

For your first case, I see no other solution as build an index but you can do it with a UNIQUE_HASH_INDEX. If the implementation is good, adding a key should be mean time constant (some keys are punctually more expensive, when the index storage base has to grow).

For other cases, have you tried to query directly from the vertex ?

Suppose we have this data:

create class Person extends V;
create property Person.name string;

create class Company extends V;
create property Company.name string;

create class WorkedAt extends E;

/* Add constraints on the edge. */
create property WorkedAt.out link Person;
create property WorkedAt.in link Company;

insert into Person (name) values ('jerome');
insert into Person (name) values ('john doe');

insert into Company (name) values ('Zeenea');
insert into Company (name) values ('Ippon Technologies');
insert into Company (name) values ('Klee Group');
insert into Company (name) values ('World Big Company');

create edge WorkedAt from (select from Person where name = 'jerome') to (select from Company where name = 'Zeenea');
create edge WorkedAt from (select from Person where name = 'jerome') to (select from Company where name = 'Ippon Technologies');
create edge WorkedAt from (select from Person where name = 'jerome') to (select from Company where name = 'Klee Group');
create edge WorkedAt from (select from Person where name = 'john doe') to (select from Company where name = 'World Big Company');

Use case 2

I can count out going link from Person with this query:

orientdb {db=tdb}> select name, out('WorkedAt').size() from Person

+----+--------+----------------------+
|#   |name    |out('WorkedAt').size()|
+----+--------+----------------------+
|0   |jerome |3                     |
|1   |john doe|1                     |
+----+--------+----------------------+

Which can be further optimized as (if not already done by the optimizer):

orientdb {db=tdb}> select name, out_WorkedAt.size() from Person

+----+--------+-------------------+
|#   |name    |out_WorkedAt.size()|
+----+--------+-------------------+
|0   |jerome |3                  |
|1   |john doe|1                  |
+----+--------+-------------------+

Those queries use direct links and don't need index, the last one just don't need the edge at all.

Use case 3

I can test if a person work in a company with this query:

orientdb {db=tdb}> select count() from Person where name = 'jerome' and out('WorkedAt') contains (name = 'Zeenea')

+----+-------+
|# |count()|
+----+-------+
|0 |1 |
+----+-------+

If count result is one or more items are linked.

This query use direct links and don't need index.

Of course that just a way to give you the idea. You have to adapt it to your use case.

Last but not least, just don't trust me. Test!

I don't have billions of edges.

Give me some feedback if I'm wrong or if I miss something. (I am learning while I respond to you.)

my 2 cents,

--
Jérôme Mainaud
jer...@mainaud.com

To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/orient-database/52f2837f-0663-4abf-9ed2-1715cda3c97b%40googlegroups.com.

Suhas

unread,

May 10, 2019, 4:00:40 AM5/10/19

to OrientDB

For your first case, I see no other solution as build an index but you can do it with a UNIQUE_HASH_INDEX. If the implementation is good, adding a key should be mean time constant (some keys are punctually more expensive, when the index storage base has to grow).

I will try this solution and revert back.

Those queries use direct links and don't need index, the last one just don't need the edge at all.

True. There is no index required for it.

If count result is one or more items are linked.
This query use direct links and don't need index.

Have tried that method. Even though the results are not bad, it may not scale so well, especially when the number of in and out edges increases.

Anyway, I'll compare the results after the UNIQUE_HASH_INDEX is complete.

--

Suhas

unread,

May 14, 2019, 3:25:51 AM5/14/19

to OrientDB

Response inline.

On Thursday, May 9, 2019 at 3:21:47 PM UTC, Jérôme Mainaud wrote:

OK, I'm not surprised by the SB-Tree insert cost increase as adding a key complexity in such a Tree is O(log(n)).

For your first case, I see no other solution as build an index but you can do it with a UNIQUE_HASH_INDEX. If the implementation is good, adding a key should be mean time constant (some keys are punctually more expensive, when the index storage base has to grow).

Tried it. There is no difference. Initially beginning with 20000 items/sec, after about one and a half days, the speed decreased down to 500 items/sec.

Jérôme Mainaud

unread,

May 14, 2019, 4:14:26 AM5/14/19

to orient-...@googlegroups.com

OK Suhas,

Thank you for the feedback.

Hope someone from OrientDB team will provide you with better help.

--
Jérôme Mainaud
jer...@mainaud.com

To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/orient-database/21389bd0-d014-4b25-ba4c-af685f55974f%40googlegroups.com.

Marek Bisz

unread,

Jun 19, 2019, 8:46:46 AM6/19/19

to OrientDB

Hi there.

I would start from RAM tuning (smaller Xmx ie -Xmx800m ).

https://orientdb.com/docs/2.1.x/Performance-Tuning.html

Reply all

Reply to author

Forward