How to properly index edges?

Sven Hodapp

unread,

Aug 6, 2015, 6:01:58 AM8/6/15

to OrientDB

Hi there,

my data model looks like this:

V class Abstract with uuid (notuniqe index), model (lucene index) and some other properties
V class NormalizedNamedEntity with the same props and indexes like Abstract
E class uima_annotated which connects Abstracts with NormalizedNamedEntities, and has also some other properties

Currently there are 100k Abstracs and 22k NormalizedNamedEntities and nearly 2 million uima_annotated edges.

Now I'd like to perform queries like this:

select expand(out) from uima_annotated where in.uuid = "DBA002026" and in.uuid = "NO000357"

This is like: give me all Abstracts (or the first 20) which have annotated a DBA002026-NormalizedNamedEntity AND NO000357-NormalizedNamedEntity.

The query is not efficient and returns nothing, because "fetched more than 50000 records: to speed up the execution, create an index or change the query to use an existent index".

I've tried to index uima_annotated.in, but this has no impact! Any ideas how to speed this up? Where should I place an index? Is it possible to index uima_annotated.in.uuid? Or is there a better way to express the query (maybe coming from NormalizedNamedEntity)?

Thanks for any advice!

SavioL

unread,

Aug 6, 2015, 6:20:05 AM8/6/15

to OrientDB

Hi,
can you export and send me your Database so we can test it (if it isn't a problem for you)..

regards,
Savio L.

Sven Hodapp

unread,

Aug 6, 2015, 6:51:34 AM8/6/15

to OrientDB

Hi SavioL,

thanks for your reply. Here you can download it:

https://owncloud.scai.fraunhofer.de/public.php?service=files&t=ad73207d52d3ed6b163fa10f954823c6

The download resource expires on 2015-08-09.

Regards,
Sven

SavioL

unread,

Aug 6, 2015, 6:54:01 AM8/6/15

to OrientDB

perfect, i'm downloading it..

SavioL

unread,

Aug 6, 2015, 10:04:34 AM8/6/15

to OrientDB

I'm back,
we tried to reconstruct the query, try this if that is what you need ..

back vertex Abstract that are connected to NormalizedNamedEntity with a uuid = "DBA002026" and uuid = "NO000357"

select expand($c) 
             let $a = (select expand(in('uima_annotated')) from (select from NormalizedNamedEntity   where uuid = "DBA002026")), 
             $b = (select expand(in('uima_annotated')) from (select from NormalizedNamedEntity   where uuid = "NO000357") ), 
             $c = intersect( $a, $b)

this is the result (elapse time for query: 0.441 sec)

this is the result you were looking for?

Regards,
Savio L.

Sven Hodapp

unread,

Aug 6, 2015, 10:15:53 AM8/6/15

to OrientDB

Thanks!

That's not bad. At some point, if we have many more edges in the bag, the intersect function will be slow? So that query is very sensitive about data quantity?
(The entire dataset is at least 240x more data!)

Is it possible to use an index to speed things up?

SavioL

unread,

Aug 6, 2015, 10:55:07 AM8/6/15

to OrientDB

Increasing vertices returned by subqueries, the intersect increase the computation time .. How much slower honestly do not know, the only way to understand it is to do some testing .
I believe that intersect instruction compare simply RID of the vertices returned by subqueries (certainly less than the total number of vertex) so it is faster than scroll million vertices.

Indexes are already present on the fields used by subqueries ..

regards
Savio L.

Sven Hodapp

unread,

Aug 7, 2015, 4:06:27 AM8/7/15

to OrientDB

So it is not possible to make an automatic index to speedup the "uima_annotated where in.uuid =..." query?
I'll can think about a "inuuid" property within uima_annotated and place an index on that? (But this would be a bit wasteful in terms of space?)

Regards,
Sven

SavioL

unread,

Aug 7, 2015, 4:41:34 AM8/7/15

to OrientDB

Hi Sven,
Why do you need another indexing? have you already tried to implement it with all your complete database? ..as well as the query is too slow?
if you copy "inuuid" on edge you have a duplicate value since the same value exists in the vertex.
If you move it from vertex to edge you sure that you do not involve any other disadvantages?

regards,
Savio L.

Sven Hodapp

unread,

Aug 11, 2015, 3:09:18 PM8/11/15

to orient-...@googlegroups.com

Hi Savio,

is OrientDB’s intersect algorithm optimized, like the one in Lucene? E.g. using skip lists for fast intersection?

For that kind of query are no other efficient ways possible with real graph-queries? (Especially to overcome input sensitive intersections)
Is there a optimal way to model the graph for such queries? Or makes only a intersection sense?

Regards,
Sven

> --
>
> ---
> You received this message because you are subscribed to a topic in the Google Groups "OrientDB" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/orient-database/xJHW2YCgBxE/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to orient-databa...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Özgür Sucu

unread,

May 31, 2017, 7:03:58 AM5/31/17

to OrientDB

Hi Sven,

Did you have any performance issues regarding the suggested query above? if you had how did you manage it?

11 Ağustos 2015 Salı 22:09:18 UTC+3 tarihinde Sven Hodapp yazdı:

Sven Hodapp

unread,

May 31, 2017, 9:03:57 AM5/31/17

to orient-...@googlegroups.com

Hi Özgür,

it’s some time ago and I’m afraid, we don’t use OrientDB. After evaluation we’ve switched to another solution.
So I’m sorry, but I can’t give you any useful suggestions on this topic.

Regards,
Sven

Özgür Sucu

unread,

May 31, 2017, 9:29:32 AM5/31/17

to OrientDB

Hi Sven,

Which solution did you switch? or what did you use as the replacement of OrientDB

31 Mayıs 2017 Çarşamba 16:03:57 UTC+3 tarihinde Sven Hodapp yazdı:

Sven Hodapp

unread,

May 31, 2017, 9:38:48 AM5/31/17

to orient-...@googlegroups.com

We’re using Apache Accumulo now. It fits better to our needs.

Reply all

Reply to author

Forward