Relational model vs. Graph database model

160 views
Skip to first unread message

Fabio Rinnone

unread,
Jul 1, 2016, 4:13:27 AM7/1/16
to OrientDB
Hi there,

I'm yet a bit confused about differences beetween relational model and
graph database model. I have imported into OrientDB graph a relational
database but I'm not sure if my approach to query graph data is correct.
For instance, I use the following query with MySQL:

select count(*) as total from drug, demo where drug.primaryid =
demo.primaryid and demo.age > 29 and demo.age < 46 and drug.drugname =
'TRIZIVIR'

Is this following OrientDB query equivalent in OrientDB?

select count(*) as total from ( select expand(in()) from demo where age
> 29 and age < 46 ) where drugname = 'TRIZIVIR'

In this case I'm using a subquery: do you think is it more slowly than
using a join?

In this case, first query require 1 min 46.15 sec and the second one
127.496 sec(s). I think it is not a good result.

I yet don't understand if using of 'traverse' into 'select' is more
efficient than using of 'expand' and I don't understand how can I use
'dot notation' to query data, if possible.

I have imported graph database using Java Graph API: when I found some
value for foreign keys (in this case primaryid) I have created an edge
beetween vertices founded as follow:

graph.addEdge(null, sourceVertex, targetVertex, "E");

I'm using lightweighted edges.

I'm not sure but I think I can't use 'dot notation' because fields
(primaryid in this case) are not mapped somehow, right?. I have tried to
use in_ and out_ properties in query but I have no success.

Can you explain what is my issue?

--
Fabio Rinnone
Skype: fabiorinnone
Web: http://www.fabiorinnone.eu


signature.asc

Ivan Mainetti

unread,
Jul 1, 2016, 5:50:47 AM7/1/16
to OrientDB, fabio.r...@gmail.com
Hi,
could you post your RDBMS schema and the graph schema?

thx

Fabio Rinnone

unread,
Jul 1, 2016, 6:11:10 AM7/1/16
to OrientDB
The attachment contains a phpMyAdmin SQL Dump of a subset of my schema
with only two tables demo and drug.

The graph schema is similar: there is a class for every table, a
property for every attribute and when primaryid values matches between
classes there is an edge from drug to demo classes.

Il 01/07/2016 11:50, Ivan Mainetti ha scritto:
> Hi,
> could you post your RDBMS schema and the graph schema?
>
> thx

faers.sql
signature.asc

Ivan Mainetti

unread,
Jul 1, 2016, 6:49:55 AM7/1/16
to OrientDB, fabio.r...@gmail.com
please help me understand your use case. Which result are you expecting from the query above?


thx

Il giorno venerdì 1 luglio 2016 10:13:27 UTC+2, Fabio Rinnone ha scritto:

Ivan Mainetti

unread,
Jul 1, 2016, 7:01:40 AM7/1/16
to OrientDB, fabio.r...@gmail.com
which kind of relation drug-demo is? 1-1, 1-n, n-1, n-m?
which of the 2 classes contains more vertices?




Il giorno venerdì 1 luglio 2016 10:13:27 UTC+2, Fabio Rinnone ha scritto:

Fabio Rinnone

unread,
Jul 1, 2016, 8:55:59 AM7/1/16
to OrientDB
Ok, demo table contains patients demographic and administrative
information (age, sex, etc.) and drug table contains information about
drugs used by patients.

The relationship beetween demo-drug is 1-n (drug contains more vertices).

I don't understand if it is correct to use 'expand' rather than
'traverse' and sincerly I don't understand how 'traverse' works.

I try to use 'parallel' because I think it is useful with a quad core
processor and the result is better, but it is not always so, for others
sample queries it is not sufficient.

Il 01/07/2016 13:01, Ivan Mainetti ha scritto:
> which kind of relation drug-demo is? 1-1, 1-n, n-1, n-m?
> which of the 2 classes contains more vertices?

signature.asc

Fabio Rinnone

unread,
Jul 1, 2016, 8:57:58 AM7/1/16
to OrientDB
Ok, demo table contains patients demographic and administrative
information (age, sex, etc.) and drug table contains information about
drugs used by patients.

The relationship beetween demo-drug is 1-n (drug contains more vertices).

I don't understand if it is correct to use 'expand' rather than
'traverse' and sincerly I don't understand how 'traverse' works.

I try to use 'parallel' because I think it is useful with a quad core
processor and the result is better, but it is not always so, for others
sample queries it is not sufficient.

Il 01/07/2016 13:01, Ivan Mainetti ha scritto:
> which kind of relation drug-demo is? 1-1, 1-n, n-1, n-m?
> which of the 2 classes contains more vertices?

signature.asc

Luca Garulli

unread,
Jul 1, 2016, 9:13:13 AM7/1/16
to OrientDB
Hi Fabio,

Are you using v2.2x? Did you create the schema first (helps with performance and db size)?

If you have foreign keys, the best way to import from MySQL is Teleporter. Did you try using it?

Last: what query are you trying to extract?

Best Regards,

Luca Garulli
Founder & CEO


--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Fabio Rinnone

unread,
Jul 1, 2016, 9:43:15 AM7/1/16
to orient-...@googlegroups.com
Hi Luca,

Il 01/07/2016 15:12, Luca Garulli ha scritto:
> Are you using v2.2x? Did you create the schema first (helps with
> performance and db size)?

I'm yet using v.2.1.x. First I've created schema and after I've
populated it with data.

> If you have foreign keys, the best way to import from MySQL is
> Teleporter. Did you try using it?

I've written a customized Java loader[1] that creates classes,
properties, indexes and relationships (implemented using edges) via
parsing the phpMyAdmin dump SQL file and after load data from CSV dump
files. I think it's ok.

> Last: what query are you trying to extract?

I'm trying to use similar queries with joins to test performances of
both MySQL and OrientDB databases. For this moment I haven't more others
queries: first I would like to understand what is the best way to query
multiple classes on OrientDB.

[1] https://bitbucket.org/fabiorinnone/faers-loader
signature.asc

Luca Garulli

unread,
Jul 1, 2016, 1:28:48 PM7/1/16
to OrientDB
Please use 2.2 because it's much much faster, specially on scan + parallel.

I can help you with queries if you can tell me what do you want. Or the same query in MySQL?

Best Regards,

Luca Garulli
Founder & CEO


Fabio Rinnone

unread,
Jul 1, 2016, 4:05:29 PM7/1/16
to orient-...@googlegroups.com
Il 01/07/2016 19:28, Luca Garulli ha scritto:
> Please use 2.2 because it's much much faster, specially on scan + parallel.

Ok, Luca.

> I can help you with queries if you can tell me what do you want. Or the
> same query in MySQL?

This is a simple query that I've tested for this moment:

select count(*) as total from drug, demo where drug.primaryid =
demo.primaryid and demo.age > 29 and demo.age < 46 and drug.drugname =
'TRIZIVIR'

In OrientDB I've tried this:

select count(*) as total from ( select expand(in()) from demo where age
> 29 and age < 46 ) where drugname = 'TRIZIVIR'

that return correct result, but I think that the use of subquery is not
efficient.

What do you think, Luca?
signature.asc

Luca Garulli

unread,
Jul 1, 2016, 11:44:32 PM7/1/16
to OrientDB
On 1 July 2016 at 15:05, Fabio Rinnone <fabio.r...@gmail.com> wrote:
In OrientDB I've tried this:

select count(*) as total from ( select expand(in()) from demo where age
> 29 and age < 46 ) where drugname = 'TRIZIVIR'

that return correct result, but I think that the use of subquery is not
efficient.

Did you create an INDEX (non hash) against demo.age and an HASH_INDEX against drugname?

Fabio Rinnone

unread,
Jul 2, 2016, 5:17:40 AM7/2/16
to orient-...@googlegroups.com
Il 02/07/2016 05:44, Luca Garulli ha scritto:
> Did you create an INDEX (non hash) against demo.age and an HASH_INDEX
> against drugname?

Hmm no, I've automatically created indices only in properties where the
corresponding attribute on MySQL schema have an index. In this case only
in primaryid attribute.

Probably I will query on every properties of classes, do you suggest me
to create indices on all properties? Don't require too much disk space?
signature.asc

Fabio Rinnone

unread,
Jul 5, 2016, 5:13:07 AM7/5/16
to orient-...@googlegroups.com
Il 02/07/2016 05:44, Luca Garulli ha scritto:
> Did you create an INDEX (non hash) against demo.age and an HASH_INDEX
> against drugname?

I have created an INDEX against demo.age and an HASH_INDEX on drug.drugname.

Now following query on MySQL:

select count(*) as total from drug, demo where drug.primaryid =
demo.primaryid and demo.age > 29 and demo.age < 46 and drug.drugname =
'TRIZIVIR'

require 58.08 sec and I expected this result.

The corresponding query on OrientDB:

select count(*) as total from ( select expand(in()) from demo where age
> 29 and age < 46 ) where drugname = 'TRIZIVIR'

require 47.804 sec(s).

Now you should think it is a good result, but in MySQL I have not
implemented indexes on attributes: if I define a BTREE index on demo.age
and an HASH index on drug.drugame the first query on MySQL require even
0.02 sec.

Do you think is correct or my OrientDB query is not correct?
signature.asc

Fabio Rinnone

unread,
Jul 8, 2016, 6:25:47 AM7/8/16
to orient-...@googlegroups.com
Il 02/07/2016 05:44, Luca Garulli ha scritto:
> Did you create an INDEX (non hash) against demo.age and an HASH_INDEX
> against drugname?

I have tried to use out() instead of in() in OrientDB query and time
execution are better, but if I add indexes yet in MySQL schema execution
times in MySQL remains better than OrientDB.

First example (2 tables join):

select count(*) as total from ( select expand(out()) from drug where
drugname = 'TRIZIVIR' ) where age > 29 and age < 46

requires 1.698 sec in OrientDB.

Equivalent query in MySQL:

select count(*) as total from drug, demo where drug.primaryid =
demo.primaryid and demo.age > 29 and demo.age < 46 and drug.drugname =
'TRIZIVIR'

requires less than 0.01 sec.

Second example (3 tables join):

select count(*) as total from ( select expand(out()) from ( select
expand(out()) from indi where indi_pt = 'HIV infection' ) where drugname
= 'TRIZIVIR' ) where age > 29 and age < 46

requires 80.66 sec in OrientDB.

Equivalent query in MySQL:

select count(*) as total from indi, drug, demo where indi.primaryid =
drug.primaryid and indi.indi_drug_seq = drug.drug_seq and drug.primaryid
= demo.primaryid and indi.indi_pt = 'HIV infection' and drug.drugname =
'TRIZIVIR' and demo.age > 29 and demo.age < 46;

require also less than 0.01 sec.
signature.asc

Luigi Dell'Aquila

unread,
Jul 11, 2016, 2:46:25 AM7/11/16
to orient-...@googlegroups.com
Hi Fabrizio,

In my experience, there numbers seem a bit too high...
Just a couple of questions (sorry, I don't know if you already answered before, maybe I missed it...)
- how big is your DB?
- what is your configuration? (RAM, heap, off-heap and so on)
- I see you are using out() without any edge class names, do you have a schema defined? Can you try to use edge class names, eg out("TheEdgeClass")?


Could you also please try to rewrite the queries this way and let me know if they are better?

select 
   out()[drugname = 'TRIZIVIR'].out()[age > 29 and age < 46].size() 
from indi 
where 
   indi_pt = 'HIV infection'


or even better


select 
   out("TheEdgeClassName")[drugname = 'TRIZIVIR'].out("TheEdgeClassName")[age > 29 and age < 46].size() 
from indi 
where 
   indi_pt = 'HIV infection'



```````Thanks

Luigi


Fabio Rinnone

unread,
Jul 18, 2016, 5:30:53 AM7/18/16
to orient-...@googlegroups.com
Il 11/07/2016 08:46, Luigi Dell'Aquila ha scritto:
> In my experience, there numbers seem a bit too high...
> Just a couple of questions (sorry, I don't know if you already answered
> before, maybe I missed it...)
> - how big is your DB?

Size of DB is 19 GB.

> - what is your configuration? (RAM, heap, off-heap and so on)
> - I see you are using out() without any edge class names, do you have a
> schema defined? Can you try to use edge class names, eg out("TheEdgeClass")?

I have 12 GB of RAM, when I launch OrientDB console I use 8192 MB of
heap space, I dont' know if this information is useful for you.

> Could you also please try to rewrite the queries this way and let me
> know if they are better?
>
> select
> out()[drugname = 'TRIZIVIR'].out()[age > 29 and age < 46].size()
> from indi
> where
> indi_pt = 'HIV infection'
>
> or even better
>
> select
> out("TheEdgeClassName")[drugname =
> 'TRIZIVIR'].out("TheEdgeClassName")[age > 29 and age < 46].size()
> from indi
> where
> indi_pt = 'HIV infection'

Ok, but these queries returns me always 20 records with all values
setted to 0: they are insignificant.

In my first implementation I have loaded all edges into E class: I
reload the DB using more than one subclass for edges. I have rewrited my
query as follow:

select count(*) as total from ( select expand(out('drugToDemo')) from (
select expand(out('indiToDrug')) from indi where indi_pt = 'HIV
infection' ) where drugname = 'TRIZIVIR' ) where age > 29 and age < 46

and results are better: 1.444 sec.

But MySQL returns same result in 0.06 sec.
signature.asc
Reply all
Reply to author
Forward
0 new messages