performance diff in relational and graph databases

113 views
Skip to first unread message

charu tyagi

unread,
Apr 7, 2012, 1:46:43 AM4/7/12
to Neo4j
hii
i am doing a test to see the difference between relational databse
mysql and graph database.the problem is that for 100 and above nodes
graph databases' performance is better than relational... but if i
take 10 nodes both in relational and graph databse,relational
database' performance is better than graph database why is it so?
plzzzzz reply

Chris Albertson

unread,
Apr 7, 2012, 2:06:31 AM4/7/12
to ne...@googlegroups.com
I bet with o zero nodes the performances are equal between both types
of databases. In fact the zero node case would tell you something,
that there is a fixed time to respond to any query

Those node counts are not realistic. I got hit by that hard a few
years ago. What happens with these tiny test cases is that the
ENTIRE database can be cashed in RAM by the operating system. Then
when you build the system and place real world data in the DBMS, you
more data than can fit in RAM can the OS actually needs to read data
from disk.

With the small number of nodes you used, the difference my be just the
implementation and nothing to do with database theory. Try another
test with a few million nodes. Your first job is to write a random
node generator and let it run a while

--

Chris Albertson
Redondo Beach, California

israe...@gmail.com

unread,
Apr 7, 2012, 1:21:52 PM4/7/12
to ne...@googlegroups.com
What algorithms where you running on the nodes?

------Original Message------
From: charu tyagi
Sender: ne...@googlegroups.com
To: Neo4j
ReplyTo: ne...@googlegroups.com
Subject: [Neo4j] performance diff in relational and graph databases
Sent: Apr 7, 2012 1:46 AM

hii
i am doing a test to see the difference between relational databse
mysql and graph database.the problem is that for 100 and above nodes
graph databases' performance is better than relational... but if i
take 10 nodes both in relational and graph databse,relational
database' performance is better than graph database why is it so?
plzzzzz reply

Sent via BlackBerry from T-Mobile

Craig Taverner

unread,
Apr 8, 2012, 8:46:43 AM4/8/12
to ne...@googlegroups.com
Hi,

I think this is not a surprising result and is actually a very normal expectation for many cases. However, it is not always going to be true. The situation is much more complex and many different performance results will be seen for different scenarios.

You did not explain the nature of your data, graph model or queries, so I cannot explain explicitly what you are seeing, but I can perhaps explain this behavior in a more general context. In principle the graph database should have better scaling query performance if your queries make use of the local graph. This means you are performing a query that benefits from traversal performance. In Neo4j traversing a local sub-graph is a very fast operation. But far more important than that is the fact that the performance of the traversal is dependent only on the size of the sub-graph being traversed, not the total database size. For example, if you have a traversal that touches 1,000 nodes in a 10,000 graph, it should take X ms. Then when you load more data into the graph, perhaps getting it up to 1,000,000 nodes in total (but leaving the sub-graph at 1,000) the traversal performance should remain about X ms. This is because traversal is following a set of explicit references or pointers to the next data, and the performance of that 'linked list' is unrelated to the total data size.

In a relational database this is not true. Since all data is in tables, finding data usually means following a foreign key, which is itself a column of the table. To find this key requires an exhaustive search (O(N)) or using an indexed key (perhaps O(log N)). Neither is as fast as the graph explicit reference (O(1)). So for this particular situation the graph scales very much better than the table.

Of course the real world is much more complex, you are most likely also using indices (eg. lucene) in Neo4j and therefor also getting some of the same types of performance characteristics you would see in Neo4j. However, of you structure things well, only use lucene for finding a limited set of key start nodes in an index that is not too big, and using mostly graph traversals from then on, you should hopefully get database performance that consistently beats relational database on large data.

There are, of course, many ways you could end up with even slower performance than relational databases, possible even much slower. A discussion of those nasty cases is probably outside the scope of this thread, and I think we need more experts in the room to really illuminate the situation. I can make one small comment on one possible reason why Neo4j might have been slower in your small case. The structures that allow Neo4j to be faster for large traversals, the network of explicit references, also mean that it will take up more space for the same data. A relationship in Neo4j is 33 bytes, while a non-indexed foreign key could be only 4 bytes. I'm not sure of the effective size of an indexed foreign key, but feel it is likely still smaller than 33 bytes. So in effect, simply loading the relevant data into memory should take longer for Neo4j than RDBMS. This is a small effect, and as you see is very quickly overcome by the positive effects of the performance scaling. In your case this happened very soon, after only 100 nodes.

Regards, Craig

gg4u

unread,
Oct 11, 2014, 7:21:18 AM10/11/14
to ne...@googlegroups.com
Hi Craig,


I'd like to revive this post with a testing I am doing on large dataset, so a real case.
4M nodes
100M rels
11GB on disk

query time is *big*, though indexes appear to be in place...
would you like to please have a look to this thread too?

thank you!

Michael Hunger

unread,
Oct 11, 2014, 10:35:00 AM10/11/14
to ne...@googlegroups.com
For your queries make sure to provide profiled outputs.

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gg4u

unread,
Oct 11, 2014, 12:21:20 PM10/11/14
to ne...@googlegroups.com
Hi Micheal,

I did it here:
https://groups.google.com/forum/#!topic/neo4j/RFeQUoHB0Fk

i did not want to duplicate link, just engage more people in this discussion cause I am not yet able to have a response time comparable with a NoSQL for accessing a node's first neighbors,
and a way too long time for returning paths and traversal.

gg4u

unread,
Oct 11, 2014, 12:22:12 PM10/11/14
to ne...@googlegroups.com
Hi Micheal,

I did it here:
https://groups.google.com/forum/#!topic/neo4j/RFeQUoHB0Fk

i did not want to duplicate link, just engage more people in this discussion cause I am not yet able to have a response time comparable with a NoSQL for accessing a node's first neighbors,
and a way too long time for returning paths and traversal.


Il giorno sabato 11 ottobre 2014 16:35:00 UTC+2, Michael Hunger ha scritto:
Reply all
Reply to author
Forward
0 new messages