Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
performance diff in relational and graph databases
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  4 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
charu tyagi  
View profile  
 More options Apr 7 2012, 1:46 am
From: charu tyagi <charu.tyag...@gmail.com>
Date: Fri, 6 Apr 2012 22:46:43 -0700 (PDT)
Local: Sat, Apr 7 2012 1:46 am
Subject: performance diff in relational and graph databases
hii
i am doing a test to see the difference between relational databse
mysql and graph database.the problem is that for 100 and above nodes
graph databases' performance is better than relational... but if i
take 10 nodes both in relational and graph databse,relational
database' performance is better than graph database why is it so?
plzzzzz reply

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chris Albertson  
View profile  
 More options Apr 7 2012, 2:06 am
From: Chris Albertson <albertson.ch...@gmail.com>
Date: Fri, 6 Apr 2012 23:06:31 -0700
Local: Sat, Apr 7 2012 2:06 am
Subject: Re: [Neo4j] performance diff in relational and graph databases
I bet with o zero nodes the performances are equal between both types
of databases.  In fact the zero node case would tell you something,
that there is a fixed time to respond to any query

Those node counts are not realistic.   I got hit by that hard a few
years ago.   What happens with these tiny test cases is that the
ENTIRE database can be cashed in RAM by the operating system.  Then
when you build the system and place real world data in the DBMS, you
more data than can fit in RAM can the OS actually needs to read data
from disk.

With the small number of nodes you used, the difference my be just the
implementation and nothing to do with database theory.    Try another
test with a few million nodes.  Your first job is to write a random
node generator and let it run a while

On Fri, Apr 6, 2012 at 10:46 PM, charu tyagi <charu.tyag...@gmail.com> wrote:
> hii
> i am doing a test to see the difference between relational databse
> mysql and graph database.the problem is that for 100 and above nodes
> graph databases' performance is better than relational... but if i
> take 10 nodes both in relational and graph databse,relational
> database' performance is better than graph database why is it so?
> plzzzzz reply

--

Chris Albertson
Redondo Beach, California


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
israele...@gmail.com  
View profile  
 More options Apr 7 2012, 1:21 pm
From: israele...@gmail.com
Date: Sat, 7 Apr 2012 17:21:52 +0000
Local: Sat, Apr 7 2012 1:21 pm
Subject: Re: [Neo4j] performance diff in relational and graph databases
What algorithms where you running on the nodes?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Craig Taverner  
View profile  
 More options Apr 8 2012, 8:46 am
From: Craig Taverner <cr...@amanzi.com>
Date: Sun, 8 Apr 2012 14:46:43 +0200
Local: Sun, Apr 8 2012 8:46 am
Subject: Re: [Neo4j] performance diff in relational and graph databases

Hi,

I think this is not a surprising result and is actually a very normal
expectation for many cases. However, it is not always going to be true. The
situation is much more complex and many different performance results will
be seen for different scenarios.

You did not explain the nature of your data, graph model or queries, so I
cannot explain explicitly what you are seeing, but I can perhaps explain
this behavior in a more general context. In principle the graph database
should have better scaling query performance if your queries make use of
the local graph. This means you are performing a query that benefits from
traversal performance. In Neo4j traversing a local sub-graph is a very fast
operation. But far more important than that is the fact that the
performance of the traversal is dependent only on the size of the sub-graph
being traversed, not the total database size. For example, if you have a
traversal that touches 1,000 nodes in a 10,000 graph, it should take X ms.
Then when you load more data into the graph, perhaps getting it up to
1,000,000 nodes in total (but leaving the sub-graph at 1,000) the traversal
performance should remain about X ms. This is because traversal is
following a set of explicit references or pointers to the next data, and
the performance of that 'linked list' is unrelated to the total data size.

In a relational database this is not true. Since all data is in tables,
finding data usually means following a foreign key, which is itself a
column of the table. To find this key requires an exhaustive search (O(N))
or using an indexed key (perhaps O(log N)). Neither is as fast as the graph
explicit reference (O(1)). So for this particular situation the graph
scales very much better than the table.

Of course the real world is much more complex, you are most likely also
using indices (eg. lucene) in Neo4j and therefor also getting some of the
same types of performance characteristics you would see in Neo4j. However,
of you structure things well, only use lucene for finding a limited set of
key start nodes in an index that is not too big, and using mostly graph
traversals from then on, you should hopefully get database performance that
consistently beats relational database on large data.

There are, of course, many ways you could end up with even slower
performance than relational databases, possible even much slower. A
discussion of those nasty cases is probably outside the scope of this
thread, and I think we need more experts in the room to really illuminate
the situation. I can make one small comment on one possible reason why
Neo4j might have been slower in your small case. The structures that allow
Neo4j to be faster for large traversals, the network of explicit
references, also mean that it will take up more space for the same data. A
relationship in Neo4j is 33 bytes, while a non-indexed foreign key could be
only 4 bytes. I'm not sure of the effective size of an indexed foreign key,
but feel it is likely still smaller than 33 bytes. So in effect, simply
loading the relevant data into memory should take longer for Neo4j than
RDBMS. This is a small effect, and as you see is very quickly overcome by
the positive effects of the performance scaling. In your case this happened
very soon, after only 100 nodes.

Regards, Craig


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »