Hi,
I think this is not a surprising result and is actually a very normal
expectation for many cases. However, it is not always going to be true. The
situation is much more complex and many different performance results will
be seen for different scenarios.
You did not explain the nature of your data, graph model or queries, so I
cannot explain explicitly what you are seeing, but I can perhaps explain
this behavior in a more general context. In principle the graph database
should have better scaling query performance if your queries make use of
the local graph. This means you are performing a query that benefits from
traversal performance. In Neo4j traversing a local sub-graph is a very fast
operation. But far more important than that is the fact that the
performance of the traversal is dependent only on the size of the sub-graph
being traversed, not the total database size. For example, if you have a
traversal that touches 1,000 nodes in a 10,000 graph, it should take X ms.
Then when you load more data into the graph, perhaps getting it up to
1,000,000 nodes in total (but leaving the sub-graph at 1,000) the traversal
performance should remain about X ms. This is because traversal is
following a set of explicit references or pointers to the next data, and
the performance of that 'linked list' is unrelated to the total data size.
In a relational database this is not true. Since all data is in tables,
finding data usually means following a foreign key, which is itself a
column of the table. To find this key requires an exhaustive search (O(N))
or using an indexed key (perhaps O(log N)). Neither is as fast as the graph
explicit reference (O(1)). So for this particular situation the graph
scales very much better than the table.
Of course the real world is much more complex, you are most likely also
using indices (eg. lucene) in Neo4j and therefor also getting some of the
same types of performance characteristics you would see in Neo4j. However,
of you structure things well, only use lucene for finding a limited set of
key start nodes in an index that is not too big, and using mostly graph
traversals from then on, you should hopefully get database performance that
consistently beats relational database on large data.
There are, of course, many ways you could end up with even slower
performance than relational databases, possible even much slower. A
discussion of those nasty cases is probably outside the scope of this
thread, and I think we need more experts in the room to really illuminate
the situation. I can make one small comment on one possible reason why
Neo4j might have been slower in your small case. The structures that allow
Neo4j to be faster for large traversals, the network of explicit
references, also mean that it will take up more space for the same data. A
relationship in Neo4j is 33 bytes, while a non-indexed foreign key could be
only 4 bytes. I'm not sure of the effective size of an indexed foreign key,
but feel it is likely still smaller than 33 bytes. So in effect, simply
loading the relevant data into memory should take longer for Neo4j than
RDBMS. This is a small effect, and as you see is very quickly overcome by
the positive effects of the performance scaling. In your case this happened
very soon, after only 100 nodes.
Regards, Craig
On Sat, Apr 7, 2012 at 7:46 AM, charu tyagi <charu.tyag
...@gmail.com> wrote:
> hii
> i am doing a test to see the difference between relational databse
> mysql and graph database.the problem is that for 100 and above nodes
> graph databases' performance is better than relational... but if i
> take 10 nodes both in relational and graph databse,relational
> database' performance is better than graph database why is it so?
> plzzzzz reply