Pros and cons about graph databses and especially Neo4j

14,671 views
Skip to first unread message

Michel Domenjoud

unread,
Jun 4, 2012, 5:40:38 AM6/4/12
to ne...@googlegroups.com
Hello,
I'm currently working on graph databases as an R&D subject, and I'm looking for good references about graph databases pros and cons.
I already watched and read a lot of good articles about graph databases, about their position in the NoSQL ecosystem, and also some benchmarks and performance comparison towards relational databases. So, I have a lot of pros arguments to uses graph databases : good fit for higly connected data, powerful for traversals, etc. but also some examples that show use cases usually adressed with relational databases (CMS, e-commerce...)

Now I'm really convinced that graph databases and especially Neo4j fits a lot of use cases, and as I was trying to convince some collegues about Neo4j benefits, I realized that I lack some cons about graph databases vs. relational databases and I was almost arguing Neo4j as a silver bullet, which can't be true.

So here is my question : does anybody have some references, precise arguments or use cases that don't fit in graph databases but fit really better in relational databases?
I already have some, but I intentionally don't put anything for the moment in order to start an open debate :)

Thanks by advance for your answers!

Johnny Weng Luu

unread,
Jun 4, 2012, 7:34:26 AM6/4/12
to ne...@googlegroups.com
One con for me is the you can't shard Neo4j. That means that you have your whole dataset in ONE server.

So you have to scale vertically each time you want to have more data capacity.

Johnny

Radhakrishna Kalyan

unread,
Jun 4, 2012, 8:20:19 AM6/4/12
to ne...@googlegroups.com
Hi

This was my first question to Peter on his presentation in Karlskrona in 2011 Dev-Con.

As it always mentioned and I too realized that NoSql does not say not to use relational database, but suggests to replace relational database with Neo4J where one see relations(Complex/Non-Complex) among data.
I hope you agree.

I do agree that neo4j is not a silver bullet for every case.

I see it like this:
I will NOT use Neo4J in an application if:

1) The application have only tables with no relations among them. i.e No foreign key relation among tables.
2) If the application is a legacy application like Mainframes and DB2 containing stored procedures etc. where migrating to a new DB is a major issue.
3) If the application code contains hard coded SQL queries to fetch the data from the database which makes it hard to migrate.

These are the few cases I found when I was looking to migrate my own application built on Swing and SqlLite as backend. I used Sql queries with in my code.

I would have been saved if I would have used JPA. Because thanks to Spring-Data-Neo4J where there is a support for cross storage.
It means that the application can persist to Neo4J and any relational db using the same entity.
Please consider looking to Spring-Data-Neo4J.

Please comment if there is any misconception in my opinion.

Kalyan
--
Thanks and Regards
N Radhakrishna Kalyan

Tero Paananen

unread,
Jun 4, 2012, 8:42:08 AM6/4/12
to ne...@googlegroups.com
> I see it like this:
> I will NOT use Neo4J in an application if:
>
> 1) The application have only tables with no relations among them. i.e No
> foreign key relation among tables.
> 2) If the application is a legacy application like Mainframes and DB2
> containing stored procedures etc. where migrating to a new DB is a major
> issue.
> 3) If the application code contains hard coded SQL queries to fetch the data
> from the database which makes it hard to migrate.

I don't think reasons 2 and 3 are good reasons (or they're not formulated well)
to keep maintaining a legacy solution.

It's a cost vs. benefits thing. If you can gain benefits (performance, cost of
maintenance, etc.) at a cost that's acceptable, then you should migrate to
a different solution.

-TPP

Radhakrishna Kalyan

unread,
Jun 4, 2012, 9:09:51 AM6/4/12
to ne...@googlegroups.com
I agree, my reasoning for 2 and 3 are more towards cost reasons. sorry for not mentioning.

Johnny Weng Luu

unread,
Jun 4, 2012, 9:40:45 AM6/4/12
to ne...@googlegroups.com
It's hard to imagine data with no relations.

Sooner or later I think you would like to have relations between different data entities.

Everything is connected.

Johnny

Charles Bedon

unread,
Jun 4, 2012, 10:20:54 AM6/4/12
to ne...@googlegroups.com
Hello

For me the best advantage of using a NOSQL approach is its schema-less nature. It's also a disadvantage if you consider that it's now your responsibility to ensure the model integrity, but it gives you a lot of freedom if you have to mess with it at runtime (I mean, if the application requires it).

-------------------------------------
Charles Edward Bedón Cortázar
Network Management, Data Analysis and Free Software http://www.neotropic.co | Follow Neotropic on Twitter
Open Source Network Inventory for the masses!  http://kuwaiba.sourceforge.net | Follow Kuwaiba on Twitter
Linux Registered User #386666


---- Am Mon, 04 Jun 2012 08:40:45 -0500 Johnny Weng Luu <johnny....@gmail.com> schrieb ----

Johnny Weng Luu

unread,
Jun 4, 2012, 10:22:31 AM6/4/12
to ne...@googlegroups.com
I personally think that all kind of data should/could be saved in a graph database.

The world is one big graph and data is usually representing real world entities.

Johnny

Alan Robertson

unread,
Jun 4, 2012, 12:18:06 PM6/4/12
to ne...@googlegroups.com
On 06/04/2012 05:34 AM, Johnny Weng Luu wrote:
One con for me is the you can't shard Neo4j. That means that you have your whole dataset in ONE server.

So you have to scale vertically each time you want to have more data capacity.
More specifically - more write throughput.  If your write throughput is modest, you may never need to scale up.

For example, in my project, the information is dependency information regarding things in a data center.  Although your data center might be large, you will not likely add dependency information rapidly (by database standards), and more importantly, it does not change rapidly by computer standards.

A few thousand dependency updates per day represents a huge churn in the underlying infrastructure, but a very small number of database updates.

I doubt I'll run into write speed problems for this project.  [Famous Last Words!]

The moral of the story:  Know your application.  No benchmarks no matter how nice look very much like your application.
-- 
    Alan Robertson <al...@unix.sh> - @OSSAlanR

"Openness is the foundation and preservative of friendship...  Let me claim from you at all times your undisguised opinions." - William Wilberforce

Rick Otten

unread,
Jun 4, 2012, 12:49:56 PM6/4/12
to ne...@googlegroups.com

That you can’t shard your Neo4j is a general statement.

 

I would argue that it is very difficult to shard a densely connected graph.  There are many graph data sets however that are not densely connected that could be sharded (at the application layer).   It depends on your data set and the types of queries you are running.

 

 

Another point from this discussion is that a table inherently has relationships.  Each field is related to each other field by the fact that they share a row.    The only data I can think of that has no inherent relationships would be a simple unordered list of objects.  (As soon as you order them, you have a data structure with inherent relationships between the objects.)

vinravind

unread,
Jun 4, 2012, 7:14:42 PM6/4/12
to Neo4j
Neo4J only supports 1 billion nodes as of today.

Michael Hunger

unread,
Jun 4, 2012, 7:27:33 PM6/4/12
to ne...@googlegroups.com
That is actually not correct, these are 2^35 = 34bn nodes same number of rels and 2 ^ 36 = 68bn properties.

And this is just the current store-format, in principle the neo4j LONG id's can store 2^64 nodes, rels and properties if the need arises.

Michael

Benjamin Gehrels

unread,
Jun 4, 2012, 7:30:12 PM6/4/12
to ne...@googlegroups.com
I would answer this depending on how you want to access your data:

In the Relational Model, all Entitys of the same kind are logically
stored together (in a Table). So: Fetching all (or Parts) of them or
aggregating over them is reletively cheap, whereas doing this for the
relationships between entities is harder and needs some index structures
to optimize.


In the Graph Model, related Entitys (and their Relationships) are
logicaly stored together. So: Traversing over them and aggregating over
the traversal results is reletively cheap, whereas doing this for all
elements of one type is harder and needs some index structures to optimize.
_____________________________________________________________________
| | access paradigm |
| | entitys of same type | relationships between entities |
|--------------------------------------------------------------------
|graphs | needs indexing | cheap |
|relational | cheap | needs indexing |
|-------------------------------------------------------------------|

Michel Domenjoud

unread,
Jun 5, 2012, 12:42:50 PM6/5/12
to ne...@googlegroups.com
Hello,
thanks for your answers,

1 - graphs are hard to shard : yes, but I don't think this is explicitly an argument in a graph vs. relational database pros&cons, as relational databases can be really hard to  shard too, depending on the connectivity of the schema.

2 - I'll keep the main argument that is graph databases are known to be more efficient when working mainly on relations (with traversals), whereas relational databases can be more efficient on finding all elements of a type.

@Benjamin Thanks for your argument about differences in model storage, i'ts pretty clear.
But with this argument about indexing needs, it seems that we can in fact use either use relational or graph database, and we'll always have to define some indexes as we never use only relations fetchs or only entities fetchs.

Markus Gattol

unread,
Jun 20, 2012, 4:21:05 AM6/20/12
to ne...@googlegroups.com
Hi Michael,

thanks for pointing this out... I hadn't looked at Neo4j for a while and too thought it (still) was at around 1 billion nodes per database. Did this change some time during the last six or so month or was this limit just a myth from the start? I haven't checked yet but I assume the info you just pointed out is documented yes?

Peter Neubauer

unread,
Jun 20, 2012, 4:22:15 AM6/20/12
to ne...@googlegroups.com
Here you go :)

http://docs.neo4j.org/chunked/snapshot/capabilities-capacity.html#capabilities-data-size

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

If you can write, you can code - @coderdojomalmo
If you can sketch, you can use a graph database - @neo4j

Niels Hoogeveen

unread,
Jun 20, 2012, 5:14:50 PM6/20/12
to Neo4j
Just reading this comment now, and I would like to counter the claim
made.

While there are no specific facilities in Neo4j (yet) to do sharding,
it's not impossible to do so.

One solution is to write a wrapper around the Node#createRelationship
method, where a test is added whether the two involved nodes have the
same database. If so, then a normal relationship is created.

If the two database are different, the following steps can be taken:

1) lookup in the source database (through an index or some other
mechanism) if a shadow node exists for the node the relationship is
created to.
2) if no such shadow node exists, create it in the source database and
make sure it can be looked up in the future (add it to an index). Add
a property to the shadow node referencing the node id of the target
node.
3) create a relationship from the source node to the shadow node
4) lookup in the target database if a shadow node exists for the
source node.
5) if no such shadow node exists, create it and make sure it can be
be looked up in the future. Add a property to the shadow node
referencing the node id of the source node.
6) create a relationship from the shadow node to the target node.

(NB. source node means the node the relationship is created from...
target node means the node the relationship is created to... source
database means the database the source node resides in... target
database means the database the target node resides in)

The remainder of the methods of Node and Relationship need to be
wrapped as well, so the lookup of properties and relationships is done
on the actual node and not on its shadow.

It would be neat if such features could be integrated somewhere in a
future Neo4j release.

Axel Morgner

unread,
Jun 20, 2012, 5:29:34 PM6/20/12
to ne...@googlegroups.com
Here's another argument, not very much stressed in this thread so far:

Sent ahead that most use cases can be modeled in either worlds, modeling a problem as a graph is easier for human beings. And the more 'graphy' a problem is, the better it fits into a graph database.

Don't underestimate the fact that you can make your managers understand your database!


Am 04.06.2012 11:40, schrieb Michel Domenjoud:

Chris Albertson

unread,
Jun 20, 2012, 5:49:55 PM6/20/12
to ne...@googlegroups.com
On Wed, Jun 20, 2012 at 2:14 PM, Niels Hoogeveen <nielsh...@gmail.com> wrote:
> Just reading this comment now, and I would like to counter the claim
> made.
>
> While there are no specific facilities in Neo4j (yet) to do sharding,
> it's not impossible to do so.
>
> One solution is to write a wrapper around the Node#createRelationship
> method, where a test is added whether the two involved nodes have the
> same database. If so, then a normal relationship is created.
>
> If the two database are different, the following steps can be taken:
>
> 1) lookup in the source database (through an index or some other
> mechanism) if a shadow node exists for the node the relationship is
> created to.
> 2) if no such shadow node exists,


For the above to work, you MUST have so way to LOCK the database.
Any time you do a lookup and then make a decision based on that lookup
you must hold a lock. This is what makes sharing hard. The need
for the lock means that no updates can be made. If these are many
users this is a huge bottle neck on performance. I worked on a
system once where there were a dozen users and each needed to hold a
lock for about 1/10th second. It should be clear that doing 10
operations per second was nt possible. We had to give up and move to
a "real" SQL database that had better, dinner grain locking. We went
with PostgreSQL. A web app can have potentially very many users
and if only one can hold a lock at a time you are "sunk".
Chris Albertson
Redondo Beach, California

Niels Hoogeveen

unread,
Jun 21, 2012, 9:25:58 AM6/21/12
to Neo4j
Hi Chris,

I am not sure why you would have to lock an entire database to make
sharding work, especially when it comes to simple lookups. Would you
care to expand?

On Jun 20, 11:49 pm, Chris Albertson <albertson.ch...@gmail.com>
wrote:

Ven Karri

unread,
Jul 12, 2014, 12:49:27 AM7/12/14
to ne...@googlegroups.com, johnny....@gmail.com
Truth spoken. Couldn't agree more with you.

Shireesh

unread,
Jul 14, 2014, 7:59:44 AM7/14/14
to ne...@googlegroups.com, charle...@zoho.com

  I am still confused with schema-less nature.

  As as can see it, still Neo4j gives us tightly coupled architecture.

  Imagine the Graph grows big as the project progresses and one day we got a new requirement which makes us to introduce new node between existing structure.
  Now this will have a cascading effect all over the graph. all the existing traversals needs to be reworked to include the new node and relationship.

  Which will have impact on all the components as the whole Graph is connected.
  
  Am i missing anything ?

  Thanks,
  Shireesh.

Benjamin Makus

unread,
Jul 14, 2014, 11:02:54 AM7/14/14
to ne...@googlegroups.com, charle...@zoho.com
That is a problem in your applications architecture. If you use MySQL and have a on-to-many relation between A and B, and now you need to store an entity C between A and B, you've got to alter the schema and run an update on all entries.
(if you can tell us a solution that works in SQL, than there's a 99% chance that it works, too, in Neo4j)

Schemaless means, that each node (and relation) can store whatever you want it to. Node 1 can have { a: true, b: "B" } and node 2 can have { a: 42, b: ["A", "B"], c: false }. So there's no schema, that means: you can't say all a-properties are of type boolean, and you can't say every node has 3 properties.

Btw: If you've got a need, to add a new node between some existing nodes, then Neo4j won't care. You can do whatever you want to:
Node 1 is related to Node 2
and after the update you can have: Node 1 is related to Node 2 and Node 1 is related to Node 3, which is related to Node 2. No problem.

Again: There's no schema that says, that Node 1 can only have 1 relation, it can have as many you like and it can relate to every other node, no matter what this node is in your application.

For Neo4j, all your data are just nodes, nothing more. They've got no type and arbitrary content. If your application says that Node 1 is a Car and Node 2 is a CryptoKey, you can still tell the database to relate them.

shireesh adla

unread,
Jul 14, 2014, 11:40:35 AM7/14/14
to ne...@googlegroups.com

Thanks Benjamin.
I can now connect on what you explained about schema.
Schema less is in terms of a Node and i [wrongly] assumed it for the Graph.

Now coming to application architecture problem.
as we know that we cannot freeze our architecture due to ever changing requirements, what we can do to make it more flexible.

i can understand its a different problem altogether, but can we come up with a " best-practice" graph structure which can handle these kind of scenarios.
following which will give you a more flexible graph structure which will be resilient to new changes.

Shireesh.



--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/mts6H9Py-2I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
eThanks & eRegards,
Shireesh Adla.
09246081931
Reply all
Reply to author
Forward
0 new messages