Hi - I'd like to understand the differences in implementation and performance between storing data in a graph database such as Neo4j vs. a document store like MongoDB. Can mongo be used to create user profile documents, wherein each profile may contain user ID's for one or more of other users (i.e., to represent a graph of users)? In such case, what is the performance between mongodb and neo4j?And also, are there areas where neo4j and mongo complement each other (rather than compete for a graph solution)? I am trying to pick the ideal stack to create a social network. Any recommendations will be great! Thanks in advance!
MongoDB works much better when all of the data on a given partition can fit in memory. If any of the nodes get bogged down in disk I/O, mongo doesn’t do so well. What makes it scale is that many of data collections that can be easily stored in MongoDB are also easily partitioned. This lets you spread your data across a number of servers … letting you fit data sets which are larger than the memory capacity of a single server still nearly entirely in memory.
Partitioning graph data can be hard – it depends on the graph – so it is particularly challenging to find ways to keep large graphs (in general) entirely in memory when you outgrow your commodity hardware resources.
An interesting hybrid (graph-massive parallel datastore) technology I’ve been looking at lately is Titan. It lets you run the full Tinkerpop suite on top of HDFS. Neo4j is way slicker and much more mature, but if scaling is a serious concern, Titan may be worth a look - http://thinkaurelius.github.com/titan/
A similar, competing project, is Giraph - http://incubator.apache.org/giraph/
And, for this type of graph on top of Hadoop there is even infogrid...
/peter
Send from mobile.
An interesting hybrid (graph-massive parallel datastore) technology I’ve been looking at lately is Titan. It lets you run the full Tinkerpop suite on top of HDFS. Neo4j is way slicker and much more mature, but if scaling is a serious concern, Titan may be worth a look - http://thinkaurelius.github.com/titan/A similar, competing project, is Giraph - http://incubator.apache.org/giraph/