Neo4J for "big data"

109 views
Skip to first unread message

RickBullotta

unread,
Oct 12, 2012, 12:33:12 PM10/12/12
to ne...@googlegroups.com
I was giving some thought to the capacities/limits of Neo4J, and I realized that the 32 billion node/relationship limit, while it seems large, is not really all that huge.  Imagine a system (web server, app server, machine) that generates some interesting data at a fairly high rate (tweets, web hits, etc.).  And let's be conservative and assume that each of those instance of data require a node and 3 relationships.  The limiting factor then becomes 10 billion relationships.  If you do the math, that only allows data to be stored for 300 or so events per second for one year, and then you're maxed out.  If the data requires any more than a couple properties, then it becomes the limiting factor and the capacity shrinks even further.

Given that we live in an area of big (and getting bigger) data, are there plans to lift those limitations in future releases?

Wes Freeman

unread,
Oct 12, 2012, 12:40:19 PM10/12/12
to ne...@googlegroups.com
I'm also interested in eliminating those limits. My current plan is to just spread my graph out to multiple instances (easy for my use case--not so easy for others). I've heard there is a team of people working on solving this for neo4j, so hopefully this won't be a concern for much longer. 

Interested in hearing some official commentary as well. :)

Wes

On Fri, Oct 12, 2012 at 12:33 PM, RickBullotta <rick.b...@gmail.com> wrote:
I was giving some thought to the capacities/limits of Neo4J, and I realized that the 32 billion node/relationship limit, while it seems large, is not really all that huge.  Imagine a system (web server, app server, machine) that generates some interesting data at a fairly high rate (tweets, web hits, etc.).  And let's be conservative and assume that each of those instance of data require a node and 3 relationships.  The limiting factor then becomes 10 billion relationships.  If you do the math, that only allows data to be stored for 300 or so events per second for one year, and then you're maxed out.  If the data requires any more than a couple properties, then it becomes the limiting factor and the capacity shrinks even further.

Given that we live in an area of big (and getting bigger) data, are there plans to lift those limitations in future releases?

--
 
 

abhijith K

unread,
Oct 12, 2012, 11:24:45 PM10/12/12
to ne...@googlegroups.com
With the VFS in place(Linux), and NFS as a filesystem cant we actually shard Graph DB for many machines ? Just a thought came to my mind ... hence sharing
Reply all
Reply to author
Forward
0 new messages