sharding problem about neo4j

70 views
Skip to first unread message

Xiaobing LIU

unread,
Nov 5, 2014, 1:12:01 AM11/5/14
to ne...@googlegroups.com
Hi all,
    I am now facing the problem abut sharding huge database to different machines in the cluster when i use neo4j in my application, which has billion nodes and relationships scale. The scale will increase in the future. I have browsed the document about neo4j 2.1.5 and found that there is one section about cache-based sharding. I think neo4j cann't sharding automaticlly according the document and i may develop one wrapper to realize the sharding. Does anybody have some experience about this problem? How do you develop the wrapper? And another problem is that how can you guarantee that the relationships between the nodes which have been sharding to different machines are not broken? I also find that neo4j can only be configured into HA node(Master-Slaves), can it be configured into cluster mode? Thanks in advance.

Andrii Stesin

unread,
Nov 5, 2014, 10:11:31 AM11/5/14
to ne...@googlegroups.com
Just use cluster with 3 Neo4j nodes
Configure application so that
  • all writes are going to a single master node #0 (or teach ha-proxy to determine dynamically, who is your master right now)
  • reads for the "north pole" part of your graph (or to the 1-st cluster of it) are always going to node #1
  • reads for the "south pole" part of your graph (or to the 2-d cluster of it) are always going to node #2
  • ...etc...
The cluster will do the job for you, but ask your engineer to do his best at setting up ha-proxy correctly. I've experimented with this setup for a while, evaluating it's cost/benefit ratio; works perfectly. Also 2.1.5 release notes tell us that some improvements were also made.

WBR, Andrii

Michael Hunger

unread,
Nov 5, 2014, 11:09:41 AM11/5/14
to ne...@googlegroups.com
Can you tell us more about your data model and the use-cases you run on top of the data?

Thanks a lot MIchael

On Wed, Nov 5, 2014 at 7:12 AM, Xiaobing LIU <lxb...@gmail.com> wrote:
Hi all,
    I am now facing the problem abut sharding huge database to different machines in the cluster when i use neo4j in my application, which has billion nodes and relationships scale. The scale will increase in the future. I have browsed the document about neo4j 2.1.5 and found that there is one section about cache-based sharding. I think neo4j cann't sharding automaticlly according the document and i may develop one wrapper to realize the sharding. Does anybody have some experience about this problem? How do you develop the wrapper? And another problem is that how can you guarantee that the relationships between the nodes which have been sharding to different machines are not broken? I also find that neo4j can only be configured into HA node(Master-Slaves), can it be configured into cluster mode? Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Xiaobing LIU

unread,
Nov 5, 2014, 10:58:58 PM11/5/14
to ne...@googlegroups.com
Hi Andrii,
    Do you try this using enterprise version? I think community version cann't do 
synchronization between master and slaves automatically. It it right? If it can, then how do you do the synchronization between master and slaves?

Thanks in advance


Andrii Stesin 
11月5日(12 小时前)

Andrii Stesin

unread,
Nov 6, 2014, 3:19:14 AM11/6/14
to ne...@googlegroups.com
Yes, I'm evaluating enterprise package (startup license) and yes, community package doesn't include cluster functionality.

WBR,
Andrii

LIU Xiaobing

unread,
Nov 6, 2014, 3:26:27 AM11/6/14
to ne...@googlegroups.com
Hi Andrii,
    thanks for your reply, do you think neo4j can meet the real-time data need? I mean that both write and read. Batch insertion cann't use in real-time condition though it's performance is high when writing data. I found the write performance was a critical problem in real-time condition such as getting data from the internet and do some real-time computing.

Regards 

--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/xIUaACjDE7M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Best Regards
LIU Xiaobing 刘小兵

Andrii Stesin

unread,
Nov 10, 2014, 11:15:36 AM11/10/14
to ne...@googlegroups.com
batch insertion is a very special and specific case which generally is done once when you initially populate your graph with data taken elsewhere

whether Neo4j on your particular hardware with your particular JVM settings in your particular cluster configuration with your particular graph and queries, will suite your needs, I have no ideas of mine :)

my tests on my datasets and my hardware taught me that real-world performance depends on your Cypher skills more than on any other factor - once I was able to start with a query which worked for 5,5 seconds and handcraft it (indexing, query rethink and redesign) to work faster than 50 milliseconds with warm cache

WBR,
Andrii 
Reply all
Reply to author
Forward
0 new messages