Neo4j 3.1: Embedded Causal Clustering?

363 views
Skip to first unread message

Timo Tiuraniemi

unread,
Dec 5, 2016, 2:26:29 PM12/5/16
to Neo4j
Hi,

I have previously successfully set up a Neo4j HA cluster using embedded Neo4j. While it does work nicely, I could simplify things a lot if I could switch to causal clustering. The problem is that in the documentation for the new 3.1, there is no mention of embedded use with causal cluster, and furthermore, the Java driver is claimed to be responsible for consistency.

So to clarify: Is it possible to setup causal clustering using embedded Neo4j directly from my Java server, without any use of Cypher and without a Java driver? Or are you deliberately trying to remove embedded Neo4j use?

Cheers,
Timo

Michael Hunger

unread,
Dec 6, 2016, 3:46:32 AM12/6/16
to ne...@googlegroups.com
Hi,

as I learned yesterday you can use (with the correct configs) causal cluster in embedded but currently have to make sure yourself that the writes are going to the leader, there is no auto-routing. It's not an officially suggested mode of operations.

The recommendation from the team is to only use read-replicas in embedded mode and run a regular core-cluster as neo4j server.

If you can make sure all your writes go to the leader always then it should also be possible (imho) to run all instances embedded.
Please try it out and let us know.

Michael


--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Timo Tiuraniemi

unread,
Dec 6, 2016, 4:41:46 AM12/6/16
to Neo4j
Thanks Michael, once again,  for a swift answer!

When you say "make sure yourself writes are going to the leader", does that mean that the "core servers" in the cluster are not all equivalent, but there is always a leader in the core group – similarly to the "master" in the HA setting – and others in the core group are "followers"? And I can't write to any of the core group servers, even though in the 3.1 blog posts they are depicted in the diagrams as all being read/write, but instead have to find the leader and write to it?

If that is the case, then I'll continue with my HA setup.

But if you mean that all core servers are "leaders" and I just have to make sure I'm not writing to the read replicates, then that's ok, and very much understandable. I can start of with just using 3-5 core servers – and utilize read replicates if need be in the future with a custom load balancing – but being able to write and read to and from any of them without any performance penalties, is a big bonus. (Compared to the HA setting, where I had to jump quite a few hoops to avoid writing to master.)

In embedded mode, I have been using neo4j.properties for HA settings. Can I just configure these:

http://neo4j.com/docs/operations-manual/3.1-beta/deployment/causal-cluster/settings-summary/

settings in neo4j.properties in place of HA settings? Another thing is that to be able to know if a server has successfully become part of the core servers, I would need to find that out from some Java class. Are there Cause Cluster equivalents for the "HighlyAvailableGraphDatabase.getInstanceState" method I'm using now?

--
Timo
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.

Michael Hunger

unread,
Dec 6, 2016, 4:51:58 AM12/6/16
to ne...@googlegroups.com, Mark Needham
Answers inline

On Tue, Dec 6, 2016 at 10:41 AM, Timo Tiuraniemi <timo.ti...@gmail.com> wrote:
Thanks Michael, once again,  for a swift answer!

When you say "make sure yourself writes are going to the leader", does that mean that the "core servers" in the cluster are not all equivalent, but there is always a leader in the core group – similarly to the "master" in the HA setting – and others in the core group are "followers"? And I can't write to any of the core group servers, even though in the 3.1 blog posts they are depicted in the diagrams as all being read/write, but instead have to find the leader and write to it?

If that is the case, then I'll continue with my HA setup.


The core-servers are all writing data, only the leader coordinates it. 

In Bolt if you use bolt+routing, the smart-client figures out (using the cluster topology) where to send read and write queries to, in embedded you would have to do it yourself.

But if you mean that all core servers are "leaders" and I just have to make sure I'm not writing to the read replicates, then that's ok, and very much understandable. I can start of with just using 3-5 core servers – and utilize read replicates if need be in the future with a custom load balancing – but being able to write and read to and from any of them without any performance penalties, is a big bonus. (Compared to the HA setting, where I had to jump quite a few hoops to avoid writing to master.)


Causal Clusters have several advantages over HA, one is better stability at scale and load, and no more branched data, b/c it is now a proper CP system. The other one is read scalability through read-replicas. And the third one RYOW with bookmarks which allows that "causal consistency". 

 
In embedded mode, I have been using neo4j.properties for HA settings. Can I just configure these:

http://neo4j.com/docs/operations-manual/3.1-beta/deployment/causal-cluster/settings-summary/

Yes. 


settings in neo4j.properties in place of HA settings? Another thing is that to be able to know if a server has successfully become part of the core servers, I would need to find that out from some Java class. Are there Cause Cluster equivalents for the "HighlyAvailableGraphDatabase.getInstanceState" method I'm using now?

There are some Cypher procedures that can be called, which use some internal API, which I'm not sure what it's called.

See here for an example

 
The implementation of dbms.cluster.overview is here

Michael

--
Timo


On Tuesday, December 6, 2016 at 10:46:32 AM UTC+2, Michael Hunger wrote:
Hi,

as I learned yesterday you can use (with the correct configs) causal cluster in embedded but currently have to make sure yourself that the writes are going to the leader, there is no auto-routing. It's not an officially suggested mode of operations.

The recommendation from the team is to only use read-replicas in embedded mode and run a regular core-cluster as neo4j server.

If you can make sure all your writes go to the leader always then it should also be possible (imho) to run all instances embedded.
Please try it out and let us know.

Michael


On Mon, Dec 5, 2016 at 8:26 PM, Timo Tiuraniemi <timo.ti...@gmail.com> wrote:
Hi,

I have previously successfully set up a Neo4j HA cluster using embedded Neo4j. While it does work nicely, I could simplify things a lot if I could switch to causal clustering. The problem is that in the documentation for the new 3.1, there is no mention of embedded use with causal cluster, and furthermore, the Java driver is claimed to be responsible for consistency.

So to clarify: Is it possible to setup causal clustering using embedded Neo4j directly from my Java server, without any use of Cypher and without a Java driver? Or are you deliberately trying to remove embedded Neo4j use?

Cheers,
Timo

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.

Jim Webber

unread,
Dec 8, 2016, 3:16:32 AM12/8/16
to Neo4j
Hello Timo,

I'm a member of the team that wrote the causal clustering code. Let me try to answer your questions.

I have previously successfully set up a Neo4j HA cluster using embedded Neo4j. While it does work nicely, I could simplify things a lot if I could switch to causal clustering. The problem is that in the documentation for the new 3.1, there is no mention of embedded use with causal cluster, and furthermore, the Java driver is claimed to be responsible for consistency.

The cluster exposes an API (actually a procedure) that allows anyone to achieve causal consistency. It's just that the Java driver has that little algorithm baked-in (and it's supported by full time neo4j engineers etc). Community drivers will definitely follow the same pattern, and we'll be releasing supported drivers for JavaScript, Python and .NET soon too.

Causal clustering is not supported in embedded mode in Neo4j 3.1. We actually need to write a new locking module to make it work properly. We did write a prototype of that module during the 3.1 delivery timeframe, but we shelved it in the end and opted to get the server-based version out first, since most folks treat their database as a server. Embedded support will come once we dust off that locking module.
 
So to clarify: Is it possible to setup causal clustering using embedded Neo4j directly from my Java server, without any use of Cypher and without a Java driver? Or are you deliberately trying to remove embedded Neo4j use?

 Neo4j is always embeddable - it is even delivered in its component jars to help facilitate that. However when we have control over the running process (as we do in server mode) lots of things become easier to code (because we don't have to second-guess how an embedded app is behaving). Sometimes this means we can get features out more quickly for server-based systems (as was the case for causal clustering). I don't think this means we're deprecating embedded mode, but it's wrong for us to hold back a good quality server release waiting for the additional embedded code to be written.

Hope that helps,

JIm

Timo Tiuraniemi

unread,
Dec 12, 2016, 6:00:48 AM12/12/16
to Neo4j
Hi Jim,

 Neo4j is always embeddable - it is even delivered in its component jars to help facilitate that. However when we have control over the running process (as we do in server mode) lots of things become easier to code (because we don't have to second-guess how an embedded app is behaving). Sometimes this means we can get features out more quickly for server-based systems (as was the case for causal clustering). I don't think this means we're deprecating embedded mode, but it's wrong for us to hold back a good quality server release waiting for the additional embedded code to be written.

Thank you for the reassurance about the future of embedded use of Neo4j. I fully understand your reasons for skipping embedded support for now, and it's great to hear embedded causal clustering is in the Neo4j pipeline! 

Cheers,
Timo

Timo Tiuraniemi

unread,
May 14, 2017, 3:48:18 AM5/14/17
to Neo4j

Any updates on embedded causal clustering support? Is it still in the pipeline?

--
Timo

Aishwarya S

unread,
Jun 25, 2018, 1:44:56 AM6/25/18
to Neo4j
Hi Jim,

I have a question on bookmarks in Neo4j. I want to fetch data from Neo4j using the bookmarkId which I had got as aa result of the previous write. But I am not sure if Neo4j is using the bookmark which I pass or not. How to make sure that if my bookmarkId is used while querying? Can I verify it by

1. checking if native bolt transaction has the bookmarkId
2. or the OGM or the Bolt driver session has the bookamark?

Please help.

Timo Tiuraniemi

unread,
Apr 4, 2019, 12:59:35 AM4/4/19
to Neo4j
Hi!

Checking up on this: HA is being deprecated for 4.0 but there's still no embedded causal clustering. Are you planning on just forgetting about embedded users?

timo.ti...@gmail.com

unread,
May 7, 2019, 5:37:46 AM5/7/19
to Neo4j
Hello,

would it be possible to get some statement on your plans with this? Jim Weber? If you are removing HA from 4.0 and at the same time for example Scala 2.12 (https://github.com/neo4j/neo4j/issues/8832) is coming only in 4.0, that leaves every embedded user in a terrible place. Do note that there was a point when embedded was the suggested use of Neo4j, and there was no Cypher then, which means migrating to 4.0 without embedded causal clustering would be akin to a complete rewrite of the entire codebase.
Reply all
Reply to author
Forward
0 new messages