Akka Cluster and Persistence Consistency

Oleg Mürk

unread,

Jan 12, 2015, 12:04:06 PM1/12/15

to akka...@googlegroups.com

Hello!

I would like to better understand and confirm consistency guarantees of Akka Cluster and Persistence (ideally in terms of CAP and CRDTs) and apologise if these questions have been discussed before :)

Akka Cluster:

Is it possible to configure Akka Cluster with auto-downing so that with 100% probability at each moment at most one cluster node thinks that he is the leader (or singleton) even in case of cluster partitions? Some emails/blog posts imply that requiring cluster partition to be majority quorum does the trick? How to do it with auto-scaling clusters?

Akka Persistence:

Am I correct that if two actors with the same persistence id work concurrently on different cluster partitions the sequence numbers of events will get mixed up? Does it imply that if Akka Persistence is used with Akka Cluster Sharding then Akka Cluster must guarantee uniqueness of ShardCoordinator?

Some relevant messages that I managed to find are:

* Quorum-based split brain resolution

https://groups.google.com/d/msg/akka-user/UBSF3QQnGaM/r8YUEzr2EtEJ

https://groups.google.com/d/msg/akka-user/UBSF3QQnGaM/Jsg30N9ORQQJ

* Exactly *one* persistence actor with the same persistenceId active ?

https://groups.google.com/d/msg/akka-user/9E65YaKI2BE/SA3px74h0ugJ

https://groups.google.com/d/msg/akka-user/9E65YaKI2BE/hZon4M_z0AgJ

Thank You!

Oleg Mürk

Akka Team

unread,

Jan 14, 2015, 11:02:33 AM1/14/15

to Akka User List

Hi Oleg,

Your understanding of both topics, in the current version of Akka, are correct.

More specifically:

Is it possible to configure Akka Cluster with auto-downing so that with 100% probability at each moment at most one cluster node thinks that he is the leader (or singleton) even in case of cluster partitions? Some emails/blog posts imply that requiring cluster partition to be majority quorum does the trick?

The auto-downing strategy currently provided is a rather naive one - just timeout based. It does not guard against split brains, which is why I would rather not encourage using auto-downing if your cluster needs any kind of "single" entity.

The timer based auto downing works well for clusters where you have "many workers, but no master" for example, since causing a split need not end in a wrong "leader" being elected (since there is no leader).

Yes, quorum would help making downing more safe (avoid split brains), however we have not implemented it yet (we are aware of the need and possibility of course).

How to do it with auto-scaling clusters?

In general it boils down to disallowing situations where the cluster can get "two majorities" (one side knows only old nodes, new side has more nodes and didn't reach quorum yet).

One of the methods I'm aware of is described in the raft paper - section 6: membership changes, which describes how it guarantees that at no point in time there can be "two majorities".

We have not looked into implementing a better downing strategy yet but it's "on the radar" - it's not a given it would be raft based, needs more thought put into it.

Am I correct that if two actors with the same persistence id work concurrently on different cluster partitions the sequence numbers of events will get mixed up?

Correct, currently these sequence numbers are "source of truth" and mixing them up causes problems during replay.

Does it imply that if Akka Persistence is used with Akka Cluster Sharding then Akka Cluster must guarantee uniqueness of ShardCoordinator?

In fact, the cluster sharding *uses* persistence in order to survive the leader going down - so we can restore the shard allocation information once the new coordinator boots up.

Yes, currently there is a hard requirement for "only one writer" in akka-persistence. This can be facilitated by either cluster-sharding or cluster-singletons.

Slightly related "future work": We do have some CRDT work by Patrik Nordwall stashed (it's public as akka-data-replication) and will want to move it into Akka main at some point,

those of course do not require any kind leaders in the cluster.

Hope this helps!

--

Konrad

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

--

Akka Team

Typesafe - The software stack for applications that scale

Blog: letitcrash.com
Twitter: @akkateam

Endre Varga

unread,

Jan 14, 2015, 11:05:44 AM1/14/15

to akka...@googlegroups.com

On Wed, Jan 14, 2015 at 5:02 PM, Akka Team <akka.o...@gmail.com> wrote:

Hi Oleg,
Your understanding of both topics, in the current version of Akka, are correct.
More specifically:

Is it possible to configure Akka Cluster with auto-downing so that with 100% probability at each moment at most one cluster node thinks that he is the leader (or singleton) even in case of cluster partitions? Some emails/blog posts imply that requiring cluster partition to be majority quorum does the trick?
The auto-downing strategy currently provided is a rather naive one - just timeout based. It does not guard against split brains, which is why I would rather not encourage using auto-downing if your cluster needs any kind of "single" entity.
The timer based auto downing works well for clusters where you have "many workers, but no master" for example, since causing a split need not end in a wrong "leader" being elected (since there is no leader).

Yes, quorum would help making downing more safe (avoid split brains), however we have not implemented it yet (we are aware of the need and possibility of course).

Well, to be precise, everything is there for anyone to implement their own downing strategy, we just not provided a prepackaged solution for example for quorum. Akka cluster is a toolbox, but it seems like there is high demand for pre-packaged "just works" solutions.

-Endre

Oleg Mürk

unread,

Jan 14, 2015, 12:58:52 PM1/14/15

to akka...@googlegroups.com

Hello!

Thanks for the answers! I have a few more questions/clarifications below, if You don't mind.

On Wednesday, January 14, 2015 at 6:02:33 PM UTC+2, Akka Team wrote:

Is it possible to configure Akka Cluster with auto-downing so that with 100% probability at each moment at most one cluster node thinks that he is the leader (or singleton) even in case of cluster partitions? Some emails/blog posts imply that requiring cluster partition to be majority quorum does the trick?
The auto-downing strategy currently provided is a rather naive one - just timeout based. It does not guard against split brains, which is why I would rather not encourage using auto-downing if your cluster needs any kind of "single" entity.
The timer based auto downing works well for clusters where you have "many workers, but no master" for example, since causing a split need not end in a wrong "leader" being elected (since there is no leader).

I'd like to better understand what happens to a Leader if it becomes unreachable. Auto-down is impossible because it can only be performed by the Leader, right? Can I execute Down command on non-leader member of cluster when the Leader is unreachable?

Am I correct that if two actors with the same persistence id work concurrently on different cluster partitions the sequence numbers of events will get mixed up?
Correct, currently these sequence numbers are "source of truth" and mixing them up causes problems during replay.

Does it imply that if Akka Persistence is used with Akka Cluster Sharding then Akka Cluster must guarantee uniqueness of ShardCoordinator?
In fact, the cluster sharding *uses* persistence in order to survive the leader going down - so we can restore the shard allocation information once the new coordinator boots up.

Yes, currently there is a hard requirement for "only one writer" in akka-persistence. This can be facilitated by either cluster-sharding or cluster-singletons.

So Persistence and consequently Sharding cannot be used in potential split-brain cluster configurations? Because ShardCoordinator could potentially run on both partitions simultaneously?

Also if Persistence is storing state in eventually consistent replicated journal (eg Cassandra) wouldn't there be inconsistencies in case of journal cluster network partition?

Slightly related "future work": We do have some CRDT work by Patrik Nordwall stashed (it's public as akka-data-replication) and will want to move it into Akka main at some point, those of course do not require any kind leaders in the cluster.

As I am sure You know, there is whole infrastructure for consistent highly reliable distributed coordination:

http://zookeeper.apache.org/

http://curator.apache.org/

http://helix.apache.org/

Zookeeper's async API at first glance looks like a perfect match for Akka and FSM?

But as outlined below You discovered that Zookeeper doesn't suit Your needs well. May be You could share Your findings?

http://letitcrash.com/post/26626939256/status-report-of-akka-cluster

> In the process we have implemented and thrown away prototypes using both ZooKeeper, JGroups and Hazelcast.

> None of them solved the problem optimally and each one imposed its own set of unnecessary constraints and drawbacks.

> We strongly believe that a loosely coupled, eventually consistent, fully decentralized and self-coordinating P2P solution

> is the right way to scale an actor system and to make it fully resilient to failure.

Thank You!

Oleg

Patrik Nordwall

unread,

Jan 15, 2015, 1:50:23 AM1/15/15

to akka...@googlegroups.com

On Wed, Jan 14, 2015 at 6:58 PM, Oleg Mürk <oleg...@gmail.com> wrote:

Hello!

Thanks for the answers! I have a few more questions/clarifications below, if You don't mind.

On Wednesday, January 14, 2015 at 6:02:33 PM UTC+2, Akka Team wrote:

Is it possible to configure Akka Cluster with auto-downing so that with 100% probability at each moment at most one cluster node thinks that he is the leader (or singleton) even in case of cluster partitions? Some emails/blog posts imply that requiring cluster partition to be majority quorum does the trick?
The auto-downing strategy currently provided is a rather naive one - just timeout based. It does not guard against split brains, which is why I would rather not encourage using auto-downing if your cluster needs any kind of "single" entity.
The timer based auto downing works well for clusters where you have "many workers, but no master" for example, since causing a split need not end in a wrong "leader" being elected (since there is no leader).

I'd like to better understand what happens to a Leader if it becomes unreachable. Auto-down is impossible because it can only be performed by the Leader, right? Can I execute Down command on non-leader member of cluster when the Leader is unreachable?

Akka Cluster doesn't require a strict leader for managing the cluster membership. The membership data is a Conflict Free Replicated Data Type (CRDT) so if there are conflicting updates they can be merged anyway. The leader is just a role for a node that performs certain actions and it is alright if several nodes thinks they have this role.

Am I correct that if two actors with the same persistence id work concurrently on different cluster partitions the sequence numbers of events will get mixed up?
Correct, currently these sequence numbers are "source of truth" and mixing them up causes problems during replay.

Does it imply that if Akka Persistence is used with Akka Cluster Sharding then Akka Cluster must guarantee uniqueness of ShardCoordinator?
In fact, the cluster sharding *uses* persistence in order to survive the leader going down - so we can restore the shard allocation information once the new coordinator boots up.
Yes, currently there is a hard requirement for "only one writer" in akka-persistence. This can be facilitated by either cluster-sharding or cluster-singletons.

So Persistence and consequently Sharding cannot be used in potential split-brain cluster configurations? Because ShardCoordinator could potentially run on both partitions simultaneously?

It is right that you must only be one active instance of a PersistentActor with a given persistenceId, i.e. single writer to the journal. That is true also for the ShardCoordinator, since it is a PersistentActor.

That means that you must handle network partitions carefully and use a proper downing strategy as discussed earlier. We have acknowledged that a packaged solution for improving this have been requested by many users.

Also if Persistence is storing state in eventually consistent replicated journal (eg Cassandra) wouldn't there be inconsistencies in case of journal cluster network partition?

Not if you use a proper consistency level for the write and reads, e.g. quorum.

Slightly related "future work": We do have some CRDT work by Patrik Nordwall stashed (it's public as akka-data-replication) and will want to move it into Akka main at some point, those of course do not require any kind leaders in the cluster.

As I am sure You know, there is whole infrastructure for consistent highly reliable distributed coordination:
http://zookeeper.apache.org/
http://curator.apache.org/
http://helix.apache.org/
Zookeeper's async API at first glance looks like a perfect match for Akka and FSM?

But as outlined below You discovered that Zookeeper doesn't suit Your needs well. May be You could share Your findings?
http://letitcrash.com/post/26626939256/status-report-of-akka-cluster
> In the process we have implemented and thrown away prototypes using both ZooKeeper, JGroups and Hazelcast.
> None of them solved the problem optimally and each one imposed its own set of unnecessary constraints and drawbacks.
> We strongly believe that a loosely coupled, eventually consistent, fully decentralized and self-coordinating P2P solution
> is the right way to scale an actor system and to make it fully resilient to failure.

We don't need strong coordination for cluster membership and we have the goal to support large clusters, which I believe would not be possible with Zookeeper. Zookeeper is great, but it is solving a different set of problems.

It's also possible to build coordination services on top of Akka Cluster, as illustrated by Konrad's akka-raft prototype.

Regards,

Patrik

Thank You!
Oleg

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

--

Patrik Nordwall
Typesafe - Reactive apps on the JVM
Twitter: @patriknw

Oleg Mürk

unread,

Jan 15, 2015, 10:47:10 AM1/15/15

to akka...@googlegroups.com

Hello!

On Thursday, January 15, 2015 at 8:50:23 AM UTC+2, Patrik Nordwall wrote:

On Wed, Jan 14, 2015 at 6:58 PM, Oleg Mürk <oleg...@gmail.com> wrote:
I'd like to better understand what happens to a Leader if it becomes unreachable. Auto-down is impossible because it can only be performed by the Leader, right? Can I execute Down command on non-leader member of cluster when the Leader is unreachable?

Akka Cluster doesn't require a strict leader for managing the cluster membership. The membership data is a Conflict Free Replicated Data Type (CRDT) so if there are conflicting updates they can be merged anyway. The leader is just a role for a node that performs certain actions and it is alright if several nodes thinks they have this role.

Got it, thanks!

So Persistence and consequently Sharding cannot be used in potential split-brain cluster configurations? Because ShardCoordinator could potentially run on both partitions simultaneously?

It is right that you must only be one active instance of a PersistentActor with a given persistenceId, i.e. single writer to the journal. That is true also for the ShardCoordinator, since it is a PersistentActor. That means that you must handle network partitions carefully and use a proper downing strategy as discussed earlier. We have acknowledged that a packaged solution for improving this have been requested by many users.

<...>

We don't need strong coordination for cluster membership and we have the goal to support large clusters, which I believe would not be possible with Zookeeper. Zookeeper is great, but it is solving a different set of problems.

I agree that current Akka's CRDT based group membership protocol serves its purpose well.

It's also possible to build coordination services on top of Akka Cluster, as illustrated by Konrad's akka-raft prototype.

I suppose one could use Zookeeper for ensuring uniqueness of a Singleton or ShardCoordinator/ShardRegion? AFAIK Zookeeper scales to 1000s of clients?

Another question I had is how to handle situations when eg ShardCoordinator/ShardRegion cannot communicate with Zookeeper, while ShardRegion's persistent Entities can communicate with, say, HBase. But I guess it a question for another mailig list :)