How to recover from network partition quarantine

Tom Pantelis

unread,

Jul 24, 2015, 3:34:02 AM7/24/15

to Akka User List

During a network partition, the partitioned node is removed from the cluster after auto-down occurs and quarantined such that it must restarted in order to rejoin the cluster once the partition heals. A manual restart due to a temporary network outage is problematic when one is developing a commercial product with end users who will expect automatic recovery (and rightly so).

One option is disable auto-down but that introduces another issue. In lieu of that,

1) is there any way to disable the quarantine behavior?

2) is there any way for code to node know or get notified that it has been quarantined and must be restarted so it can be handled automatically?

Thanks,

Tom

Patrik Nordwall

unread,

Aug 5, 2015, 7:17:40 AM8/5/15

to akka...@googlegroups.com

Hi Tom,

On Fri, Jul 24, 2015 at 4:24 AM, Tom Pantelis <tompa...@gmail.com> wrote:

During a network partition, the partitioned node is removed from the cluster after auto-down occurs and quarantined such that it must restarted in order to rejoin the cluster once the partition heals. A manual restart due to a temporary network outage is problematic when one is developing a commercial product with end users who will expect automatic recovery (and rightly so).

One option is disable auto-down but that introduces another issue. In lieu of that,

1) is there any way to disable the quarantine behavior?

No, because when it has been decided that it is not part of the cluster any more we don't want it to show up again. This is important for correct semantics of watch. We don't allow zombies.

2) is there any way for code to node know or get notified that it has been quarantined and must be restarted so it can be handled automatically?

Subscribe to cluster event MemberRemoved, but then the problem is that the auto-down has downed the nodes on the other side of the partition and you end up with two separate clusters. auto-down can handle crashed nodes, but it doesn't handle network partitions well. That is why we don't have it turned on by default and recommend against it when using cluster singleton and persistence.

It's possible to implement smarter downing strategies, but it is rather difficult to implement it correctly. We are working on something for improving this. Stay tuned.

Regards,

Patrik

Thanks,
Tom

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

--

Patrik Nordwall
Typesafe - Reactive apps on the JVM
Twitter: @patriknw

Morten Kjetland

unread,

Aug 5, 2015, 7:58:19 AM8/5/15

to akka...@googlegroups.com

Hi,

We have solved these issues like this:

We have a ClusterListener on each node that "pings" the database - As long as it is "online" and a happy member of the cluster it updates a timestamp in the database.

To detect split-brain scenarios, we do this:

The ClusterListener on each node keeps track of all members of the cluster in memory.

Periodically we check if there are more alive nodes in the database than we know is member of our cluster.

If we see more alive nodes than we have in the cluster, we know we have a split brain scenario.

To recover from it, the node waits a random amount of seconds, then trigger itself to restart (we spawn a process that executes "./application.sh restart")

When the node is starting up (again) we use the same "alive" mechanism in the database to find seed-nodes - so we actually join the existing cluster. If no one is alive, we know we are the first one starting up, so we're going to be our own seed node.

If it decided to join a cluster but failed to do so, it starts over again with a new restart.

This solution has, at least for us, turned out to be a robust solution which supports

* staged or instant startup of multiple nodes.

* auto-restarting multiple nodes when deploying new version.

* auto-healing when something odd happens in our data-center (like network-glitches or something causing the cpu to stall for too long)

We're planning to opensouce this code soon.

I hope this info was helpful.

Regards,

Morten

leo...@liveperson.com

unread,

Aug 5, 2015, 4:29:11 PM8/5/15

to Akka User List, m...@kjetland.com

Hi Morten,

We'd like to implement a solution similar to yours, can you elaborate on some details of your solution?

On Wednesday, August 5, 2015 at 2:58:19 PM UTC+3, Morten Kjetland wrote:

Hi,

We have solved these issues like this:

We have a ClusterListener on each node that "pings" the database - As long as it is "online" and a happy member of the cluster it updates a timestamp in the database.

To detect split-brain scenarios, we do this:
The ClusterListener on each node keeps track of all members of the cluster in memory.
Periodically we check if there are more alive nodes in the database than we know is member of our cluster.
If we see more alive nodes than we have in the cluster, we know we have a split brain scenario.

I think there might be a possible race condition here, what if one or more new node join the cluster and update the DB before all other nodes learned about the new nodes? In this case other nodes might think they are in a split brain situation and restart themselves, right? How do you prevent this?

To recover from it, the node waits a random amount of seconds, then trigger itself to restart (we spawn a process that executes "./application.sh restart")

While waiting a random amount of seconds, is the node still part of the cluster or has it left the cluster already?

When the node is starting up (again) we use the same "alive" mechanism in the database to find seed-nodes - so we actually join the existing cluster. If no one is alive, we know we are the first one starting up, so we're going to be our own seed node.

"If no one is alive, we know we are the first one starting up" - have you implemented this with some atomic operation, like "check and set", to prevent starting 2 clusters? What DB do you use for this?

If it decided to join a cluster but failed to do so, it starts over again with a new restart.

This solution has, at least for us, turned out to be a robust solution which supports

* staged or instant startup of multiple nodes.
* auto-restarting multiple nodes when deploying new version.
* auto-healing when something odd happens in our data-center (like network-glitches or something causing the cpu to stall for too long)

Do you use akka persistence/cluster sharding? I'm asking because we do use both and have found them to be sensitive to split brain.

We're planning to opensouce this code soon.

I hope this info was helpful.

Regards,
Morten

Thanks,

Leonid

This message may contain confidential and/or privileged information.

If you are not the addressee or authorized to receive this on behalf of the addressee you must not use, copy, disclose or take action based on this message or any information herein.

If you have received this message in error, please advise the sender immediately by reply email and delete this message. Thank you.

Morten Kjetland

unread,

Aug 6, 2015, 3:38:29 AM8/6/15

to leo...@liveperson.com, Akka User List

…

On Wed, Aug 5, 2015 at 9:45 PM <leo...@liveperson.com> wrote:

Hi Morten,

We'd like to implement a solution similar to yours, can you elaborate on some details of your solution?

First let me say that our solution is not optimal, but it works.

It was improved over time to work with what we experienced in production, and has ended up to be robust enough - at least for now.

…

On Wednesday, August 5, 2015 at 2:58:19 PM UTC+3, Morten Kjetland wrote:
Hi,

We have solved these issues like this:

We have a ClusterListener on each node that "pings" the database - As long as it is "online" and a happy member of the cluster it updates a timestamp in the database.

To detect split-brain scenarios, we do this:
The ClusterListener on each node keeps track of all members of the cluster in memory.
Periodically we check if there are more alive nodes in the database than we know is member of our cluster.
If we see more alive nodes than we have in the cluster, we know we have a split brain scenario.

I think there might be a possible race condition here, what if one or more new node join the cluster and update the DB before all other nodes learned about the new nodes? In this case other nodes might think they are in a split brain situation and restart themselves, right? How do you prevent this?

I think you have a point here..

A node writes that it is alive when it self knows that it has joined the cluster.

But you are right that a different node might see this alive message before itself is aware of the odder node having joined the cluster.

I'm adding a todo to improve it. thanks :)

…

To recover from it, the node waits a random amount of seconds, then trigger itself to restart (we spawn a process that executes "./application.sh restart")

While waiting a random amount of seconds, is the node still part of the cluster or has it left the cluster already?

As we know, we do not want these error-situations to happen, but when it does happen, we would like to recover from it, and we do it by restarting our app.

But to prevent a theoretical problem where all of our cluster nodes restarts at the same time over and over again, I have introduces a random delay, to make sure not everything happens at the same time (*if* multiple nodes detect an error at the same time).

I guess this could be improved by trying to leave the cluster right away, then wait some time before restart.

…

When the node is starting up (again) we use the same "alive" mechanism in the database to find seed-nodes - so we actually join the existing cluster. If no one is alive, we know we are the first one starting up, so we're going to be our own seed node.

"If no one is alive, we know we are the first one starting up" - have you implemented this with some atomic operation, like "check and set", to prevent starting 2 clusters? What DB do you use for this?

It is not an atomic operation at this time, but should this situation you describes happen, then the error-detection would detect and fix it.

I guess this could also be improved.

We use Oracle (Company decision)

…

If it decided to join a cluster but failed to do so, it starts over again with a new restart.

This solution has, at least for us, turned out to be a robust solution which supports

* staged or instant startup of multiple nodes.
* auto-restarting multiple nodes when deploying new version.
* auto-healing when something odd happens in our data-center (like network-glitches or something causing the cpu to stall for too long)

Do you use akka persistence/cluster sharding? I'm asking because we do use both and have found them to be sensitive to split brain.

Yes, we have multiple micro-service applications all using akka persistence with sharding.

We also experienced that it was sensible to these problems, so what I have described is our way of getting around these problems.

…

Patrik Nordwall

unread,

Sep 22, 2015, 5:23:56 AM9/22/15

to akka...@googlegroups.com, leo...@liveperson.com

For the archives (and I promised to get back):

Akka Split Brain Resolver (Akka SBR) is a new commercial feature available exclusively to Typesafe PSS subscribers.

It's part of the Typesafe Reactive Platform and implements a number of strategies on how downing can be performed more safely than just timeouts (auto-downing). The strategies are for example "static quorum" or "keep majority" etc. Each of them has specific trade-offs, i.e. scenarios where they work well, and failure scenarios where the strategy would make a decision consistent with how it's working, but maybe not what you need.

The docs are available here: http://doc.akka.io/docs/akka/rp-15v09p01/scala/split-brain-resolver.html and go pretty in-depth about how it all works.

Konrad did a webinar about new features in Akka 2.4 and Reactive Platform and it also covered the Split Brain Resolver a bit: https://youtu.be/D3mPl8OUrjs?t=9m11s (9 minute mark is about SBR).

In order to use this in production you'll need to obtain a Reactive Platform subscription, more details here: http://www.typesafe.com/products/typesafe-reactive-platform (it also explains on the bottom how you can try it out).

--

>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward