Hazelcast Failover hooks and split brain support

1,335 views
Skip to first unread message

Tim Fox

unread,
Mar 1, 2013, 2:45:30 AM3/1/13
to Hazelcast
Hello All,

We are currently using HazelCast in the Vert.x project to distribute
subscription information across the cluster.

I am currently looking at implementing HA support in Vert.x, and would
like to be able to use Hazelcast for that too.

I have a couple of questions.

1. For Vert.x HA, applications can be run on certain Vert.x nodes, and
if those nodes fail we need to be able to detect that so we can
restart those applications on different nodes. Consequently we need
some way of Hazelcast providing a hook into it's internal group
management logic so it can notify us of cluster membership changes
(node join, node leave etc). I can't see any way of registering a
listener for these kind of events on the Hazelcast API...

2. Split brain. I can see that Hazelcast supports *merging* of
partitions after they have healed from a network partition but it does
not appear to detect and prevent split brain in the first place.
Common techniques for doing this usually involve a quorum. I.e.
partitioned sub-clusters will automatically shut down if they detect
that less than a quorum of members is visible. I can't see any support
for that currently in Hazelcast. Are there plans to implement
something like this?

Thanks!

Peter Veentjer

unread,
Mar 1, 2013, 3:40:19 AM3/1/13
to haze...@googlegroups.com
On Fri, Mar 1, 2013 at 9:45 AM, Tim Fox <timv...@gmail.com> wrote:
> Hello All,
>
> We are currently using HazelCast in the Vert.x project to distribute
> subscription information across the cluster.
>
> I am currently looking at implementing HA support in Vert.x, and would
> like to be able to use Hazelcast for that too.
>
> I have a couple of questions.
>
> 1. For Vert.x HA, applications can be run on certain Vert.x nodes, and
> if those nodes fail we need to be able to detect that so we can
> restart those applications on different nodes. Consequently we need
> some way of Hazelcast providing a hook into it's internal group
> management logic so it can notify us of cluster membership changes
> (node join, node leave etc). I can't see any way of registering a
> listener for these kind of events on the Hazelcast API...

Have a look at the Cluster:

http://www.hazelcast.com/javadoc/com/hazelcast/core/Cluster.html

The cluster can be accessed from the HazelcastInstance.


> 2. Split brain. I can see that Hazelcast supports *merging* of
> partitions after they have healed from a network partition but it does
> not appear to detect and prevent split brain in the first place.

I'm no expert in this area, so could be that I'm wrong.

But a split brain can't be detected in the first place, since only when
split parts of the cluster refind each other, the cluster know that
a split brain has happened.

> Common techniques for doing this usually involve a quorum. I.e.
> partitioned sub-clusters will automatically shut down if they detect
> that less than a quorum of members is visible. I can't see any support
> for that currently in Hazelcast. Are there plans to implement
> something like this?

For 2.x I don't think there will be support for that.

Hazelcast 3.x exposes a new feature: the SPI
The SPI exposes the infrastructure used by e.g. the Map, Lock, Queue etc.
But since the SPI is now exposed to the outside world, custom datastructures
can be written on top of it. E.g. a quorum based map implementation.

I think the quorum based functionality could be something very interesting
for the future of Hazelcast.

>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups "Hazelcast" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
> To post to this group, send email to haze...@googlegroups.com.
> Visit this group at http://groups.google.com/group/hazelcast?hl=en-US.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Tim Fox

unread,
Mar 1, 2013, 11:23:33 AM3/1/13
to Hazelcast


On Mar 1, 8:40 am, Peter Veentjer <alarmnum...@gmail.com> wrote:
> On Fri, Mar 1, 2013 at 9:45 AM, Tim Fox <timvo...@gmail.com> wrote:
> > Hello All,
>
> > We are currently using HazelCast in the Vert.x project to distribute
> > subscription information across the cluster.
>
> > I am currently looking at implementing HA support in Vert.x, and would
> > like to be able to use Hazelcast for that too.
>
> > I have a couple of questions.
>
> > 1. For Vert.x HA, applications can be run on certain Vert.x nodes, and
> > if those nodes fail we need to be able to detect that so we can
> > restart those applications on different nodes. Consequently we need
> > some way of Hazelcast providing a hook into it's internal group
> > management logic so it can notify us of cluster membership changes
> > (node join, node leave etc). I can't see any way of registering a
> > listener for these kind of events on the Hazelcast API...
>
> Have a look at the Cluster:
>
> http://www.hazelcast.com/javadoc/com/hazelcast/core/Cluster.html
>
> The cluster can be accessed from the HazelcastInstance.
>
> > 2. Split brain. I can see that Hazelcast supports *merging* of
> > partitions after they have healed from a network partition but it does
> > not appear to detect and prevent split brain in the first place.
>
> I'm no expert in this area, so could be that I'm wrong.
>
> But a split brain can't be detected in the first place, since only when
> split parts of the cluster refind each other, the cluster know that
> a split brain has happened.

Split brains can be detected _before_ the partition has healed by
requiring a quorum.

It's a pretty standard technique and works something like this.

Let's say you have 5 nodes in the cluster, and you require a quorum of
3 nodes for the system to function.

Let's say there's a network partition that splits the system into two
networks - one containing 3 nodes and one containing 2 nodes. The
partition with just two nodes will automatically shut itself down
since it can only see itself and another node - this is less than the
quorum size of 3.

The other partition will carry on functioning since it can see itself
and two other nodes, a total of 3 which is >= the quorum size of 3.

It's a well know technique that's been around for ages, and is
implemented by many applications and tools.

Fuad Malikov

unread,
Mar 1, 2013, 2:18:11 PM3/1/13
to haze...@googlegroups.com
You actually can implement quorum using the Membership Listeners. Basically upon receiving Member Left event, count the number of current members in the cluster. And if it is less than Quorum, shutdown hazelcast. 

A very basic code would be something like this:

final int quorum = 3; //do real calculation here, based on number of initial nodes.
        node.getCluster().addMembershipListener(new MembershipListener() {
            @Override
            public void memberAdded(MembershipEvent membershipEvent) {
            }

            @Override
            public void memberRemoved(MembershipEvent membershipEvent) {
                if(node.getCluster().getMembers().size() < quorum){
                    node.getLifecycleService().shutdown();
                }

            }
        });



Fuad Malikov
Co-founder & Managing Partner
Hazelcast | Open source in-memory data grid
575 Middlefield Rd, Palo Alto, CA 94301


Tim Fox

unread,
Mar 2, 2013, 2:45:30 AM3/2/13
to Hazelcast
Awesome! I didn't realise the Cluster interface existed. This is
exactly what I need :)

innocent...@gmail.com

unread,
Feb 19, 2015, 8:53:11 AM2/19/15
to haze...@googlegroups.com, timv...@gmail.com
I think, now they have added the quorum functionality. Take a look here: http://docs.hazelcast.org/docs/latest-dev/manual/html/clusterquorum.html
Reply all
Reply to author
Forward
0 new messages