Concurrent changes to group membership

180 views
Skip to first unread message

Tim Fox

unread,
Mar 5, 2013, 8:09:52 AM3/5/13
to Hazelcast
Hi all,

If I set a MembershipListener on the Cluster, and receive a member
added event, does Hazelcast guarantee that the member list won't get
updated concurrently during the execution of the handler, i.e.

@Override
public synchronized void memberAdded(MembershipEvent
membershipEvent) {
Set<Member> members1 = cluster.getMembers();

// Now do something, may take some time and involve remote calls,
e.g.
map.get("somekey");

Set<Member> members2 = cluster.getMembers();

// Question - does Hazelcast guarantee that members1 and members2
are always identical?

}

Peter Veentjer

unread,
Mar 5, 2013, 8:22:37 AM3/5/13
to haze...@googlegroups.com
Afaik it is done asynchronously.
> --
> You received this message because you are subscribed to the Google Groups "Hazelcast" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
> To post to this group, send email to haze...@googlegroups.com.
> Visit this group at http://groups.google.com/group/hazelcast?hl=en-US.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Peter Veentjer

unread,
Mar 5, 2013, 8:30:34 AM3/5/13
to haze...@googlegroups.com
So the first call can returns different members than the second call.

PS:
There is a similar problem when using the member list in the Routers.
Perhaps it is a good time to get this issue solved once and for all.

Tim Fox

unread,
Mar 5, 2013, 8:38:39 AM3/5/13
to Hazelcast
Can you confirm that? (By "asynchronous" I assume you mean that the
set can be modified concurrently while the handler is being executed?)

In order for each node to make deterministic calculations on the state
of the members it would be necessary for each node to always receive
each event in the exact same order as the others (this appears to be
guaranteed in the javadocs), however it would also be necessary to
guarantee that the set is not modified while the handler is being
processed.

With those guarantees it means that each node would see the exact same
set of members whenever any membership event is received.

This would be essential, for example, when computing which node
another node fails over onto.

For example, let's say I have N nodes, and one of them fails. I now
want to compute which node in the cluster will take over from the
failed node.

One way of doing this would be, in the memberRemoved handler:

public synchronized void memberRemoved(MembershipEvent
membershipEvent) {
// Sort members into list based on their UUID
ArrayList<String> sortedMembers =
sortMembers(cluster.getMembers());
// Now I know that every node has the exact same list of
sortedMembers when this event is handled
// Now I can chose a node by, saying, hashing a string and choosing
one based on this
String chosen = sortedMembers.get(someHashCode %
sortedMembers.size());
// Now I know every node in the cluster will have chosen the same
node
if (chosen == myNodeID) {
// This means I am the node in the cluster that the failed node
will fail over onto
}
}

I know that other libraries work the way I expect (e.g. JGroups), i.e
the state of the members is only modified by the member added/removed
events so each node will see the same state. Without such guarantee
I'm not sure what use having Hazelcast expose a list of members is....

Tim Fox

unread,
Mar 5, 2013, 12:32:18 PM3/5/13
to Hazelcast
If anyone of the hazelcast team are watching this - I would very much
like to hear what the "official" word on this is :)

Fuad Malikov

unread,
Mar 5, 2013, 11:47:39 PM3/5/13
to haze...@googlegroups.com
Hi Tim,

Any listener notification is done asynchronous. And actually we require you not to do any heavy weight operation inside the listener. Upon receiving the event you should spin your own thread and do all the processing there. In my previous code to keep it simple I didn't obey this rule. I called the getCluster().getMembers() inside the memberAdded method. Normally you shouldn't do that. The main reason is not to block a Hazelcast thread. 

The set of Members represents the current alive members. It will always return the current true state. 

-fuad

Tim Fox

unread,
Mar 6, 2013, 2:29:26 AM3/6/13
to Hazelcast
Hi Fuad,



On Mar 6, 4:47 am, Fuad Malikov <f...@hazelcast.com> wrote:
> Hi Tim,
>
> Any listener notification is done asynchronous. And actually we require you
> not to do any heavy weight operation inside the listener. Upon receiving
> the event you should spin your own thread and do all the processing there.
> In my previous code to keep it simple I didn't obey this rule. I called the
> getCluster().getMembers() inside the memberAdded method. Normally you
> shouldn't do that. The main reason is not to block a Hazelcast thread.
>
> The set of Members represents the current alive members. It will always
> return the current true state.

Well, as you know "current" is something hard to define in a
distributed system ;)

I can workaround not doing any "heavyweight" operations in an event
handler, but the issue still remains that the membership still might
get updated while the handler is being executed, e.g.


Node 1 might have these sequence of events:

public synchronized void memberRemoved(MembershipEvent
membershipEvent) {

// membershipEvent A is received

<<===== The members set gets updated concurrently here

Set<Member> members = instance.getCluster().getMembers();
}

But Node 2 might see these sequence of events:

public synchronized void memberRemoved(MembershipEvent
membershipEvent) {

// membershipEvent A is received

Set<Member> members = instance.getCluster().getMembers();

<<===== The members set gets updated concurrently here

}

Resulting in Node 1 and Node 2 seeing _different_ sets of members for
the _same_ membership event that was received.

If different nodes see different sets of members for the same sequence
of membership events then any calculations that require each node to
see the same set of members (e.g. calculating a failover node by
hashing the list of members) don't work.

If Hazelcast cannot guarantee that different nodes see the same
membership set per sequence of member events then is there some other
way I can do this with Hazelcast? If not this would be as showstopper
for Vert.x and I would have to start looking elsewhere for clustering
support in Vert.x.

If the members list cannot be trusted one thing I thought would be to
maintain our own member state ourselves, i.e. each node maintains its
own set of Members and adds/removes them as it receives
MemberShipEvents. This would mean the member state is a proper
deterministic state machine, which is what we want, however the
problem then becomes how does a new node sync it's initial state with
that of other nodes when it joins.

Any help greatly appreciated here. I don't want to have to look at
other clustering technologies at this point.

Tim Fox

unread,
Mar 6, 2013, 2:37:24 AM3/6/13
to Hazelcast
TLDR;

What I'm basically saying is that the set of members provided by
Hazelcast should always be consistent with respect to the sequence of
membership events that are received by a node.

Peter Veentjer

unread,
Mar 6, 2013, 3:22:04 AM3/6/13
to haze...@googlegroups.com
Hi Tim,

I understand your problem. We (Fuad and I) have run into this issue while
looking at the Router implementation for the client since it also relies on
having a consistent set of members.

I'll have a talk with Fuad..

Another problem is getting a correct initial list of members; you
don't want to miss any.

I think having the following api:

public interface MembershipListener extends EventListener {
void init(Set<Member> initialMembers)

void memberAdded(MembershipEvent membershipEvent);

void memberRemoved(MembershipEvent membershipEvent);
}

And the following guarantees:
- the init will be called before memberadded/memberremoved
- the membership listener will not be called concurrently
- the membership listener will be called in the order the events happened

Would be sufficient to make the changes within the membershiplistener
deterministic.

Do you agree, or am I missing something?

Tim Fox

unread,
Mar 6, 2013, 4:27:41 AM3/6/13
to Hazelcast


On Mar 6, 8:22 am, Peter Veentjer <alarmnum...@gmail.com> wrote:
> Hi Tim,
>
> I understand your problem. We (Fuad and I) have run into this issue while
> looking at the Router implementation for the client since it also relies on
> having a consistent set of members.
>
> I'll have a talk with Fuad..
>
> Another problem is getting a correct initial list of members; you
> don't want to miss any.
>
> I think having the following api:
>
> public interface MembershipListener extends EventListener {
>     void init(Set<Member> initialMembers)
>
>     void memberAdded(MembershipEvent membershipEvent);
>
>     void memberRemoved(MembershipEvent membershipEvent);
>
> }
>
> And the following guarantees:
> - the init will be called before memberadded/memberremoved
> - the membership listener will not be called concurrently
> - the membership listener will be called in the order the events happened

Hi Peter,

Sounds reasonable :)

A small caveat: Regarding the last point - defining the order in which
events "happened" in a distributed system is quite difficult (which
clock do you use?), but I don't think it is necessary.

As long as all Membership listeners are called with the exact same
sequence of events, and none are omitted, that should be enough.

One thing to watch out for is that no events are omitted between
getting the init state and the first event on a node.

- the

Peter Veentjer

unread,
Mar 6, 2013, 2:52:32 PM3/6/13
to haze...@googlegroups.com
Hi Tim,

I had a talk with Fuad, comments are placed inline.

If all goes fine I'll implement it Friday or Saturday and it should be
available on the
2.x snapshot next week. It will also be ported to 3.x since we need it
with the client
router (for load balancing) implementations as well.

On Wed, Mar 6, 2013 at 11:27 AM, Tim Fox <timv...@gmail.com> wrote:
>
>
> On Mar 6, 8:22 am, Peter Veentjer <alarmnum...@gmail.com> wrote:
>> Hi Tim,
>>
>> I understand your problem. We (Fuad and I) have run into this issue while
>> looking at the Router implementation for the client since it also relies on
>> having a consistent set of members.
>>
>> I'll have a talk with Fuad..
>>
>> Another problem is getting a correct initial list of members; you
>> don't want to miss any.
>>
>> I think having the following api:
>>
>> public interface MembershipListener extends EventListener {
>> void init(Set<Member> initialMembers)
>>
>> void memberAdded(MembershipEvent membershipEvent);
>>
>> void memberRemoved(MembershipEvent membershipEvent);
>>
>> }
>>
>> And the following guarantees:
>> - the init will be called before memberadded/memberremoved
>> - the membership listener will not be called concurrently
>> - the membership listener will be called in the order the events happened
>
> Hi Peter,
>
> Sounds reasonable :)
>
> A small caveat: Regarding the last point - defining the order in which
> events "happened" in a distributed system is quite difficult (which
> clock do you use?), but I don't think it is necessary.

There is a total ordering of membership events since they are send from
the master node. So we get lucky here :)

> As long as all Membership listeners are called with the exact same
> sequence of events, and none are omitted, that should be enough.

You will get the full ordering of membership events and there wont be
any misses.

Behind the scenes the membership event listener will be treated as an actor:
- every membershiplistener will have its own blockingqueue as mailbox;
so events will always be received in order.
- membershipeventlistener will be executed only 1 thread, so no
synchronization required
This will be completely invisible to the end user.

> One thing to watch out for is that no events are omitted between
> getting the init state and the first event on a node.

That was the tricky part, but Fuad and I found a good solution for
that as well. We need to
verify it with Mehmet.

Tim Fox

unread,
Mar 6, 2013, 3:10:39 PM3/6/13
to Hazelcast
Awesome! I look forward to it.
> ...
>
> read more »

Peter Veentjer

unread,
Mar 10, 2013, 3:23:07 AM3/10/13
to haze...@googlegroups.com
Hi Tim,

a short status update.

I'm working on adding it to the 3.x branch and it will be backported to the 2.x

I'm running into some issues for the actual registration of this new
'actor' member listener. This is a bit tricky since no messages should
be lost or to be received out of order. Once they are send to the
'actor' listener, things will work as promised. But the currently
difficulty is registering this actor listener.

Tim Fox

unread,
Mar 11, 2013, 4:05:08 AM3/11/13
to Hazelcast
I've solved similar issues before in my messaging days.. one way to do
this and ensuring no messages are omitted is for the sender to have
the ability to replay messages.

What you do is you label each membership event with a global sequence
number - everyone sees the events in the sequence.

Then, when a new node joins it first asks for the "current" state. It
receives a tuple of current state and a sequence number which
represents the last update to the state,e g..

[state, sequence_number]

Now, obviously once it has received that state and before it sets a
membership listener it might lose a few membership events, one way to
resolve this is to pass the next expected sequence number when setting
the listener, i.e.:

registerMembershipListener(listener, sequence_number + 1)

This means "register the listener and start sending me membership
events starting at sequence_number +1 (because my state has events
only up to sequence_number) "

When the request to set the listener is received it's current sequence
number might be sequence_number + n, so the first thing it needs to do
is immediately _replay_ the events from sequence_number +1 to
sequence_number + n before sending any more events.

This means the co-ordinator needs to remember this last m events
(where m is some "large enough" configurable number) so they can be
replayed when nodes set listeners.

On Mar 10, 7:23 am, Peter Veentjer <alarmnum...@gmail.com> wrote:
> Hi Tim,
>
> ...
>
> read more »

Tim Fox

unread,
Mar 17, 2013, 1:37:52 PM3/17/13
to Hazelcast
Hi Peter,

Just wondering if you've made any progress on this? :)

On Mar 10, 7:23 am, Peter Veentjer <alarmnum...@gmail.com> wrote:
> Hi Tim,
>
> ...
>
> read more »

Tim Fox

unread,
Mar 20, 2013, 12:22:39 PM3/20/13
to Hazelcast
bump?
> ...
>
> read more »

Peter Veentjer

unread,
Mar 20, 2013, 1:25:31 PM3/20/13
to haze...@googlegroups.com
Hi Tim,

the coming 3 days I have time for it again.

The 'actor' already is in place.. but feeding the actor with the
events and making sure that the events are not getting out of order
before they are placed on the mailbox needs a bit more investigation.

Tim Fox

unread,
May 1, 2013, 8:11:10 AM5/1/13
to Hazelcast
Peter, Talip,

Can you tell me if this functionality is going to make it into a
Hazelcast release any time soon?

It's pretty critical for Vert.x, if we're going to use Hazelcast for
failover. I would very much like to stay with Hazelcast but if it's
not imminent I need to consider other options.

Thanks again

On Mar 20, 6:25 pm, Peter Veentjer <alarmnum...@gmail.com> wrote:
> HiTim,
>
> the coming 3 days I have time for it again.
>
> The 'actor' already is in place.. but feeding the actor with the
> events and making sure that the events are not getting out of order
> before they are placed on the mailbox needs a bit more investigation.
>
>
>
>
>
>
>
> On Wed, Mar 20, 2013 at 6:22 PM,TimFox<timvo...@gmail.com> wrote:
> > bump?
>
> > On Mar 17, 5:37 pm,TimFox<timvo...@gmail.com> wrote:
> >> Hi Peter,
>
> >> Just wondering if you've made any progress on this? :)
>
> >> On Mar 10, 7:23 am, Peter Veentjer <alarmnum...@gmail.com> wrote:
>
> >> > HiTim,
>
> >> > a short status update.
>
> >> > I'm working on adding it to the 3.x branch and it will be backported to the 2.x
>
> >> > I'm running into some issues for the actual registration of this new
> >> > 'actor' member listener. This is a bit tricky since no messages should
> >> > be lost or to be received out of order. Once they are send to the
> >> > 'actor' listener, things will work as promised. But the currently
> >> > difficulty is registering this actor listener.
>
> >> > On Wed, Mar 6, 2013 at 10:10 PM,TimFox<timvo...@gmail.com> wrote:
> >> > > Awesome! I look forward to it.
>
> >> > > On Mar 6, 7:52 pm, Peter Veentjer <alarmnum...@gmail.com> wrote:
> >> > >> HiTim,
>
> >> > >> I had a talk with Fuad, comments are placed inline.
>
> >> > >> If all goes fine I'll implement it Friday or Saturday and it should be
> >> > >> available on the
> >> > >> 2.x snapshot next week. It will also be ported to 3.x since we need it
> >> > >> with the client
> >> > >> router (for load balancing) implementations as well.
>
> >> > >> >> On Wed, Mar 6, 2013 at 9:37 AM,TimFox<timvo...@gmail.com> wrote:
> >> > >> >> > TLDR;
>
> >> > >> >> > What I'm basically saying is that the set of members provided by
> >> > >> >> > Hazelcast should always be consistent with respect to the sequence of
> >> > >> >> > membership events that are received by a node.
>
> >> > >> >> >> > On Tue, Mar 5, 2013 at 9:32 AM,TimFox<timvo...@gmail.com> wrote:
> >> > >> >> >> > > If anyone of the hazelcast team are watching this - I would very much
> >> > >> >> >> > > like to hear what the "official" word on this is :)
>
> >> > >> >> >> > > On Mar 5, 1:38 pm,TimFox<timvo...@gmail.com> wrote:
> >> > >> >> >> > > > Can you confirm that? (By "asynchronous" I assume you mean that the
> >> > >> >> >> > > > set can be modified concurrently while the handler is being executed?)
>
> >> > >> >> >> > > > In order for each node to make deterministic calculations on the state
> >> > >> >> >> > > > of the members it would be necessary for each node to always receive
> >> > >> >> >> > > > each event in the exact same order as the others (this appears to be
> >> > >> >> >> > > > guaranteed in the javadocs), however it would also be necessary to
> >> > >> >> >> > > > guarantee that the set is not modified while the handler is being
> >> > >> >> >> > > > processed.
>
> >> > >> >> >> > > > With those guarantees it means that each node would see the exact same
> >> > >> >> >> > > > set of members whenever any membership event is received.
>
> >> > >> >> >> > > > This would be essential, for example, when computing which node
> >> > >> >> >> > > > another node fails over onto.
>
> >> > >> >> >> > > > For example, let's say I have N nodes, and one of them fails. I now
> >> > >> >> >> > > > want to compute which node in the cluster will take over from the
> >> > >> >> >> > > > failed node.
>
> >> > >> >> >> > > > One way of doing this would be, in the memberRemoved handler:
>
> >> > >> >> >> > > > public synchronized void memberRemoved(MembershipEvent
>
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages