Hazelcast hard stop - client hung operations

176 views
Skip to first unread message

roman....@openmindnetworks.com

unread,
Mar 3, 2017, 10:57:49 AM3/3/17
to Hazelcast
Hi,

I'm dealing with this situation: Let's say there are 3 Hazelcast member nodes running and a C++ client is connected to them.
There are some steady inserts (put/replace) pouring in.

Then 1 Hazelcast member node goes down. It is hard stop not a clean exit (clean stop.sh works fine). Another case would be a network issue.
It takes "hazelcast.max.no.heartbeat.seconds" till the topology is updated correctly.
Now until the topology is updated, the Client application get stuck. All the inflight operations are blocked.

I have two questions:

Q1. Is it possible to explicitly tell Hazelcast cluster, that a concrete node disappeared?
I've found the REST API for Cluster management, but it seems to be possible to change only the running node's state.
I would need to tell node #1 and #2  that node #3 is down, so they don't have to wait till "hazelcast.max.no.heartbeat.seconds" times out.
(Let's assume I know that the node went down or is otherwise unavailable).

Q2. How can I optimize the Client side to avoid hung operations?
There is no obvious timeout setting in the ClientConfig configuration.
But perhaps I missed something. Maybe "hazelcast.client.invocation.timeout.seconds" ?
Or is there anything else on the client side, that I could configure to avoid the hung operations or at least limit the time?

Note: Using Hazelcast 3.7.4, Hazelcast Cpp Client 3.6.3

Thanks a lot,
Regards,
Roman





  

M. Sancar Koyunlu

unread,
Mar 6, 2017, 1:54:44 AM3/6/17
to Hazelcast
Hi Roman, 

When a member is down `hazelcast.max.no.heartbeat.seconds` should not be waited. It should be able to take action almost immediately. 
Do you see the killed member kicked out of  the member list  on other nodes immediately ?
What is a `hard stop`  ? kill -9 <pid>  ? I would like the double check because symptoms you described much likely to be result of suspending the process rather than killing it.

When it comes to client, the operations that supposed go over the dead member will hang until that member is kicked out of member list. You can set "hazelcast.client.invocation.timeout.seconds" to make invocation give up earlier. But in this case, finding the problem on server side should be the way forward.  
Regards.

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at https://groups.google.com/group/hazelcast.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/d791c3de-80e4-422c-91bc-0881f60451d0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Sancar Koyunlu
Software Engineer, Hazelcast

roman....@openmindnetworks.com

unread,
Mar 6, 2017, 3:42:11 AM3/6/17
to Hazelcast
Hi Sancar,

thank you for a quick response.

Well this is a special case - Hazelcast is run in docker. When docker is stopped, I experience the above issues.
The problem might be, that Hazelcast is run from another process in docker, so I'm not sure if it even gets SIGTERM / SIGKILL signal when being stopped.

No the member list is not updated immediately - it seems to be refreshed after the hazelcast.max.no.heartbeat.seconds period.

Regards,
Roman


  

roman....@openmindnetworks.com

unread,
Mar 6, 2017, 7:09:40 AM3/6/17
to Hazelcast
P.S: In a client-server topology, where should I set the following properties? On the client side or the server side (or both)?

hazelcast.client.max.no.heartbeat.seconds
hazelcast.client.invocation.timeout.seconds

Thank you,
Regards,
Roman


  

M. Sancar Koyunlu

unread,
Mar 6, 2017, 7:19:11 AM3/6/17
to Hazelcast
hazelcast.client.max.no.heartbeat.seconds  should be set on server side property

hazelcast.client.invocation.timeout.seconds should be set on client side property

Note that killing the java processes in docker before stopping it, is probably better idea instead of tinkering these properties. 



For more options, visit https://groups.google.com/d/optout.

roman....@openmindnetworks.com

unread,
Mar 6, 2017, 7:24:21 AM3/6/17
to Hazelcast
Thank you very much.

Yeah I'm aware of this, but the idea is to have a robust solution also for HW fails, power outages etc.
So I will go with both options - the correct process stopping on regular docker stop; and configuration tinkering.

Roman


  
Reply all
Reply to author
Forward
0 new messages