Problem with cluster shutdown via IExecutorService

Marek Šabo

unread,

Apr 28, 2016, 10:30:50 AM4/28/16

to Hazelcast

Hi all,

I have a problem with cluster shutdown (4 nodes). I have very similar solution to this: http://stackoverflow.com/a/35340118/303559

Basically I send runnable that does Hazelcast.shutdownAll() via IExecutorService.executeOnAllMembers().

My problem is that Hazelcast tries to send a response which leaves the process running with application in an undefined state on some nodes.

18:12:35.495 [hz....@wfmtowerb.generic-operation.thread-3] INFO  com.hazelcast.cluster.ClusterService - [172.17.0.1]:5708 [admin] [3.6] 

Members [3] {
	Member [172.17.0.1]:5706
	Member [172.17.0.1]:5707
	Member [172.17.0.1]:5708 this
}
18:12:38.542 [cached1] INFO  com.hazelcast.instance.NodeExtension - [172.17.0.1]:5708 [admin] [3.6] Destroying node NodeExtension.
18:12:38.543 [cached1] INFO  com.hazelcast.instance.Node - [172.17.0.1]:5708 [admin] [3.6] Hazelcast Shutdown is completed in 3071 ms.

18:12:38.543 [cached1] INFO  com.hazelcast.core.LifecycleService - [172.17.0.1]:5708 [admin] [3.6] Address[172.17.0.1]:5708 is SHUTDOWN

Exception in thread "cached1" com.hazelcast.spi.exception.ResponseNotSentException: Cannot send response: null to Address[172.17.0.1]:5705. Op: com.hazelcast.executor.impl.operations.MemberCallableTaskOperation{identityHash=1153548074, serviceName='hz:impl:executorService', partitionId=-1, replicaIndex=0, callId=2117, invocationTime=1461255152460 (Thu Apr 21 18:12:32 CEST 2016), waitTimeout=-1, callTimeout=60000, name=jarvis.server.systemExecutor}
	at com.hazelcast.spi.impl.operationservice.impl.RemoteInvocationResponseHandler.sendResponse(RemoteInvocationResponseHandler.java:54)
	at com.hazelcast.spi.Operation.sendResponse(Operation.java:277)
	at com.hazelcast.executor.impl.DistributedExecutorService$CallableProcessor.sendResponse(DistributedExecutorService.java:229)
	at com.hazelcast.executor.impl.DistributedExecutorService$CallableProcessor.run(DistributedExecutorService.java:215)
	at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:212)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
	at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76)
	at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:92)

Any advice on how to avoid this? Or is there a better way how to shutdown whole cluster? I'm trying to avoid calling System.exit() instead for now, I want really graceful shutdown.

Thanks for any advice.

Best Regards,

Marek

Peter Veentjer

unread,

Apr 29, 2016, 2:21:42 PM4/29/16

to haze...@googlegroups.com

Quick hack: spawn a thread from the executor-task and delay it for a few seconds and then shut the cluster down from this spawned thread.

This way you the system has not shut down before the executor-task has completed.

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at https://groups.google.com/group/hazelcast.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/5ef0bd96-4ab4-4e1b-bf3d-36f4ea3019a7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Peter Veentjer

unread,

Apr 29, 2016, 2:22:29 PM4/29/16

to haze...@googlegroups.com

Less hacky:

https://github.com/hazelcast/hazelcast/blob/v3.6.2/hazelcast/src/main/java/com/hazelcast/core/Cluster.java#L194

Marek Šabo

unread,

Apr 29, 2016, 3:22:13 PM4/29/16

to Hazelcast

Hi Peter,

I've already tried the delay with another thread and it solves the response problem. However, we still are seeing situations when the last node to go down logs all the proper messages about shutdown (and invokes lifecycle events) yet the JVM stays on (confirmed via JMX that no non-hazelcast threads were blocking shutdown). This would need more investigation on our side.

My colleague discovered the Cluster.shutdown() method a we hoped for Christmas. Nodes were shutdown fine but lot of our services are deinitiized and executors shutdown in the HazelcastLifecycleListener. And the cluster.shutdown() invokes node.shutdown() instead of lifecycleService.shutdown() so we don't get notified via event and our stuff keeps the JVM hanging.

Is this needed/intentional - not calling lifecycleService.shutdown() and sending out events? Because that looks like exactly what we need.

TIA,

Marek

Marek Šabo

unread,

May 3, 2016, 7:22:00 AM5/3/16

to Hazelcast

FYI the described issue with lifecycle is a defect now: https://github.com/hazelcast/hazelcast/issues/8070

Reply all

Reply to author

Forward