Node can't join after it is restarted (Hazelcast)

164 views
Skip to first unread message

David Klotz

unread,
Sep 22, 2016, 5:15:17 AM9/22/16
to vert.x
Hi,

we use Vert.x with the Hazelcast cluster manager (direct tcp connections with a fixed list of "seed" nodes, no multicast). We embed Vert.x (no verticles) and all our services run inside docker containers. This all works fine, most of the time, but on our test-cluster where most nodes are redeployed (at least) daily (after CI builds), we often experience problems with nodes not reconnecting after one or more nodes are restarted. On the running nodes we get warnings/errors like this from Hazelcast in the log:


[INFO ][] 2016-09-15 22:38:39.604 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Connection [Address[test-03.somedomain.com]:10083] lost. Reason: java.io.EOFException[Remote socket closed!]
[INFO ][] 2016-09-15 22:38:43.015 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Removing Member [test-03.somedomain.com]:10083
[WARN ][] 2016-09-15 22:38:43.434 (Slf4jFactory.java:66): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] hz._hzInstance_1_dev.IO.thread-in-2 Closing socket to endpoint Address[test-03.somedomain.com]:10083, Cause:java.io.EOFException: Remote socket closed!
[INFO ][] 2016-09-15 22:38:46.583 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Removing Member [test-03.somedomain.com]:10083
[INFO ][] 2016-09-15 22:38:48.683 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Connecting to test-03.somedomain.com/192.168.0.251:10083, timeout: 0, bind-any: true
[INFO ][] 2016-09-15 22:38:59.624 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Established socket connection between /172.17.0.2:51448 and test-03.somedomain.com/192.168.0.251:10083
[INFO ][] 2016-09-15 22:39:01.640 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3]

Members [8] {
        Member [seed2-staging.somedomain.com]:5801
        Member [seed1-staging.somedomain.com]:5701 this
        Member [test-03.somedomain.com]:10072
        Member [test-03.somedomain.com]:10071
        Member [test-03.somedomain.com]:10089
        Member [test-03.somedomain.com]:7701
        Member [test-03.somedomain.com]:10082
        Member [test-03.somedomain.com]:10008
}

[INFO ][] 2016-09-15 22:39:02.949 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Committing/rolling-back alive transactions of Member [test-03.somedomain.com]:10083, UUID: de89a4fc-caa8-472b-a129-a147bbda66dd
[WARN ][] 2016-09-15 22:39:03.909 (Slf4jFactory.java:66): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Slow operation detected: com.hazelcast.cluster.impl.operations.MemberRemoveOperation
[INFO ][] 2016-09-15 22:39:05.320 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Connection [Address[test-03.somedomain.com]:10083] lost. Reason: java.io.EOFException[Could not read protocol type!]
[WARN ][] 2016-09-15 22:39:05.756 (Slf4jFactory.java:66): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Slow operation detected: com.hazelcast.map.impl.operation.RemoveOperation
[INFO ][] 2016-09-15 22:39:05.795 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Connection [Address[test-03.somedomain.com]:10089] lost. Reason: java.io.EOFException[Remote socket closed!]
[INFO ][] 2016-09-15 22:39:06.049 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Connecting to test-03.somedomain.com/192.168.0.251:10089, timeout: 0, bind-any: true
[WARN ][] 2016-09-15 22:39:06.999 (Slf4jFactory.java:66): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] hz._hzInstance_1_dev.IO.thread-in-1 Closing socket to endpoint Address[test-03.somedomain.com]:10089, Cause:java.io.EOFException: Remote socket closed!
[INFO ][] 2016-09-15 22:39:07.268 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Invocations:1 timeouts:1 backup-timeouts:0
[INFO ][] 2016-09-15 22:39:09.527 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Could not connect to: test-03.somedomain.com/192.168.0.251:10089. Reason: SocketException[Connection refused to address test-03.somedomain.com/192.168.0.251:10089]
[INFO ][] 2016-09-15 22:39:09.987 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Connecting to test-03.somedomain.com/192.168.0.251:10089, timeout: 0, bind-any: true
[WARN ][] 2016-09-15 22:39:10.033 (Slf4jFactory.java:66): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Asking if operation execution has been started: Invocation{serviceName='hz:impl:mapService', op=com.hazelcast.map.impl.query.QueryOperation{identityHash=231975495, serviceName='hz:impl:mapService', partitionId=-1, replicaIndex=0, callId=8106, invocationTime=1473978835443 (Thu Sep 15 22:33:55 GMT 2016), waitTimeout=-1, callTimeout=60000, name=__vertx.haInfo}, partitionId=-1, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[test-03.somedomain.com]:10071, backupsExpected=0, backupsCompleted=0, connection=Connection [/172.17.0.2:5701 -> /172.17.0.1:36899], endpoint=Address[test-03.somedomain.com]:10071, alive=true, type=MEMBER}
[INFO ][] 2016-09-15 22:39:10.066 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Established socket connection between /172.17.0.2:35541 and test-03.somedomain.com/192.168.0.251:10089
[WARN ][] 2016-09-15 22:39:10.071 (Slf4jFactory.java:66): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] SocketWriter is not set, creating SocketWriter with CLUSTER protocol!
[INFO ][] 2016-09-15 22:39:10.815 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Invocations:2 timeouts:1 backup-timeouts:0
[WARN ][] 2016-09-15 22:39:12.304 (Slf4jFactory.java:66): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Asking if operation execution has been started: Invocation{serviceName='hz:impl:mapService', op=com.hazelcast.map.impl.query.QueryOperation{identityHash=1364729747, serviceName='hz:impl:mapService', partitionId=-1, replicaIndex=0, callId=8113, invocationTime=1473978841788 (Thu Sep 15 22:34:01 GMT 2016), waitTimeout=-1, callTimeout=60000, name=__vertx.haInfo}, partitionId=-1, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=4, callTimeout=60000, target=Address[test-03.somedomain.com]:10089, backupsExpected=0, backupsCompleted=0, connection=null}
[ERROR][] 2016-09-15 22:39:12.764 (Slf4jFactory.java:68): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Cannot send response: com.hazelcast.spi.impl.operationservice.impl.operations.PartitionIteratingOperation$PartitionResponse@70f1edbe to Address[test-03.somedomain.com]:10089. Op: com.hazelcast.spi.impl.operationservice.impl.operations.PartitionIteratingOperation{identityHash=1207550469, serviceName='hz:impl:multiMapService', partitionId=-1, replicaIndex=0, callId=1514, invocationTime=1473978817212 (Thu Sep 15 22:33:37 GMT 2016), waitTimeout=-1, callTimeout=60000, operationFactory=com.hazelcast.multimap.impl.operations.MultiMapOperationFactory@3e4ded}
com.hazelcast.spi.exception.ResponseNotSentException: Cannot send response: com.hazelcast.spi.impl.operationservice.impl.operations.PartitionIteratingOperation$PartitionResponse@70f1edbe to Address[test-03.somedomain.com]:10089. Op: com.hazelcast.spi.impl.operationservice.impl.operations.PartitionIteratingOperation{identityHash=1207550469, serviceName='hz:impl:multiMapService', partitionId=-1, replicaIndex=0, callId=1514, invocationTime=1473978817212 (Thu Sep 15 22:33:37 GMT 2016), waitTimeout=-1, callTimeout=60000, operationFactory=com.hazelcast.multimap.impl.operations.MultiMapOperationFactory@3e4ded}
        at com.hazelcast.spi.impl.operationservice.impl.RemoteInvocationResponseHandler.sendResponse(RemoteInvocationResponseHandler.java:54) ~[node.jar:na]
        at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.sendResponse(OperationRunnerImpl.java:278) [node.jar:na]
        at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.handleResponse(OperationRunnerImpl.java:251) [node.jar:na]
        at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:173) [node.jar:na]
        at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:393) [node.jar:na]
        at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.processPacket(OperationThread.java:184) [node.jar:na]
        at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.process(OperationThread.java:137) [node.jar:na]
        at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.doRun(OperationThread.java:124) [node.jar:na]
        at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.run(OperationThread.java:99) [node.jar:na]
[INFO ][] 2016-09-15 22:39:13.066 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Connection [Address[test-03.somedomain.com]:10089] lost. Reason: java.io.EOFException[Could not read protocol type!]
[INFO ][] 2016-09-15 22:39:13.786 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Invocations:3 timeouts:2 backup-timeouts:0
[WARN ][] 2016-09-15 22:39:13.903 (Slf4jFactory.java:66): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] 'is-executing': false -> Invocation{serviceName='hz:impl:mapService', op=com.hazelcast.map.impl.query.QueryOperation{identityHash=231975495, serviceName='hz:impl:mapService', partitionId=-1, replicaIndex=0, callId=8106, invocationTime=1473978835443 (Thu Sep 15 22:33:55 GMT 2016), waitTimeout=-1, callTimeout=60000, name=__vertx.haInfo}, partitionId=-1, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[test-03.somedomain.com]:10071, backupsExpected=0, backupsCompleted=0, connection=Connection [/172.17.0.2:5701 -> /172.17.0.1:36899], endpoint=Address[test-03.somedomain.com]:10071, alive=true, type=MEMBER}
[WARN ][] 2016-09-15 22:39:14.013 (Slf4jFactory.java:66): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Asking if operation execution has been started: Invocation{serviceName='hz:impl:mapService', op=com.hazelcast.map.impl.query.QueryOperation{identityHash=1364729747, serviceName='hz:impl:mapService', partitionId=-1, replicaIndex=0, callId=8115, invocationTime=1473978843260 (Thu Sep 15 22:34:03 GMT 2016), waitTimeout=-1, callTimeout=60000, name=__vertx.haInfo}, partitionId=-1, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=5, callTimeout=60000, target=Address[test-03.somedomain.com]:10089, backupsExpected=0, backupsCompleted=0, connection=null}
[INFO ][] 2016-09-15 22:39:15.223 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Connecting to test-03.somedomain.com/192.168.0.251:10089, timeout: 0, bind-any: true
[INFO ][] 2016-09-15 22:39:15.623 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Established socket connection between /172.17.0.2:55197 and test-03.somedomain.com/192.168.0.251:10089
[WARN ][] 2016-09-15 22:39:15.623 (Slf4jFactory.java:66): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] SocketWriter is not set, creating SocketWriter with CLUSTER protocol!
[WARN ][] 2016-09-15 22:39:15.760 (Slf4jFactory.java:66): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] No Invocation found for response: NormalResponse{callId=8114, urgent=true, value=false, backupCount=0} sent from Address[test-03.somedomain.com]:10071
[INFO ][] 2016-09-15 22:39:16.312 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Connection [Address[test-03.somedomain.com]:10089] lost. Reason: java.io.EOFException[Could not read protocol type!]
[INFO ][] 2016-09-15 22:39:16.718 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Connecting to test-03.somedomain.com/192.168.0.251:10089, timeout: 0, bind-any: true
[WARN ][] 2016-09-15 22:39:16.719 (Slf4jFactory.java:66): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] No Invocation found for response: NormalResponse{callId=8106, urgent=false, value=com.hazelcast.map.impl.query.QueryResult@4308e8ca, backupCount=0} sent from Address[test-03.somedomain.com]:10071
[INFO ][] 2016-09-15 22:39:16.853 (Slf4jFactory.java:64): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Established socket connection between /172.17.0.2:42563 and test-03.somedomain.com/192.168.0.251:10089
[WARN ][] 2016-09-15 22:39:16.853 (Slf4jFactory.java:66): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] SocketWriter is not set, creating SocketWriter with CLUSTER protocol!
[WARN ][] 2016-09-15 22:39:17.520 (Slf4jFactory.java:66): [seed1-staging.somedomain.com]:5701 [dev] [3.6.3] Could not get results
java.util.concurrent.ExecutionException: com.hazelcast.core.OperationTimeoutException: No response for 120000 ms. Aborting invocation! Invocation{serviceName='hz:impl:mapService', op=com.hazelcast.map.impl.query.QueryOperation{identityHash=231975495, serviceName='hz:impl:mapService', partitionId=-1, replicaIndex=0, callId=8106, invocationTime=1473978835443 (Thu Sep 15 22:33:55 GMT 2016), waitTimeout=-1, callTimeout=60000, name=__vertx.haInfo}, partitionId=-1, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[test-03.somedomain.com]:10071, backupsExpected=0, backupsCompleted=0, connection=Connection [/172.17.0.2:5701 -> /172.17.0.1:36899], endpoint=Address[test-03.somedomain.com]:10071, alive=true, type=MEMBER} No response has been received!  backups-expected:0 backups-completed: 0
        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicationResponseOrThrowException(InvocationFuture.java:360) ~[node.jar:na]
        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:225) ~[node.jar:na]
        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:204) ~[node.jar:na]
        at com.hazelcast.map.impl.query.MapQueryEngineImpl.addResultsOfPredicate(MapQueryEngineImpl.java:528) ~[node.jar:na]
        at com.hazelcast.map.impl.query.MapQueryEngineImpl.invokeQueryAllPartitions(MapQueryEngineImpl.java:421) ~[node.jar:na]
        at com.hazelcast.map.impl.proxy.MapProxyImpl.entrySet(MapProxyImpl.java:617) [node.jar:na]
        at com.hazelcast.map.impl.proxy.MapProxyImpl.entrySet(MapProxyImpl.java:606) [node.jar:na]
        at io.vertx.core.impl.HAManager.nodeLeft(HAManager.java:282) [node.jar:na]
        at io.vertx.core.impl.HAManager.access$100(HAManager.java:95) [node.jar:na]
        at io.vertx.core.impl.HAManager$1.nodeLeft(HAManager.java:142) [node.jar:na]
        at io.vertx.spi.cluster.hazelcast.HazelcastClusterManager.memberRemoved(HazelcastClusterManager.java:286) [node.jar:na]
        at com.hazelcast.cluster.impl.ClusterServiceImpl.dispatchEvent(ClusterServiceImpl.java:775) [node.jar:na]
        at com.hazelcast.cluster.impl.ClusterServiceImpl.dispatchEvent(ClusterServiceImpl.java:87) [node.jar:na]
        at com.hazelcast.spi.impl.eventservice.impl.LocalEventDispatcher.run(LocalEventDispatcher.java:56) [node.jar:na]
        at com.hazelcast.util.executor.StripedExecutor$Worker.process(StripedExecutor.java:187) [node.jar:na]
        at com.hazelcast.util.executor.StripedExecutor$Worker.run(StripedExecutor.java:171) [node.jar:na]
Caused by: com.hazelcast.core.OperationTimeoutException: No response for 120000 ms. Aborting invocation! Invocation{serviceName='hz:impl:mapService', op=com.hazelcast.map.impl.query.QueryOperation{identityHash=231975495, serviceName='hz:impl:mapService', partitionId=-1, replicaIndex=0, callId=8106, invocationTime=1473978835443 (Thu Sep 15 22:33:55 GMT 2016), waitTimeout=-1, callTimeout=60000, name=__vertx.haInfo}, partitionId=-1, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[test-03.somedomain.com]:10071, backupsExpected=0, backupsCompleted=0, connection=Connection [/172.17.0.2:5701 -> /172.17.0.1:36899], endpoint=Address[test-03.somedomain.com]:10071, alive=true, type=MEMBER} No response has been received!  backups-expected:0 backups-completed: 0
        at com.hazelcast.spi.impl.operationservice.impl.Invocation.newOperationTimeoutException(Invocation.java:536) ~[node.jar:na]
        at com.hazelcast.spi.impl.operationservice.impl.IsStillRunningService$IsOperationStillRunningCallback.setOperationTimeout(IsStillRunningService.java:241) ~[node.jar:na]
        at com.hazelcast.spi.impl.operationservice.impl.IsStillRunningService$IsOperationStillRunningCallback.onResponse(IsStillRunningService.java:229) ~[node.jar:na]
        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture$1.run(InvocationFuture.java:127) ~[node.jar:na]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_74]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_74]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_74]
        at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76) ~[node.jar:na]
        at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:92) ~[node.jar:na]
        at ------ End remote and begin local stack-trace ------.(Unknown Source) ~[na:na]
        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicationResponse(InvocationFuture.java:387) ~[node.jar:na]
        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicationResponseOrThrowException(InvocationFuture.java:337) ~[node.jar:na]
        ... 15 common frames omitted



These other nodes keep running, while the restarted node can't connect to the cluster with errors like this leading up to the Vert.x specific "Failed to start event bus" at the end (which is also why I'm first posting this here and not on the Hazelcast group, since it might be specific to how the Hazelcast cluster manager/Vert.x uses Hazelcast):


[INFO ][] 2016-09-15 22:40:27.044 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] TcpIpConnectionManager configured with Non Blocking IO-threading model: 3 input threads and 3 output threads
[INFO ][] 2016-09-15 22:40:27.067 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Accepting socket connection from /172.17.0.1:58599
[INFO ][] 2016-09-15 22:40:27.069 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Accepting socket connection from /172.17.0.1:58602
[INFO ][] 2016-09-15 22:40:27.070 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Accepting socket connection from /172.17.0.1:58603
[INFO ][] 2016-09-15 22:40:27.070 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Accepting socket connection from /172.17.0.1:58606
[INFO ][] 2016-09-15 22:40:27.071 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Accepting socket connection from /172.17.0.1:58608
[INFO ][] 2016-09-15 22:40:27.077 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Connecting to seed2-staging.somedomain.com/192.168.0.251:5801, timeout: 0, bind-any: true
[INFO ][] 2016-09-15 22:40:27.077 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Connecting to seed1-staging.somedomain.com/192.168.0.251:5701, timeout: 0, bind-any: true
[INFO ][] 2016-09-15 22:40:27.089 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:5701 and /172.17.0.1:58603
[INFO ][] 2016-09-15 22:40:27.092 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:5701 and /172.17.0.1:58602
[INFO ][] 2016-09-15 22:40:27.097 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:5701 and /172.17.0.1:58606
[INFO ][] 2016-09-15 22:40:27.097 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:59291 and seed2-staging.somedomain.com/192.168.0.251:5801
[INFO ][] 2016-09-15 22:40:27.097 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:5701 and /172.17.0.1:58608
[INFO ][] 2016-09-15 22:40:27.097 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:57417 and seed1-staging.somedomain.com/192.168.0.251:5701
[INFO ][] 2016-09-15 22:40:27.097 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:5701 and /172.17.0.1:58599
[WARN ][] 2016-09-15 22:40:27.218 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: Address[test-03.somedomain.com]:10089, caller: Address[seed1-staging.somedomain.com]:5701, partitionId: -1, operation: com.hazelcast.spi.impl.operationservice.impl.operations.IsStillExecutingOperation,
 service: hz:impl:mapService
[WARN ][] 2016-09-15 22:40:27.218 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: Address[test-03.somedomain.com]:10089, caller: Address[seed1-staging.somedomain.com]:5701, partitionId: -1, operation: com.hazelcast.spi.impl.operationservice.impl.operations.IsStillExecutingOperation,
 service: hz:impl:mapService
[WARN ][] 2016-09-15 22:40:27.218 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: Address[test-03.somedomain.com]:10089, caller: Address[seed1-staging.somedomain.com]:5701, partitionId: -1, operation: com.hazelcast.spi.impl.operationservice.impl.operations.IsStillExecutingOperation,
 service: hz:impl:mapService
[WARN ][] 2016-09-15 22:40:27.435 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: Address[test-03.somedomain.com]:10089, caller: Address[seed1-staging.somedomain.com]:5701, partitionId: -1, operation: com.hazelcast.spi.impl.operationservice.impl.operations.IsStillExecutingOperation, service: hz:impl:mapService
[WARN ][] 2016-09-15 22:40:27.483 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: Address[test-03.somedomain.com]:10089, caller: Address[seed1-staging.somedomain.com]:5701, partitionId: -1, operation: com.hazelcast.spi.impl.operationservice.impl.operations.IsStillExecutingOperation, service: hz:impl:mapService
[WARN ][] 2016-09-15 22:40:27.548 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: Address[test-03.somedomain.com]:10089, caller: Address[seed1-staging.somedomain.com]:5701, partitionId: -1, operation: com.hazelcast.spi.impl.operationservice.impl.operations.IsStillExecutingOperation, service: hz:impl:mapService
[WARN ][] 2016-09-15 22:40:27.652 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: Address[test-03.somedomain.com]:10089, caller: Address[seed1-staging.somedomain.com]:5701, partitionId: -1, operation: com.hazelcast.spi.impl.operationservice.impl.operations.IsStillExecutingOperation, service: hz:impl:mapService
[WARN ][] 2016-09-15 22:40:27.961 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: Address[test-03.somedomain.com]:10089, caller: Address[test-03.somedomain.com]:10008, partitionId: -1, operation: com.hazelcast.spi.impl.operationservice.impl.operations.IsStillExecutingOperation, service: hz:impl:multiMapService
[WARN ][] 2016-09-15 22:40:27.962 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: Address[test-03.somedomain.com]:10089, caller: Address[test-03.somedomain.com]:10008, partitionId: -1, operation: com.hazelcast.spi.impl.operationservice.impl.operations.IsStillExecutingOperation, service: hz:impl:multiMapService
[WARN ][] 2016-09-15 22:40:28.143 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: Address[test-03.somedomain.com]:10089, caller: Address[test-03.somedomain.com]:10008, partitionId: -1, operation: com.hazelcast.spi.impl.operationservice.impl.operations.IsStillExecutingOperation, service: hz:impl:multiMapService
[WARN ][] 2016-09-15 22:40:28.244 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: Address[test-03.somedomain.com]:10089, caller: Address[test-03.somedomain.com]:10008, partitionId: -1, operation: com.hazelcast.spi.impl.operationservice.impl.operations.IsStillExecutingOperation, service: hz:impl:multiMapService
[WARN ][] 2016-09-15 22:40:28.536 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] com.hazelcast.spi.exception.CallerNotMemberException: Not Member! this: Address[test-03.somedomain.com]:10089, caller: Address[seed1-staging.somedomain.com]:5701, partitionId: -1, operation: com.hazelcast.spi.impl.operationservice.impl.operations.IsStillExecutingOperation, service: hz:impl:mapService
[INFO ][] 2016-09-15 22:40:28.564 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Connection [Address[test-03.somedomain.com]:10008] lost. Reason: java.io.EOFException[Remote socket closed!]
[INFO ][] 2016-09-15 22:40:28.564 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Connection [Address[seed1-staging.somedomain.com]:5701] lost. Reason: java.io.EOFException[Remote socket closed!]
[WARN ][] 2016-09-15 22:40:28.569 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] hz._hzInstance_1_dev.IO.thread-in-1 Closing socket to endpoint Address[test-03.somedomain.com]:10008, Cause:java.io.EOFException: Remote socket closed!
[WARN ][] 2016-09-15 22:40:28.569 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] hz._hzInstance_1_dev.IO.thread-in-0 Closing socket to endpoint Address[seed1-staging.somedomain.com]:5701, Cause:java.io.EOFException: Remote socket closed!
[INFO ][] 2016-09-15 22:40:28.572 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Connection [Address[test-03.somedomain.com]:7701] lost. Reason: java.io.EOFException[Remote socket closed!]
[WARN ][] 2016-09-15 22:40:28.573 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] hz._hzInstance_1_dev.IO.thread-in-0 Closing socket to endpoint Address[test-03.somedomain.com]:7701, Cause:java.io.EOFException: Remote socket closed!
[INFO ][] 2016-09-15 22:40:28.575 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Connection [Address[test-03.somedomain.com]:10082] lost. Reason: java.io.EOFException[Remote socket closed!]
[WARN ][] 2016-09-15 22:40:28.576 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] hz._hzInstance_1_dev.IO.thread-in-0 Closing socket to endpoint Address[test-03.somedomain.com]:10082, Cause:java.io.EOFException: Remote socket closed!
[INFO ][] 2016-09-15 22:40:28.587 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Connection [Address[test-03.somedomain.com]:10071] lost. Reason: java.io.EOFException[Remote socket closed!]
[WARN ][] 2016-09-15 22:40:28.587 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] hz._hzInstance_1_dev.IO.thread-in-1 Closing socket to endpoint Address[test-03.somedomain.com]:10071, Cause:java.io.EOFException: Remote socket closed!
[INFO ][] 2016-09-15 22:40:28.860 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Accepting socket connection from /172.17.0.1:58612
[INFO ][] 2016-09-15 22:40:28.861 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:5701 and /172.17.0.1:58612
[INFO ][] 2016-09-15 22:40:32.102 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Accepting socket connection from /172.17.0.1:58633
[INFO ][] 2016-09-15 22:40:32.103 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:5701 and /172.17.0.1:58633
[INFO ][] 2016-09-15 22:40:32.318 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3]

Members [8] {
        Member [seed2-staging.somedomain.com]:5801
        Member [seed1-staging.somedomain.com]:5701
        Member [test-03.somedomain.com]:10072
        Member [test-03.somedomain.com]:10071
        Member [test-03.somedomain.com]:7701
        Member [test-03.somedomain.com]:10082
        Member [test-03.somedomain.com]:10008
        Member [test-03.somedomain.com]:10089 this
}

[INFO ][] 2016-09-15 22:40:32.335 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Connecting to seed1-staging.somedomain.com/192.168.0.251:5701, timeout: 0, bind-any: true
[INFO ][] 2016-09-15 22:40:32.336 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:55523 and seed1-staging.somedomain.com/192.168.0.251:5701
[INFO ][] 2016-09-15 22:40:32.343 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Connecting to test-03.somedomain.com/192.168.0.251:10072, timeout: 0, bind-any: true
[INFO ][] 2016-09-15 22:40:32.347 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Connecting to test-03.somedomain.com/192.168.0.251:7701, timeout: 0, bind-any: true
[INFO ][] 2016-09-15 22:40:32.349 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:59208 and test-03.somedomain.com/192.168.0.251:10072
[INFO ][] 2016-09-15 22:40:32.349 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Connecting to test-03.somedomain.com/192.168.0.251:10071, timeout: 0, bind-any: true
[INFO ][] 2016-09-15 22:40:32.349 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:51002 and test-03.somedomain.com/192.168.0.251:7701
[INFO ][] 2016-09-15 22:40:32.349 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Connecting to test-03.somedomain.com/192.168.0.251:10082, timeout: 0, bind-any: true
[INFO ][] 2016-09-15 22:40:32.350 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:57635 and test-03.somedomain.com/192.168.0.251:10082
[INFO ][] 2016-09-15 22:40:32.350 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:60617 and test-03.somedomain.com/192.168.0.251:10071
[INFO ][] 2016-09-15 22:40:32.366 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Accepting socket connection from /172.17.0.1:58639
[INFO ][] 2016-09-15 22:40:32.367 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:5701 and /172.17.0.1:58639
[INFO ][] 2016-09-15 22:40:32.724 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Invocations:1 timeouts:1 backup-timeouts:0
[WARN ][] 2016-09-15 22:40:32.725 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] Asking if operation execution has been started: Invocation{serviceName='hz:core:clusterService', op=com.hazelcast.cluster.impl.operations.PostJoinOperation{identityHash=463864997, serviceName='hz:core:clusterService', partitionId=-1, replicaIndex=0, callId=1, invocationTime=1473978920576 (Thu Sep 15 22:35:20 GMT 2016), waitTimeout=-1, callTimeout=60000, operations=[com.hazelcast.spi.impl.eventservice.impl.operations.PostJoinRegistrationOperation{identityHash=1956436811, serviceName='null', partitionId=-1, replicaIndex=0, callId=0, invocationTime=-1 (Wed Dec 31 23:59:59 GMT 1969), waitTimeout=-1, callTimeout=9223372036854775807}, com.hazelcast.map.impl.operation.PostJoinMapOperation{identityHash=1355562486, serviceName='hz:impl:mapService', partitionId=-1, replicaIndex=0, callId=0, invocationTime=-1 (Wed Dec 31 23:59:59 GMT 1969), waitTimeout=-1, callTimeout=9223372036854775807}]}, partitionId=-1, replicaIndex=0, tryCount=100, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[seed2-staging.somedomain.com]:5801, backupsExpected=0, backupsCompleted=0, connection=Connection [/172.17.0.10:59291 -> seed2-staging.somedomain.com/192.168.0.251:5801], endpoint=Address[seed2-staging.somedomain.com]:5801, alive=true, type=MEMBER}
[INFO ][] 2016-09-15 22:40:32.780 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Accepting socket connection from /172.17.0.1:58642
[INFO ][] 2016-09-15 22:40:32.782 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:5701 and /172.17.0.1:58642
[INFO ][] 2016-09-15 22:40:32.901 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Accepting socket connection from /172.17.0.1:58643
[INFO ][] 2016-09-15 22:40:32.903 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:5701 and /172.17.0.1:58643
[INFO ][] 2016-09-15 22:40:33.726 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Invocations:1 timeouts:1 backup-timeouts:0
[WARN ][] 2016-09-15 22:40:33.726 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] 'is-executing': false -> Invocation{serviceName='hz:core:clusterService', op=com.hazelcast.cluster.impl.operations.PostJoinOperation{identityHash=463864997, serviceName='hz:core:clusterService', partitionId=-1, replicaIndex=0, callId=0, invocationTime=1473978920576 (Thu Sep 15 22:35:20 GMT 2016), waitTimeout=-1, callTimeout=60000, operations=[com.hazelcast.spi.impl.eventservice.impl.operations.PostJoinRegistrationOperation{identityHash=1956436811, serviceName='null', partitionId=-1, replicaIndex=0, callId=0, invocationTime=-1 (Wed Dec 31 23:59:59 GMT 1969), waitTimeout=-1, callTimeout=9223372036854775807}, com.hazelcast.map.impl.operation.PostJoinMapOperation{identityHash=1355562486, serviceName='hz:impl:mapService', partitionId=-1, replicaIndex=0, callId=0, invocationTime=-1 (Wed Dec 31 23:59:59 GMT 1969), waitTimeout=-1, callTimeout=9223372036854775807}]}, partitionId=-1, replicaIndex=0, tryCount=100, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[seed2-staging.somedomain.com]:5801, backupsExpected=0, backupsCompleted=0, connection=Connection [/172.17.0.10:59291 -> seed2-staging.somedomain.com/192.168.0.251:5801], endpoint=Address[seed2-staging.somedomain.com]:5801, alive=true, type=MEMBER}
[INFO ][] 2016-09-15 22:40:34.926 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Accepting socket connection from /172.17.0.1:58651
[INFO ][] 2016-09-15 22:40:34.927 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:5701 and /172.17.0.1:58651
[INFO ][] 2016-09-15 22:40:35.375 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Accepting socket connection from /172.17.0.1:58652
[INFO ][] 2016-09-15 22:40:35.377 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:5701 and /172.17.0.1:58652
[INFO ][] 2016-09-15 22:40:35.727 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Invocations:2 timeouts:1 backup-timeouts:0
[WARN ][] 2016-09-15 22:40:35.727 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] Asking if operation execution has been started: Invocation{serviceName='hz:core:clusterService', op=com.hazelcast.cluster.impl.operations.PostJoinOperation{identityHash=285902017, serviceName='hz:core:clusterService', partitionId=-1, replicaIndex=0, callId=44, invocationTime=1473978926070 (Thu Sep 15 22:35:26 GMT 2016), waitTimeout=-1, callTimeout=60000, operations=[com.hazelcast.spi.impl.eventservice.impl.operations.PostJoinRegistrationOperation{identityHash=1956436811, serviceName='null', partitionId=-1, replicaIndex=0, callId=0, invocationTime=-1 (Wed Dec 31 23:59:59 GMT 1969), waitTimeout=-1, callTimeout=9223372036854775807}, com.hazelcast.map.impl.operation.PostJoinMapOperation{identityHash=1355562486, serviceName='hz:impl:mapService', partitionId=-1, replicaIndex=0, callId=0, invocationTime=-1 (Wed Dec 31 23:59:59 GMT 1969), waitTimeout=-1, callTimeout=9223372036854775807}]}, partitionId=-1, replicaIndex=0, tryCount=100, tryPauseMillis=500, invokeCount=11, callTimeout=60000, target=Address[seed1-staging.somedomain.com]:5701, backupsExpected=0, backupsCompleted=0, connection=Connection [/172.17.0.10:5701 -> /172.17.0.1:58633], endpoint=Address[seed1-staging.somedomain.com]:5701, alive=true, type=MEMBER}
[INFO ][] 2016-09-15 22:40:36.727 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Invocations:3 timeouts:2 backup-timeouts:0
[WARN ][] 2016-09-15 22:40:36.728 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] 'is-executing': false -> Invocation{serviceName='hz:core:clusterService', op=com.hazelcast.cluster.impl.operations.PostJoinOperation{identityHash=285902017, serviceName='hz:core:clusterService', partitionId=-1, replicaIndex=0, callId=44, invocationTime=1473978926070 (Thu Sep 15 22:35:26 GMT 2016), waitTimeout=-1, callTimeout=60000, operations=[com.hazelcast.spi.impl.eventservice.impl.operations.PostJoinRegistrationOperation{identityHash=1956436811, serviceName='null', partitionId=-1, replicaIndex=0, callId=0, invocationTime=-1 (Wed Dec 31 23:59:59 GMT 1969), waitTimeout=-1, callTimeout=9223372036854775807}, com.hazelcast.map.impl.operation.PostJoinMapOperation{identityHash=1355562486, serviceName='hz:impl:mapService', partitionId=-1, replicaIndex=0, callId=0, invocationTime=-1 (Wed Dec 31 23:59:59 GMT 1969), waitTimeout=-1, callTimeout=9223372036854775807}]}, partitionId=-1, replicaIndex=0, tryCount=100, tryPauseMillis=500, invokeCount=11, callTimeout=60000, target=Address[seed1-staging.somedomain.com]:5701, backupsExpected=0, backupsCompleted=0, connection=Connection [/172.17.0.10:5701 -> /172.17.0.1:58633], endpoint=Address[seed1-staging.somedomain.com]:5701, alive=true, type=MEMBER}
[WARN ][] 2016-09-15 22:40:36.728 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] Asking if operation execution has been started: Invocation{serviceName='hz:core:clusterService', op=com.hazelcast.cluster.impl.operations.PostJoinOperation{identityHash=285902017, serviceName='hz:core:clusterService', partitionId=-1, replicaIndex=0, callId=44, invocationTime=1473978926070 (Thu Sep 15 22:35:26 GMT 2016), waitTimeout=-1, callTimeout=60000, operations=[com.hazelcast.spi.impl.eventservice.impl.operations.PostJoinRegistrationOperation{identityHash=1956436811, serviceName='null', partitionId=-1, replicaIndex=0, callId=0, invocationTime=-1 (Wed Dec 31 23:59:59 GMT 1969), waitTimeout=-1, callTimeout=9223372036854775807}, com.hazelcast.map.impl.operation.PostJoinMapOperation{identityHash=1355562486, serviceName='hz:impl:mapService', partitionId=-1, replicaIndex=0, callId=0, invocationTime=-1 (Wed Dec 31 23:59:59 GMT 1969), waitTimeout=-1, callTimeout=9223372036854775807}]}, partitionId=-1, replicaIndex=0, tryCount=100, tryPauseMillis=500, invokeCount=11, callTimeout=60000, target=Address[seed1-staging.somedomain.com]:5701, backupsExpected=0, backupsCompleted=0, connection=Connection [/172.17.0.10:5701 -> /172.17.0.1:58633], endpoint=Address[seed1-staging.somedomain.com]:5701, alive=true, type=MEMBER}
[WARN ][] 2016-09-15 22:40:36.935 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] No Invocation found for response: NormalResponse{callId=44, urgent=true, value=null, backupCount=0} sent from Address[seed1-staging.somedomain.com]:5701
[INFO ][] 2016-09-15 22:40:37.721 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Accepting socket connection from /172.17.0.1:58664
[INFO ][] 2016-09-15 22:40:37.723 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Established socket connection between /172.17.0.10:5701 and /172.17.0.1:58664
[INFO ][] 2016-09-15 22:40:37.728 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Invocations:1 timeouts:1 backup-timeouts:0
[WARN ][] 2016-09-15 22:40:37.728 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] 'is-executing': false -> Invocation{serviceName='hz:core:clusterService', op=com.hazelcast.cluster.impl.operations.PostJoinOperation{identityHash=285902017, serviceName='hz:core:clusterService', partitionId=-1, replicaIndex=0, callId=0, invocationTime=1473978926070 (Thu Sep 15 22:35:26 GMT 2016), waitTimeout=-1, callTimeout=60000, operations=[com.hazelcast.spi.impl.eventservice.impl.operations.PostJoinRegistrationOperation{identityHash=1956436811, serviceName='null', partitionId=-1, replicaIndex=0, callId=0, invocationTime=-1 (Wed Dec 31 23:59:59 GMT 1969), waitTimeout=-1, callTimeout=9223372036854775807}, com.hazelcast.map.impl.operation.PostJoinMapOperation{identityHash=1355562486, serviceName='hz:impl:mapService', partitionId=-1, replicaIndex=0, callId=0, invocationTime=-1 (Wed Dec 31 23:59:59 GMT 1969), waitTimeout=-1, callTimeout=9223372036854775807}]}, partitionId=-1, replicaIndex=0, tryCount=100, tryPauseMillis=500, invokeCount=11, callTimeout=60000, target=Address[seed1-staging.somedomain.com]:5701, backupsExpected=0, backupsCompleted=0, connection=Connection [/172.17.0.10:5701 -> /172.17.0.1:58633], endpoint=Address[seed1-staging.somedomain.com]:5701, alive=true, type=MEMBER}
[INFO ][] 2016-09-15 22:40:38.143 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] HazelcastInstance starting after waiting for cluster size of 3
[INFO ][] 2016-09-15 22:40:38.144 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Address[test-03.somedomain.com]:10089 is STARTED
[DEBUG][] 2016-09-15 22:40:38.270 (BaseQueryBuilder.java:254): DB-Query for Company took 2066 ms
[DEBUG][] 2016-09-15 22:40:38.383 (BaseQueryBuilder.java:254): DB-Query for Contact took 2339 ms
[INFO ][] 2016-09-15 22:40:39.729 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Invocations:3 timeouts:1 backup-timeouts:0
[WARN ][] 2016-09-15 22:40:39.729 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] Asking if operation execution has been started: Invocation{serviceName='hz:impl:multiMapService', op=com.hazelcast.spi.impl.eventservice.impl.operations.RegistrationOperation{identityHash=77501496, serviceName='hz:impl:multiMapService', partitionId=-1, replicaIndex=0, callId=58, invocationTime=1473978929635 (Thu Sep 15 22:35:29 GMT 2016), waitTimeout=-1, callTimeout=60000}, partitionId=-1, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[test-03.somedomain.com]:10071, backupsExpected=0, backupsCompleted=0, connection=Connection [/172.17.0.10:60617 -> test-03.somedomain.com/192.168.0.251:10071], endpoint=Address[test-03.somedomain.com]:10071, alive=true, type=MEMBER}
[INFO ][] 2016-09-15 22:40:40.730 (Slf4jFactory.java:64): [test-03.somedomain.com]:10089 [dev] [3.6.3] Invocations:3 timeouts:1 backup-timeouts:0
[WARN ][] 2016-09-15 22:40:40.731 (Slf4jFactory.java:66): [test-03.somedomain.com]:10089 [dev] [3.6.3] 'is-executing': false -> Invocation{serviceName='hz:impl:multiMapService', op=com.hazelcast.spi.impl.eventservice.impl.operations.RegistrationOperation{identityHash=77501496, serviceName='hz:impl:multiMapService', partitionId=-1, replicaIndex=0, callId=58, invocationTime=1473978929635 (Thu Sep 15 22:35:29 GMT 2016), waitTimeout=-1, callTimeout=60000}, partitionId=-1, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[test-03.somedomain.com]:10071, backupsExpected=0, backupsCompleted=0, connection=Connection [/172.17.0.10:60617 -> test-03.somedomain.com/192.168.0.251:10071], endpoint=Address[test-03.somedomain.com]:10071, alive=true, type=MEMBER}
[ERROR][] 2016-09-15 22:40:40.750 (VertxImpl.java:193): Failed to start event bus
com.hazelcast.core.HazelcastException: java.util.concurrent.ExecutionException: com.hazelcast.core.OperationTimeoutException: No response for 120000 ms. Aborting invocation! Invocation{serviceName='hz:impl:multiMapService', op=com.hazelcast.spi.impl.eventservice.impl.operations.RegistrationOperation{identityHash=77501496, serviceName='hz:impl:multiMapService', partitionId=-1, replicaIndex=0, callId=58, invocationTime=1473978929635 (Thu Sep 15 22:35:29 GMT 2016), waitTimeout=-1, callTimeout=60000}, partitionId=-1, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[test-03.somedomain.com]:10071, backupsExpected=0, backupsCompleted=0, connection=Connection [/172.17.0.10:60617 -> test-03.somedomain.com/192.168.0.251:10071], endpoint=Address[test-03.somedomain.com]:10071, alive=true, type=MEMBER} No response has been received!  backups-expected:0 backups-completed: 0
        at com.hazelcast.spi.impl.eventservice.impl.FutureUtilExceptionHandler.handleException(FutureUtilExceptionHandler.java:41) ~[hazelcast-3.6.3.jar:3.6.3]
        at com.hazelcast.util.FutureUtil.waitWithDeadline(FutureUtil.java:280) ~[hazelcast-3.6.3.jar:3.6.3]
        at com.hazelcast.util.FutureUtil.waitWithDeadline(FutureUtil.java:252) ~[hazelcast-3.6.3.jar:3.6.3]
        at com.hazelcast.spi.impl.eventservice.impl.EventServiceImpl.invokeRegistrationOnOtherNodes(EventServiceImpl.java:246) ~[hazelcast-3.6.3.jar:3.6.3]
        at com.hazelcast.spi.impl.eventservice.impl.EventServiceImpl.registerListenerInternal(EventServiceImpl.java:200) ~[hazelcast-3.6.3.jar:3.6.3]
        at com.hazelcast.spi.impl.eventservice.impl.EventServiceImpl.registerListener(EventServiceImpl.java:181) ~[hazelcast-3.6.3.jar:3.6.3]
        at com.hazelcast.multimap.impl.MultiMapService.addListener(MultiMapService.java:211) ~[hazelcast-3.6.3.jar:3.6.3]
        at com.hazelcast.multimap.impl.ObjectMultiMapProxy.addEntryListener(ObjectMultiMapProxy.java:228) ~[hazelcast-3.6.3.jar:3.6.3]
        at io.vertx.spi.cluster.hazelcast.impl.HazelcastAsyncMultiMap.<init>(HazelcastAsyncMultiMap.java:64) ~[vertx-hazelcast-3.3.2.jar:na]
        at io.vertx.spi.cluster.hazelcast.HazelcastClusterManager.lambda$getAsyncMultiMap$1(HazelcastClusterManager.java:160) ~[vertx-hazelcast-3.3.2.jar:na]
        at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$2(ContextImpl.java:303) ~[vertx-core-3.3.2.jar:na]
        at io.vertx.core.impl.OrderedExecutorFactory$OrderedExecutor.lambda$new$0(OrderedExecutorFactory.java:94) ~[vertx-core-3.3.2.jar:na]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_91]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_91]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
Caused by: java.util.concurrent.ExecutionException: com.hazelcast.core.OperationTimeoutException: No response for 120000 ms. Aborting invocation! Invocation{serviceName='hz:impl:multiMapService', op=com.hazelcast.spi.impl.eventservice.impl.operations.RegistrationOperation{identityHash=77501496, serviceName='hz:impl:multiMapService', partitionId=-1, replicaIndex=0, callId=58, invocationTime=1473978929635 (Thu Sep 15 22:35:29 GMT 2016), waitTimeout=-1, callTimeout=60000}, partitionId=-1, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[test-03.somedomain.com]:10071, backupsExpected=0, backupsCompleted=0, connection=Connection [/172.17.0.10:60617 -> test-03.somedomain.com/192.168.0.251:10071], endpoint=Address[test-03.somedomain.com]:10071, alive=true, type=MEMBER} No response has been received!  backups-expected:0 backups-completed: 0
        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicationResponseOrThrowException(InvocationFuture.java:360) ~[hazelcast-3.6.3.jar:3.6.3]
        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:225) ~[hazelcast-3.6.3.jar:3.6.3]
        at com.hazelcast.util.FutureUtil.executeWithDeadline(FutureUtil.java:294) ~[hazelcast-3.6.3.jar:3.6.3]
        at com.hazelcast.util.FutureUtil.waitWithDeadline(FutureUtil.java:278) ~[hazelcast-3.6.3.jar:3.6.3]
        ... 13 common frames omitted
Caused by: com.hazelcast.core.OperationTimeoutException: No response for 120000 ms. Aborting invocation! Invocation{serviceName='hz:impl:multiMapService', op=com.hazelcast.spi.impl.eventservice.impl.operations.RegistrationOperation{identityHash=77501496, serviceName='hz:impl:multiMapService', partitionId=-1, replicaIndex=0, callId=58, invocationTime=1473978929635 (Thu Sep 15 22:35:29 GMT 2016), waitTimeout=-1, callTimeout=60000}, partitionId=-1, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[test-03.somedomain.com]:10071, backupsExpected=0, backupsCompleted=0, connection=Connection [/172.17.0.10:60617 -> test-03.somedomain.com/192.168.0.251:10071], endpoint=Address[test-03.somedomain.com]:10071, alive=true, type=MEMBER} No response has been received!  backups-expected:0 backups-completed: 0
        at com.hazelcast.spi.impl.operationservice.impl.Invocation.newOperationTimeoutException(Invocation.java:536) ~[hazelcast-3.6.3.jar:3.6.3]
        at com.hazelcast.spi.impl.operationservice.impl.IsStillRunningService$IsOperationStillRunningCallback.setOperationTimeout(IsStillRunningService.java:241) ~[hazelcast-3.6.3.jar:3.6.3]
        at com.hazelcast.spi.impl.operationservice.impl.IsStillRunningService$IsOperationStillRunningCallback.onResponse(IsStillRunningService.java:229) ~[hazelcast-3.6.3.jar:3.6.3]
        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture$1.run(InvocationFuture.java:127) ~[hazelcast-3.6.3.jar:3.6.3]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_91]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_91]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
        at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76) ~[hazelcast-3.6.3.jar:3.6.3]
        at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:92) ~[hazelcast-3.6.3.jar:3.6.3]
        at ------ End remote and begin local stack-trace ------.(Unknown Source) ~[na:na]
        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicationResponse(InvocationFuture.java:387) ~[hazelcast-3.6.3.jar:3.6.3]
        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicationResponseOrThrowException(InvocationFuture.java:337) ~[hazelcast-3.6.3.jar:3.6.3]
        ... 16 common frames omitted


I know all this is still very unspecific, but so far I've been unable to reproduce it locally, so it's hard to really track it down. If you have any pointers on what I could look into to debug this further, I would be very grateful! It doesn't happen all the time and only on this test-cluster (thankfully not in production...). 

Could this also be related to high load (since the nodes are basically all running on the same physical machine for this test cluster and it sometimes experiences high loads because of it)?

Cheers, 
David


-- 
David Klotz, Software Engineering, fileee.com
Windthorststr. 68, 48143 Münster, Germany 

Thomas SEGISMONT

unread,
Sep 22, 2016, 12:02:16 PM9/22/16
to ve...@googlegroups.com
Make sure you Docker images are setup correctly to transfer the SIGTERM signal to the JVM. Otherwise nodes being shutdown cannot signal their leaving the cluster.

Also, try to not shutdown an entire cluster at once. Give the nodes some time to rebalance data and establish a new cluster view. You could wait a few seconds between node restarts.

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/196190f6-27e6-45e8-89c1-d3e05a8d5a38%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages