Could not join cluster. Shutting down now!

346 views
Skip to first unread message

Puneet Sharma

unread,
Dec 4, 2021, 9:26:50 AM12/4/21
to Hazelcast
HI All,

I am trying to debug an issue where running cluster stops unpredictably.
Hazelcast(v3.12.8) is embedded with spring boot application & 2 nodes are forming a cluster.

I have tried restarting node but its too complex with spring managed beans.

1) If someone can suggest a way to fix this error OR
2) A clean way to restart node.
Note:- tried restarting with Spring boot's RestartEndpoint but its not working as application is deployed on Weblogic using ServletInitializer.


Attaching logs from both nodes, please see if anyone has fixed this kind of issue before.

Puneet Sharma

unread,
Dec 4, 2021, 9:32:52 AM12/4/21
to Hazelcast
NOde 1
2021-12-01 04:55:55,093 WARN  []InvocationMonitorThread] com.hazelcast.spi.impl.operationservice.impl.Invocation - [10.217.87.58]:28175 [dev] [3.12.8] Retrying invocation: Invocation{op=com.hazelcast.cache.impl.operation.CacheGetOperation{serviceName='hz:impl:cacheService', identityHash=1963024162, partitionId=269, replicaIndex=0, callId=241363, invocationTime=1638308277402 (2021-11-30 22:37:57.402), waitTimeout=-1, callTimeout=60000, name=/hz/ProposalCache}, tryCount=250, tryPauseMillis=500, invokeCount=190, callTimeoutMillis=60000, firstInvocationTimeMs=1638330888988, firstInvocationTime='2021-12-01 04:54:48.988', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 01:00:00.000', target=null, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=null}, Reason: com.hazelcast.spi.exception.WrongTargetException: WrongTarget! local: Member [10.217.87.58]:28175 - 2d9c973b-4889-4d7e-bf9f-1ebe6855cb57 this, expected-target: null, partitionId: 269, replicaIndex: 0, operation: com.hazelcast.cache.impl.operation.CacheGetOperation, service: hz:impl:cacheService
2021-12-01 04:55:58,101 WARN  []InvocationMonitorThread] com.hazelcast.spi.impl.operationservice.impl.Invocation - [10.217.87.58]:28175 [dev] [3.12.8] Retrying invocation: Invocation{op=com.hazelcast.cache.impl.operation.CacheGetOperation{serviceName='hz:impl:cacheService', identityHash=1963024162, partitionId=269, replicaIndex=0, callId=241373, invocationTime=1638308280411 (2021-11-30 22:38:00.411), waitTimeout=-1, callTimeout=60000, name=/hz/ProposalCache}, tryCount=250, tryPauseMillis=500, invokeCount=200, callTimeoutMillis=60000, firstInvocationTimeMs=1638330888988, firstInvocationTime='2021-12-01 04:54:48.988', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 01:00:00.000', target=null, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=null}, Reason: com.hazelcast.spi.exception.WrongTargetException: WrongTarget! local: Member [10.217.87.58]:28175 - 2d9c973b-4889-4d7e-bf9f-1ebe6855cb57 this, expected-target: null, partitionId: 269, replicaIndex: 0, operation: com.hazelcast.cache.impl.operation.CacheGetOperation, service: hz:impl:cacheService
2021-12-01 04:56:00,607 WARN  []InvocationMonitorThread] com.hazelcast.spi.impl.operationservice.impl.Invocation - [10.217.87.58]:28175 [dev] [3.12.8] Retrying invocation: Invocation{op=com.hazelcast.cache.impl.operation.CacheGetOperation{serviceName='hz:impl:cacheService', identityHash=1963024162, partitionId=269, replicaIndex=0, callId=241383, invocationTime=1638308282917 (2021-11-30 22:38:02.917), waitTimeout=-1, callTimeout=60000, name=/hz/ProposalCache}, tryCount=250, tryPauseMillis=500, invokeCount=210, callTimeoutMillis=60000, firstInvocationTimeMs=1638330888988, firstInvocationTime='2021-12-01 04:54:48.988', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 01:00:00.000', target=null, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=null}, Reason: com.hazelcast.spi.exception.WrongTargetException: WrongTarget! local: Member [10.217.87.58]:28175 - 2d9c973b-4889-4d7e-bf9f-1ebe6855cb57 this, expected-target: null, partitionId: 269, replicaIndex: 0, operation: com.hazelcast.cache.impl.operation.CacheGetOperation, service: hz:impl:cacheService
2021-12-01 04:56:05,610 WARN  []InvocationMonitorThread] com.hazelcast.spi.impl.operationservice.impl.Invocation - [10.217.87.58]:28175 [dev] [3.12.8] Retrying invocation: Invocation{op=com.hazelcast.cache.impl.operation.CacheGetOperation{serviceName='hz:impl:cacheService', identityHash=1963024162, partitionId=269, replicaIndex=0, callId=241394, invocationTime=1638308287919 (2021-11-30 22:38:07.919), waitTimeout=-1, callTimeout=60000, name=/hz/ProposalCache}, tryCount=250, tryPauseMillis=500, invokeCount=220, callTimeoutMillis=60000, firstInvocationTimeMs=1638330888988, firstInvocationTime='2021-12-01 04:54:48.988', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 01:00:00.000', target=null, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=null}, Reason: com.hazelcast.spi.exception.WrongTargetException: WrongTarget! local: Member [10.217.87.58]:28175 - 2d9c973b-4889-4d7e-bf9f-1ebe6855cb57 this, expected-target: null, partitionId: 269, replicaIndex: 0, operation: com.hazelcast.cache.impl.operation.CacheGetOperation, service: hz:impl:cacheService
2021-12-01 04:56:09,613 WARN  []InvocationMonitorThread] com.hazelcast.spi.impl.operationservice.impl.Invocation - [10.217.87.58]:28175 [dev] [3.12.8] Retrying invocation: Invocation{op=com.hazelcast.cache.impl.operation.CacheGetOperation{serviceName='hz:impl:cacheService', identityHash=1963024162, partitionId=269, replicaIndex=0, callId=241404, invocationTime=1638308291923 (2021-11-30 22:38:11.923), waitTimeout=-1, callTimeout=60000, name=/hz/ProposalCache}, tryCount=250, tryPauseMillis=500, invokeCount=230, callTimeoutMillis=60000, firstInvocationTimeMs=1638330888988, firstInvocationTime='2021-12-01 04:54:48.988', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 01:00:00.000', target=null, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=null}, Reason: com.hazelcast.spi.exception.WrongTargetException: WrongTarget! local: Member [10.217.87.58]:28175 - 2d9c973b-4889-4d7e-bf9f-1ebe6855cb57 this, expected-target: null, partitionId: 269, replicaIndex: 0, operation: com.hazelcast.cache.impl.operation.CacheGetOperation, service: hz:impl:cacheService
2021-12-01 04:56:11,125 WARN  []InvocationMonitorThread] com.hazelcast.spi.impl.operationservice.impl.Invocation - [10.217.87.58]:28175 [dev] [3.12.8] Retrying invocation: Invocation{op=com.hazelcast.cache.impl.operation.CacheGetOperation{serviceName='hz:impl:cacheService', identityHash=1963024162, partitionId=269, replicaIndex=0, callId=241414, invocationTime=1638308293435 (2021-11-30 22:38:13.435), waitTimeout=-1, callTimeout=60000, name=/hz/ProposalCache}, tryCount=250, tryPauseMillis=500, invokeCount=240, callTimeoutMillis=60000, firstInvocationTimeMs=1638330888988, firstInvocationTime='2021-12-01 04:54:48.988', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 01:00:00.000', target=null, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=null}, Reason: com.hazelcast.spi.exception.WrongTargetException: WrongTarget! local: Member [10.217.87.58]:28175 - 2d9c973b-4889-4d7e-bf9f-1ebe6855cb57 this, expected-target: null, partitionId: 269, replicaIndex: 0, operation: com.hazelcast.cache.impl.operation.CacheGetOperation, service: hz:impl:cacheService
2021-12-01 04:57:03,861 ERROR []cached.thread-10] com.hazelcast.instance.Node - [10.217.87.58]:28175 [dev] [3.12.8] Could not join cluster. Shutting down now!
2021-12-01 04:57:03,873 INFO  []cached.thread-10] com.hazelcast.core.LifecycleService - [10.217.87.58]:28175 [dev] [3.12.8] [10.217.87.58]:28175 is SHUTTING_DOWN
2021-12-01 04:57:03,880 WARN  []cached.thread-10] com.hazelcast.instance.Node - [10.217.87.58]:28175 [dev] [3.12.8] Terminating forcefully...
2021-12-01 04:57:03,880 INFO  []cached.thread-10] com.hazelcast.instance.Node - [10.217.87.58]:28175 [dev] [3.12.8] Shutting down connection manager...
2021-12-01 04:57:03,885 INFO  []cached.thread-10] com.hazelcast.nio.tcp.TcpIpConnection - [10.217.87.58]:28175 [dev] [3.12.8] Connection[id=71, /10.217.87.58:48470->/10.216.38.210:28175, qualifier=null, endpoint=[10.216.38.210]:28175, alive=false, type=NONE] closed. Reason: EndpointManager is stopping
2021-12-01 04:57:03,893 INFO  []cached.thread-10] com.hazelcast.instance.Node - [10.217.87.58]:28175 [dev] [3.12.8] Shutting down node engine...
2021-12-01 04:57:06,922 INFO  []cached.thread-10] com.hazelcast.instance.NodeExtension - [10.217.87.58]:28175 [dev] [3.12.8] Destroying node NodeExtension.
2021-12-01 04:57:06,925 INFO  []cached.thread-10] com.hazelcast.instance.Node - [10.217.87.58]:28175 [dev] [3.12.8] Hazelcast Shutdown is completed in 3045 ms.
2021-12-01 04:57:06,925 INFO  []cached.thread-10] com.hazelcast.core.LifecycleService - [10.217.87.58]:28175 [dev] [3.12.8] [10.217.87.58]:28175 is SHUTDOWN
2021-12-01 04:57:06,926 INFO  []cached.thread-10] com.hazelcast.core.LifecycleService - [10.217.87.58]:28175 [dev] [3.12.8] [10.217.87.58]:28175 is MERGE_FAILED






Node 2
2021-12-01 04:51:43,785 INFO  [] [] [hz.apt-uat-java-hazelcast-instance.IO.thread-in-2] com.hazelcast.nio.tcp.TcpIpConnection - [10.216.38.210]:28175 [dev] [3.12.8] Initialized new cluster connection between /10.216.38.210:28175 and /10.217.87.58:48470
2021-12-01 04:57:41,553 INFO  [] [] [hz.apt-uat-java-hazelcast-instance.IO.thread-in-2] com.hazelcast.nio.tcp.TcpIpConnection - [10.216.38.210]:28175 [dev] [3.12.8] Connection[id=100, /10.216.38.210:28175->/10.217.87.58:48470, qualifier=null, endpoint=[10.217.87.58]:28175, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
2021-12-01 04:57:33,503 INFO  [] [] [hz.apt-uat-java-hazelcast-instance.cached.thread-19] com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager - [10.216.38.210]:28175 [dev] [3.12.8] System clock apparently jumped from 2021-12-01 04:52:39.438 to 2021-12-01 04:57:33.502 since last heartbeat (+289064 ms)
2021-12-01 04:57:41,553 WARN  [] [] [hz.apt-uat-java-hazelcast-instance.cached.thread-19] com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager - [10.216.38.210]:28175 [dev] [3.12.8] Resetting heartbeat timestamps because of huge system clock jump! Clock-Jump: 289064 ms, Heartbeat-Timeout: 60000 ms
2021-12-01 04:57:33,180 WARN  [] [] [hz.apt-uat-java-hazelcast-instance.InvocationMonitorThread] com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor - [10.216.38.210]:28175 [dev] [3.12.8] MonitorInvocationsTask delayed 288655 ms
2021-12-01 04:57:41,561 WARN  [] [] [hz.apt-uat-java-hazelcast-instance.InvocationMonitorThread] com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor - [10.216.38.210]:28175 [dev] [3.12.8] BroadcastOperationControlTask delayed 297029 ms
2021-12-01 04:51:43,766 INFO  [] [] [hz.apt-uat-java-hazelcast-instance.IO.thread-in-1] com.hazelcast.nio.tcp.TcpIpConnection - [10.216.38.210]:28175 [dev] [3.12.8] Connection[id=99, /10.216.38.210:28175->/10.217.87.58:57854, qualifier=null, endpoint=[10.217.87.58]:28175, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
2021-12-01 05:01:53,048 WARN  [] [] [hz.apt-uat-java-hazelcast-instance.InvocationMonitorThread] com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor - [10.216.38.210]:28175 [dev] [3.12.8] MonitorInvocationsTask delayed 169523 ms


Reply all
Reply to author
Forward
0 new messages