OperationTimeoutException on a local node

949 views

Skip to first unread message

ivenhov

unread,

Feb 29, 2020, 6:06:18 AM2/29/20

to Hazelcast

I have a strange problem

Hazelcast with 3 members, version 3.9.4

As part of an upgrade procedure node 10.173.240.3 was stopped, node2 started having some problems interacting with Hazelcast.

Node 3 logs

INFO 2020-02-25 14:01:43.707 [Thread-1 ] log() [10.173.240.3]:7008 [matrix] [3.9.4] [10.173.240.3]:7008 is SHUTTING_DOWN (LifecycleService.java:65)

INFO 2020-02-25 14:01:43.708 [Thread-1 ] log() [10.173.240.3]:7008 [matrix] [3.9.4] Shutdown request of [10.173.240.3]:7008 is handled (MigrationManager.java:65)

INFO 2020-02-25 14:01:43.718 [_hzInstance_1_matrix.migration] log() [10.173.240.3]:7008 [matrix] [3.9.4] Re-partitioning cluster data... Migration queue size: 180 (MigrationManager.java:65)

INFO 2020-02-25 14:01:44.844 [Thread-1 ] log() [10.173.240.3]:7008 [matrix] [3.9.4] Shutting down connection manager... (Node.java:65)

INFO 2020-02-25 14:01:44.845 [Thread-1 ] log() [10.173.240.3]:7008 [matrix] [3.9.4] Connection[id=4, /10.173.240.3:7008->/10.173.240.2:57241, endpoint=[10.173.240.2]:7008, alive=false, type=MEMBER] closed. Reason: TcpIpConnectionManager is stopping (TcpIpConnection.java:65)

INFO 2020-02-25 14:01:44.845 [Thread-1 ] log() [10.173.240.3]:7008 [matrix] [3.9.4] Connection[id=3, /10.173.240.3:7008->/10.173.240.1:51229, endpoint=[10.173.240.1]:7008, alive=false, type=MEMBER] closed. Reason: TcpIpConnectionManager is stopping (TcpIpConnection.java:65)

INFO 2020-02-25 14:01:44.846 [Thread-1 ] log() [10.173.240.3]:7008 [matrix] [3.9.4] Shutting down node engine... (Node.java:65)

INFO 2020-02-25 14:01:44.920 [Thread-1 ] log() [10.173.240.3]:7008 [matrix] [3.9.4] Destroying node NodeExtension. (NodeExtension.java:65)

INFO 2020-02-25 14:01:44.921 [Thread-1 ] log() [10.173.240.3]:7008 [matrix] [3.9.4] Hazelcast Shutdown is completed in 1213 ms. (Node.java:65)

INFO 2020-02-25 14:01:44.921 [Thread-1 ] log() [10.173.240.3]:7008 [matrix] [3.9.4] [10.173.240.3]:7008 is SHUTDOWN (LifecycleService.java:65)

What I get in the log on node 10.173.240.2, this is 3 mins later

WARN 2020-02-25 14:04:23.499 [NodeListRebuilderTh ] run() Exception during background node list rebuild. Happened: 1 in the last: 1m (TeamNodesProvider.java:238)

com.hazelcast.core.OperationTimeoutException: PutOperation invocation failed to complete due to operation-heartbeat-timeout. Current time: 2020-02-25 14:04:23.497. Start time: 2020-02-25 14:02:23.167. Total elapsed time: 120330 ms. Last operation heartbeat: never. Last operation heartbeat from member: 2020-02-25 14:04:08.499. Invocation{op=com.hazelcast.map.impl.operation.PutOperation{serviceName='hz:impl:mapService', identityHash=1978584929, partitionId=27, replicaIndex=0, callId=-4197, invocationTime=1582639343165 (2020-02-25 14:02:23.165), waitTimeout=-1, callTimeout=60000, name=inmemMap}, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeoutMillis=60000, firstInvocationTimeMs=1582639343167, firstInvocationTime='2020-02-25 14:02:23.167', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[10.173.240.2]:7008, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=null}

at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.newOperationTimeoutException(InvocationFuture.java:164)

at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:106)

at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrowIfException(InvocationFuture.java:79)

at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:155)

at com.hazelcast.map.impl.proxy.MapProxySupport.invokeOperation(MapProxySupport.java:424)

at com.hazelcast.map.impl.proxy.MapProxySupport.putInternal(MapProxySupport.java:389)

at com.hazelcast.map.impl.proxy.NearCachedMapProxyImpl.putInternal(NearCachedMapProxyImpl.java:164)

at com.hazelcast.map.impl.proxy.MapProxyImpl.put(MapProxyImpl.java:131)

at com.hazelcast.map.impl.proxy.MapProxyImpl.put(MapProxyImpl.java:122)

at com.some.package.write(Class.java:731)

This continues in he same fashion, with write failing

INFO 2020-02-25 14:14:30.497 [matrix.InvocationMonitorThread] log() [10.173.240.2]:7008 [matrix] [3.9.4] Invocations:32 timeouts:1 backup-timeouts:0 (InvocationMonitor.java:65)

WARN 2020-02-25 14:14:30.499 [NodeListRebuilderTh ] run() Exception during background node list rebuild. Happened: 1 in the last: 2m1s (TeamNodesProvider.java:238)

com.hazelcast.core.OperationTimeoutException: PutOperation invocation failed to complete due to operation-heartbeat-timeout. Current time: 2020-02-25 14:14:30.497. Start time: 2020-02-25 14:12:30.499. Total elapsed time: 119998 ms. Last operation heartbeat: never. Last operation heartbeat from member: 2020-02-25 14:14:23.499. Invocation{op=com.hazelcast.map.impl.operation.PutOperation{serviceName='hz:impl:mapService', identityHash=675701052, partitionId=27, replicaIndex=0, callId=-4620, invocationTime=1582639950497 (2020-02-25 14:12:30.497), waitTimeout=-1, callTimeout=60000, name=inmemMap}, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeoutMillis=60000, firstInvocationTimeMs=1582639950499, firstInvocationTime='2020-02-25 14:12:30.499', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[10.173.240.2]:7008, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=null}