We are copying over some 200K data objects(apprx 110 MB) into Hazelcast cache asynchronously. This process happens every 5 minutes during load testing in a cluster of 5 nodes.
policyMap.putAsync(key, policy);
After running load tests for couple of days, we are seeing couple of thousands of data objects being lost from cache. We see following warning in the logs, and we suspect that these warnings has something to do with data loss.
I suspect that Hazelcast threads are backing up as we are using putAsync, and maybe timing out, and causing data loss. I am wondering if you guys have seen something like this, or have any insight?
2017-06-12 16:17:43.154 WARN (hz._hzInstance_1_Secure Email Gateway.async.thread-8) [c.h.s.i.o.i.Invocation] - [192.168.123.123]:5701 [Secure Email Gateway] [3.6.3] 'is-executing': false -> Invocation{serviceName='hz:impl:mapService', op=com.hazelcast.map.impl.operation.PutOperation{identityHash=1047065014, serviceName='hz:impl:mapService', partitionId=65, replicaIndex=0, callId=0, invocationTime=1497298625311 (Mon Jun 12 16:17:05 EDT 2017), waitTimeout=-1, callTimeout=8000, name=key=encryption.key.data.payload.info-index-1, name=key=encryption.key.data.payload.info-index-1}, partitionId=65, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=8000, target=Address[192.168.123.144]:5701, backupsExpected=0, backupsCompleted=0, connection=Connection [/
192.168.123.123:5701 -> /
192.168.123.144:5988], endpoint=Address[192.168.123.144]:5701, alive=true, type=MEMBER}
2017-06-12 16:17:43.185 WARN (cached7) [c.h.s.i.o.i.Invocation] - [192.168.123.123]:5701 [Secure Email Gateway] [3.6.3] Asking if operation execution has been started: Invocation{serviceName='hz:impl:mapService', op=com.hazelcast.map.impl.query.QueryPartitionOperation{identityHash=1413917444, serviceName='hz:impl:mapService', partitionId=137, replicaIndex=0, callId=5794549, invocationTime=1497298633217 (Mon Jun 12 16:17:13 EDT 2017), waitTimeout=-1, callTimeout=8000, name=key=device.policy.info-index-0}, partitionId=137, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=8000, target=Address[192.168.123.144]:5701, backupsExpected=0, backupsCompleted=0, connection=Connection [/
192.168.123.123:5701 -> /
192.168.123.144:5988], endpoint=Address[192.168.123.144]:5701, alive=true, type=MEMBER}
2017-06-12 16:17:43.547 WARN (hz._hzInstance_1_Secure Email Gateway.response) [c.h.s.OperationService] - [192.168.123.123]:5701 [Secure Email Gateway] [3.6.3] No Invocation found for response: NormalResponse{callId=5803881, urgent=true, value=false, backupCount=0} sent from Address[192.168.123.144]:5701
Seeing this too, but very few instances as compared to above warning messages.
2017-06-12 16:17:43.388 WARN (vert.x-eventloop-thread-8) [c.h.s.i.o.i.Invocation] - [192.168.123.123]:5701 [Secure Email Gateway] [3.6.3] While asking 'is-executing': Invocation{serviceName='hz:impl:mapService', op=com.hazelcast.map.impl.operation.SetOperation{identityHash=1267366088, serviceName='hz:impl:mapService', partitionId=39, replicaIndex=0, callId=5794633, invocationTime=1497298638468 (Mon Jun 12 16:17:18 EDT 2017), waitTimeout=-1, callTimeout=8000, name=key=account.policy.info-index-1, name=key=account.policy.info-index-1}, partitionId=39, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=8000, target=Address[192.168.123.144]:5701, backupsExpected=0, backupsCompleted=1, connection=Connection [/
192.168.123.123:5701 -> /
192.168.123.144:5988], endpoint=Address[192.168.123.144]:5701, alive=true, type=MEMBER}
java.util.concurrent.TimeoutException: Call Invocation{serviceName='hz:impl:mapService', op=com.hazelcast.spi.impl.operationservice.impl.operations.IsStillExecutingOperation{identityHash=501153906, serviceName='hz:impl:mapService', partitionId=-1, replicaIndex=0, callId=6003108, invocationTime=1497298657291 (Mon Jun 12 16:17:37 EDT 2017), waitTimeout=-1, callTimeout=5000}, partitionId=-1, replicaIndex=0, tryCount=0, tryPauseMillis=0, invokeCount=1, callTimeout=5000, target=Address[192.168.123.144]:5701, backupsExpected=0, backupsCompleted=0, connection=Connection [/
192.168.123.123:5701 -> /
192.168.123.144:5988], endpoint=Address[192.168.123.144]:5701, alive=true, type=MEMBER} encountered a timeout
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicationResponse(InvocationFuture.java:369)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicationResponseOrThrowException(InvocationFuture.java:337)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:225)
at com.hazelcast.spi.impl.operationservice.impl.IsStillRunningService.isOperationExecuting(IsStillRunningService.java:85)