Data loss while copying data from one cache map to another

30 views
Skip to first unread message

Manish Kumar

unread,
Jun 14, 2017, 12:43:49 PM6/14/17
to Hazelcast
We are copying over some 200K data objects(apprx 110 MB) into Hazelcast cache asynchronously. This process happens every 5 minutes during load testing in a cluster of 5 nodes.
 
policyMap.putAsync(key, policy);
 
After running load tests for couple of days, we are seeing couple of thousands of data objects being lost from cache. We see following warning in the logs, and we suspect that these warnings has something to do with data loss. 
I suspect that Hazelcast threads are backing up as we are using putAsync, and maybe timing out, and causing data loss. I am wondering if you guys have seen something like this, or have any insight?
 
 
2017-06-12 16:17:43.154 WARN (hz._hzInstance_1_Secure Email Gateway.async.thread-8) [c.h.s.i.o.i.Invocation] - [192.168.123.123]:5701 [Secure Email Gateway] [3.6.3] 'is-executing': false -> Invocation{serviceName='hz:impl:mapService', op=com.hazelcast.map.impl.operation.PutOperation{identityHash=1047065014, serviceName='hz:impl:mapService', partitionId=65, replicaIndex=0, callId=0, invocationTime=1497298625311 (Mon Jun 12 16:17:05 EDT 2017), waitTimeout=-1, callTimeout=8000, name=key=encryption.key.data.payload.info-index-1, name=key=encryption.key.data.payload.info-index-1}, partitionId=65, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=8000, target=Address[192.168.123.144]:5701, backupsExpected=0, backupsCompleted=0, connection=Connection [/192.168.123.123:5701 -> /192.168.123.144:5988], endpoint=Address[192.168.123.144]:5701, alive=true, type=MEMBER}
 
 
2017-06-12 16:17:43.185 WARN (cached7) [c.h.s.i.o.i.Invocation] - [192.168.123.123]:5701 [Secure Email Gateway] [3.6.3] Asking if operation execution has been started: Invocation{serviceName='hz:impl:mapService', op=com.hazelcast.map.impl.query.QueryPartitionOperation{identityHash=1413917444, serviceName='hz:impl:mapService', partitionId=137, replicaIndex=0, callId=5794549, invocationTime=1497298633217 (Mon Jun 12 16:17:13 EDT 2017), waitTimeout=-1, callTimeout=8000, name=key=device.policy.info-index-0}, partitionId=137, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=8000, target=Address[192.168.123.144]:5701, backupsExpected=0, backupsCompleted=0, connection=Connection [/192.168.123.123:5701 -> /192.168.123.144:5988], endpoint=Address[192.168.123.144]:5701, alive=true, type=MEMBER}
 
 
2017-06-12 16:17:43.547 WARN (hz._hzInstance_1_Secure Email Gateway.response) [c.h.s.OperationService] - [192.168.123.123]:5701 [Secure Email Gateway] [3.6.3] No Invocation found for response: NormalResponse{callId=5803881, urgent=true, value=false, backupCount=0} sent from Address[192.168.123.144]:5701
 
 
Seeing this too, but very few instances as compared to above warning messages.
 
2017-06-12 16:17:43.388 WARN (vert.x-eventloop-thread-8) [c.h.s.i.o.i.Invocation] - [192.168.123.123]:5701 [Secure Email Gateway] [3.6.3] While asking 'is-executing': Invocation{serviceName='hz:impl:mapService', op=com.hazelcast.map.impl.operation.SetOperation{identityHash=1267366088, serviceName='hz:impl:mapService', partitionId=39, replicaIndex=0, callId=5794633, invocationTime=1497298638468 (Mon Jun 12 16:17:18 EDT 2017), waitTimeout=-1, callTimeout=8000, name=key=account.policy.info-index-1, name=key=account.policy.info-index-1}, partitionId=39, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=8000, target=Address[192.168.123.144]:5701, backupsExpected=0, backupsCompleted=1, connection=Connection [/192.168.123.123:5701 -> /192.168.123.144:5988], endpoint=Address[192.168.123.144]:5701, alive=true, type=MEMBER}
java.util.concurrent.TimeoutException: Call Invocation{serviceName='hz:impl:mapService', op=com.hazelcast.spi.impl.operationservice.impl.operations.IsStillExecutingOperation{identityHash=501153906, serviceName='hz:impl:mapService', partitionId=-1, replicaIndex=0, callId=6003108, invocationTime=1497298657291 (Mon Jun 12 16:17:37 EDT 2017), waitTimeout=-1, callTimeout=5000}, partitionId=-1, replicaIndex=0, tryCount=0, tryPauseMillis=0, invokeCount=1, callTimeout=5000, target=Address[192.168.123.144]:5701, backupsExpected=0, backupsCompleted=0, connection=Connection [/192.168.123.123:5701 -> /192.168.123.144:5988], endpoint=Address[192.168.123.144]:5701, alive=true, type=MEMBER} encountered a timeout
                at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicationResponse(InvocationFuture.java:369)
                at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicationResponseOrThrowException(InvocationFuture.java:337)
                at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:225)
                at com.hazelcast.spi.impl.operationservice.impl.IsStillRunningService.isOperationExecuting(IsStillRunningService.java:85)

Jaromir Hamala

unread,
Jun 15, 2017, 2:28:13 AM6/15/17
to Hazelcast
Hello Manish,

Hazelcast 3.6.3 is rather old. We changed the invocation system and timeout detection significaly in 3.7. Can you please try the latest version?

Cheers,
Jaromir
Reply all
Reply to author
Forward
0 new messages