Error in reading cache with 2 million objects, 400 requests/second

Manish Kumar

unread,

Jul 17, 2017, 11:53:57 AM7/17/17

to Hazelcast

Hey Guys,

We have apprx 2 million distributed data objects(not replicated) in cache of 10 nodes cluster (apprx 500 MB data). Backup count is one. We are seeing given below errors/warnings.

Do you guys know when I can see these errors? I have sanitize some logs to not share something sensitive. Majority of time we do cache read(around 400 request/second), and whole cache gets reinitialized every 2 hours.

I know that we can do replicated cache to improve performance, but wondering what's wrong going on here.

Hazelcast version 3.6.3

Server size 8 core, 16 GB

Windows Server 2012 R2

IO Input thread count size is 30

IO Output thread count size is 50

This one :


2017-06-24 23:46:22.679 ERROR (hz._hzInstance_1_My-App.partition-operation.thread-5) [c.h.m.i.o.GetOperation] - [192.168.111.11]:5701 [My-App] [3.6.3] Cannot send response: HeapData{type=-2, hashCode=113248027, partitionHash=113248027, totalSize=722, dataSize=714, heapCost=742} to Address[192.168.111.13]:5701. Op: com.hazelcast.map.impl.operation.GetOperation{identityHash=1124265765, serviceName='hz:impl:mapService', partitionId=189, replicaIndex=0, callId=3490089, invocationTime=1498362385498 (Sat Jun 24 23:46:25 EDT 2017), waitTimeout=-1, callTimeout=8000, name=HKF/my-cache-id-3, name=HKF/my-cache-id-3}
com.hazelcast.spi.exception.ResponseNotSentException: Cannot send response: HeapData{type=-2, hashCode=113248027, partitionHash=113248027, totalSize=722, dataSize=714, heapCost=742} to Address[192.168.111.13]:5701. Op: com.hazelcast.map.impl.operation.GetOperation{identityHash=1124265765, serviceName='hz:impl:mapService', partitionId=189, replicaIndex=0, callId=3490089, invocationTime=1498362385498 (Sat Jun 24 23:46:25 EDT 2017), waitTimeout=-1, callTimeout=8000, name=HKF/my-cache-id-3, name=HKF/my-cache-id-3}
at com.hazelcast.spi.impl.operationservice.impl.RemoteInvocationResponseHandler.sendResponse(RemoteInvocationResponseHandler.java:54)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.sendResponse(OperationRunnerImpl.java:278)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.handleResponse(OperationRunnerImpl.java:251)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:173)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:393)
at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.processPacket(OperationThread.java:184)

And this one :



com.hazelcast.core.OperationTimeoutException: No response for 16000 ms. Aborting invocation! Invocation{serviceName='hz:impl:mapService', op=com.hazelcast.map.impl.operation.GetOperation{identityHash=168383579, serviceName='hz:impl:mapService', partitionId=104, replicaIndex=0, callId=12830925, invocationTime=1498362385742 (Sat Jun 24 23:46:25 EDT 2017), waitTimeout=-1, callTimeout=8000, name=my-cache-id-3, name=my-cache-id-3}, partitionId=104, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=8000, target=Address[192.168.211.64]:5701, backupsExpected=0, backupsCompleted=0, connection=Connection [/192.168.111.59:5701 -> /192.168.111.64:38030], endpoint=Address[192.168.111.64]:5701, alive=true, type=MEMBER} No response has been received!  backups-expected:0 backups-completed: 0
at com.hazelcast.spi.impl.operationservice.impl.Invocation.newOperationTimeoutException(Invocation.java:536)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.waitForResponse(InvocationFuture.java:277)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:224)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:204)
at com.hazelcast.map.impl.proxy.MapProxySupport.invokeOperation(MapProxySupport.java:320)
at com.hazelcast.map.impl.proxy.MapProxySupport.getInternal(MapProxySupport.java:250)
at com.hazelcast.map.impl.proxy.MapProxyImpl.get(MapProxyImpl.java:94)
at com.airwatch.seg.cache.provider.impl.HazelcastCacheProvider.getMemPolicy(HazelcastCacheProvider.java:84)

Ahmet Mircik

unread,

Jul 20, 2017, 6:06:34 AM7/20/17

to Hazelcast

Hi Manish,

Seems some nodes are leaving the cluster for example that ResponseNotSentException is thrown when invocation sender node leaves the cluster.

Also, if you can migrate to latest release, it would be better since there is some improvements in time-out logics.

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+unsubscribe@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at https://groups.google.com/group/hazelcast.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/9c23d96b-09e3-430c-a3b3-71ea1093d458%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

alpars...@gmail.com

unread,

Jul 20, 2017, 8:06:45 AM7/20/17

to Hazelcast

Hi Manish,

Your cluster seems facing issues when responding to your invocations. Can you please answer the following queries:

1. What kind of serialization do you use in your cluster?

2. Can you share your Hazelcast cluster configuration?

3. Can you share all cluster logs (surely by removing/replacing the sensitive data)?

4. The IO threads that you configured are so much, this can degrade your cluster performance. Can you try the same setup with default IO thread counts?