IndexOutOfBoundsException when members leave the cluster

85 views
Skip to first unread message

Milinda Perera

unread,
Feb 7, 2016, 1:54:13 PM2/7/16
to Hazelcast
Hi,
 
I have setup WSO2 DAS cluster. For clustering WSO2 products use hazelcast. I created cluster with 6 nodes. When I shut down two nodes (lets say node1, node2) from that cluster with around 3-5 seconds time gap, and I'm getting following error in all the other nodes (no errors shown in one node which is master node):

[2016-02-08 00:10:21,061]  INFO {org.wso2.carbon.analytics.dataservice.core.clustering.AnalyticsClusterManagerImpl} -  Retrying executing Check Group Member Removal Flow for : __ANALYTICS_INDEXING_GROUP__. Retry count : 13, Member : Member [192.168.1.2]:4004
[2016-02-08 00:11:21,067]  WARN {org.wso2.carbon.analytics.dataservice.core.clustering.AnalyticsClusterManagerImpl} -  Exception while executing the check Group Member Removal flow .. Index: 0, Size: 0
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
    at java.util.ArrayList.rangeCheck(ArrayList.java:635)
    at java.util.ArrayList.get(ArrayList.java:411)
    at com.hazelcast.collection.impl.list.ListContainer.get(ListContainer.java:64)
    at com.hazelcast.collection.impl.list.operations.ListGetOperation.run(ListGetOperation.java:42)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:137)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:315)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.processPacket(OperationThread.java:142)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.process(OperationThread.java:115)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.doRun(OperationThread.java:101)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.run(OperationThread.java:76)
    at ------ End remote and begin local stack-trace ------.(Unknown Source)
    at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicationResponse(InvocationFuture.java:384)
    at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicationResponseOrThrowException(InvocationFuture.java:334)
    at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:225)
    at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:204)
    at com.hazelcast.collection.impl.collection.AbstractCollectionProxyImpl.invoke(AbstractCollectionProxyImpl.java:231)
    at com.hazelcast.collection.impl.list.ListProxyImpl.get(ListProxyImpl.java:66)
    at org.wso2.carbon.analytics.dataservice.core.clustering.AnalyticsClusterManagerImpl.getLeader(AnalyticsClusterManagerImpl.java:184)
    at org.wso2.carbon.analytics.dataservice.core.clustering.AnalyticsClusterManagerImpl.checkLeader(AnalyticsClusterManagerImpl.java:142)
    at org.wso2.carbon.analytics.dataservice.core.clustering.AnalyticsClusterManagerImpl.executeCheckGroupMemberRemovalFlow(AnalyticsClusterManagerImpl.java:352)
    at org.wso2.carbon.analytics.dataservice.core.clustering.AnalyticsClusterManagerImpl.memberRemoved(AnalyticsClusterManagerImpl.java:407)
    at com.hazelcast.cluster.impl.ClusterServiceImpl.dispatchEvent(ClusterServiceImpl.java:1422)
    at com.hazelcast.cluster.impl.ClusterServiceImpl.dispatchEvent(ClusterServiceImpl.java:116)
    at com.hazelcast.spi.impl.eventservice.impl.LocalEventDispatcher.run(LocalEventDispatcher.java:63)
    at com.hazelcast.util.executor.StripedExecutor$Worker.process(StripedExecutor.java:190)
    at com.hazelcast.util.executor.StripedExecutor$Worker.run(StripedExecutor.java:174)
[2016-02-08 00:11:21,068]  INFO {org.wso2.carbon.analytics.dataservice.core.clustering.AnalyticsClusterManagerImpl} -  Retrying executing Check Group Member Removal Flow for : __ANALYTICS_INDEXING_GROUP__. Retry count : 14, Member : Member [192.168.1.2]:4004


After I starting up node1 and node2, in one of the nodes, I can see following exception continuously:

[2016-02-08 00:17:02,998] ERROR {com.hazelcast.collection.impl.list.operations.ListGetOperation} -  [192.168.1.2]:4004 [wso2.carbon.domain] [3.5.2] Index: 0, Size: 0
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
    at java.util.ArrayList.rangeCheck(ArrayList.java:635)
    at java.util.ArrayList.get(ArrayList.java:411)
    at com.hazelcast.collection.impl.list.ListContainer.get(ListContainer.java:64)
    at com.hazelcast.collection.impl.list.operations.ListGetOperation.run(ListGetOperation.java:42)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:137)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:315)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.processPacket(OperationThread.java:142)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.process(OperationThread.java:115)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.doRun(OperationThread.java:101)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.run(OperationThread.java:76)
[2016-02-08 00:17:03,290] ERROR {com.hazelcast.collection.impl.list.operations.ListGetOperation} -  [192.168.1.2]:4004 [wso2.carbon.domain] [3.5.2] Index: 0, Size: 0



What could be the reason? can it be configuration flaw?
Highly appreciate if you guys can give advise to solve this


Thanks,
Milinda

Jaromir Hamala

unread,
Feb 8, 2016, 4:18:15 AM2/8/16
to Hazelcast
Hello Milinda,

to me this looks like an issue in WS02 integration/configuration. Hazelcast maintains single backup copy of data by default -> it will survive 1 backup member going down. When 2 members are going down in a short time interval then a data-loss is possible. I *assume* data-loss triggers another problem in WS02 code - it calls `list.get(0)` an empty List -> this throws the IndexOutOfBoundsException Do you have an access to Hazelcast configuration? It's usually in hazelcast.xml, but WS02 could use some other name of even a programmatic configuration. 

Cheers,
Jaromir

Milinda Perera

unread,
Feb 9, 2016, 1:37:26 AM2/9/16
to Hazelcast
Hello Jarmir,

Thanks for the reply, 
Yes, configurations are made pragmatically, So I changed the source and increased backup-count, but still I can get that error when I shutdown two or more servers simultaneously. 

Thanks,
Milinda
Reply all
Reply to author
Forward
0 new messages