Hi
Hazelcast 3.2.1
Two nodes on separate Centos machines virtualized on Hyper-V
Java 7
Enabled icmp (read elsewhere that this is recommended to get faster node alive/dead confirmations).
My properties are
hz.icmp.enabled=true
hz.icmp.timeout=3000
hz.icmp.ttl=1
However, I get these messages in the log
2014-05-13 10:59:26,997 WARN com.hazelcast.cluster.ClusterService - [10.230.48.190]:5701 [qa-cluster] [3.2.1] Address[10.230.48.190]:5701 will ping Address[10.230.48.189]:5701
2014-05-13 10:59:27,002 WARN com.hazelcast.cluster.ClusterService - [10.230.48.190]:5701 [qa-cluster] [3.2.1] Address[10.230.48.190]:5701 couldn't ping Address[10.230.48.189]:5701
2014-05-13 10:59:28,139 WARN com.hazelcast.spi.OperationService - [10.230.48.190]:5701 [qa-cluster] [3.2.1] Member [10.230.48.189]:5701 has left cluster!
2014-05-13 11:02:18,959 WARN com.hazelcast.cluster.TcpIpJoiner - [10.230.48.190]:5701 [qa-cluster] [3.2.1] Address[10.230.48.190]:5701 is merging [tcp/ip] to Address[10.230.48.189]:5701
2014-05-13 11:02:18,960 WARN com.hazelcast.cluster.PrepareMergeOperation - [10.230.48.190]:5701 [qa-cluster] [3.2.1] Preparing to merge... Waiting for merge instruction...
2014-05-13 11:02:18,961 WARN com.hazelcast.cluster.MergeClustersOperation - [10.230.48.190]:5701 [qa-cluster] [3.2.1] Address[10.230.48.190]:5701 is merging to Address[10.230.48.189]:5701, because: instructed by master Address[10.230.48.190]:5701
This is strange as I am able to ping the address from the command line.
The rest interface also reports that both nodes are up and connected
Members [2] {
Member [10.230.48.190]:5701 this
Member [10.230.48.189]:5701
}
ConnectionCount: 11
AllConnectionCount: 5
When running two nodes on my local machine, I do not get this error.
This is a problem for the QA cluster.
Also, when running the test-cluster, which runs on the exact same virtual machines, with the same setup, I also do not get these warnings, which is kinda strange?
The logs also reports
2014-05-13 11:05:17,762 WARN com.hazelcast.spi.impl.BasicInvocation - [10.230.48.190]:5701 [qa-cluster] [3.2.1] Retrying invocation: BasicInvocation{ serviceName='hz:impl:mapService', op=com.hazelcast.map.operation.MapSizeOperation@8f40491, partitionId=270, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=110, callTimeout=60000, target=Address[10.230.48.190]:5701}, Reason: com.hazelcast.spi.exception.RetryableHazelcastException: Map is not ready!!!
2014-05-13 11:06:18,959 WARN com.hazelcast.cluster.TcpIpJoiner - [10.230.48.190]:5701 [qa-cluster] [3.2.1] Address[10.230.48.190]:5701 is merging [tcp/ip] to Address[10.230.48.189]:5701
2014-05-13 11:06:18,959 WARN com.hazelcast.cluster.PrepareMergeOperation - [10.230.48.190]:5701 [qa-cluster] [3.2.1] Preparing to merge... Waiting for merge instruction...
2014-05-13 11:06:18,959 WARN com.hazelcast.cluster.MergeClustersOperation - [10.230.48.190]:5701 [qa-cluster] [3.2.1] Address[10.230.48.190]:5701 is merging to Address[10.230.48.189]:5701, because: instructed by master Address[10.230.48.190]:5701
2014-05-13 11:06:19,780 WARN com.hazelcast.partition.InternalPartitionService - [10.230.48.190]:5701 [qa-cluster] [3.2.1] Owner of partition is being removed! Possible data loss for partition[0]. PartitionReplicaChangeEvent{partitionId=0, replicaIndex=0, oldAddress=Address[10.230.48.190]:5701, newAddress=null}
And sending callables to the cluster generates a lot of "Map is not ready!!!" exceptions and "Owner of partition is being removed! Possible data loss for partition"
And it finishes with
2014-05-13 11:06:31,863 ERROR com.hazelcast.cluster.ClusterService - [10.230.48.190]:5701 [qa-cluster] [3.2.1] While merging...
java.util.concurrent.ExecutionException: com.hazelcast.core.HazelcastException: java.lang.ClassNotFoundException: LATEST_UPDATE
at java.util.concurrent.FutureTask.report(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at com.hazelcast.cluster.ClusterServiceImpl.waitOnFutureInterruptible(ClusterServiceImpl.java:675)
at com.hazelcast.cluster.ClusterServiceImpl.access$600(ClusterServiceImpl.java:86)
at com.hazelcast.cluster.ClusterServiceImpl$6.run(ClusterServiceImpl.java:656)
at com.hazelcast.instance.LifecycleServiceImpl.runUnderLifecycleLock(LifecycleServiceImpl.java:103)
at com.hazelcast.cluster.ClusterServiceImpl.merge(ClusterServiceImpl.java:629)
at com.hazelcast.cluster.MergeClustersOperation.run(MergeClustersOperation.java:54)
at com.hazelcast.spi.impl.BasicOperationService.processOperation(BasicOperationService.java:363)
at com.hazelcast.spi.impl.BasicOperationService.runOperation(BasicOperationService.java:228)
at com.hazelcast.cluster.AbstractJoiner.startClusterMerge(AbstractJoiner.java:256)
at com.hazelcast.cluster.TcpIpJoiner.searchForOtherClusters(TcpIpJoiner.java:472)
at com.hazelcast.cluster.SplitBrainHandler.searchForOtherClusters(SplitBrainHandler.java:47)
at com.hazelcast.cluster.SplitBrainHandler.run(SplitBrainHandler.java:37)
at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:186)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
at com.hazelcast.util.executor.PoolExecutorThreadFactory$ManagedThread.run(PoolExecutorThreadFactory.java:59)
Caused by: com.hazelcast.core.HazelcastException: java.lang.ClassNotFoundException: LATEST_UPDATE
at com.hazelcast.util.ExceptionUtil.rethrow(ExceptionUtil.java:45)
at com.hazelcast.map.MapService.getMergePolicy(MapService.java:268)
at com.hazelcast.map.MapService$Merger.run(MapService.java:289)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at com.hazelcast.util.executor.CompletableFutureTask.run(CompletableFutureTask.java:57)
... 5 more
Caused by: java.lang.ClassNotFoundException: LATEST_UPDATE
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at com.hazelcast.nio.ClassLoaderUtil.loadClass(ClassLoaderUtil.java:113)
at com.hazelcast.nio.ClassLoaderUtil.newInstance(ClassLoaderUtil.java:63)
at com.hazelcast.map.MapService.getMergePolicy(MapService.java:264)
... 9 more
When disabling icmp, all of these errors are gone