Cluster nodes connection loss

Paolo Di Tommaso

unread,

Mar 24, 2014, 5:08:21 PM3/24/14

to haze...@googlegroups.com

Dear all,

I'm setting up an Hazelcast cluster in a cloud based environment.

I've noted that quite often some nodes suddenly lose the connection and after few seconds they reconnect to the cluster.

I think that it could be due to high latencies in the cloud network. What is your experience about that? Is there a suggested configuration for a cloud environment?

Below you can find the node log trace.

Thanks,

Paolo

Mar-24 12:40:03.507 [hz._hzInstance_1_nextflow.IO.thread-in-0] INFO com.hazelcast.nio.TcpIpConnection - [172.16.1.115]:5701 [nextflow] Connection [Address[172.16.1.92]:5701] lost. Reason: java.io.IOException[Connection reset by peer]

Mar-24 12:40:03.507 [hz._hzInstance_1_nextflow.IO.thread-out-0] WARN com.hazelcast.nio.WriteHandler - [172.16.1.115]:5701 [nextflow] hz._hzInstance_1_nextflow.IO.thread-out-0 Closing socket to endpoint Address[172.16.1.92]:5701, Cause:java.nio.channels.ClosedChannelException

Mar-24 12:40:03.508 [hz._hzInstance_1_nextflow.IO.thread-in-0] WARN com.hazelcast.nio.ReadHandler - [172.16.1.115]:5701 [nextflow] hz._hzInstance_1_nextflow.IO.thread-in-0 Closing socket to endpoint Address[172.16.1.92]:5701, Cause:java.io.IOException: Connection reset by peer

Mar-24 12:40:03.739 [hz._hzInstance_1_nextflow.cached.thread-1] INFO com.hazelcast.nio.SocketConnector - [172.16.1.115]:5701 [nextflow] Connecting to /172.16.1.92:5701, timeout: 0, bind-any: false

Mar-24 12:40:03.741 [hz._hzInstance_1_nextflow.cached.thread-1] INFO com.hazelcast.nio.SocketConnector - [172.16.1.115]:5701 [nextflow] Could not connect to: /172.16.1.92:5701. Reason: SocketException[Connection refused to address /172.16.1.92:5701]

Mar-24 12:40:04.739 [hz._hzInstance_1_nextflow.cached.thread-1] INFO com.hazelcast.nio.SocketConnector - [172.16.1.115]:5701 [nextflow] Connecting to /172.16.1.92:5701, timeout: 0, bind-any: false

Mar-24 12:40:04.740 [hz._hzInstance_1_nextflow.cached.thread-1] INFO com.hazelcast.nio.SocketConnector - [172.16.1.115]:5701 [nextflow] Could not connect to: /172.16.1.92:5701. Reason: SocketException[Connection refused to address /172.16.1.92:5701]

Mar-24 12:40:05.740 [hz._hzInstance_1_nextflow.cached.thread-2] INFO com.hazelcast.nio.SocketConnector - [172.16.1.115]:5701 [nextflow] Connecting to /172.16.1.92:5701, timeout: 0, bind-any: false

Mar-24 12:40:05.741 [hz._hzInstance_1_nextflow.cached.thread-2] INFO com.hazelcast.nio.SocketConnector - [172.16.1.115]:5701 [nextflow] Could not connect to: /172.16.1.92:5701. Reason: SocketException[Connection refused to address /172.16.1.92:5701]

Mar-24 12:40:05.741 [hz._hzInstance_1_nextflow.cached.thread-2] WARN com.hazelcast.nio.ConnectionMonitor - [172.16.1.115]:5701 [nextflow] Removing connection to endpoint Address[172.16.1.92]:5701 Cause => java.net.SocketException {Connection refused to address /172.16.1.92:5701}, Error-Count: 5

Mar-24 12:40:05.742 [hz._hzInstance_1_nextflow.cached.thread-3] INFO com.hazelcast.cluster.ClusterService - [172.16.1.115]:5701 [nextflow] Master Address[172.16.1.92]:5701 left the cluster. Assigning new master Member [172.16.1.115]:5701 this

Mar-24 12:40:05.742 [hz._hzInstance_1_nextflow.cached.thread-3] INFO com.hazelcast.cluster.ClusterService - [172.16.1.115]:5701 [nextflow] Removing Member [172.16.1.92]:5701

Mar-24 12:40:05.765 [hz._hzInstance_1_nextflow.cached.thread-3] INFO com.hazelcast.cluster.ClusterService - [172.16.1.115]:5701 [nextflow]

Members [1] {

Member [172.16.1.115]:5701 this

}

Mar-24 12:40:05.765 [hz._hzInstance_1_nextflow.migration] INFO c.h.partition.PartitionService - [172.16.1.115]:5701 [nextflow] Partition balance is ok, no need to re-partition cluster data...

Mar-24 12:40:05.772 [hz._hzInstance_1_nextflow.cached.thread-4] INFO nextflow.executor.HzDaemon - Nextflow cluster member remove: Member [172.16.1.92]:5701

Mar-24 12:40:06.777 [hz._hzInstance_1_nextflow.IO.thread-Acceptor] INFO com.hazelcast.nio.SocketAcceptor - [172.16.1.115]:5701 [nextflow] Accepting socket connection from /172.16.1.92:36418

Mar-24 12:40:06.777 [hz._hzInstance_1_nextflow.IO.thread-Acceptor] INFO c.h.nio.TcpIpConnectionManager - [172.16.1.115]:5701 [nextflow] 5701 accepted socket connection from /172.16.1.92:36418

Mar-24 12:40:12.267 [hz._hzInstance_1_nextflow.cached.thread-5] INFO nextflow.executor.HzDaemon - Nextflow cluster member added: Member [172.16.1.92]:5701

Mar-24 12:40:12.268 [hz._hzInstance_1_nextflow.cached.thread-4] INFO com.hazelcast.cluster.ClusterService - [172.16.1.115]:5701 [nextflow]

Members [2] {

Member [172.16.1.115]:5701 this

Member [172.16.1.92]:5701

}

Paolo Di Tommaso

unread,

Apr 2, 2014, 11:15:50 AM4/2/14

to haze...@googlegroups.com

Hi all,

I'm testing a small Hazelcast cluster (both 3.16 and 3.2) in a cloud environment and I've noticed that nodes disconnect and reconnect quite frequently.

Nobody has idea why this happen? Is there any "magic" tuning required for the cloud (I suspect that it can be caused by high latencies in the network) ?

Thanks,

Paolo

Enes Akar

unread,

Apr 2, 2014, 4:12:12 PM4/2/14

to haze...@googlegroups.com

Which instance types do you use? For some types amazon does not guarantee a high quality internal network

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at http://groups.google.com/group/hazelcast.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/58b8af6e-6bca-46b4-8756-c47f7e1afab3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Enes Akar

unread,

Apr 2, 2014, 4:13:07 PM4/2/14

to haze...@googlegroups.com

Sorry I assumed you use aws. Do you?

Paolo Di Tommaso

unread,

Apr 2, 2014, 4:26:26 PM4/2/14

to haze...@googlegroups.com

Hi, well no. I'm using "medium" instances on this platform https://www.opensciencedatacloud.org

In the case it is a latency in the network, is there any configuration property that can be tuned ?

Moreover I've noticed on the client side this warning, at the same time I'm losing some some items in my distributed data structure.

Apr-02 10:39:20.298 [InSelector] WARN c.h.c.c.nio.ClientConnection - Connection [/172.16.1.133:5701] lost. Reason: java.io.IOException[Connection reset by peer]

Apr-02 10:39:20.299 [InSelector] WARN c.h.c.c.nio.ClientReadHandler - InSelector Closing socket to endpoint Address[172.16.1.133]:5701, Cause:java.io.IOException: Connection reset by peer

Apr-02 10:39:34.868 [InSelector] WARN c.h.c.c.nio.ClientConnection - Connection [/172.16.1.159:5701] lost. Reason: java.io.EOFException[Remote socket closed!]

Apr-02 10:39:34.869 [InSelector] WARN c.h.c.c.nio.ClientReadHandler - InSelector Closing socket to endpoint Address[172.16.1.159]:5701, Cause:java.io.EOFException: Remote socket closed!

Thanks for helping.

Cheers,

Paolo

You received this message because you are subscribed to a topic in the Google Groups "Hazelcast" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hazelcast/wFrlBjHca1o/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hazelcast+...@googlegroups.com.

To post to this group, send email to haze...@googlegroups.com.
Visit this group at http://groups.google.com/group/hazelcast.

To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/CAFDfRCwg1UX_dTg-Szfj%2BmWbrOPzmCsk5LBv3xCEk%2BiT3qFjuQ%40mail.gmail.com.

besui...@gmail.com

unread,

Apr 29, 2016, 5:22:46 AM4/29/16

to Hazelcast

Hi Paolo,

Did you ever get the root cause for this issue? We have been using hazelcast 3.2.3 for our application and we observer the same issue, where the client disconnects and the reconnects every 2 hours. We are hosting our application on linux servers:

[4/28/16 12:35:37:209 MST] 00000056 ClientConnect W com.hazelcast.client.connection.nio.ClientConnection Connection [/xx.xx.xx.136:5701] lost. Reason: java.io.EOFException[Remote socket closed!]

[4/28/16 12:35:37:210 MST] 00000056 ClientReadHan W com.hazelcast.client.connection.nio.ClientReadHandler InSelector Closing socket to endpoint Address[xx.xx.xx.136]:5701, Cause:java.io.EOFExcepti

on: Remote socket closed!

[4/28/16 12:35:37:221 MST] 00000058 ClientConnect W com.hazelcast.client.connection.nio.ClientConnection Connection [/xx.xx.xx.136:5701] lost. Reason: java.io.EOFException[Remote socket closed!]

[4/28/16 12:35:37:222 MST] 00000058 ClientReadHan W com.hazelcast.client.connection.nio.ClientReadHandler InSelector Closing socket to endpoint Address[xx.xx.xx.136]:5701, Cause:java.io.EOFExcepti

on: Remote socket closed!

[4/28/16 12:35:38:161 MST] 00000055 ClientConnect W com.hazelcast.client.connection.nio.ClientConnection Connection [/xx.xx.xx.136:5701] lost. Reason: java.io.EOFException[Remote socket closed!]

[4/28/16 12:35:38:162 MST] 00000055 ClientReadHan W com.hazelcast.client.connection.nio.ClientReadHandler InSelector Closing socket to endpoint Address[xx.xx.xx.136]:5701, Cause:java.io.EOFExcepti

on: Remote socket closed!

[4/28/16 12:36:52:165 MST] 00000050 ClientCluster W com.hazelcast.client.spi.ClientClusterService Error while listening cluster events! -> ClientConnection{live=true, writeHandler=com.hazelcast.cl

ient.connection.nio.ClientWriteHandler@44184418, readHandler=com.hazelcast.client.connection.nio.ClientReadHandler@43e543e5, connectionId=504, socketChannel=DefaultSocketChannelWrapper{socketChanne

l=java.nio.channels.SocketChannel[connected local=/yy.yy.y.133:40888 remote=/xx.xx.xx.136:5701]}, remoteEndpoint=Address[xx.xx.xx.136]:5701}, Error: java.io.IOException: Connection timed out

[4/28/16 12:36:52:167 MST] 00000050 ClientConnect W com.hazelcast.client.connection.nio.ClientConnection Connection [null] lost. Reason: Socket explicitly closed

[4/28/16 12:36:52:167 MST] 00000050 LifecycleServ I com.hazelcast.core.LifecycleService HazelcastClient[hz.client_7_hzadmin][3.2.3] is CLIENT_DISCONNECTED

[4/28/16 12:36:52:173 MST] 00000052 ClientCluster W com.hazelcast.client.spi.ClientClusterService Error while listening cluster events! -> ClientConnection{live=true, writeHandler=com.hazelcast.cl

ient.connection.nio.ClientWriteHandler@401e401e, readHandler=com.hazelcast.client.connection.nio.ClientReadHandler@3feb3feb, connectionId=538, socketChannel=DefaultSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[connected local=/yy.yy.y.133:36997 remote=/xx.xx.xx.138:5702]}, remoteEndpoint=Address[xx.xx.xx.138]:5702}, Error: java.io.IOException: Connection timed out

[4/28/16 12:36:52:174 MST] 00000052 ClientConnect W com.hazelcast.client.connection.nio.ClientConnection Connection [null] lost. Reason: Socket explicitly closed

[4/28/16 12:36:52:175 MST] 00000052 LifecycleServ I com.hazelcast.core.LifecycleService HazelcastClient[hz.client_6_hzadmin][3.2.3] is CLIENT_DISCONNECTED

[4/28/16 12:36:53:118 MST] 00000051 ClientCluster W com.hazelcast.client.spi.ClientClusterService Error while listening cluster events! -> ClientConnection{live=true, writeHandler=com.hazelcast.client.connection.nio.ClientWriteHandler@2240224, readHandler=com.hazelcast.client.connection.nio.ClientReadHandler@1f101f1, connectionId=507, socketChannel=DefaultSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[connected local=/yy.yy.y.133:42665 remote=/xx.xx.xx.139:5702]}, remoteEndpoint=Address[xx.xx.xx.139]:5702}, Error: java.io.IOException: Connection timed out

[4/28/16 12:36:53:119 MST] 00000051 ClientConnect W com.hazelcast.client.connection.nio.ClientConnection Connection [null] lost. Reason: Socket explicitly closed

[4/28/16 12:36:53:120 MST] 00000051 LifecycleServ I com.hazelcast.core.LifecycleService HazelcastClient[hz.client_8_hzadmin][3.2.3] is CLIENT_DISCONNECTED

[4/28/16 12:36:53:236 MST] 00000052 LifecycleServ I com.hazelcast.core.LifecycleService HazelcastClient[hz.client_6_hzadmin][3.2.3] is CLIENT_CONNECTED

[4/28/16 12:36:53:236 MST] 00000050 LifecycleServ I com.hazelcast.core.LifecycleService HazelcastClient[hz.client_7_hzadmin][3.2.3] is CLIENT_CONNECTED

[4/28/16 12:36:53:237 MST] 00000052 ClientCluster I com.hazelcast.client.spi.ClientClusterService

Members [8] {

Member [xx.xx.xx.136]:5701

Member [xx.xx.xx.136]:5702

Member [xx.xx.xx.137]:5701

Member [xx.xx.xx.137]:5702

Member [xx.xx.xx.138]:5701

Member [xx.xx.xx.138]:5702

Member [xx.xx.xx.139]:5701

Member [xx.xx.xx.139]:5702

}

Reply all

Reply to author

Forward