tar -xzf combined_file.tgz
2016.03.17 23:44:46.952 DEBUG ClusterVNodeControll] There is NO MASTER or MASTER is DEAD according to GOSSIP. Starting new elections.checking gossip at 192.168.147.212:2113
Checking for 3 nodes
Checking nodes for IsAlive state
CheckGossip CRITICAL: Only 2 alive nodes, should be 3 alive
Host: ggh-d-evtapp02
Timestamp: 2016-03-17 23:46:29 +0000
Address: Â 192.168.147.212
checking gossip at 192.168.147.213:2113
Checking for 3 nodes
Checking nodes for IsAlive state
CheckGossip CRITICAL: Only 2 alive nodes, should be 3 alive
Host: ggh-d-evtapp03Timestamp: 2016-03-17 23:46:29 +0000
Address: 192.168.147.213checking gossip at 192.168.147.211:2113
Checking for 3 nodes
Checking nodes for IsAlive state
CheckGossip CRITICAL: Only 1 alive nodes, should be 3 alive
Host: ggh-d-evtapp01
Timestamp: 2016-03-17 23:46:29 +0000
Address: 192.168.147.211
nodes found with states:
<State>PreReplica</State> when expected Master or Slave.
checking gossip at 192.168.147.212:2113
Checking for 3 nodes
Checking nodes for IsAlive state
Checking for exactly 1 master
Checking node state
Host: ggh-d-evtapp02
Timestamp: 2016-03-18 00:01:29 +0000
Address: 192.168.147.212
Status: Â WARNING
nodes found with states:
<State>PreReplica</State> when expected Master or Slave.
checking gossip at 192.168.147.213:2113
Checking for 3 nodes
Checking nodes for IsAlive state
Checking for exactly 1 master
Checking node state
Host: ggh-d-evtapp03
Timestamp: 2016-03-18 00:01:29 +0000
Address: 192.168.147.213
Status: Â WARNINGchecking gossip at 192.168.147.211:2113
CheckGossip CRITICAL: Could not connect to http://192.168.147.211:2113/gossip?format=xml to check gossip, has event store fallen over on this node?
Host: ggh-d-evtapp01.dev.green.net
Timestamp: 2016-03-18 00:01:29 +0000
Address: Â 192.168.147.211
Status: Â CRITICALRecovers at 00:10:29
Repeats at 00:11:29
Recovers at 00:38:20
Repeats at 01:17:29
Continues flapping thereafter until 20:01 where node 211 no longer recovers, and again gets into a Time difference error state. This continues until 20:42 when it reports:
[PID:16535:017 2016.03.18 20:42:14.809 ERROR GossipController   ] Received as POST invalid ClusterInfo from [http://192.168.147.211:2113/gossip]. Content-Type: application/json.
[PID:16535:017 2016.03.18 20:42:14.809 ERROR GossipController   ] Received as POST invalid ClusterInfo from [http://192.168.147.211:2113/gossip]. Body: .
This continues untilÂ
[PID:16535:007 2016.03.18 20:47:22.051 ERROR JsonCodec      ]
During this time, memory use was ballooning, and then
[PID:16535:018 2016.03.18 22:21:00.433 DEBUG MonitoringService  ] Could not get free mem on linux, received memory info raw string: []# uptime
 15:15:19 up 150 days,  4:16,  2 users,  load average: 0.81, 0.89, 0.68
# /opt/sensu/embedded/bin/metrics-net.rb
ggh-d-evtapp01.net.eth0.tx_packets 358948797 1459520042
ggh-d-evtapp01.net.eth0.rx_packets 409045650 1459520042
ggh-d-evtapp01.net.eth0.tx_bytes 117129928809 1459520042
ggh-d-evtapp01.net.eth0.rx_bytes 125511090261 1459520042
ggh-d-evtapp01.net.eth0.tx_errors 0 1459520042
ggh-d-evtapp01.net.eth0.rx_errors 0 1459520042
ggh-d-evtapp01.net.eth0.if_speed 10000 1459520042
[root@ggh-d-evtapp01:2016-03-31]# netstat -sIp:  375175628 total packets received  0 forwarded  0 incoming packets discarded  375162614 incoming packets delivered  359260474 requests sent out  5 outgoing packets droppedIcmp:  31625 ICMP messages received  116 input ICMP message failed.  ICMP input histogram:    destination unreachable: 2382    echo requests: 29243  30821 ICMP messages sent  0 ICMP messages failed  ICMP output histogram:    destination unreachable: 1586    echo replies: 29235IcmpMsg:    InType3: 2382    InType8: 29243    OutType0: 29235    OutType3: 1586Tcp:  1313017 active connections openings  748250 passive connection openings  138137 failed connection attempts  374700 connection resets received  21 connections established  353046858 segments received  362162781 segments send out  2510208 segments retransmited  527276 bad segments received.  996872 resets sentUdp:  7202637 packets received  249 packets to unknown port received.  0 packet receive errors  7207436 packets sentUdpLite:TcpExt:  166485 invalid SYN cookies received  120761 resets received for embryonic SYN_RECV sockets  712704 TCP sockets finished time wait in fast timer  11972772 delayed acks sent  4321 delayed acks further delayed because of locked socket  Quick ack mode was activated 290061 times  4584990 times the listen queue of a socket overflowed  4584990 SYNs to LISTEN sockets dropped  3738 packets directly queued to recvmsg prequeue.  333960446 bytes directly in process context from backlog  3023346 bytes directly received in process context from prequeue  108203159 packet headers predicted  480190 packets header predicted and directly queued to user  41099609 acknowledgments not containing data payload received  108090553 predicted acknowledgments  304 times recovered from packet loss by selective acknowledgements  56 congestion windows recovered without slow start by DSACK  3707 congestion windows recovered without slow start after partial ack  TCPLostRetransmit: 26  51 timeouts after SACK recovery  3 timeouts in loss state  1558 fast retransmits  295 forward retransmits  16805 retransmits in slow start  1672168 other TCP timeouts  TCPLossProbes: 355970  TCPLossProbeRecovery: 212066  16 SACK retransmits failed  290101 DSACKs sent for old packets  1 DSACKs sent for out of order packets  241161 DSACKs received  1 DSACKs for out of order packets received  54257 connections reset due to unexpected data  202077 connections reset due to early user close  15591 connections aborted due to timeout  TCPDSACKIgnoredNoUndo: 159022  TCPSpuriousRTOs: 182  TCPSackShifted: 643  TCPSackMerged: 1418  TCPSackShiftFallback: 6697  TCPRetransFail: 39  TCPRcvCoalesce: 44012879  TCPOFOQueue: 1064  TCPOFOMerge: 1  TCPChallengeACK: 536811  TCPSYNChallenge: 535124  TCPSpuriousRtxHostQueues: 12215IpExt:  InNoRoutes: 2  InMcastPkts: 739  InBcastPkts: 14880564  InOctets: -1302133305  OutOctets: 1201931353  InMcastOctets: 26604  InBcastOctets: 1679428203  InNoECTPkts: 374870674  InECT0Pkts: 304957
  4584990 times the listen queue of a socket overflowed  4584990 SYNs to LISTEN sockets dropped  3266737 times the listen queue of a socket overflowed  3266737 SYNs to LISTEN sockets dropped  736787 times the listen queue of a socket overflowed  736787 SYNs to LISTEN sockets droppedThis is listen sockets and listen queue (eg accepting connections).
This would not explain TCP connections all dying at the same time.
How long has your ES nodes been running (estimate on the machine)? And
what are your uptimes?
211# uptime
 15:15:19 up 150 days,  4:16,  2 users,  load average: 0.81, 0.89, 0.68
eventst+  9556 16.7 13.4 1736832 545916 ?    Ssl  Mar20 2838:36 /usr/bin/eventstored
212# uptime
 16:08:21 up 151 days,  4:55,  1 user,  load average: 0.18, 0.31, 0.27
eventst+ 29897 14.4 12.8 1717196 519212 ?    Ssl  Mar20 2445:11 /usr/bin/eventstored
213# uptime
 16:10:32 up 151 days,  4:57,  1 user,  load average: 0.51, 0.53, 0.48
eventst+ 25009 15.3 13.0 1713800 530200 ?    Ssl  Mar31 189:04 /usr/bin/eventstored
[PID:17377:035 2016.03.31 18:38:09.661 ERROR QueuedHandlerThreadP] Error while processing message EventStore.Core.Messages.SystemMessage+BecomeShuttingDown in queued handler 'Subscriptions'.System.InvalidOperationException: out of sync at System.Collections.Generic.Dictionary`2+Enumerator[System.Guid,EventStore.Core.Services.SubscriptionsService+Subscription].VerifyState () [0x00000] in <filename unknown>:0 at System.Collections.Generic.Dictionary`2+Enumerator[System.Guid,EventStore.Core.Services.SubscriptionsService+Subscription].MoveNext () [0x00000] in <filename unknown>:0 at System.Collections.Generic.Dictionary`2+ValueCollection+Enumerator[System.Guid,EventStore.Core.Services.SubscriptionsService+Subscription].MoveNext () [0x00000] in <filename unknown>:0 at EventStore.Core.Services.SubscriptionsService.Handle (EventStore.Core.Messages.BecomeShuttingDown message) [0x00000] in <filename unknown>:0 at EventStore.Core.Bus.MessageHandler`1[EventStore.Core.Messages.SystemMessage+BecomeShuttingDown].TryHandle (EventStore.Core.Messaging.Message message) [0x00000] in <filename unknown>:0 at EventStore.Core.Bus.InMemoryBus.Publish (EventStore.Core.Messaging.Message message) [0x00000] in <filename unknown>:0 at EventStore.Core.Bus.InMemoryBus.Handle (EventStore.Core.Messaging.Message message) [0x00000] in <filename unknown>:0 at EventStore.Core.Bus.QueuedHandlerThreadPool.ReadFromQueue (System.Object o) [0x00000] in <filename unknown>:0
checking gossip at 192.168.147.211:2113
CheckGossip CRITICAL: Could not connect to http://192.168.147.211:2113/gossip?format=xml to check gossip, has event store fallen over on this node?
Host: ggh-d-evtapp01.dev.green.net
Timestamp: 2016-03-18 00:01:29 +0000
Address: Â 192.168.147.211
Status: Â CRITICAL
The root cause of the blocking was WebRequest blocking. This is also
to be resolved in 3.6.0 with a move to httpclient.