Agressive Reconnections leads to OOM - Hz 4.2.1

15 views
Skip to first unread message

Juan Garcia-Matos

unread,
Sep 24, 2021, 10:04:40 AM9/24/21
to Hazelcast

Hi Folks, we are running a 3 node Hazelcast cluster in K8s via the standard helm chart, we have ISTIO enable in our K8s, and we have found that from time to time Hazelcast members disconnect and the whole re-connection is very aggressive leading at some point to OOM, that gets also shown in our Grafana dashboard showing an steady increase of the used heap for the hazelcast cluster.

We have found in our ELK around ~30k entries in an interval of 30secs.

Currently I am struggling to get a memory heapdumps because the helm chart is not recognizing the JAVA_OPTS parameters as describe here

Any hint is appreciated.

>>>>
2021-09-24 12:03:29,287 [ [32m INFO [m] [ [36mhz.boring_clarke.IO.thread-in-2 [m] [ [34mc.h.i.s.t.TcpServerConnection [m]: [10.0.5.146]:5701 [dev] [4.2.1] Initialized new cluster connection between /10.0.5.146:55547 and /10.0.70.177:5701

2021-09-24 12:03:29,287 [ [32m INFO [m] [ [36mhz.boring_clarke.IO.thread-in-2 [m] [ [34mc.h.i.s.t.TcpServerConnection [m]: [10.0.5.146]:5701 [dev] [4.2.1] Initialized new cluster connection between /10.0.5.146:57197 and /10.0.70.177:5701

2021-09-24 12:03:29,285 [ [32m INFO [m] [ [36mhz.naughty_chaplygin.IO.thread-in-2 [m] [ [34mc.h.i.s.t.TcpServerConnection [m]: [10.0.70.177]:5701 [dev] [4.2.1] Connection[id=100632, /127.0.0.1:5701->/127.0.0.1:43766, qualifier=null, endpoint=[10.0.5.146]:5701, alive=false, connectionType=MEMBER, planeIndex=0] closed. Reason: Connection closed by the other side
......
<<<<

Reply all
Reply to author
Forward
0 new messages