HZMC metrics ttl

19 views
Skip to first unread message

Mikhail Alekseev

unread,
Jan 16, 2023, 4:08:43 PM1/16/23
to Hazelcast
Hi!

We use Hazelcast as external distributed cache. Our cluster contains 3 members and 3 clients.

We have discovered regression of latency of our requests on load tests during migration to Hazelcast 4.2.6: latency increased in 3-4 times compared to values on Hazelcast 3 after some time of the load scenario run.
Also old gen of Hazelcast JVM was growing rapidly from the very beginning of the load scenario run.

Then we decided to analyse Hazelcast heap dump and found that there were a lot of MetricDescriptorImpl instances and big byte[] referenced from some internal metrics classes of Hazelcast (I suppose these byte[] are compressed metrics).

Finally, we found there was an issue in launch configuration of hzmc which led to hzmc reboot every hour.
After we'd fixed it, latency normalised and now looks much closer to values on Hazelcast 3.

So, my assumption is that if hzmc crashes for some reason metrics can't be consumed by anybody, so they are being accumulated which leads to the old gen growth and high gc pressure and consequently to the growth of latency of Hazelcast operations execution.
Am I right?

If so, is it intended behaviour?
Is there kind of ttl for metrics which could prevent latency growth in case of hzmc crash?

Our hazelcast.xml metrics section:

<metrics enabled="true">
  <management-center enabled="true">
    <retention-seconds>5</retention-seconds>
  </management-center>
  <jmx enabled="true"/>
  <collection-frequency-seconds>5</collection-frequency-seconds>
</metrics>

Thanks,
Mikhail.
Reply all
Reply to author
Forward
0 new messages