Hazelcast IMap Dirty Entries and Locks

225 views
Skip to first unread message

Thad Truman

unread,
Feb 14, 2020, 5:17:18 PM2/14/20
to haze...@googlegroups.com

Hi All,

 

We are using Hazelcast version 3.12.3.  We have an IMap that has a 24 hour TTL enabled on the map config.  The process that is populating the map is calling setAsync(key, value), and is doing this about 50k times a second.  The Hazelcast cluster is experiencing some GC/Memory issues.  We were looking at the Management Center to see if we could get some clues as to what could be the problem and we noticed that the number of dirty entries for the map is over 350 million and the number of locks is 390 million, with only 4 million records total in the map.  We aren’t doing any locks in our client code and we aren’t using a MapStore which, from my understanding, is the only time dirty entries would come into play.  Any idea why the number of dirty entries and locks is so high?

 

Thanks,

Thad

Sharath Sahadevan

unread,
Feb 17, 2020, 12:35:31 PM2/17/20
to Hazelcast
Hi Thad,

  If you could share your code( remove any proprietary aspects ) to reproduce the issue  and the Hazelcast config and options you use to run the Hazelcast member or client , happy to look into it.

On your end you can turn on,  additional diagnostics in a non-prod environment by using -Dhazelcast.diagnostics.enabled=true when you run the Hazelcast member or client.
Thanks,
Sharath

Lucas Beeler

unread,
Feb 25, 2020, 4:47:01 PM2/25/20
to haze...@googlegroups.com
Hi Thad,

Oftentimes, this issue is due to an insufficient number of physical CPU cores on the host machine. This is, more or less, a case of thread starvation. This becomes a little more clear if you understand how cache expiration works in Hazelcast. When a cache entry times-out under a TTL policy, it is not immediately removed. Indeed, the reason why we don't immediately and synchronously remove cache entries the moment they time-out is that it would incur CPU overhead that could impair the performance of client-initiated get( ) and put( ) operations (and these are what most users care about in terms of latency).

Instead, in Hazelcast, removal after TTL time-out happens asynchronously. There is a "reaper" thread that runs in the background that is responsible for actually removing timed-out entries and then updating cache metadata and query indexes in response to the entry removal. If you're short on physical CPU cores, this thread has comparatively fewer chances of being scheduled, especially when your partition threads are slammed with 50k set( ) operations per second.

I would try moving to higher core count physical servers or VMs and try re-running your tests. In the future, we may introduce more sophisticated tuning parameters to control the number of reaper threads, etc. but they're just not in the product right now.

Take care,
Lucas

This message contains confidential information and is intended only for the individuals named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required, please request a hard-copy version. -Hazelcast

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/858dcc94-0468-469a-9292-5816a2a6a08b%40googlegroups.com.


--
Lucas BEELER
Senior Solutions Architect

This message contains confidential information and is intended only for the individuals named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required, please request a hard-copy version. -Hazelcast

Emre Aydın

unread,
Feb 26, 2020, 12:54:35 AM2/26/20
to Hazelcast
Hi Thad,

Sorry for the late answer but this just caught my attention today after Lucas' answer. Hope this answer still helps you resolve your issue and not too late. You haven't mentioned explicitly but I believe you are using Management Center 3.12.3. We had a UI bug in 3.12.3 where map detail table had wrong column header labels. You can use the latest patch version of Management Center in 3.12.z line which is 3.12.8. It is compatible with Hazelcast 3.12.3. You can download it from here: https://download.hazelcast.com/management-center/hazelcast-management-center-3.12.8.zip

Regards,
Emre
Reply all
Reply to author
Forward
0 new messages