Hazelcast goes OOM due to a large number of entrySet() calls

193 views
Skip to first unread message

Dinesh Babu K G

unread,
Apr 28, 2017, 5:24:40 AM4/28/17
to Hazelcast
Hi,

We have a 7 node hazelcast cluster (v3.5) with 16GB as heap size per node. Our backup config is 1 sync & 1 async.

We have about 250+ clients accessing the heaviest map mostly (with 80,000 values ~ 500MB total map size). They use a custom eager cache client which calls entrySet() to get the snapshot on startup and then receives Map changes via Listeners after.

The trouble is whenever these clients (web-servers) are restarted in batches (of 50, 75 etc.,), all entrySet() calls come in parallel causing our Hazelcast to go OOM.

For now, we have asked the clients to not to parallel restarts (or more than 10 instances) but we want to solve this problem the right way for the long term & in a scalable way.

We thought of using,
1. PagingPredicate - My understanding is that it will create the entire Predicate's snapshot in memory and stream it page by page to the client, so massive entrySet calls will anyway make Hazelcast to go OOM.
2. Enabling BackPressure - Our IMap.get() & put() are very cheap and we don't want to throttle this at the cost of throttling entrySet() calls, is there a way to apply backPressure at an operation level?

Please advise.

Fuad Malikov

unread,
May 8, 2017, 4:35:51 PM5/8/17
to Hazelcast
Hi Dinesh, 

You actually summarized the issue pretty well. 

1. Paging predicate would work in the same way. It will have the same memory usage. 
2. This depends, if you have a control in when client calls entrySet(). you could use some distributed Locking or better a Semaphore to make sure that only few clients would be loading at the same time. 

Dinesh Babu K G

unread,
May 10, 2017, 3:36:57 AM5/10/17
to Hazelcast
Thanks Fuad. We'll look into constraining the clients somehow.

Can you explain the rationale behind allowing hazelcast to go OOM in case of huge number of parallel entrySet() calls? Shouldn't it fail the operations above a threshold instead of bringing the entire cluster down?

Fuad Malikov

unread,
May 10, 2017, 12:31:29 PM5/10/17
to Hazelcast
Hazelcast acts very similar to the way Java Map works. You can easily store more data than your heap allows and it will go OOME. It is about where to cross the line. Hazelcast likes to provide the API and functionality and expect users to not shoot on their foot:)  




--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+unsubscribe@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at https://groups.google.com/group/hazelcast.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/5fcccbb8-50cd-4e8b-a545-9f808c6eb2b2%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages