Hi,
We have a 7 node hazelcast cluster (v3.5) with 16GB as heap size per node. Our backup config is 1 sync & 1 async.
We have about 250+ clients accessing the heaviest map mostly (with 80,000 values ~ 500MB total map size). They use a custom eager cache client which calls entrySet() to get the snapshot on startup and then receives Map changes via Listeners after.
The trouble is whenever these clients (web-servers) are restarted in batches (of 50, 75 etc.,), all entrySet() calls come in parallel causing our Hazelcast to go OOM.
For now, we have asked the clients to not to parallel restarts (or more than 10 instances) but we want to solve this problem the right way for the long term & in a scalable way.
We thought of using,
1. PagingPredicate - My understanding is that it will create the entire Predicate's snapshot in memory and stream it page by page to the client, so massive entrySet calls will anyway make Hazelcast to go OOM.
2. Enabling BackPressure - Our IMap.get() & put() are very cheap and we don't want to throttle this at the cost of throttling entrySet() calls, is there a way to apply backPressure at an operation level?
Please advise.