Hazelcast is taking too much memory

6,275 views
Skip to first unread message

Arghya Kusum Das

unread,
Feb 18, 2015, 7:58:21 PM2/18/15
to haze...@googlegroups.com
Hazelcast is taking too much of memory than I expected.
My data size is 31GB. Number of unique keys are 642Million.
I am running Hazelcast on 32 machines each with 32GB RAM.
I ran only one hazescast server instance per machine with 24GB Heap Space.
Surprisingly, After loading all the data, the memory usage profile is showing almost 15GB memory usage per machine (average). 
That is, the total memory usage for my 31GB data is 15*32=480GB !!!!
What is the reason of it?
Which does not make any sense. Is it some configuration issue?

Now, I have another data set of 300GB. Is it possible to run it with the opensource version?

Jaromir Hamala

unread,
Feb 19, 2015, 5:55:57 AM2/19/15
to haze...@googlegroups.com
Hello Arghya,

I assume you are using IMap. Hazelcast keeps a backup copy of all your entries by default. You can disable this behaviour in MapConfig if you don't need it.  This alone should half your memory consumption. 

To help you further I'd need to know nature of your data a bit better. You have 31 GB - is it just a pure payload? 31GB spread across 642 millions of key means you have approx. 50 bytes / key. Each object in 64bit Java has an overhead of approx. 12 bytes to store a header. It can be even more if you have a larger maximum heap size as reference compression will be disabled. This is not specific to Hazelcast - it's given by JVM implementation. Each key, value and entry requires own object - that's 36B just for object headers - that's almost as much overhead as size of your useful payload / entry. 

This itself cannot explain the overhead you have observed. Another overhead can be caused by your serialization strategy. Are you using Java Serialization? Java Serialization is known for not being very efficient it time nor space. You can usually gain quite a lot of performance and save space by implementing DataSerialization interface in your domain objects. You can see an example here.

It's definitely possible to store 300GB of data in Hazelcast (open source), it shouldn't be a problem at all. 

Cheers,
Jaromir

David Brimley

unread,
Feb 19, 2015, 5:57:56 AM2/19/15
to haze...@googlegroups.com
Hi There,

Can you tell me ...

(1) How are you loading the data into the cache?  From a Hazelcast client doing puts into the cluster or by using the MapLoader.loadAllKeys()?
(2) The 15gb consumed after load is complete, can you try to run a FULL GC via jconsole or similar?  What is memory consumption after that?
(3) What are the size of the objects and how many entries are there?  Are they all stored in the one map or many maps?

Additionally

I would advise against 24GB heaps as this might cause you GC pause issues.  Instead consider a number of JMVs on each machine of around 5GB each.

Best Regards
David Brimley
Senior Solutions Architect @ Hazelcast

Arghya Kusum Das

unread,
Feb 19, 2015, 4:33:27 PM2/19/15
to haze...@googlegroups.com
Hi Jaromir and David,

Following are some of the details abot our data and our procedure
Insight of the 31GBdata that I have in ASCII files:
Each line consists of key-value. 
First 31characters of each line forms the key.
It is followed by at max 25 more characters that we are storing as value for the corresponding key as byte array.
And the files have total 642Million lines/entries.
We are using replication factor 1.
We have 32 machines each with 32GB RAM.
Servers and clients are collocated in a machine. Each machine has 1 Hazelcast server instance and 5 clients. Which means, I have 32 (1*32) server-instances and 160 (5*32) clients are loading the data.
We are using client-put to load the data into Hazelcast Map.

Arghya Kusum Das

unread,
Feb 20, 2015, 11:49:07 AM2/20/15
to haze...@googlegroups.com
Hi,
One modificaion on data descripttion:
it should not be 25 but 65 (It is followed by at max 65 (NOT 25 mentioned in my last post) more characters that we are storing as value for the corresponding key as byte array.)


On Wednesday, February 18, 2015 at 6:58:21 PM UTC-6, Arghya Kusum Das wrote:

Arghya Kusum Das

unread,
Feb 22, 2015, 5:38:25 AM2/22/15
to haze...@googlegroups.com
It seems like a memory-leak. 
Today, I tried the same 31GB data in a singe machine with a 256GB RAM.
I instantiated 8 clients with max heap space 8GB.
Then try uploading the data from only one client.
The memory usage was huge. For only one million keys (with max 65bytes each)  it used up 2GB. Where as it should be in MB range (at most 80-90MB). The same trend follows the entire process.
May be the client connection is creating some memory leak in server.
I will appreciate if you let me know your opinion on that.


On Wednesday, February 18, 2015 at 6:58:21 PM UTC-6, Arghya Kusum Das wrote:

Arghya Kusum Das

unread,
Feb 22, 2015, 5:39:43 AM2/22/15
to haze...@googlegroups.com
Also I tried with full GC, and also used the parallel gc option while running the jar.... But no luck


On Wednesday, February 18, 2015 at 6:58:21 PM UTC-6, Arghya Kusum Das wrote:

Jean Luc

unread,
Feb 22, 2015, 10:24:33 AM2/22/15
to haze...@googlegroups.com
Hi Arghya,

Take a heap dump and load it into a profiler (YourKit, Eclipse MAT or whatever you prefer), it will become readily apparent what's using up the heap.

JL





--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at http://groups.google.com/group/hazelcast.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/a7d3251e-6521-4ee8-bfec-7bc2404923e7%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Arghya Kusum Das

unread,
Feb 22, 2015, 5:16:00 PM2/22/15
to haze...@googlegroups.com
Hi, 
can any one look at the attached code and the sample data file. My 31GB data is entirely like this.
Only 1million of this key is taking huge amount of memory


On Wednesday, February 18, 2015 at 6:58:21 PM UTC-6, Arghya Kusum Das wrote:
MyFirstHazelCast.java
MyFirstHazelCastClient.java
dbg.txt

David Brimley

unread,
Feb 23, 2015, 12:12:19 PM2/23/15
to haze...@googlegroups.com
Hi,

I ran a quick test and loaded 1 million entries into a 2 node cluster on Hazelcast 3.4.1 with backup-count set to 1.
Java 1.7.0_45 on a Mac Book Pro.

The key was an Integer and the value was a String of 65 bytes.

The cost of each map entry is around 225 bytes, you have to remember that Hazelcast wraps these objects with a small amount of metadata for each entry such as timestamp etc.  

You can see this for yourself by getting the EntryView object for a map entry by using IMap.getEntryView(key).  On the EntryView there is a getCost() method which will show you exactly how many bytes each Map Entry should be consuming.

Anyway, my test showed that after 1 million records each of my 2 Server JVM had consumed around 320mb.

Advice/Questions...

(1) What version of Hazelcast are you using ? What version of Java? What O/S?
(2) Can you do the EntryView check as described above, what is the cost of your entries in bytes?
(3) Can you also try to generate a Heap Dump using something like VisualVM, which is free (http://visualvm.java.net/)
(4) Also doing individual puts over a network from a client for millions of entries is not very efficient.  If you are unable to use MapLoader (as seems the case as its a file).  Can I suggest you batch up your puts from the client, by using IMap.putAll(Map map), possible every 1,000 or so entries?

Arghya Kusum Das

unread,
Feb 23, 2015, 1:12:14 PM2/23/15
to hazelcast
Hi,

We did some modification and encoded the string value into a byte array of size 20. And the key is a 31Char String. 
Now each entry takes 204 bytes in memory (using IMap.getEntryView(key)). But it still takes a lot of memory in production.

Here are the details:
1. Hazelcast version: 3.4.1
1. Java version: tested on both 1.6 and 1.7
2. OS: Tested on both Rhel and Ubuntu
4. If I Batch the entries is it going to take less memory?

--
You received this message because you are subscribed to a topic in the Google Groups "Hazelcast" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hazelcast/J_9KPC-4iaU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hazelcast+...@googlegroups.com.

To post to this group, send email to haze...@googlegroups.com.
Visit this group at http://groups.google.com/group/hazelcast.

For more options, visit https://groups.google.com/d/optout.



--
Thanks and regards,
Arghya Kusum Das

David Brimley

unread,
Feb 23, 2015, 1:16:06 PM2/23/15
to haze...@googlegroups.com
Batching won't reduce memory but it should drastically improve your prime times.

Are you saying you still see more memory consumption than the entries x entry cost?  e.g. 204 bytes?

You are running a full GC after the load ?

Arghya Kusum Das

unread,
Feb 23, 2015, 1:30:46 PM2/23/15
to hazelcast
Hi David,
We enabled Parallel GC in the server.
Everytime we start running the server with the command 'java -Xms4g -Xmx6g -XX:+UseParallelGC -XX:ParallelGCThreads=5 -jar server1.jar hazelcast.xml 1 &'
Do I need to do anything else?
And the hazelcast.xml is attached. I don't know if there is any bad configuration


For more options, visit https://groups.google.com/d/optout.
hazelcast.xml

David Brimley

unread,
Feb 23, 2015, 2:22:15 PM2/23/15
to haze...@googlegroups.com
Nothing seems out of the ordinary with the XML.

I think you're going to have to run a Heap Dump and let us know the results.  Try it on a 1gb run and see if it tallies with the entry cost x 2 (for backup).

Arghya Kusum Das

unread,
Feb 23, 2015, 2:55:45 PM2/23/15
to hazelcast
Do you want me to enable the backup for the heap dump? 
We set the sync and async backup both to 0 in the xml. 


For more options, visit https://groups.google.com/d/optout.

Arghya Kusum Das

unread,
Feb 27, 2015, 8:59:31 PM2/27/15
to haze...@googlegroups.com
Hi,
I checked the memory usage. Yes, you were right. With 204Bytes entry cost memmory usage is explainable. 
But again, it is taking almost twice the memory it is supposed to take although we disabled the backup.

The java file (which is starting the server as well as creating the map) looks like following:
public class Driver {
public static IMap<String, byte[]> createMap(String mapName, HazelcastInstance hzc){
IMap <String, byte[]> map = hzc.getMap(mapName);
return map;
}
public static void main(String[] args) {
try {
Config config = new XmlConfigBuilder(args[0]).build();
HazelcastInstance hz = Hazelcast.newHazelcastInstance(config);
IMap<String, byte[]> map = createMap("DBG", hz);
} catch (Exception e) {s
e.printStackTrace();
}
}

}


In Hazelcast.xml we made the following change (as you can see in the java file attached our map-name is DBG):
 <map name="DBG">
        <in-memory-format>BINARY</in-memory-format>
        <backup-count>0</backup-count>
        <async-backup-count>0</async-backup-count>
        <time-to-live-seconds>0</time-to-live-seconds>
        <max-idle-seconds>0</max-idle-seconds>
        <eviction-policy>NONE</eviction-policy>
        <max-size policy="PER_NODE">0</max-size>
        <eviction-percentage>25</eviction-percentage>
        <min-eviction-check-millis>100</min-eviction-check-millis>
        <merge-policy>com.hazelcast.map.merge.PutIfAbsentMapMergePolicy</merge-policy>
    </map>


On Wednesday, February 18, 2015 at 6:58:21 PM UTC-6, Arghya Kusum Das wrote:
Reply all
Reply to author
Forward
0 new messages