Is there a difference (performance/memory-wise) between calling entrySet()+iterate over or keySet() and iterate over plus get()?

206 views
Skip to first unread message

Peter Litvak

unread,
Nov 1, 2013, 10:41:53 AM11/1/13
to haze...@googlegroups.com
Hi,

If I'm to iterate over the map and perform some work for each entry is there a difference in performance/memory of doing this by getting the entrySet first and then just go over it and, alternatively, getting a keySet and iterating over it getting each value?

Thank you

Peter Veentjer

unread,
Nov 1, 2013, 10:43:54 AM11/1/13
to haze...@googlegroups.com
The problem with this approach is that you pull the entry sets in memory.

E.g. if you have 1 million entries of 10 machines (so 100.000 entries per machine). And you do a entryset/keyset.. then you will pull 1.000.000 items in memory, before going to work on it. This could lead to OOME.  

If you are using hazelcast 3, you might consider the EntryProcessor. And in the near future we'll also have a map/reduce implementation.

Peter.


--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at http://groups.google.com/group/hazelcast.
For more options, visit https://groups.google.com/groups/opt_out.

Peter Litvak

unread,
Nov 1, 2013, 11:05:46 AM11/1/13
to haze...@googlegroups.com
So my idea was that if I use keySet() the amount of data initially pulled will be relatively small (for small keys). Then when I iterate over the keys and pulling out values one by one to look at some property of the value and discard the value, this will amount to relatively small growth in memory consumption since the entries should be garbage collected soon.

Is this a valid assumption?

Peter Veentjer

unread,
Nov 1, 2013, 12:58:06 PM11/1/13
to haze...@googlegroups.com
It is a valid assumption. For small keys it should work, but for larger keysets, you don't want to use such a solution since as long as you are iterating, the full set will be kept in memory.

Dinesh Kumar

unread,
Sep 21, 2015, 10:35:25 AM9/21/15
to Hazelcast
Hi Peter,

A related question posted here as well : http://stackoverflow.com/questions/32697445/hazelcast-imap-keyset-versus-localkeyset

I'm using Hazelcast 3.4 in a cluster with 5+ nodes. I have a map where keys are complex objects ( not primitive types). At any point of time, the size of the map may be around 200K.

I understand that the entries/values in a map will be stored in different partitions. However, I would like to know the following.

  1. Does each member in the Hazelcast cluster maintain the information about the set of all keys in a given map or only about the subset of those keys in the partition(s) that it owns ?

2 & 3 are follow up questions.

  1. Is keySet() a distributed operation ? (i.e. will it surely involve remote machine communication in a cluster environment)

  2. Is localKeySet() a distributed operation ?

Some more background on the problem:


I need to inform a third party program about the keys present in the hazelcast map in my program during regular intervals of time by calling a service cachedKeys(K[] keys) exposed by the third party program.


Option 1: call keySet() from any one of the nodes in the cluster to retrieve all the keys in the map and then call cachedKeys(K[] keys) service from that node.

Option 2: call localKeySet() from each of the nodes in the cluster to identify the local keys owned by each nodes, followed by cachedKeys(K[] keys) call from each node.


The third party program doesn't work (read value with key 'k' -> update -> put teh value back) on the cached objects. So EntryProcessor is not very useful in this scenario.

Thanks,
Dinesh

Jaromir Hamala

unread,
Sep 22, 2015, 2:56:50 AM9/22/15
to Hazelcast
Hi Dinesh,

Members do not maintain a global key set. Each member is aware of local entries only. Hence keySet() is a distributed operation hitting all members while localKeySet() is just a local operation. I hope this info helps you a bit.

Cheers,
Jaromir

Dinesh Kumar

unread,
Sep 22, 2015, 5:30:15 AM9/22/15
to haze...@googlegroups.com
Hi Jaromir,

Your reply is indeed helpful.

Out of curiosity, I now have a different question.

I understand that key gets serialized (gets converted into a byte[] array) which is then hashed and the result of which is mod by no of partitions. This gives us the id of the partition where data is stored and from the partition table in each member, it identifies the owner of the partition.

I would like to know what happens after this step. I understand that keys are stored as com.hazelcast.nio.serialization.Data class (binary form). Will there be
​ ​
separate hash buckets maintained for each hashmap whose key is present in a given partition ? If not, assuming a hashing algorithm which uniformly hashes  100K objects
​(​
in
​ ​
a
​ 2 node cluster), and with default partition size of 271, won't the collisions be very high ?

I understand that partition count is configurable, but I would like to know some internals before I change any configuration.

Hope that I'm not messaging this group with wrong questions.

Thanks,
Dinesh

--
You received this message because you are subscribed to a topic in the Google Groups "Hazelcast" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hazelcast/09Hpf-SbY1s/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hazelcast+...@googlegroups.com.

To post to this group, send email to haze...@googlegroups.com.
Visit this group at http://groups.google.com/group/hazelcast.
Reply all
Reply to author
Forward
0 new messages