Re: Does Hazelcast deserialize on *every* call to IMap.get()?

1,280 views
Skip to first unread message

Peter Veentjer

unread,
Jan 19, 2013, 4:08:24 AM1/19/13
to haze...@googlegroups.com
On Fri, Jan 18, 2013 at 8:25 PM, <cclev...@gmail.com> wrote:
> We've got an app that needs fast access to objects in a cache, several
> thousand times per second. I'm storing the objects in an IMap for
> distribution around the cluster. Strangely, the code to deserialize the
> object is getting called when I fetch it from the cache using IMap.get().
> Does this mean that the object is stored in serialized form and is
> deserialized on every call?

It depends.

The default setting on a map is that values are cached; so once read
you will get back the same object instance and deserialization is not
needed after that.

Example:

<hazelcast>
<map name="employees">
<cache-value>false</cache-value>
</map>
</hazelcast>

cache-value defaults to true, so if you don't specify anything else,
there will only be a deserialization one time..

This works perfectly for immutable values, but if the property is not
set to true, you will run into dataraces (and jmm) issues if you are
storing mutable values in the map.

> I did see the documentation on the near cache, but it isn't clear to me
> whether that cache holds objects or byte arrays.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Hazelcast" group.
> To post to this group, send email to haze...@googlegroups.com.
> To unsubscribe from this group, send email to
> hazelcast+...@googlegroups.com.
> Visit this group at http://groups.google.com/group/hazelcast?hl=en-US.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Peter Veentjer

unread,
Jan 19, 2013, 4:09:39 AM1/19/13
to haze...@googlegroups.com
Correction:

cache-value needs to be set to false explicitly to prevent race
problems on mutable values.

Joe Planisky

unread,
Jan 21, 2013, 8:05:37 PM1/21/13
to haze...@googlegroups.com
Are you sure that near-cache avoids deserialization? I was under the impression that near-cache only cached the serialized bytes, and that deserialization would happen on each call to IMap.get().

In Mehmet Dogan's answer at the end of this discussion https://groups.google.com/d/topic/hazelcast/_51hnq-2GOY/discussion he says:

> Near-cache stores serialized (binary) forms of values
> owned by other members. ... So, near-cache does not
> return the same value for subsequent calls.

That seems to indicate deserialization IS happening. That was back in October 2012. Has the behavior changed in the releases since then?

--
Joe


On Jan 21, 2013, at 12:35 AM, Lukas Blunschi wrote:

> Hi,
>
> I have not seen the cache-value property until now, but in 2.4 and 2.5 you
> can enable the near cache which avoids to deserialize data on every
> IMap.get() call.

> ...

> Best,
> Lukas
>

Lukas Blunschi

unread,
Jan 22, 2013, 3:53:43 AM1/22/13
to haze...@googlegroups.com
Hi Joe,

jupp, I'm sure:-)

First I saw it happen im my code, second the source code of Hazelcast is easy enough in this case to read it:
on lines 1091-1097

Best,
Lukas

Peter Veentjer

unread,
Jan 22, 2013, 4:19:35 AM1/22/13
to haze...@googlegroups.com
On Tue, Jan 22, 2013 at 3:05 AM, Joe Planisky <joe.pl...@temboo.com> wrote:
> Are you sure that near-cache avoids deserialization?

Yes.

I have just verified it.

> I was under the impression that near-cache only cached the serialized bytes, and that deserialization would happen on each call to IMap.get().

It is a setting on the map that determined if the same instance is
returned (so if deserialization is happening or not).

It is through the 'cache-value' property.

Try the following:

<hazelcast xmlns="http://www.hazelcast.com/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.hazelcast.com/schema/config
http://www.hazelcast.com/schema/config/hazelcast-config-2.5.xsd">

<map name="articles">
<near-cache>
<max-size>5000</max-size>
<time-to-live-seconds>60</time-to-live-seconds>
<eviction-policy>NONE</eviction-policy>
<invalidate-on-change>false</invalidate-on-change>
</near-cache>
</map>
</hazelcast>

import com.hazelcast.core.*;
import java.util.Map;
public class Main {
public static void main(String[] args) {
HazelcastInstance hzInstance = Hazelcast.newHazelcastInstance(null);
Map<Long, Article> articles = hzInstance.getMap("articles");
Article article = new Article("foo");
articles.put(1L, article);

Article found1 = articles.get(1L);
Article found2 = articles.get(1L);
System.out.println("found == article: "+(found1==found2));
}
}

import java.io.Serializable;

public final class Article implements Serializable {
private final String name;
public Article(String name){this.name = name;}
public String getName(){return name;}
}


> In Mehmet Dogan's answer at the end of this discussion https://groups.google.com/d/topic/hazelcast/_51hnq-2GOY/discussion he says:
>
>> Near-cache stores serialized (binary) forms of values
>> owned by other members. ... So, near-cache does not
>> return the same value for subsequent calls.
>
> That seems to indicate deserialization IS happening. That was back in October 2012. Has the behavior changed in the releases since then?
>
> --
> Joe
>
>
> On Jan 21, 2013, at 12:35 AM, Lukas Blunschi wrote:
>
>> Hi,
>>
>> I have not seen the cache-value property until now, but in 2.4 and 2.5 you
>> can enable the near cache which avoids to deserialize data on every
>> IMap.get() call.
>
>> ...
>
>> Best,
>> Lukas
>>
>

Peter Veentjer

unread,
Jan 22, 2013, 4:20:26 AM1/22/13
to haze...@googlegroups.com
And play with the near cache property and see what happens:
<cache-value>false</cache-value> <----- play with this
setting, defaults to true.
<near-cache>
<max-size>5000</max-size>
<time-to-live-seconds>60</time-to-live-seconds>
<eviction-policy>NONE</eviction-policy>
<invalidate-on-change>false</invalidate-on-change>
</near-cache>
</map>
</hazelcast>

peerkeh...@gmail.com

unread,
Nov 20, 2013, 5:18:25 PM11/20/13
to haze...@googlegroups.com
I am having trouble getting this behavior on Hazelcast 3.1.2. Would appreciate any help on this, maybe I am missing some simple setting:

Given the following sample code:

import java.io.*;
import java.util.Map.Entry;
import com.hazelcast.config.*;
import com.hazelcast.core.*;
import com.hazelcast.map.*;
import com.hazelcast.nio.*;
import com.hazelcast.nio.serialization.*;

class Article implements DataSerializable {

private String name;

public Article() {
}

public Article(String name) {
this.name = name;
}

public String getName() {
return name;
}

@Override
public void writeData(ObjectDataOutput out) throws IOException {
System.out.println("serialized");
out.writeUTF(name);
}

@Override
public void readData(ObjectDataInput in) throws IOException {
System.out.println("deserialized");
name = in.readUTF();
}
}

public class Main {

private static class MyEntryProcessor extends AbstractEntryProcessor<Long, Article> {

public MyEntryProcessor() {
super(false);
}

@Override
public Object process(Entry<Long, Article> entry) {
return entry.getValue();
}

}

public static void main(String[] args) {
Config conf = new Config();
MapConfig mapConfig = new MapConfig();
mapConfig.setName("articles");
mapConfig.setNearCacheConfig(new NearCacheConfig());
mapConfig.setInMemoryFormat(InMemoryFormat.OBJECT);
conf.addMapConfig(mapConfig);
HazelcastInstance hzInstance = Hazelcast.newHazelcastInstance(conf);
try {
IMap<Long, Article> articles = hzInstance.getMap("articles");
Article article = new Article("foo");
articles.put(1L, article);

System.out.println("simple get");

Object found1 = articles.get(1L);
Object found2 = articles.get(1L);
System.out.println("found == article: " + (found1 == found2));
System.out.println("entry processing");
found1 = articles.executeOnKey(1L, new MyEntryProcessor());
found2 = articles.executeOnKey(1L, new MyEntryProcessor());
System.out.println("found == article: " + (found1 == found2));
} finally {
hzInstance.getLifecycleService().shutdown();
}
}
}

I get a significant amount of calls to serialization and deserialization, and no identity values on the local node for both normal gets as entry processing:

deserialized
simple get
serialized
deserialized
serialized
deserialized
found == article: false
entry processing
serialized
deserialized
serialized
serialized
deserialized
deserialized
deserialized
serialized
deserialized
serialized
serialized
deserialized
deserialized
deserialized
found == article: false

Op dinsdag 22 januari 2013 10:20:26 UTC+1 schreef peter veentjer:

Enes Akar

unread,
Nov 21, 2013, 4:05:51 AM11/21/13
to haze...@googlegroups.com
There is a problem there. We need to fix it.
HEre the issue you can track:


To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.

To post to this group, send email to haze...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.



--
Enes Akar
Hazelcast | Open source in-memory data grid
Mobile: +90.507.150.56.71

Ahmet Mircik

unread,
Nov 21, 2013, 9:00:11 AM11/21/13
to haze...@googlegroups.com
Since you are testing on a one node cluster, enabled near cache does not make any difference on gets. It just serializes & deserializes every time.
HZ is not super optimized for one node local operations.

But in entry processing case your findings are correct and it should be fixed.  

orek...@evergage.com

unread,
Nov 22, 2013, 12:52:09 PM11/22/13
to haze...@googlegroups.com
If you are not updating objects, cache-values=true will help you avoid deserialization, otherwise you are stuck. If you do any puts, serialization will always happen. Near-cache solves remote gets, but does not solve deserialization.

Hazelcast is just not designed for fast access to objects like that, especially if you are doing any updates. You might be better off using Guava LoadingCache and then using Hazelcast as a partitioning mechanism for determining data affinity. Hazelcast supports many different ways to use a map, and for some narrow use cases, it has an expensive performance profile.

Another option that might hold you over is to use a much faster serialization algorithm, such as Kryo or Jackson Smile.


On Friday, January 18, 2013 1:25:20 PM UTC-5, Chris Cleveland wrote:
We've got an app that needs fast access to objects in a cache, several thousand times per second. I'm storing the objects in an IMap for distribution around the cluster. Strangely, the code to deserialize the object is getting called when I fetch it from the cache using IMap.get(). Does this mean that the object is stored in serialized form and is deserialized on every call?

Peter Veentjer

unread,
Nov 22, 2013, 2:45:28 PM11/22/13
to haze...@googlegroups.com


peerkeh...@gmail.com

unread,
Nov 22, 2013, 4:08:30 PM11/22/13
to haze...@googlegroups.com, orek...@evergage.com
I am confused that the get() operation incurs a "serialized" operation when in OBJECT memory mode, but not in BINARY mode. I am not updating the map in between. The amount of deserialized calls when using EntryProcessors is significant, while the documentation states this should not happen.

For comparison, the above example code InMemoryFormat.BINARY vs OBJECT (single node) yields:

BINARY (put)
serialized

OBJECT (put)
serialized
deserialized

BINARY (simple get, 1 call)
deserialized

OBJECT (simple get, 1 call)
serialized
deserialized

BINARY (entry processor, 1 call)
deserialized
serialized
serialized
deserialized

OBJECT (entry processor, 1 call)
serialized
deserialized
serialized
serialized
deserialized
deserialized
deserialized

I do intend to use Hazelcast for a multi-node cache, so LoadingCache is not for me. I am just trying to understand when and how often I will need to expect a (de)serialization penalty. We are going to analyze large sets of data so this will matter to us. Why using OBJECT would incur an additional "serialization" on a get() operation is totally unexpected for me.

Op vrijdag 22 november 2013 18:52:09 UTC+1 schreef orek...@evergage.com:

Peter Veentjer

unread,
Nov 22, 2013, 4:22:23 PM11/22/13
to haze...@googlegroups.com, orek...@evergage.com
With Object currently always a serialize is done to get a byte array and then a deserialize is done to get the object.

It feels strange because you have the object already next to you :) But currently Hazelcast doesn't optimize for local calls. So it will assume it is remote and therefor serialize + deserialize with object in memory format.

With binary, since the object already is stored as a byte array, only a deserialize is needed.

PS: I would love to see the optimization for local immutable objects btw.


peerkeh...@gmail.com

unread,
Nov 22, 2013, 5:27:52 PM11/22/13
to haze...@googlegroups.com, orek...@evergage.com
That explains things, though it is unfortunate: we expect a mostly-local distribution of data. The docs state this:
  • OBJECT:Data will be stored in de-serialized form. This configuration is good for maps where entry processing and queries form the majority of all operations and the objects are complex ones so serialization cost is respectively high. By storing objects, entry processing will not contain the de-serialization cost.

This is why I tried to explore using the entry processing API. Thanks Enes for opening an issue on that problem! Another enhancement might be to explain in the docs that using Object incurs additional serialization calls on simple get operations. This trade-off is not that explicitly mentioned right now. Agreed that detecting a immutable object would open opportunities for even better optimizations! It would be nice if custom objects would be detected/flagged as immutable as well, I will add a comment to the issue.

Op vrijdag 22 november 2013 22:22:23 UTC+1 schreef peter veentjer:

Peter Veentjer

unread,
Nov 23, 2013, 12:27:00 AM11/23/13
to haze...@googlegroups.com, Oleg Rekutin
On Sat, Nov 23, 2013 at 12:27 AM, <peerkeh...@gmail.com> wrote:
That explains things, though it is unfortunate: we expect a mostly-local distribution of data. The docs state this:
  • OBJECT:Data will be stored in de-serialized form. This configuration is good for maps where entry processing and queries form the majority of all operations and the objects are complex ones so serialization cost is respectively high. By storing objects, entry processing will not contain the de-serialization cost.

This is why I tried to explore using the entry processing API.

Yes. The EntryProcessor and the queries are able to access the actual stored object. 

 
Thanks Enes for opening an issue on that problem! Another enhancement might be to explain in the docs that using Object incurs additional serialization calls on simple get operations. This trade-off is not that explicitly mentioned right now. Agreed that detecting a immutable object would open opportunities for even better optimizations! It would be nice if custom objects would be detected/flagged as immutable as well, I will add a comment to the issue.

Yes. Some kind of registry or annotation.

To be honest, I think it is important. We already have functionality in place to control partitioning, so we also should then make local data access a lot faster because one deserves the other.

rob...@jpro.no

unread,
May 7, 2014, 8:20:16 AM5/7/14
to haze...@googlegroups.com
I'm curious if any progress has been made on fast direct access to near-cached objects?

The reason I'm asking is that we're working on an application that wants to cache lots of small-sized read-mostly maps. All of them fit into memory on all nodes. We'll be using these maps *a lot*, with some code iterating over all entries, other code doings gets in tight loops in code hotspots etc. Fast access is crucial.

I thought this would be super-fast:
NearCacheConfig nearCacheConfig = new NearCacheConfig();
nearCacheConfig.setInMemoryFormat(InMemoryFormat.OBJECT);
nearCacheConfig.setEvictionPolicy("NONE");
nearCacheConfig.setCacheLocalEntries(true);

But Hazelcast still takes the deserialization cost on every get.

Then I did some benchmarks, this is with all gets on locally cached objects:
JDK ConcurrentHashMap   5,500,000 gets/sec
Infinispan 6.0.2.Final  1,700,000 gets/sec
GridGain 6.1.0            360,000 gets/sec
Hazelcast 3.2.1            35,000 gets/sec

For our app this is pretty much a showstopper for Hazelcast.


Regards, Robert.

Noctarius

unread,
May 7, 2014, 10:27:28 AM5/7/14
to haze...@googlegroups.com
Hi Robert,

Iterating over the map by requesting all values to a node is bad practice anyways. You should split up your operation to work in parallel on the key owning nodes. Probably there is a chance to use the new map reduce framework.

To your question:
No effort spend on it since it is expected to return a new instance all the time to prevent users from creating inconsistent object states by changing values in the returned object without putting it back to the map again (to enforce update on other nodes).

Chris

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at http://groups.google.com/group/hazelcast.

Robert Andersson

unread,
May 7, 2014, 11:05:02 AM5/7/14
to haze...@googlegroups.com
Let's forget about iterates and map/reduce and all that for now, let's simplify...

Assume a cache consisting of 1 entry. It's fronted by a near cache, using OBJECT memoryformat. Let's get this single entry 100000 times from the near cache.

Even in that silly simple case Hazelcast wants to create a new object 100000 times, using costly deserialization, running at 1/500 the speed it could run.

I would really like a near cache option 'return-the-damned-object-I-know-I-should-not-mutate-it'.

Googled around a bit, see it's been brought up before:


Regards, Robert.

Noctarius

unread,
May 7, 2014, 11:12:41 AM5/7/14
to haze...@googlegroups.com
Yep and the comments still stand. There was a way in Hazelcast 2 that’s why Peter brought it up but currently there is no plan to get it back in. TBH InMemoryFormat::OBJECT is kinda irritating.

Probably somewhere in the future, you might want to push this issue to get it on the roadmap but for now I sadly can’t tell you anything else.

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at http://groups.google.com/group/hazelcast.

Lukas Blunschi

unread,
May 7, 2014, 1:01:29 PM5/7/14
to haze...@googlegroups.com, noctar...@googlemail.com
Hi Chris,

I'm a bit confused now, because in version 3.2.0 you *do* return the same instance if the near cache is enabled, uses OBJECT format and also caches local entries:

NearCacheConfig nearCacheConfig = new NearCacheConfig();
nearCacheConfig.setEvictionPolicy("NONE");
nearCacheConfig.setInMemoryFormat(InMemoryFormat.OBJECT);
nearCacheConfig.setCacheLocalEntries(true);

See MapProxyImpl.getInternal().

I even suggested an optimization to this behavior in the following pull request: https://github.com/hazelcast/hazelcast/pull/2350 which would further decrease the number of deserializations upon IMap.get()...

And I strongly hope that this local caching and returning the same instance will also be possible in the future, because we are relying on fast access without deserialization upon every call.

Best,
Lukas

Noctarius

unread,
May 7, 2014, 1:06:58 PM5/7/14
to Lukas Blunschi, haze...@googlegroups.com
Oh seems I missed that, I haven’t said anything - just ignore me :D Sorry

Lukas Blunschi

unread,
May 7, 2014, 1:09:30 PM5/7/14
to haze...@googlegroups.com, Lukas Blunschi, noctar...@googlemail.com
:-) no problem, I'm relieved nonetheless.

The question remains, why does Robert see such bad performance numbers in his post a few hours ago..

@Robert: could you share your performance test code?

Thanks,
Lukas

Ben Cotton

unread,
May 7, 2014, 3:01:41 PM5/7/14
to haze...@googlegroups.com


Sent from my iPhone

> On May 7, 2014, at 10:27 AM, Noctarius <noctar...@googlemail.com> wrote:
>
>

Robert Andersson

unread,
May 8, 2014, 4:29:30 AM5/8/14
to haze...@googlegroups.com
Problem solved!

Lukas' messages prompted me to have another look at my testcode, and to have a look at MapProxySupport.getInternal in the Hazelcast source.

Turns out readBackupData does not put objects into the near cache (bug?), basically disabling the cacheLocalEntries optimization.

And my testcode had readBackupData enabled.

Turned it off, and the speed went up from 35,000 gets/sec to 1,200,000 gets/sec!

Thanks for the help.


Regards, Robert.


Noctarius

unread,
May 8, 2014, 4:33:25 AM5/8/14
to haze...@googlegroups.com
Well doesn’t really sounds like a bug but more like there should be a validation on the configuration because those two properties are kinda except each other.

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at http://groups.google.com/group/hazelcast.

Anton Lisovenko

unread,
Aug 30, 2014, 8:35:20 AM8/30/14
to haze...@googlegroups.com
Actually, this is a big stopper for many use cases. Even distributed grid could get a lot of benefit from getting object from local map without deseriallization (for cluster of 3 machines this means that 1/3 of all objects would be taken as simple objects, furthermore - if backup copies are kept also deseriallized - they would be written locally as though)

Seems, only backup replication would suffer a bit as the object would have to be serialliized/deseriallized. Instead local put operations will skip this cost as though.

For my case - I use one 50Gb data grid node (with possibility to scale in future to more machines) and suffered a lot from failing the ability to read objects directly. Thanks god, entry processors can read plain objects, I changed logics of my applications to use them and reieved boost about 100-150% comparing with BINARY version.

Though I highly regard this as a huck to gain acceptable performance, not functional design. Implementing really local objects would give a great performance boost for HZ applications. If I could vote somehow, I would give all points for the sake of https://github.com/hazelcast/hazelcast/issues/1194 :)

пятница, 18 января 2013 г., 22:25:20 UTC+4 пользователь Chris Cleveland написал:
Reply all
Reply to author
Forward
0 new messages