[Performance] Is there any way to NOT serialize local entries from Hazelcast map

2,052 views
Skip to first unread message

kraythe

unread,
Aug 6, 2015, 11:54:46 AM8/6/15
to Hazelcast
Greetings, 

I have been working with a Hazelcast cluster for about a year now and I am having some performance issues that I am trying to address. I understand that when an object is on a remote node serialization must occur if I try to get it from an IMap. What I dont really get is why the local entry set has to be serialized. Why isn't it possible to have those objects available as objects, as any other object in a hash map for that matter, in the local environment without breaking the paradigm. The need to serialize anything and everything out of hazelcast is presenting a huge problem in performance as I have some large objects in the map. In fact its almost at a level where RDBMS access to partial objects is preferable from the point of view of performance. 

So I am wondering perhaps I am doing something wrong? Is there any way to configure an IMap so that entries are not serialized to or from the map if accessed on the owner of the partition? I know that you can use Object format and an entry processor to access objects but the problems with that paradigm are extensive. 1) we are choking executors with hundreds of thousands of calls. 2) Entry processors are EXTREMELY limited in that they cant access other maps so any kind of composite operation, which constitute the bulk of our use cases, is impossible in an EP. 

We moved from DB direct to Hazelcast cache to get performance increases but frankly I have not seen the performance increase I expected to see. 

I would appreciate any constructive guidance on this issue. 

Peter Veentjer

unread,
Aug 6, 2015, 2:22:10 PM8/6/15
to haze...@googlegroups.com
On Thu, Aug 6, 2015 at 6:54 PM, kraythe <kra...@gmail.com> wrote:
Greetings, 

I have been working with a Hazelcast cluster for about a year now and I am having some performance issues that I am trying to address. I understand that when an object is on a remote node serialization must occur if I try to get it from an IMap. What I dont really get is why the local entry set has to be serialized.

In Hazelcast 2.x it worked that way. The problem was that if 2 threads reads e.g. an employee from an IMap, they both get the same Employee instance. And therefor you get concurrent access on an object.

But.. what is on the radar for a long time it to instruct HZ to prevent the serialization/deserialization barrier in case of immutable objects.

 
Why isn't it possible to have those objects available as objects, as any other object in a hash map for that matter, in the local environment without breaking the paradigm. The need to serialize anything and everything out of hazelcast is presenting a huge problem in performance as I have some large objects in the map. In fact its almost at a level where RDBMS access to partial objects is preferable from the point of view of performance. 

Most users are using quite small objects in the map; so the serialization/deserialization isn't an issue.
 

So I am wondering perhaps I am doing something wrong? Is there any way to configure an IMap so that entries are not serialized to or from the map if accessed on the owner of the partition? I know that you can use Object format and an entry processor to access objects but the problems with that paradigm are extensive. 1) we are choking executors with hundreds of thousands of calls.

Which executors are you chocking? EntryProcessors run on top the partition threads.


 
2) Entry processors are EXTREMELY limited in that they cant access other maps so any kind of composite operation, which constitute the bulk of our use cases, is impossible in an EP. 

Within an partition this should not be a problem. Over multiple partitions is not allowed indeed.
 

We moved from DB direct to Hazelcast cache to get performance increases but frankly I have not seen the performance increase I expected to see. 

I would appreciate any constructive guidance on this issue. 


Would you could do is to hook into the SPI and create your own operations. You have a lot more flexibility here and you could mess around directly with deeper data-structures e.g. the RecordStore.

I don't know exactly what you are doing and how much problems it can cause to bypass the IMap and access RecordStore directly.

But before beginning on this adventure, I would suggest having another closer look at your object structure. HZ is not really designed for 'big' values.
 

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at http://groups.google.com/group/hazelcast.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/ca97196b-6cf1-4e38-84c8-d2facc9185ca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dsukho...@gmail.com

unread,
Aug 6, 2015, 4:51:51 PM8/6/15
to Hazelcast
Hi,

Yes, caching large objects is a well-known anti-pattern in distributed cache world. This is not a Hazelcast issue, the same performance issues I've seen in Coherence, for instance. Try to split your structure to a number of smaller parts, all co-located via partition key. 

kraythe

unread,
Aug 6, 2015, 5:18:52 PM8/6/15
to Hazelcast, dsukho...@gmail.com
I think one of the problems I am facing is not so much the large object concept but rather that the objects are being accessed in the thousands of objects at once. So serializing one object may not be a big deal whereas serializing tens of thousands of them is. Especially to get a single field out of them. For example if we have A with 10 fields including a URL id and B which has an id to A. Getting the distinct set of A urls for thousands of Bs is quite the serialization nightmare. We do have some rather large objects and although breaking them up is a great idea, there are only so many resources to go around. 

However, I thought the mantra of the NoSQL world would be to store documents (or key values) as cohesive units instead of spreading it among 5 maps. If A has 2000 Cs its associated with and uses heavily it would seem to be an anti-pattern to have to put all C's in another map. If so then the benefit of a cache leaps out the window since DBs search better than hazelcast by far and have join logic that Hazelcast doesn't. 

So are we saying the proscribed recommended approach to a hazelcast system is to mimic an RDBMS with 10 times more complexity and 1/100th of the query ability? Also keep in mind that in a database when i search I can specify only 2 fields of 50, in hazelcast I cant do that. 

I hope you aren't in sales for Hazelcast. :) 

dsukho...@gmail.com

unread,
Aug 6, 2015, 6:27:29 PM8/6/15
to Hazelcast, dsukho...@gmail.com
Ok, you want to de-serialize only a certain fields out of some large object. If the whole object has different access pattern between its fields (say, 7 out of 10 can be accessed/changed in one scenario and other 3 can be accessed/changed only in the second scenario) then I'd split them in two caches. If there is no such clear boundary.. Well, in Coherence there is an ability to de-serialize only certain fields. If you need a field serialized at index 3 out of 10, deserializer wil read only 3 first fields and will not touch the outstanding 7. It should be possible to do the same via direct access to the RecordStore as Peter suggested.

Lukas Blunschi

unread,
Aug 14, 2015, 5:54:16 AM8/14/15
to Hazelcast
Hi kraythe,

a different approach would be to use near caches and configure them such that they also cache local entries and keep cached entries in OBJECT format.

 NearCacheConfig nearCacheConfig = new NearCacheConfig();
 nearCacheConfig
.setCacheLocalEntries(true);
 nearCacheConfig
.setInvalidateOnChange(true);
 nearCacheConfig
.setInMemoryFormat(InMemoryFormat.OBJECT);
 mapConfig
.setNearCacheConfig(nearCacheConfig);

if you warm-up those near caches, you can execute all kinds of queries without having to deserialize anything.

best,
Lukas

Rakesh K

unread,
Aug 14, 2015, 9:59:05 AM8/14/15
to Hazelcast
Not sure if you have seen the documentation  http://docs.hazelcast.org/docs/3.5/manual/html-single/hazelcast-documentation.html#map-statistics
  • OBJECT: The data will be stored in deserialized form. This configuration is good for maps where entry processing and queries form the majority of all operations and the objects are complex ones, making the serialization cost respectively high. By storing objects, entry processing will not contain the deserialization cost.

It clearly says that you can store it in OBJECT format. This is in Hazelcast 3.5.

kraythe

unread,
Aug 15, 2015, 12:21:17 AM8/15/15
to Hazelcast
Setting memory format on near caches is not in the documentation so I didn't know you could even do it. Does this work ? 

kraythe

unread,
Aug 17, 2015, 12:13:21 PM8/17/15
to Hazelcast
In the map configs its in the docs. Its not mentioned in near cache config. And im not sure it bypasses serialization anyway.

Lukas Blunschi

unread,
Aug 26, 2015, 11:12:18 AM8/26/15
to Hazelcast
Hi kraythe,

yes, it seams this is not in the documentation. However, looking at the API it is clearly there and it works for me.

You can test it by enabling a near cache with the configuration above (cache local entries, InMemoryFormat.OBJECT) and execute a put and afterwards two IMap.get() operations. To see if it worked, compare the LocalMapStatistics after each call and check your serialization counters (you will have to use a custom value type which allows you to control/count the number of serializations).

Here is an example test:
package com.nm.test.hazelcast.nearcache;

import com.hazelcast.config.*;
import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.HazelcastInstance;
import com.hazelcast.core.IMap;
import com.nm.test.hazelcast.TestHazelcast;
import com.nm.test.hazelcast.utils.TestValue;
import org.apache.log4j.BasicConfigurator;
import org.apache.log4j.Logger;
import junit.framework.TestCase;

public class TestLocalNearCache3 extends TestCase {

private static final Logger logger = Logger.getLogger(TestLocalNearCache3.class);

private static final String mapName = "testMap" + TestLocalNearCache3.class.getSimpleName();

@Override
protected void setUp() throws Exception {

// configure logging
if (!TestHazelcast.loggingInitialized) {
TestHazelcast.loggingInitialized = true;
BasicConfigurator.configure();
}
}

public void testGet() {

// create config
Config config = new XmlConfigBuilder().build();
config.setProperty("hazelcast.logging.type", "log4j");

// configure map
MapConfig mapConfig = config.getMapConfig(mapName);

// configure near cache
NearCacheConfig nearCacheConfig = new NearCacheConfig();
nearCacheConfig.setEvictionPolicy("NONE");
nearCacheConfig.setInMemoryFormat(InMemoryFormat.OBJECT);
nearCacheConfig.setCacheLocalEntries(true);
mapConfig.setNearCacheConfig(nearCacheConfig);

// disable multicast for faster startup
config.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);

// disable version check on start-up
config.setProperty("hazelcast.version.check.enabled", "false");

// create Hazelcast instance
HazelcastInstance hcInstance = Hazelcast.newHazelcastInstance(config);

// try-finally to ensure hazelcast is stopped
try {

// get map
IMap<String, TestValue> map = hcInstance.getMap(mapName);

// add map entry
String key = "key1";
TestValue value = new TestValue("value1");
map.put(key, value);

// get map entry
// - count number of read data
int countRead0 = TestValue.getNumReadData();
map.get(key);
int countRead1 = TestValue.getNumReadData();

// ensure only one readData() call
assertTrue("readData() called not exactly one times (" + countRead0 + " -> " + countRead1 + ")", countRead0 + 1 == countRead1);

// get map entry again
map.get(key);
int countRead2 = TestValue.getNumReadData();

// ensure still only one readData() call
assertTrue("readData() called not exactly one times (" + countRead0 + " -> " + countRead2 + ")", countRead0 + 1 == countRead2);

} finally {
hcInstance.getLifecycleService().terminate();
}

logger.info("testGet() done.");
}

}

package com.nm.test.hazelcast.utils;

import com.hazelcast.nio.ObjectDataInput;
import com.hazelcast.nio.ObjectDataOutput;
import com.hazelcast.nio.serialization.DataSerializable;
import java.io.IOException;
import java.util.concurrent.atomic.AtomicInteger;

/**
 * A test value which counts read and write data calls.
 */
public class TestValue implements DataSerializable {

private static AtomicInteger numWriteData = new AtomicInteger();

private static AtomicInteger numReadData = new AtomicInteger();

public static int getNumWriteData() {
return numWriteData.get();
}

public static int getNumReadData() {
return numReadData.get();
}

private String value;

/*
* protected constructor for deserialization
*/
TestValue() {
}

public TestValue(String value) {
this.value = value;
}

public String getValue() {
return value;
}

@Override
public void writeData(ObjectDataOutput out) throws IOException {
out.writeUTF(value);

// count
numWriteData.incrementAndGet();
}

@Override
public void readData(ObjectDataInput in) throws IOException {
value = in.readUTF();

// count
numReadData.incrementAndGet();
}

}



If you use IMap.get() operations to execute your queries, they will use this near cache and therefore no serialization and no network access are needed.

Best,
Lukas


Reply all
Reply to author
Forward
0 new messages