hazelcast 4 slower than hazelcast 3.8

299 views
Skip to first unread message

Łukasz Kula

unread,
Aug 18, 2020, 9:03:45 AM8/18/20
to Hazelcast
Hi , 

after we did upgrade of the hazelcast from version 3.8 to most recent we experienced that it is slower on reading and writing data to distribution maps.
We have quite heavy load and cluster with like 12 nodes. We are not using any special configuration. 
Do you have similar experience ?   we are trying to figure out what is causing this issue but so far without success. 

Thanks in advance, 
Lukasz

Neil Stevenson

unread,
Aug 18, 2020, 1:58:41 PM8/18/20
to Hazelcast
Hi
  Only generic suggestions, but can you try the following please and share findings:

  Hazelcast 3.8.7, 3.12.7 and 4.0.2 against Java 8 and Java 11 == six tests in total

 The point here is to confirm if it's advancing along the Hazelcast 3 series that exposes the issue, or the final jump from Hazelcast 3 to 4.

 If you've got any code that demonstrates the issue, that's even easier.

Neil

Jiri Holusa

unread,
Aug 19, 2020, 6:28:01 AM8/19/20
to Hazelcast
One more comment to your question: "Do you have similar experience ?"

Not at all, 4.0.2 should be much faster actually and we have plenty of performance tests results that's supporting the statement. Therefore, we're very interested in what you're seing. What operations are you using? Put/gets? Queries? EntryProcessors etc.?

If you could share the code that is slower in your case, that would be very helpful. Also the configuration if possible. Something fishy is going on there :)

Łukasz Kula

unread,
Aug 19, 2020, 8:58:34 AM8/19/20
to Hazelcast
Thanks a lot for the suggestions and informations, 
so far we checked 3.12.7 and it looks ok , so no performance issues. 
Now we are trying with Java 11 as we are using Java 8. We have quite simple setup with IMaps and ReplicatedMaps just get set / get, we remove backups etc for testing. 
We are testing with full application setup so it's hard to share the code. We will try to make small isolated application and replicate the problem.

Example map configuration:

<map name = "map1">
<in-memory-format>OBJECT</in-memory-format/>
<eviction eviction-policy="LRU" max-size-policy="PER_NODE" size="5000"/>
<max-idle-seconds>1800</max-idle-seconds>
<backup-count>2<backup-count>
</map>

we play with in memory format or with removing backup but without visible improvment same with in-memory-format to binary.

I must say that the problem is more visible on the prod when there is heavy load. 

Example time: on version 3 set operation 16 ms 
  on version 4 set 25 ms + 

We are still preparing to test with Java 11 it's in progress.


Jaromir Hamala

unread,
Aug 19, 2020, 9:25:16 AM8/19/20
to haze...@googlegroups.com
what's your load (tps) and what's your object size like? both 16ms and 25ms look awfully high for a simple set operation. Of course there can be good reasons for this: The simplest one is a large entry (100s of KB or more) or congestion network.

Cheers,
JAromir

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/75760057-61a2-4892-80de-22315b47b8b5n%40googlegroups.com.

This message contains confidential information and is intended only for the individuals named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required, please request a hard-copy version. -Hazelcast

Marcin Mikolajczyk

unread,
Aug 20, 2020, 10:28:19 AM8/20/20
to Hazelcast
Hi all, I'm from OP team facing the issue.

As we are still migrating to java11, I've made some more tests with our application on the development environment. 

We've got there 4 servers running in hz cluster, from which 2 are receiving events from kafka and storing some data (via set) in the distributed map (propagated to all 4 servers). During the test I sent huge amount of events to kafka, so that servers CPU were busy (90+%) and compared the network load on hazelcast 3.12.7 and 4.0.2 with high load of events and without any events (a few might have happened from the test env).

We noticed that the network load was 2-3 times higher after installing the hazelcast 4 version in comparison to hz3.12.7. Have you noticed anything like this during your performance tests?

Please find the screenshots attached to see what I'm talking about. The last screenshot shows the comparison of the 97th percentile of set operation time alone in hz4 (first part of chart) and hz3.12.7.
4_taskmanager_smallload.png
3_taskmanager_smallload.png
4_taskmanager_heavyload.png
setcomparison_perc97.png
3_taskmanager_heavyload.png
4_resourcemonitor_heavyload.png
4_resourcemonitor_smallload.png
3_resourcemonitor_heavyload.png
3_resourcemonitor_smallload.png

peter veentjer

unread,
Aug 20, 2020, 11:38:25 AM8/20/20
to Hazelcast
What is your value size.

Marcin Mikolajczyk

unread,
Aug 20, 2020, 1:25:13 PM8/20/20
to Hazelcast
in this map - approx 2-3kb.

Neil Stevenson

unread,
Aug 26, 2020, 3:52:53 AM8/26/20
to Hazelcast
Hi
 > We will try to make small isolated application 

 Are you any closer to a simpler reproducer ?
 
 For example, default map configuration (1 backup, no eviction), do "map.set(K,V)" for integer keys and random strings of the right size range. No "map.get()", just the one operation type. 

Neil

Marcin Mikolajczyk

unread,
Aug 26, 2020, 6:17:29 AM8/26/20
to Hazelcast

Hi,

we still can't reproduce the issue on a simpler setting, but we have noticed that when we start sending events (that triggers imap.set) in hz3 we can see 4 new async threads (named like "hz.%cache_name%.async.thread-N"), while on hz4 they are not showing up - it could mean that something that was made asynchronously in hz3, happens now in a blocking thread.

Vassilis Bekiaris

unread,
Aug 26, 2020, 6:42:18 AM8/26/20
to haze...@googlegroups.com
Hi Marcin,

the async executor was present in Hazelcast 3 to supply a default thread pool for execution of async callbacks. This was necessary as Hazelcast 3 was compatible with Java 6 and that JDK did not supply a default thread pool for async callbacks execution. Since Hazelcast 4 requires Java 8, the async executor was removed. Async callbacks chained on CompetionStages returned from Hazelcast will be executed according to the execution policy described in JDK’s CompletableFuture javadoc. See also https://docs.hazelcast.org/docs/4.0/manual/html-single/index.html#removal-of-icompletablefuture

Cheers,
Vassilis

Neil Stevenson

unread,
Sep 1, 2020, 5:37:21 AM9/1/20
to Hazelcast
Hi
 Are you getting anywhere with resolving this ?

 One thing that's worth trying is a test with only "Map.set(K,V)" operations to see if that's slower or faster.

 We would expect faster, but if it's not for you, that gives something to focus on

N

Marcin Mikolajczyk

unread,
Sep 1, 2020, 5:49:28 AM9/1/20
to Hazelcast
Hi,

thank you for quick responses.

In the meantime, it turned out that the map we were looking at was not the problem. We've got another map which is more troublesome and impacts whole hazelcast instance. 

During the kafka event, after set is made to the first map, there are a few more operations on another map: a few get's followed by getCacheMap().getEntryView(key).getLastUpdateTime() and at the end there is a putAll operation (all those operations are made 1-10 times per one event+set to first map). 

The second map is much heavier than first one (one entry can have 10-200kb [or more in extreme cases]) and its key is a pretty simple 4 field object and the value is a TreeMap. 

In hazelcast 4 you have added new default serializers for java types, incl the TreeMap ( https://github.com/hazelcast/hazelcast/pull/15371/files ). Unfortunately, I can't disable nor simply override this serializer but it's our suspect for now because the same setting works perfectly fine on hz3 where the treemap is being serialized by JavaDefaultSerializers.JavaSerializer class. 

Do you think it could be it?

regards
Marcin

Marcin Mikolajczyk

unread,
Sep 4, 2020, 8:02:14 AM9/4/20
to Hazelcast
Hi,

I have forked hazelcast and commented out the treemap serializer. It didn't work.

Although, I have added custom stream serializer for the object that was in the treemap value and that did the trick, I've got times like in hz3.

So something must have changed in interiors of serialization mechanism.

Neil Stevenson

unread,
Sep 8, 2020, 3:58:21 AM9/8/20
to Hazelcast
So just to confirm, you think the problem occurs for a TreeMap that contains objects that aren't standard Java serializable

If so, can you describe or provide the kind of object you're using please ?

Neil

Neil Stevenson

unread,
Sep 16, 2020, 12:00:25 PM9/16/20
to Hazelcast
Hi
 Did you get anywhere with progressing this ?

N

Marcin Mikolajczyk

unread,
Sep 17, 2020, 3:32:55 PM9/17/20
to Hazelcast
Hi,

I had to change priorities of my tasks, I'll let you know about the outcome of this change in few days.

M

Neil Stevenson

unread,
Oct 1, 2020, 4:24:47 AM10/1/20
to Hazelcast
Hi
 Did you get anywhere ? We're really like to get to the root cause here

Marcin Mikolajczyk

unread,
Oct 15, 2020, 5:09:12 AM10/15/20
to Hazelcast
Hi,

sorry for the delay, but I have some other tasks on my shoulders and after we went to production, we've suffered from some new issues with hz4.

The new issue came up after we added custom serialization and the application didnt die in few minutes/hours, 
but it was our bug (which didn't affect hz3 though) - we sometimes didnt unlock the key on cache after trylock.

I have run some tests on my test env. Test env consists of 4 servers clustered by hz. Scenario is simulation of users logging.
At the beginning cache is empty (bigger times at the beginning of each run) but because I have limited number of users, 
as their data is stored in the cache, times are getting better after next logging of the same user.

This is the cache:

Map<Key, TreeMap<Long, AnObject>>

Key is a class with 2 string fields, long field and an enum field.

TreeMap Value is quite big object:

public class AnObject implements Serializable, Comparable<AnObject>, SomeInterface {

    private static final long serialVersionUID = someLongVal;

    private Long aLongField1;

    private Long aLongField2;

    private Long aLongField3;

    private Long aLongField4;

    private Long aLongField5;

    private LocalDateTime aLocalDateTimeField1;

    private LocalDateTime aLocalDateTimeField2;

    private Enum1 anEnumField1;

    private String aStringField1;

    private Enum2 anEnumField2;

    private BigDecimal aBigDecimalField1;

    private BigDecimal aBigDecimalField2;

    private String aStringField2;

    private BigDecimal aBigDecimalField3;

    private String aStringField3;

    private AnObject2 anObjectField1; //simple 4 String field object

    private String aStringField4;

    private AnObject2 anObjectField2;  //simple 4 String field object

    private AnObject2 anObjectField3;  //simple 4 String field object

    private String aStringField5;

    private BigDecimal aBigDecimalField4;

    private Long aLongField6;

    private String aStringField6;

    private String aStringField7;

    private Long aLongField7;

    private Long aLongField8;

    private Long aLongField9;

    private List<AnObject> anObjectField4; //AnObject of the same type as this class(!!!)

    private Set<AnObject3> anObjectField5; //a bit more complex - it consists fields SomeClass(with 3 simple fields), 4 long fields, 1 enum, 1 BigDecimal, 3 string fields

    private Boolean aBooleanField1;

    private Long aLongField10;
}

To sum up, the screen below shows tests results. It shows times of cache operations (except the green line): get, putAll, getLastUpdateTime (from entryView).


Hz4 with custom serialization has the lowest possible times now, probably mostly the network time is what we see.

Hz4 without custom serialization but simplified the field anObjectField4 - I have made a list of Strings out of it temporarily 
because I thought that maybe it was the issue for the serializer to serialize field of the Object which was the same type 
as the Class the field was in, but as we can see, its just slightly better than next test run.

Hz4 without custom serialization and without simplification is what have killed us after switching from hz3 
(note the times at the beginning, when the cache is empty - its very high for new values)

Hz3 without custom serialization has slightly worse times than hz4 with custom serialization, but it was just fine.

Hz4 with custom serialization and simplified value (anObjectField4) made no big difference for first test run, as the times there are already minimal. 

Unfortunately I couldnt find anything suspicious while debugging but after fixing serialization and not-unlocked-tryLock issue, we are now running smoothly.

regards
Marcin Mikołajczyk

Marcin Mikolajczyk

unread,
Oct 15, 2020, 5:27:41 AM10/15/20
to Hazelcast

here's the missing screen

s.png

Neil Stevenson

unread,
Oct 24, 2020, 2:45:00 AM10/24/20
to Hazelcast
Thanks, will try to find time to investigate

N

Neil Stevenson

unread,
Oct 27, 2020, 3:08:00 PM10/27/20
to Hazelcast
Thanks!  I can make the problem recreate for me :-)

Marcin Mikolajczyk

unread,
Oct 28, 2020, 2:15:00 AM10/28/20
to haze...@googlegroups.com
Wow, thats great! Please keep me posted what was wrong and how to fix this or what you will do to fix in Hz :) glad I could help somehow. 

Thanks, 
Marcin


You received this message because you are subscribed to a topic in the Google Groups "Hazelcast" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hazelcast/8RWkUkWY5C4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hazelcast+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/1cdc3c5c-bb6f-43e2-a26b-4f0d99211e9dn%40googlegroups.com.

Neil Stevenson

unread,
Oct 29, 2020, 3:33:38 AM10/29/20
to Hazelcast
I've logged this as an issue on Github, https://github.com/hazelcast/hazelcast/issues/17788, with a subset bit of your code that demonstrates at least one problem
Reply all
Reply to author
Forward
0 new messages