Hazelcast performance tests

Wojciech Durczyński

unread,

Feb 12, 2010, 6:46:00 AM2/12/10

to haze...@googlegroups.com

Hello

I tested Hazelcast with your test class
(com.hazelcast.examples.SimpleMapTest).
Results and conclusions are attached to this message.
Could you post some comments about it?
Is it possible to improve scalability of Hazelcast instance on multicore
machines (more cpus utilization and greater speed)?

Regards

Hazelcast tests.pdf

Talip Ozturk

unread,

Feb 12, 2010, 8:34:17 AM2/12/10

to haze...@googlegroups.com

Wojciech,

Excellent test.

S is a pretty powerful machine and I think the test is not able to
create enough work/load for it. Can you please test SS with 80 threads
on each and tell us the total operations?

Thanks,
-talip

2010/2/12 Wojciech Durczyński <Wojciech....@comarch.com>:

> --
> You received this message because you are subscribed to the Google Groups
> "Hazelcast" group.
> To post to this group, send email to haze...@googlegroups.com.
> To unsubscribe from this group, send email to
> hazelcast+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/hazelcast?hl=en.
>
>

Wojciech Durczyński

unread,

Feb 12, 2010, 10:16:17 AM2/12/10

to Hazelcast

I made some tests with varying number of threads on S (300000 records
x 1000B).
Results:
S, 1 thread, 28k op/s
S, 2 threads, 40k op/s
S, 4 threads, 70k op/s
S, 8 threads, 70k op/s
S, 20 threads, 70k op/s
S, 40 threads, 70k op/s
S, 80 threads, 70k op/s

SS, 20 threads, 6k op/s overall
SS, 80 threads, 46k op/s overall
SS, 200 threads, 55k op/s overall

On 12 Lut, 14:34, Talip Ozturk <ta...@hazelcast.com> wrote:
> Wojciech,
>
> Excellent test.
>
> S is a pretty powerful machine and I think the test is not able to
> create enough work/load for it. Can you please test SS with 80 threads
> on each and tell us the total operations?
>
> Thanks,
> -talip
>

> 2010/2/12 Wojciech Durczyński <Wojciech.Durczyn...@comarch.com>:

Talip Ozturk

unread,

Feb 13, 2010, 6:20:27 AM2/13/10

to haze...@googlegroups.com

Wojciech,

As you can see, Hazelcast can better utilize the cpu/network as you
increase the thread numbers. The idea is that your machine S is so
powerful that it is not fair to run 40 thread on it. Forget about
Hazelcast, if you run say, Weblogic, on that server and if you let
Weblogic use only 40 threads then Weblogic won't utilize the
CPU/Network enough. Because you are not able create enough load for
it.

The idea with single thread (ServiceThread) to handle all operations
is the following:
We know that single thread can handle more than 100K per second which
super good for a distributed grid instance. So single thread is
already too good in terms of performance and utilization. so all we
have to do is to make sure that ServiceThread is getting enough
work/load and enough CPU cycles to handle. and since you have 2X4 core
on that machine, ServiceThread is getting enough CPU cycles but we
should also make sure that it is getting enough load. Latency on the
network is also lowering the work/load. As your cluster gets bigger,
handling the network will slow us even more. So the enhancement we
should do is not on the ServiceThread (it is so fast already), but is
on the handling the network; reading packets faster from the sockets
and writing packet faster through the sockets so that we can create
more work for ServiceThread..

What do you think?

-talip

2010/2/12 Wojciech Durczyński <wojciech....@comarch.com>:

Wojciech Durczyński

unread,

Feb 15, 2010, 11:36:05 AM2/15/10

to Hazelcast

Surely slow network could be problematic and increasing its speed
should be one of our goals (maybe increasing number of I/O threads,
but two threads using NIO Selector should be fast enough).
But there are some more ideas to consider. Currently each thread
performing any operation (let's name it: client thread) waits for
result of that operations. Operation itself is performed by
ServiceThread and Input/Output threads. In this time client thread is
not able to generate further operations. And this is a reason to use
large number of threads to generate enough workload for service
thread. But in real application it is not a problem and I think that
adding more ServiceThreads or more communication threads is not really
important.
Following modifications could improve performance aspects of
Hazelcast:
1. Ability to read from local backup copies
If local node contains backup copy of some data, 'get' method should
be extremely fast and do not communicate with node responsible for
particular key. It's easy to implement and should significantly
improve reads speed.
2. Possiblity to declare auxiliary method to map object to specific
region (and cluster member) - i.e. to store all customer's orders on
the same cluster member as customer object itself.
Sample implementation in Oracle Coherence:
http://coherence.oracle.com/display/COH35UG/Data+Affinity
3. Bulk operations
Thread calling 'get' or 'put' waits now for successful execution of
operation. It'd be great to add methods putAll and getAll that can
speed up writing or reading of multiple objects at once. In example:
IMap.putAll(Collection<Entry>)
IMap.putAll(Entry a,...)
optionally HazelCastInstance.putAll(Map<IMap, Collection<Entry>)) -
modyfying at once more than one map
IMap.getAll(Collection<Key>)
IMap.getAll(Key k,...)
optionally HazelcastInstance.getAll(Map<IMap,Collection<Key>) - get
data from multiple maps at once
4. Api allowing asynchronic writes and reads from distributed map
In example: java.util.concurrent.Future<Value> IMap.get(Key) - returns
Future that will be filled with received value.

On 13 Lut, 12:20, Talip Ozturk <ta...@hazelcast.com> wrote:
> Wojciech,
>

> As you can see, Hazelcast can better utilize the cpu/network as you
> increase the thread numbers. The idea is that your machine S is so
> powerful that it is not fair to run 40 thread on it. Forget about
> Hazelcast, if you run say, Weblogic, on that server and if you let
> Weblogic use only 40 threads then Weblogic won't utilize the
> CPU/Network enough. Because you are not able create enough load for
> it.
>
> The idea with single thread (ServiceThread) to handle all operations
> is the following:
> We know that single thread can handle more than 100K per second which
> super good for a distributed grid instance. So single thread is
> already too good in terms of performance and utilization. so all we
> have to do is to make sure that ServiceThread is getting enough
> work/load and enough CPU cycles to handle. and since you have 2X4 core
> on that machine, ServiceThread is getting enough CPU cycles but we
> should also make sure that it is getting enough load. Latency on the
> network is also lowering the work/load. As your cluster gets bigger,
> handling the network will slow us even more. So the enhancement we
> should do is not on the ServiceThread (it is so fast already), but is
> on the handling the network; reading packets faster from the sockets
> and writing packet faster through the sockets so that we can create
> more work for ServiceThread..
>
> What do you think?
>
> -talip
>

> 2010/2/12 Wojciech Durczyński <wojciech.durczyn...@comarch.com>:

Milenovic Family

unread,

Feb 16, 2010, 3:49:23 AM2/16/10

to Hazelcast

Very interested results, thanks.

Just a quick question: did you monitor CPU usage and memory footprint
during those tests? If yes, are they significant?

Regards,
Aleks

On Feb 12, 11:46 am, Wojciech Durczyński

> Hazelcast tests.pdf
> 257KViewDownload

Wojciech Durczyński

unread,

Feb 17, 2010, 7:01:33 AM2/17/10

to Hazelcast

I monitored CPU usage. It was jumping from about 30% of idle
processors to about 80%. Average values of idle processors (0% means
all processors are fully used and 100% means no CPU usage) are given
below:

S, 1 thread, 80% idle
S, 2 threads, 75% idle
S, 4 threads, 65% idle
S, 8 threads, 50% idle
S, 20 threads, 70% idle
S, 40 threads, 70% idle
S, 80 threads, 55% idle

SS, 20 threads, 95% idle
SS, 40 threads, 80% idle
SS, 80 threads, 70% idle
SS, 200 threads, 60% idle

And some previous tests:

D, 55% idle
DD, 60% idle
DDD, 50% idle
DDDD, 40% idle
DDMM, 70% idle + 65% idle
S,S,S 35% idle
S,S,S,S 35% idle
S,S,S,S,S,S 20% idle
SS, 80% idle
SSSS, 70% idle
SSSSSS, 60% idle

Conclusion is that Hazelcast doesn't use whole cpu power (one node
about 30% of all cpus but with many testing threads) and is still very
fast.

I tested also Oracle Coherence and JBoss Infinispan.
Coherence has richer functionality then Hazelcast but has an expensive
commercial license and uses always about 100% of cpus power with
performance results similar to Hazelcast.
Infinispan has some bugs and much of expected functionality is not
ready yet (Didn't tested performance because of exceptions during the
tests - org.infinispan.util.concurrent.TimeoutException and
org.infinispan.remoting.transport.jgroups.SuspectException).
So at the moment Hazelcast seems to be the most reasonable choice for
my project.

Talip, could you write something about modifications proposed by me in
this thread? Can they improve performance?

On 16 Lut, 09:49, Milenovic Family <family.mileno...@gmail.com> wrote:
> Very interested results, thanks.
>
> Just a quick question: did you monitor CPU usage and memory footprint
> during those tests? If yes, are they significant?
>
> Regards,
> Aleks
>

> On Feb 12, 11:46 am, Wojciech Durczyñski

Talip Ozturk

unread,

Feb 17, 2010, 8:12:35 AM2/17/10

to haze...@googlegroups.com

> 1. Ability to read from local backup copies
> If local node contains backup copy of some data, 'get' method should
> be extremely fast and do not communicate with node responsible for
> particular key. It's easy to implement and should significantly
> improve reads speed.

This can be made optional. Obviously user should agree to read
[almost]-the-most-up-to-date value as backup may not be the actual
value at some point in time. Behavior would be similar to NearCache.

> 2. Possiblity to declare auxiliary method to map object to specific
> region (and cluster member) - i.e. to store all customer's orders on
> the same cluster member as customer object itself.
> Sample implementation in Oracle Coherence:
> http://coherence.oracle.com/display/COH35UG/Data+Affinity

Data Affinity is requested by many others. Yes we will have it.

> 3. Bulk operations
> Thread calling 'get' or 'put' waits now for successful execution of
> operation. It'd be great to add methods putAll and getAll that can
> speed up writing or reading of multiple objects at once. In example:
> IMap.putAll(Collection<Entry>)
> IMap.putAll(Entry a,...)
> optionally HazelCastInstance.putAll(Map<IMap, Collection<Entry>)) -
> modyfying at once more than one map
> IMap.getAll(Collection<Key>)
> IMap.getAll(Key k,...)
> optionally HazelcastInstance.getAll(Map<IMap,Collection<Key>) - get
> data from multiple maps at once

we already have putAll(Map) and having getAll(Collection<Key>) would be nice.

> 4. Api allowing asynchronic writes and reads from distributed map
> In example: java.util.concurrent.Future<Value> IMap.get(Key) - returns
> Future that will be filled with received value.

It is super easy to do this with the current codebase but you might
get into starvation where user pushes so many asynchronous write/read
operations that Hazelcast can get no time finish them all. So
performance is actually very unpredictable. you can easily slow down
everything while trying to speed up with asynchronous calls. But I
surely agree that this would increase the utilization and
responsiveness if you don't cause starvation.

-talip

Avatar

unread,

Feb 25, 2010, 7:42:47 AM2/25/10

to Hazelcast

> > 1. Ability to read from local backup copies
>

> This can be made optional. Obviously user should agree to read
> [almost]-the-most-up-to-date value as backup may not be the actual
> value at some point in time. Behavior would be similar to NearCache.

Why backup may not be the actual value at some point in time? Backups
are created synchronously so they should be as up-to-date as original
data.

> > 3. Bulk operations

> we already have putAll(Map) and having getAll(Collection<Key>) would be nice.
> > 4. Api allowing asynchronic writes and reads from distributed map

> It is super easy to do this with the current codebase but you might
> get into starvation where user pushes so many asynchronous write/read
> operations that Hazelcast can get no time finish them all. So
> performance is actually very unpredictable. you can easily slow down
> everything while trying to speed up with asynchronous calls. But I
> surely agree that this would increase the utilization and
> responsiveness if you don't cause starvation.

As a test I created AsyncMPut class as:

class AsyncMPut extends MBackupAndMigrationAwareOp {

public Future put(String name, Object key, Object value) {
return txnalPut(CONCURRENT_MAP_PUT, name, key, value);
}

private Future txnalPut(ClusterOperation operation, String name,
Object key, Object value) {
setLocal(operation, name, key, value, -1, -1);
request.longValue = (request.value == null) ?
Integer.MIN_VALUE : request.value.hashCode();
setIndexValues(request, value);
request.setObjectRequest();
doOp();
return getResultAsFuture();
}

public Future getResultAsFuture(){
return new Future() {
public boolean cancel(boolean mayInterruptIfRunning) {
// TODO Auto-generated method stub
return false;
}

public Object get() throws InterruptedException,
ExecutionException {
Object result = getResultAsObject();
backup(CONCURRENT_MAP_BACKUP_PUT);
return result;
}

public Object get(long timeout, TimeUnit unit) throws
InterruptedException, ExecutionException, TimeoutException {
return get();
}

public boolean isCancelled() {
// TODO Auto-generated method stub
return false;
}

public boolean isDone() {
// TODO Auto-generated method stub
return false;
}
};
}
}

These implementation is not the final one (backups are made
synchronously, there is no transaction support etc.), but should give
us right performance results.

I also modified IMap interface adding method
Future<V> asyncPut(K key, V value);
with implementation in MProxyReal:
public Future asyncPut(Object key, Object value){
check(key);
check(value);
mapOperationStats.incrementPuts();
AsyncMPut mput = concurrentMapManager.new AsyncMPut();
return mput.put(name, key, value);
}

And putAll method implemented as:
public void newPutAll(Map map){
Set<Entry> entries = map.entrySet();
Set<Future> results = new HashSet<Future>();
for (final Entry entry : entries) {
results.add(asyncPut(entry.getKey(), entry.getValue()));
}
for (Future result:results){
try {
result.get();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ExecutionException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

Performance (in 40 threads and 1 Hazelcast instance):
old puts - 36000 puts/s
new puts implemented as map.asyncPut(...).get() - 36000 puts/s
old putAll (bulk puts of 100 entries) - 11000 puts/s !
new putAll (bulk puts of 100 entries) - 36000 puts/s

The same on 2 instances with backup count 0:
old puts - 6000 puts/s
new puts implemented as map.asyncPut(...).get() - 6000 puts/s
old putAll (bulk puts of 100 entries) - 9000 puts/s
new putAll (bulk puts of 100 entries) - 21000 puts/s

Conclusion:
current putAll implementation is slow and should be changed (creating
separate thread for writing every entry is very resource consuming and
slows down whole operation much)

Talip Ozturk

unread,

Feb 25, 2010, 1:59:23 PM2/25/10

to haze...@googlegroups.com

> Why backup may not be the actual value at some point in time? Backups
> are created synchronously so they should be as up-to-date as original
> data.

Yes. Backups are synchronous meaning when your put(key, value) returns
you can be sure that the backup is also taken but there is still a
delay between updating the entry on the owner member and on the backup
member. backup and the owner members are not the same JVMs.

This is great! You got it my friend. This should be the way putAll is
implemented.

-talip

Wojciech Durczyński

unread,

Feb 25, 2010, 4:55:30 PM2/25/10

to Hazelcast

Of course Avatar is my second google account - my mistake.

I'm glad that you enjoy my implementation of putAll.
Take into account that my implementation of asynchronous put is very
simple. You must include transactions and asynchronous backups in
final implementation (In my version backups are synchronous so test
results with backups were wrong).
Could you be so kind and write asynchronous get (and getAll) also?

Wojciech

Talip Ozturk

unread,

Feb 26, 2010, 10:23:37 AM2/26/10

to haze...@googlegroups.com

> Take into account that my implementation of asynchronous put is very
> simple. You must include transactions and asynchronous backups in
> final implementation (In my version backups are synchronous so test
> results with backups were wrong).

Ok. we will do so.

> Could you be so kind and write asynchronous get (and getAll) also?

Yes. sounds good.

Thanks for the feedback,
-talip

Reply all

Reply to author

Forward