Heavy concurrent access and hazelcast.executor.client.thread.count

Алексей Крылов

unread,

Mar 1, 2012, 4:12:11 PM3/1/12

to haze...@googlegroups.com

Hi Fuad,

Is there any strong recommendations/restrictions to number of concurrent access clients per one JVM to queue/map/set structures?

Currently I have a junit test that always hangs Hazelcast with default settings. I’ve tested on 1.9.4.8 and 2.0-RC2 – results are the same.

This junit-test has a client that performs 50000 inserts to the queue in 250 threads, so it’s 200 queue inserts/per thread.

Also it has one queue poller that counts incoming objects.

With this parameters poller retrieves ~48000-49000, but not 50000.

In message log I see this:

INFO: There is no response for Call [196] operation=BLOCKING_QUEUE_OFFER in 5 seconds.

мар 02, 2012 12:59:38 AM com.hazelcast.client.ProxyHelper

INFO: There is no response for Call [206] operation=BLOCKING_QUEUE_OFFER in 5 seconds.

мар 02, 2012 12:59:38 AM com.hazelcast.client.ProxyHelper

INFO: There is no response for Call [216] operation=BLOCKING_QUEUE_OFFER in 5 seconds.

мар 02, 2012 12:59:38 AM com.hazelcast.client.ProxyHelper

INFO: There is no response for Call [226] operation=BLOCKING_QUEUE_OFFER in 5 seconds.

….

мар 02, 2012 1:02:49 AM com.hazelcast.client.ProxyHelper

INFO: There is no response for Call [87563] operation=BLOCKING_QUEUE_POLL in 190 seconds.

мар 02, 2012 1:02:49 AM com.hazelcast.client.ProxyHelper

And this process never ends, so I stop the test manually.

This situation cardinally changes and test run successfully, when I set hazelcast.executor.client.thread.count to 1000.

Just on 2.0-RC2 there is some warnings:

мар 02, 2012 1:04:26 AM com.hazelcast.impl.ThreadContext

WARNING: 1014 ThreadContext is created!! You might have too many threads. Is that normal?

мар 02, 2012 1:04:26 AM com.hazelcast.impl.ThreadContext

Also in server logs I periodically see this:

WARNING: /192.168.1.12:5701 [dev] null

java.lang.NullPointerException

at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)

at com.hazelcast.impl.ClientService.getClientEndpoint(ClientService.java:248)

at com.hazelcast.impl.ClientRequestHandler$1.run(ClientRequestHandler.java:56)

at com.hazelcast.impl.ClientRequestHandler$1.run(ClientRequestHandler.java:53)

at com.hazelcast.impl.ClientRequestHandler.doRun(ClientRequestHandler.java:62)

at com.hazelcast.impl.FallThroughRunnable.run(FallThroughRunnable.java:22)

at com.hazelcast.impl.ClientService$Worker.run(ClientService.java:224)

at java.lang.Thread.run(Thread.java:619)

01.03.2012 23:23:31 com.hazelcast.impl.ClientRequestHandler

WARNING: /192.168.1.12:5701 [dev] null

java.lang.NullPointerException

01.03.2012 23:23:48 com.hazelcast.impl.ClientRequestHandler

WARNING: /192.168.1.12:5701 [dev] null

java.lang.NullPointerException

01.03.2012 23:23:56 com.hazelcast.impl.ClientRequestHandler

WARNING: /192.168.1.12:5701 [dev] null

I ask for suggestions J

Best regards,

Alexey

Алексей Крылов

unread,

Mar 1, 2012, 5:30:09 PM3/1/12

to haze...@googlegroups.com

Maybe this can help:

During playing with threads count and hazelcast.executor.client.thread.count I found that test always performs successfully when

[concurrent threads count]=[hazelcast.executor.client.thread.count/4].

So in my case for default configuration can be maximum 10 access threads on client side.

But unfortunately this is only empiric knowledge, I will appreciate for detailed explanation.

Main problem for me now – that client hand-made limitation for concurrent calls is not effective, because put operation returns immediately and it’s success relayed only on server hazelcast.executor.client.thread.count parameter value.

I think It’s dangerous to set big value, because executors threads started for each client connection and for many clients this can cause ‘Unable to create new native thread’ error.

Also I don’t understand main reason of this infinite await loop in ProxyHelper:

for (int i = 0; ; i++) {

final Object response = c.getResponse(timeout, TimeUnit.SECONDS);

if (response != null) {

c.replied = System.nanoTime();

return (Packet) response;

}

if (i > 0) {

logger.log(Level.INFO, "There is no response for " + c

+ " in " + (timeout * i) + " seconds.");

}

if (!client.isActive()) {

throw new RuntimeException("HazelcastClient is no longer active.");

}

I think it should have await period or maximum retry count with asynchronous listener-based notification on failed put’s. This listener will be very useful, because you always will know what put operation was successful and what are not.

Best regards,

Alexey

Алексей Крылов

unread,

Mar 2, 2012, 7:34:34 AM3/2/12

to haze...@googlegroups.com

Some new additions to this problem.

If concurrent HazelcastClient.queue.put access threads > hazelcast.executor.client.thread.count / 4 then server became to unstable state - some of putted objects will be just lost and client's ProxyHelper will never gets a response.

Here is this code:

protected Packet doCall(Call c) {
sendCall(c);
c.sent = System.nanoTime();
int timeout = 5;

for (int i = 0; ; i++) {

Object response = c.getResponse(timeout, TimeUnit.SECONDS);

//!!!

if (response != null) {
c.replied = System.nanoTime();
return (Packet) response;
}

if (i > 0) {

logger.log(Level.INFO, "!!!There is no response for " + c
+ " in " + awaitSeconds + " seconds.");

}

if (!client.isActive()) {
throw new RuntimeException("HazelcastClient is no longer active.");
}
}
}

For !!! label - response always will be null, because for some reasons server has lost the request. This behavior will cause the client sender thread (thread, from which you call queue.offer) to loop infinitely and in retrospective this will cause OutOfMemoryException or "Unable to create new native thread" errors.

Currently i have a little-bit hacky workaround for queue. Code is here:

ProxyHelper:

protected Packet doCall(Call c) {
sendCall(c);
c.sent = System.nanoTime();
int timeout = 5;

for (int i = 0; ; i++) {

Object response = c.getResponse(timeout, TimeUnit.SECONDS);
if (response != null) {
c.replied = System.nanoTime();
return (Packet) response;
}

int awaitSeconds = timeout * i;

if (c.getRequest().getOperation().equals(ClusterOperation.BLOCKING_QUEUE_OFFER) && awaitSeconds >= MAX_RESPONSE_AWAIT_SECONDS) {
String errorMessage = String.format("!!!Unable to receive response for %s in %d seconds. " +
"Stopping this process. Possible you use HazelcastClient from too many threads.", c, awaitSeconds);
logger.log(Level.SEVERE, errorMessage);
throw new IllegalStateException(errorMessage);
}

if (i > 0) {
logger.log(Level.INFO, "!!!There is no response for " + c
+ " in " + awaitSeconds + " seconds.");

}

if (!client.isActive()) {
throw new RuntimeException("HazelcastClient is no longer active.");
}
}
}

This IllegalStateException just flows back to code where you call put, so it's possible to catch the exception and re-put this object later.

Fuad, please advice with this situation

Best regards,

Alexey

Алексей Крылов

unread,

Mar 2, 2012, 4:36:32 PM3/2/12

to haze...@googlegroups.com

I have made an issue with JUnit-test to reproduce this situation:

http://code.google.com/p/hazelcast/issues/detail?id=804

Best regards,
Alexey

Reply all

Reply to author

Forward