Do I understand these stack traces correctly? (and how do I speed things up?)

267 views
Skip to first unread message

kjko...@gmail.com

unread,
Nov 9, 2013, 6:12:50 PM11/9/13
to haze...@googlegroups.com
Dear All,

I need some help with stack traces that I see when I load my system. Per Peter Veentjer's advice I am using the EntryProcessor and I am quite happy with its performance. I am now load testing my application and I am trying to explain some of the stack traces I am seeing. In particular, I would like to know how to speed this stuff up. :-)

In the stack dumps taken during the load test I find more or less equal portions of the two traces shown below. Some of the lines are suppressed to reduce the size and to improve readability of the stack traces.

stack trace form 1:
"http-bio-80-exec-915" daemon prio=10 tid=0x0000000002542000 nid=0x45c1 waiting on condition [0x00007fc301999000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00000000f9222898> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.waitForResponse(InvocationImpl.java:326)
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.get(InvocationImpl.java:294)
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.get(InvocationImpl.java:286)
at com.hazelcast.map.proxy.MapProxySupport.executeOnKeyInternal(MapProxySupport.java:592)
at com.hazelcast.map.proxy.MapProxyImpl.executeOnKeyInternal(MapProxyImpl.java:44)
at com.hazelcast.map.proxy.MapProxyImpl.executeOnKey(MapProxyImpl.java:485)
at org.kjkoster.foo.entities.SeatAllocator.buySeats(SeatAllocator.java:241)
at org.kjkoster.foo.api.HazelcastAPI.buySeats(HazelcastAPI.java:188)
at org.kjkoster.foo.servlets.TicketServlet.doPost(TicketServlet.java:97)
in tomcat...
- locked <0x00000000ff7634e8> (a org.apache.tomcat.util.net.SocketWrapper)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
in java...

stack trace form 2:
"http-bio-80-exec-896" daemon prio=10 tid=0x0000000001b1f000 nid=0x45a3 waiting on condition [0x00007fc302eae000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00000000f8551398> (a java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:588)
at com.hazelcast.spi.impl.OperationServiceImpl.waitForBackups(OperationServiceImpl.java:636)
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.waitForBackupsAndGetResponse(InvocationImpl.java:393)
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.get(InvocationImpl.java:298)
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.get(InvocationImpl.java:286)
at com.hazelcast.map.proxy.MapProxySupport.executeOnKeyInternal(MapProxySupport.java:592)
at com.hazelcast.map.proxy.MapProxyImpl.executeOnKeyInternal(MapProxyImpl.java:44)
at com.hazelcast.map.proxy.MapProxyImpl.executeOnKey(MapProxyImpl.java:485)
at org.kjkoster.foo.entities.SeatAllocator.buySeats(SeatAllocator.java:241)
at org.kjkoster.foo.api.HazelcastAPI.buySeats(HazelcastAPI.java:188)
at org.kjkoster.foo.servlets.TicketServlet.doPost(TicketServlet.java:97)
in tomcat...
- locked <0x00000000f4a85918> (a org.apache.tomcat.util.net.SocketWrapper)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
in java...

There are other traces, but there are maybe one or two instances of them. The ones above show up dozens of times in each stack dump I take under load.

From reading the code, I am seeing that my app ends up waiting for other nodes to respond to executeOnKey() results from other nodes (well, there are only two in the cluster). Is that observation correct?

Where do I start to improve performance here? Any ideas?

Kees Jan

Peter Veentjer

unread,
Nov 10, 2013, 3:25:25 AM11/10/13
to haze...@googlegroups.com
Your observations are correct.

This is the Hazelcast 3 architecture for executing operations. And although I love its simplicity and its guarantees its provides, it has quite an influence on the performance. Normally there are 2 queues that pop up when I do benchmarking; either the pending operation (request) queue and the response queue. I hope that in the future we can find solutions for that. I also would like to see optimizations for local calls, a local call should not need to go through the request/response-queue, but should be able to execute operations directly from the calling thread. But this will complicate the architecture and currently everyone really enjoys the current architecture because it is easy to understand and we can very easily add new functionality without worrying.  

Another optimization I would like to add it the ability to execute concurrent operations within a partition because currently at any given moment only 1 operation can be executed. E.g a read can be executed concurrently with other reads and potentially with writes since we have the old byte array maybe still laying around.. so a read could read the previous write.. when a write is currently happening. But this will introduce a lot of complexity as well on all levels. 

But what kind of performance do you currently get and why do you need more? And can you explain a bit more about your problem?


--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at http://groups.google.com/group/hazelcast.
For more options, visit https://groups.google.com/groups/opt_out.

kjko...@gmail.com

unread,
Nov 10, 2013, 5:42:01 AM11/10/13
to haze...@googlegroups.com
Dear Peter,


This is the Hazelcast 3 architecture for executing operations. And although I love its simplicity and its guarantees its provides, it has quite an influence on the performance. Normally there are 2 queues that pop up when I do benchmarking; either the pending operation (request) queue and the response queue. I hope that in the future we can find solutions for that. I also would like to see optimizations for local calls, a local call should not need to go through the request/response-queue, but should be able to execute operations directly from the calling thread. But this will complicate the architecture and currently everyone really enjoys the current architecture because it is easy to understand and we can very easily add new functionality without worrying.  

The use-case I have forces the data on both nodes in my cluster to be in sync. They load balance and one may be switched off at any time with no advance warning and thus no time to signal the peer node.
 

Another optimization I would like to add it the ability to execute concurrent operations within a partition because currently at any given moment only 1 operation can be executed. E.g a read can be executed concurrently with other reads and potentially with writes since we have the old byte array maybe still laying around.. so a read could read the previous write.. when a write is currently happening. But this will introduce a lot of complexity as well on all levels.

Heinz Kabutz did a nice talk on Java8's StampedLock at JFocus this year. Just watched it last night. In particular, he shows that it is relatively easy to do optimistic reads safely using that construct. http://parleys.com/play/5148922b0364bc17fc56ca4f Even if you offered that today it would not help me, because I have write contention and hardly do any reads.
 
 
But what kind of performance do you currently get and why do you need more? And can you explain a bit more about your problem?

I'm using Hazelcast as part of a coding challenge: http://www.chess-ix.com/blog/chess-ix-challenge/ so there really is no upper limit on the throughput I need. :) I have no idea what the other contestants use as a solution or get in terms of throughput. That has me worried that my current performance is not enough, of course. Especially, since my machines are now running at only 70% CPU, I feel I have 30% extra performance to gain somewhere. Right now, I bottleneck on Hazelcast.

Speculative allocation has been ruled out by the event organiser (I asked to be sure), so that is why I am stuck on having to write the data in sync on two nodes.

Kees Jan

PS. Not asking anyone to do my challenge for me, of course. Just checking that there is nothing about Hazelcast or its use that I am missing.
Reply all
Reply to author
Forward
0 new messages