High contingency / locks on high concurrency system

166 vues
Accéder directement au premier message non lu

sotretus

non lue,
6 janv. 2010, 22:56:1306/01/2010
à spymemcached
We are having really heavy-loaded tomcats in our app and we use
memcached to cache important data (duh!). The thing is that almost
every request we get hits the memcached at least a few times. We have
splitted our date among different memcached server to avoid
concurrency and allow better performance, and each memcached cluster
has 2 nodes.
With this configuration, we are having lock issues with the memcached
client. Here is the standard exception.
{{{
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00002aab4b463758> (a
java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.parkNanos
(LockSupport.java:198)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos
(AbstractQueuedSynchronizer.java:947)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos
(AbstractQueuedSynchronizer.java:1239)
at java.util.concurrent.CountDownLatch.await
(CountDownLatch.java:253)
at net.spy.memcached.MemcachedClient$OperationFuture.get
(MemcachedClient.java:1655)
at net.spy.memcached.MemcachedClient$GetFuture.get
(MemcachedClient.java:1708)
}}}

A jstack log file might show over 200 of them at a given time (we have
300-worker threads per tomcat).
At first I thought this could be a networking issue... more latency /
bandwidth means longer response times, which could mean more time
locking for I/O. But we ran some tests in our prod environment and
network seems to be fine. We were able to test it from our client
tomcat to all the target memcached's and looked ok. Now we are having
a hard time identifying what might be causing this issue.
Our memcached box holds 4 memcached servers and has a peak eth0
transfer of 30Mbit/s (the link has proven to handle a lot more, more
than 100Mbit of real data transfer).
Our cache servers (all holding different data) in that box handle the
following data (Server # | Hits/s | GET/s | SET/s | Misses/s)
{{{
1 478 505 454 41
2 207 333 128 128
3 1350 1350 1480 0
4 2870 3210 836 339
}}}
Our CPU is about 95% idle all day, around 0.6% user and 0.35%System.
Load is 0.28Max (4-core machine) and 0.12 avg. Also a max of 7.7L eth0
interrupts/second.
Additionally, we are measuring an aggregated 1-hour view of the time
we spend going to memcached (from "around" calling the
memcachedClient). These values are measured from a tomcat, of course,
and include ALL get calls to memcached (all servers, all nodes).
{{{
Units=ms.: (Hits=3058393.0, Avg=43.88392073876706, Total=1.34214276E8,
Min=0.0, Max=3808.0)
}}}

That 43ms AVERAGE time is killing us...not to mention when we have
above that. I believe that value is high because most of the time is
spent waiting on the lock to free. Sadly, I have no that as how much
time the I/O part actually took, as that is inside spymemcached.

In the last web-site we had this issue we solved it by creating more
memcached clients for each server (around 5, to 10) and we've stopped
having this issue. But in this one, it seems that is not solving our
issues (or we haven't hit the sweet number of clients we should use).

Any insight or tips on how to find our bottleneck will be greatly
appreciated. I haven't found a forum for spymemcached so I'm sorry if
this is not the correct place to post it.

I've read several posts on this elsewhere (http://groups.google.com/
group/spymemcached/browse_thread/thread/
93e100893c7ac778/54163aec33c43e97?lnk=raot&pli=1 ,
http://code.google.com/p/spymemcached/issues/detail?id=104 and others,
but last time I posted in the wrong place and didn't receive a proper
answer)

Regards
Andres B.

PS. All our servers run inside a cloud, with high-IO servers and 8cpus
(equivalent).

Dustin

non lue,
13 janv. 2010, 22:14:5413/01/2010
à spymemcached

On Jan 6, 7:56 pm, sotretus <andres.bernasc...@gmail.com> wrote:
> We are having really heavy-loaded tomcats in our app and we use
> memcached to cache important data (duh!). The thing is that almost
> every request we get hits the memcached at least a few times. We have
> splitted our date among different memcached server to avoid
> concurrency and allow better performance, and each memcached cluster
> has 2 nodes.
> With this configuration, we are having lock issues with the memcached
> client. Here is the standard exception.

Thanks for all the info. We'll be taking a deeper look into some of
the contention shortly as we're in the process of getting some more
test infrastructure built out.

Répondre à tous
Répondre à l'auteur
Transférer
0 nouveau message