It's PHP. I have seen something but in last couple weeks it has
"cleared" itself. It could be coincidental with using memcached 1.4.1,
code changes etc. I actually have some Ganglia snapshots of the behavior
you are describing here
Reason why load goes to 35-50 is that Apache starts consuming greater
and greater amounts of memory indicating a PHP memory leak. Granted it
could also have something to do with session garbage collection.
> I'm running memcached 1.2.5 currently (which looks to be a bit out of
> date at this point, so perhaps an update is in order).
>
I think that would be a wise choice.
Vladimir
If you discover this is a TIME_WAIT issue (too many TCP sockets
waiting around in kernel), you can tweak this in the kernel:
# cat /proc/sys/net/ipv4/tcp_fin_timeout
60
# cat /proc/sys/net/ipv4/ip_local_port_range
32768 61000
61000-32768= 28232
(these are the defaults on Debian Linux).
So you only have a pool of 28232 sockets to work with, and each will
linger around for 60 seconds in a TIME_WAIT state even after being
close()d on both ends. You can increase your port range and lower
your TIME_WAIT value to buy you a larger window. Something to keep
in mind though for any clients/servers that have a high connect rate.
-Eric
Can you troubleshoot it more carefully without thinking it's specific to
memcached? How'd you track it down to memcached in the first place?
When your load is spiking, what requests are hitting your server? Can you
look at an apache server-status page to see what's in flight, or
re-assemble such a view from the logs?
It smells like you're getting a short flood of traffic. If you can see
what type of traffic you're getting at the time of the load spike you can
reproduce it yourself... Load the page yourself, time how long it takes to
render, then break it down and see what it's doing.
If it's related to memcached, it's still likely to be a bug in how you're
using it internally (looping wrong, or something) - since your load is
related to the number of apache procs, and you claim it's not swapping,
it's either doing disk io or running CPU hard.
-Dormando
for omg in `seq 1 30` ; do yes > /dev/null & done
observe load hit 30.
-Dormando
Smells like you're leaking memcached connections objects somewhere, or you
have a ton of servers? During these spikes, can you telnet to memcached
and run the 'stats' command, or can you not connect either?
Try restarting memcached with -c (connection limit) set to 32767 or
somesuch. See if that changes things.
Is your pecl/memcache library fully upgraded?
If you're using memcached 1.2.8 or later the 'stats' output has a value
'listen_disabled_num' - if that value is nonzero, or incrementing, you're
hitting the connection limit on memcached.
On Tue, 22 Sep 2009, nsheth wrote:
>
> I've already looked in some detail at that, but haven't been able to
> discern any real pattern. I'll look again, though.
>
> I suspect memcache, as whenever I experience this, I get a flood of
> messages in my error log like:
>
> [Sun Sep 13 14:54:34 2009] [error] [client 10.0.0.2] PHP Warning:
> memcache_pconnect() [<a href='function.memcache-pconnect'>function.
> memcache-pconnect</a>]: Can't connect to 10.0.0.5:11211, Unknown error
> (0) in /var/www/html/memcache.php on line 174, referer: xxxx
>
> On Sep 22, 5:31ápm, dormando <dorma...@rydia.net> wrote:
> > Hey,
> >
> > Can you troubleshoot it more carefully without thinking it's specific to
> > memcached? How'd you track it down to memcached in the first place?
> >
> > When your load is spiking, what requests are hitting your server? Can you
> > look at an apache server-status page to see what's in flight, or
> > re-assemble such a view from the logs?
> >
> > It smells like you're getting a short flood of traffic. If you can see
> > what type of traffic you're getting at the time of the load spike you can
> > reproduce it yourself... Load the page yourself, time how long it takes to
> > render, then break it down and see what it's doing.
> >
> > If it's related to memcached, it's still likely to be a bug in how you're
> > using it internally (looping wrong, or something) - since your load is
> > related to the number of apache procs, and you claim it's not swapping,
> > it's either doing disk io or running CPU hard.
> >
> > -Dormando
> >
> > On Tue, 22 Sep 2009, nsheth wrote:
> >
> > > Hmm, just saw the same issue occur again. áLoad spiked to 35-40.
> > > (I've set MaxClients to 40 in apache, and looking at the status page,
> > > I see it basically using every thread, so that may explain that load
> > > level).
> >
> > > Going back on the connections, it looks like we've got about 1.2k
> > > connections in various states, so nowhere near any of these limits.
> >
> > > Any other thoughts?
> >
> > > Thanks!
> >
> > > On Sep 18, 3:30ápm, nsheth <nsh...@gmail.com> wrote:
> > > > We weren't experiencing any abnormal connection levels.
> >
> > > > I did upgrade to the latest client and server version 1.4.1. áSo far
> > > > so good . . .
> >
> > > > On Sep 15, 10:36ápm, nsheth <nsh...@gmail.com> wrote:
> >
> > > > > The machine isn't swapping, actually. áI'll try to "catch" it
> > > > > happening next time and see if I can get more information about the
> > > > > connections used . . . and also look into upgrading to 1.4.1,
> > > > > hopefully that helps.
> >
> > > > > On Sep 15, 6:19ápm, Vladimir <vli...@veus.hr> wrote:
> >
> > > > > > I do question whether those would actually cause load to spike up.
> > > > > > Perhaps connection refused but I suspect those two ie. load spike and
> > > > > > connection refused are linked. Please correct if I am wrong. I just
> > > > > > checked my tcp_time_wait metrics and they peak around 600 even during
> > > > > > these load spikes.
> >
> > > > > > Eric Day wrote:
> > > > > > > If you discover this is a TIME_WAIT issue (too many TCP sockets
> > > > > > > waiting around in kernel), you can tweak this in the kernel:
> >
> > > > > > > # cat /proc/sys/net/ipv4/tcp_fin_timeout
> > > > > > > 60
> >
> > > > > > > # cat /proc/sys/net/ipv4/ip_local_port_range
> > > > > > > 32768 á 61000
> >
> > > > > > > 61000-32768= 28232
> >
> > > > > > > (these are the defaults on Debian Linux).
> >
> > > > > > > So you only have a pool of 28232 sockets to work with, and each will
> > > > > > > linger around for 60 seconds in a TIME_WAIT state even after being
> > > > > > > close()d on both ends. You can increase your port range and lower
> > > > > > > your TIME_WAIT value to buy you a larger window. Something to keep
> > > > > > > in mind though for any clients/servers that have a high connect rate.
> >
> > > > > > > -Eric
> >
> > > > > > > On Tue, Sep 15, 2009 at 08:48:39PM -0400, Vladimir wrote:
> >
> > > > > > >> á áToo many connections in CLOSE_WAIT state ?
> >
> > > > > > >> á áAnyways I would highly recommend installing something like Ganglia to get
> > > > > > >> á ásome types of metrics.
> >
> > > > > > >> á áAlso at 35-50 machine is not doing much other than swapping.
> >
> > > > > > >> á áStephen Johnston wrote:
> >
> > > > > > >> á á áThis is a total long shot, but we spent alot of time figuring out a
> > > > > > >> á á ásimilar issue that ended up being ephemeral port exhaustion.
> >
> > > > > > >> á á áStephen Johnston
> >
> > > > > > >> á á áOn Tue, Sep 15, 2009 at 8:27 PM, Vladimir <vli...@veus.hr> wrote:
> >
> > > > > > >> á á á ánsheth wrote:
> >
> > > > > > >> á á á á áAbout once a day, usually during peak traffic times, I hit some
> > > > > > >> á á á á ámajor
> > > > > > >> á á á á áload issues. áI'm running memached on the same boxes as my
> > > > > > >> á á á á áwebservers. áLoad usually spikes to 35-50, and I see the apache
> > > > > > >> á á á á áerror
> > > > > > >> á á á á álog flooded with messages like the following:
> >
> > > > > > >> á á á á á[Sun Sep 13 14:54:34 2009] [error] [client 10.0.0.2] PHP Warning:
> > > > > > >> á á á á ámemcache_pconnect() [<a href='function.memcache-pconnect'>function.
> > > > > > >> á á á á ámemcache-pconnect</a>]: Can't connect to 10.0.0.5:11211, Unknown
> > > > > > >> á á á á áerror
> > > > > > >> á á á á á(0) in /var/www/html/memcache.php on line 174, referer: xxxx
> >
> > > > > > >> á á á á áAny thoughts? áRestart apache, and everything clears up.
> >
> > > > > > >> á á á áIt's PHP. I have seen something but in last couple weeks it has
> > > > > > >> á á á á"cleared" itself. It could be coincidental with using memcached 1.4.1,
> > > > > > >> á á á ácode changes etc. I actually have some Ganglia snapshots of the
> > > > > > >> á á á ábehavior you are describing here
> >
> > > > > > >> á á á áhttp://2tu.us/pgr
> >
> > > > > > >> á á á áReason why load goes to 35-50 is that Apache starts consuming greater
> > > > > > >> á á á áand greater amounts of memory indicating a PHP memory leak. Granted it
> > > > > > >> á á á ácould also have something to do with session garbage collection.
> >
> > > > > > >> á á á á áI'm running memcached 1.2.5 currently (which looks to be a bit out
> > > > > > >> á á á á áof
> > > > > > >> á á á á ádate at this point, so perhaps an update is in order).
> >
> > > > > > >> á á á áI think that would be a wise choice.
> > > > > > >> á á á áVladimir
>
Okay,
Smells like you're leaking memcached connections objects somewhere, or you
have a ton of servers? During these spikes, can you telnet to memcached
and run the 'stats' command, or can you not connect either?
Try restarting memcached with -c (connection limit) set to 32767 or
somesuch. See if that changes things.
Is your pecl/memcache library fully upgraded?
If you're using memcached 1.2.8 or later the 'stats' output has a value
'listen_disabled_num' - if that value is nonzero, or incrementing, you're
hitting the connection limit on memcached.
On Tue, 22 Sep 2009, nsheth wrote:
>
> I've already looked in some detail at that, but haven't been able to
> discern any real pattern. I'll look again, though.
>
> I suspect memcache, as whenever I experience this, I get a flood of
> messages in my error log like:
>
> [Sun Sep 13 14:54:34 2009] [error] [client 10.0.0.2] PHP Warning:
> memcache_pconnect() [<a href='function.memcache-pconnect'>function.
> memcache-pconnect</a>]: Can't connect to 10.0.0.5:11211, Unknown error
> (0) in /var/www/html/memcache.php on line 174, referer: xxxx
>
> On Sep 22, 5:31 pm, dormando <dorma...@rydia.net> wrote:
> > Hey,
> >
> > Can you troubleshoot it more carefully without thinking it's specific to
> > memcached? How'd you track it down to memcached in the first place?
> >
> > When your load is spiking, what requests are hitting your server? Can you
> > look at an apache server-status page to see what's in flight, or
> > re-assemble such a view from the logs?
> >
> > It smells like you're getting a short flood of traffic. If you can see
> > what type of traffic you're getting at the time of the load spike you can
> > reproduce it yourself... Load the page yourself, time how long it takes to
> > render, then break it down and see what it's doing.
> >
> > If it's related to memcached, it's still likely to be a bug in how you're
> > using it internally (looping wrong, or something) - since your load is
> > related to the number of apache procs, and you claim it's not swapping,
> > it's either doing disk io or running CPU hard.
> >
> > -Dormando
> >
> > On Tue, 22 Sep 2009, nsheth wrote:
> >
> > > Hmm, just saw the same issue occur again. Load spiked to 35-40.
> > > (I've set MaxClients to 40 in apache, and looking at the status page,
> > > I see it basically using every thread, so that may explain that load
> > > level).
> >
> > > Going back on the connections, it looks like we've got about 1.2k
> > > connections in various states, so nowhere near any of these limits.
> >
> > > Any other thoughts?
> >
> > > Thanks!
> >
> > > On Sep 18, 3:30 pm, nsheth <nsh...@gmail.com> wrote:
> > > > We weren't experiencing any abnormal connection levels.
> >
> > > > I did upgrade to the latest client and server version 1.4.1. So far
> > > > so good . . .
> >
> > > > On Sep 15, 10:36 pm, nsheth <nsh...@gmail.com> wrote:
> >
> > > > > The machine isn't swapping, actually. I'll try to "catch" it
> > > > > happening next time and see if I can get more information about the
> > > > > connections used . . . and also look into upgrading to 1.4.1,
> > > > > hopefully that helps.
> >
> > > > > On Sep 15, 6:19 pm, Vladimir <vli...@veus.hr> wrote:
> >
> > > > > > I do question whether those would actually cause load to spike up.
> > > > > > Perhaps connection refused but I suspect those two ie. load spike and
> > > > > > connection refused are linked. Please correct if I am wrong. I just
> > > > > > checked my tcp_time_wait metrics and they peak around 600 even during
> > > > > > these load spikes.
> >
> > > > > > Eric Day wrote:
> > > > > > > If you discover this is a TIME_WAIT issue (too many TCP sockets
> > > > > > > waiting around in kernel), you can tweak this in the kernel:
> >
> > > > > > > # cat /proc/sys/net/ipv4/tcp_fin_timeout
> > > > > > > 60
> >
> > > > > > > # cat /proc/sys/net/ipv4/ip_local_port_range
> > > > > > > 32768 61000
> >
> > > > > > > 61000-32768= 28232
> >
> > > > > > > (these are the defaults on Debian Linux).
> >
> > > > > > > So you only have a pool of 28232 sockets to work with, and each will
> > > > > > > linger around for 60 seconds in a TIME_WAIT state even after being
> > > > > > > close()d on both ends. You can increase your port range and lower
> > > > > > > your TIME_WAIT value to buy you a larger window. Something to keep
> > > > > > > in mind though for any clients/servers that have a high connect rate.
> >
> > > > > > > -Eric
> >
> > > > > > > On Tue, Sep 15, 2009 at 08:48:39PM -0400, Vladimir wrote:
> >
> > > > > > >> Too many connections in CLOSE_WAIT state ?
> >
> > > > > > >> Anyways I would highly recommend installing something like Ganglia to get
> > > > > > >> some types of metrics.
> >
> > > > > > >> Also at 35-50 machine is not doing much other than swapping.
> >
> > > > > > >> Stephen Johnston wrote:
> >
> > > > > > >> This is a total long shot, but we spent alot of time figuring out a
> > > > > > >> similar issue that ended up being ephemeral port exhaustion.
> >
> > > > > > >> Stephen Johnston
> >
> > > > > > >> On Tue, Sep 15, 2009 at 8:27 PM, Vladimir <vli...@veus.hr> wrote:
> >
> > > > > > >> nsheth wrote:
> >
> > > > > > >> About once a day, usually during peak traffic times, I hit some
> > > > > > >> major
> > > > > > >> load issues. I'm running memached on the same boxes as my
> > > > > > >> webservers. Load usually spikes to 35-50, and I see the apache
> > > > > > >> error
> > > > > > >> log flooded with messages like the following:
> >
> > > > > > >> [Sun Sep 13 14:54:34 2009] [error] [client 10.0.0.2] PHP Warning:
> > > > > > >> memcache_pconnect() [<a href='function.memcache-pconnect'>function.
> > > > > > >> memcache-pconnect</a>]: Can't connect to 10.0.0.5:11211, Unknown
> > > > > > >> error
> > > > > > >> (0) in /var/www/html/memcache.php on line 174, referer: xxxx
> >
> > > > > > >> Any thoughts? Restart apache, and everything clears up.
> >
> > > > > > >> It's PHP. I have seen something but in last couple weeks it has
> > > > > > >> "cleared" itself. It could be coincidental with using memcached 1.4.1,
> > > > > > >> code changes etc. I actually have some Ganglia snapshots of the
> > > > > > >> Reason why load goes to 35-50 is that Apache starts consuming greater
> > > > > > >> and greater amounts of memory indicating a PHP memory leak. Granted it
> > > > > > >> could also have something to do with session garbage collection.
> >
> > > > > > >> I'm running memcached 1.2.5 currently (which looks to be a bit out
> > > > > > >> of
> > > > > > >> date at this point, so perhaps an update is in order).
> >
> > > > > > >> I think that would be a wise choice.
> > > > > > >> Vladimir
>
I'd be interested in what the memory utilization is at the time.
Vladimir