Re: [gevent] gevent and high soft interrupts rate ?

206 views
Skip to first unread message

Rayene Ben Rayana

unread,
Jul 17, 2012, 4:37:48 AM7/17/12
to gev...@googlegroups.com
Hi,

I'am still having the problem. I've tried to monitor the soft interrupts in /proc/interrupts. When the number of "local timer interrupts" exceeds 250 per second, and one of the cpu cores is completely busy treating these interrupts (100%), I start having socket errors (Errno 133, no route to host and Errno 110 operation timed out).
The system freeze is caused by the fact that when the crawlers encounter an error, they retry after some time which results in more interrupts (snowball effect).

Is anyone having similar issues ?

Cheers,


On Sat, Jul 14, 2012 at 10:45 AM, Rayene Ben Rayana <rayene.b...@gmail.com> wrote:
Hi,

I've developed a script that uses gevent and urllib3 to stress a web server.

I was expecting that the bottleneck of this script to be the user-cpu or the system load but I was surprised with a completely different behaviour.

When the number of parallel requests increases, I've noticed that the soft interrupts rate increases dramatically. At the same time, a process called ksoftirqd becomes very active (I'am on linux/debian).

I've used mpstat to monitor cpus and remarked that when the soft interrupts rate reaches 100% of one of the cpu cores, the machine freezes for about 20 to 60 seconds. This behaviour is very annoying for my tests.

So my questions are :
1. Do you think gevent (0.13) or libvent can cause these soft interrupts ?
2. If yes, do you think things can be better with gevent 1.0/libev ?
3. What function calls should I avoid to use if I want to improve things ?

Thanks,

Ian Epperson

unread,
Jul 17, 2012, 12:22:21 PM7/17/12
to gev...@googlegroups.com
Hi Rayene,

I'm not familiar with your problem, but I suggest to try again using the latest version of gevent.  Quite a lot has changed and even if you did come across a bug in the old version it would not be fixed.  The 1.0 branch is pretty darned stable and probably should be used over 0.13 anyway (there has been some debate on this list about when to make 1.0 "official").

Good luck!

Ian E.


On Tue, Jul 17, 2012 at 1:37 AM, Rayene Ben Rayana <rayene.b...@gmail.com> wrote:
Hi,

I'am still having the problem. I've tried to monitor the soft interrupts in /proc/interrupts. When the number of "local timer interrupts" exceeds 250 per second, and one of the cpu cores is completely busy treating these interrupts (100%), I start having socket errors (Errno 133, no route to host and Errno 110 operation timed out).
The system freeze is caused by the fact that when the crawlers encounter an error, they retry after some time which results in more interrupts (snowball effect).

Is anyone having similar issues ?

Cheers,


On Sat, Jul 14, 2012 at 10:45 AM, Rayene Ben Rayana <rayene.b...@gmail.com> wrote:
Hi,

I've developed a script that uses gevent and urllib3 to stress a web server.

I was expecting that the bottleneck of this script to be the user-cpu or the system load but I was surprised with a completely different behaviour.

When the number of parallel requests increases, I've noticed that the soft interrupts rate increases dramatically. At the same time, a process called ksoftirqd becomes very active (I'am on linux/debian).

I've used mpstat to monitor cpus and remarked that when the soft interrupts rate reaches 100% of one of the cpu cores, the machine freezes for about 20 to 60 seconds. This behaviour is very annoying for my tests.

So my questions are :
1. Do you think gevent (0.13) or libvent can cause these soft interrupts ?
2. If yes, do you think things can be better with gevent 1.0/libev ?
3. What function calls should I avoid to use if I want to improve things ?

Thanks,




--
This email is intended for the use of the individual addressee(s) named above and may contain information that is confidential, privileged or unsuitable for overly sensitive persons with low self-esteem, no sense of humor or irrational religious beliefs. If you are not the intended recipient, any dissemination, distribution or copying of this email is not authorized (either explicitly or implicitly) and constitutes an irritating social faux pas. Unless the word absquatulation has been used in its correct context somewhere other than in this warning, it does not have any legal or grammatical use and may be ignored. No animals were harmed in the transmission of this email, although the yorkshire terrier next door is living on borrowed time, let me tell you. Those of you with an overwhelming fear of the unknown will be gratified to learn that there is no hidden message revealed by reading this warning backwards, so just ignore that Alert Notice from Microsoft: However, by pouring a complete circle of salt around yourself and your computer you can ensure that no harm befalls you and your pets. If you have received this email in error, please add some nutmeg and egg whites and place it in a warm oven for 40 minutes. Whisk briefly and let it stand for 2 hours before icing.

Rayene Ben Rayana

unread,
Jul 18, 2012, 3:40:35 AM7/18/12
to gev...@googlegroups.com
Thank you Ian, I'll give it a try ! 

Rayene,

Rayene Ben Rayana

unread,
Jul 19, 2012, 10:20:05 AM7/19/12
to gev...@googlegroups.com
FYI, I just tried with the latest gevent version and I've got the same issues. One of the CPU cores is completely consumed by soft interrupts.

root@m1:~# mpstat -P ALL 1
06:05:40 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:05:41 PM  all   20.90    0.00   12.44    0.00    0.00   49.75    0.00    0.00   16.92
06:05:41 PM    0    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00
06:05:41 PM    1   42.00    0.00   25.00    0.00    0.00    0.00    0.00    0.00   33.00

I will try with a newer/better hardware.

Rayene,

Håkan Rosenhorn

unread,
Jul 19, 2012, 11:07:13 AM7/19/12
to gev...@googlegroups.com
Hi,

I once had a similar problem where the amount of context switches/interrupts went up. After some investigation i found some faulty code that tried to recv on a socket when there was no more data. Might be unrelated but it crossed my mind.

Code was doing something like:
while True:
  data = sock.recv()
  buffer += data
  if len(buffer) >= expected:
    break

And regarding your 100% cpu usage, i assume you are using threads or fork() to utilize the other cores?

2012/7/19 Rayene Ben Rayana <rayene.b...@gmail.com>

Rayene Ben Rayana

unread,
Aug 22, 2012, 3:42:30 AM8/22/12
to gev...@googlegroups.com
Hi Hakan, 

Just for your information, the problem was coming from the fact that I was using too many virtual network interfaces on linux (macvlan).
When I replaced the macvlan interfaces by simple IP aliases, I reached 40 000 simultaneous clients while I barely reached 1000 with macvlan.

Problem solved ! I just lost the ability to have a dedicated mac address for each client.

Cheers,

On Thu, Jul 19, 2012 at 5:07 PM, Håkan Rosenhorn <hakan.r...@esportnetwork.com> wrote:
Hi,

I once had a similar problem where the amount of context switches/interrupts went up. After some investigation i found some faulty code that tried to recv on a socket when there was no more data. Might be unrelated but it crossed my mind.


Yes, the voluntary context switches rate for this process is quite high (around 100/s). Information from /proc/<pid>/status

Code was doing something like:
while True:
  data = sock.recv()
  buffer += data
  if len(buffer) >= expected:
    break

I don't use sockets directly. I use urllib3 which is based on the standard httplib. However, your example is interesting to help me understand. 
Maybe these libraries use the socket API too much which results in a heavy number of gevent context switches.

And regarding your 100% cpu usage, i assume you are using threads or fork() to utilize the other cores?


I launch manually one process per CPU core.
Reply all
Reply to author
Forward
0 new messages