Performance regression upgrading from gevent 0.13.8 to 1.0.2

132 views
Skip to first unread message

Steve Niemitz

unread,
Jul 7, 2015, 5:05:38 PM7/7/15
to gev...@googlegroups.com
I'm in the process of upgrading a fairly large python web app (using gevent's wsgi server) to gevent 1.0.2.  In doing so, I noticed a fairly large performance regression around our response times.

Below is a comparison of the mean, 50, 75, 90, and 99% response times based on a load test simulating 3 concurrent requests.  I'd like to emphasize that the code and environment (ubuntu 12.04, x64) are all identical between runs, the only thing that changes was the version of gevent installed.

0.13.8:
Done 998 requests (0 failures) in 34.99 seconds [28.52 req/sec]
Request percentiles:
 mean: 101.21
  50%: 90.28
  75%: 142.04
  90%: 200.37
  99%: 333.82

1.0.2:
Done 998 requests (0 failures) in 44.58 seconds [22.38 req/sec]
Request percentiles:
 mean: 133.20
  50%: 110.04
  75%: 179.77
  90%: 245.06
  99%: 766.39


Digging in more, I noticed that the performance of gevent.sleep(0) has changed drastically between 0.13.8 and 1.0.2.  I wrote a very small test app, running in a bare python interpreter with only gevent imported:

import gevent
import time

def test():
  start = time.time()
  for n in xrange(100000):
    gevent.sleep(0)
  return time.time() - start

print [test() for _ in xrange(10)]

In gevent 0.13.8, I get a nice, consistent run time:
[0.9938888549804688, 0.9902729988098145, 0.9916880130767822, 0.9745519161224365, 0.9556670188903809, 0.9612908363342285, 0.959618091583252, 0.9578731060028076, 0.9588639736175537, 0.9585850238800049]

In gevent 1.0.2, the times are all across the chart:
[0.5551230907440186, 0.5567431449890137, 0.5528249740600586, 0.5553970336914062, 0.5544919967651367, 0.5473408699035645, 0.5450968742370605, 3.153994083404541, 4.591259002685547, 3.9078118801116943]

[4.665871858596802, 3.5736589431762695, 3.468141794204712, 4.282025098800659, 4.229030132293701, 3.214601993560791, 0.5512881278991699, 0.5353419780731201, 0.5437362194061279, 0.5354230403900146]

...etc

If this holds true in our application, that could explain the large performance difference at the 99%, we use gevent.sleep(0) frequently throughout our code.  Does anyone have any thoughts on what might be the issue here?  I'm happy to provide more info/debugging if needed.

Jason Madden

unread,
Jul 8, 2015, 7:40:51 PM7/8/15
to gev...@googlegroups.com
The generally recognized intent of `sleep(0)` is to yield time to other greenlets, but not actually sleep any longer than required. In 0.13.8, this was implemented as a 0-second timer. However, this had some problems. First, it required a complete trip around the event loop (and depending on how many callbacks were waiting, that could take an arbitrary amount of time). Secondarily, because of the way callbacks (which deal with most greenlet switching) were implemented, callbacks and sleep were mutually exclusive, meaning they could starve each other. Finally, on Windows, a small timer led libev to put the process to sleep for some minimum amount of time (I'm getting much of this information from issue #202.)

This was changed a few times before 1.0, with a changelog entry for 1.0rc2: `sleep(0)` is now implemented as a callback itself, thus serving as simply a scheduling primitive. It switches the current greenlet to the end of the callback run list; if the event loop was currently running callbacks, and there are no other callbacks to run, the current greenlet can run again immediately. On the other hand, if there were many callbacks ahead of it, sometimes it can mean a full trip around the loop. This has the effect of introducing some variability to `sleep(0)`, but overall making it more fair: I/O and other greenlets are less likely to be starved.

FWIW, with no other greenlets doing anything, I wouldn't expect there to be the extreme kind of variability shown in your example. Running that example locally, so far I haven't seen that (on gevent 1.1a2); all of the times are around 0.6 give or take 0.1-0.2.

Hope this helps,
Jason

Steve Niemitz

unread,
Jul 8, 2015, 11:02:14 PM7/8/15
to gev...@googlegroups.com
Interesting, I can reproduce your (much better) results on my mac running gevent 1.0.2.  IIRC its running python 2.7.6 instead of 2.7.3, so that's a small difference, same version of greenlet though (0.4.7).  Do you think having libev-dev installed prior to installing gevent (via pip) would make a difference?  It seems like it shouldn't?

Thanks for the reply!  I'll follow up with more info tomorrow.

Jason Madden

unread,
Jul 9, 2015, 8:56:00 AM7/9/15
to gev...@googlegroups.com

> On Jul 8, 2015, at 22:02, Steve Niemitz <snie...@twitter.com> wrote:
>
> Interesting, I can reproduce your (much better) results on my mac running gevent 1.0.2. IIRC its running python 2.7.6 instead of 2.7.3, so that's a small difference, same version of greenlet though (0.4.7). Do you think having libev-dev installed prior to installing gevent (via pip) would make a difference? It seems like it shouldn't?

I don't see why it would. gevent ships with its own copy of libev that it should compile and use unless you go out of your way to set environment variables otherwise.

Steve Niemitz

unread,
Jul 9, 2015, 10:34:22 AM7/9/15
to gev...@googlegroups.com
I was able to repro this with a very simple dockerfile.  I also get similar results in my ubuntu 12 VM on either python 2.7.3 or 2.7.6.

vagrant@precise64:~$ docker build -t gevent-repro:latest https://github.com/steveniemitz/gevent-repro.git
<... snip ...>

vagrant@precise64:~$ docker run --rm gevent-repro:latest
[2.0547568798065186, 1.9169600009918213, 2.519369125366211, 1.7542948722839355, 0.9130430221557617, 1.1009140014648438, 1.0500757694244385, 0.9520280361175537, 1.048987865447998, 1.242807149887085]
vagrant@precise64:~$ docker run --rm gevent-repro:latest
[0.8453240394592285, 2.0958290100097656, 0.8345727920532227, 2.4779369831085205, 2.6614830493927, 2.288416862487793, 2.3216190338134766, 2.1939380168914795, 0.6827738285064697, 0.6621580123901367]

I'll keep investigating.

Steve Niemitz

unread,
Jul 9, 2015, 3:24:47 PM7/9/15
to gev...@googlegroups.com
So the biggest difference I've seen so far between ubuntu and osx is the flags that libev is built with, although I'm not sure the significance of many of them.
pprint({ c: getattr(core, c) for c in dir(core) })

On OSX:
{'ASYNC': 524288,
 'BACKEND_EPOLL': 4,
 'BACKEND_KQUEUE': 8,
 'BACKEND_POLL': 2,
 'BACKEND_PORT': 32,
 'BACKEND_SELECT': 1,
 'CHECK': 32768,
 'CHILD': 2048,
 'CLEANUP': 262144,
 'CUSTOM': 16777216,
 'EMBED': 65536,
 'ERROR': -2147483648,
 'EVENTS': gevent.core.EVENTS,
 'EV_USE_4HEAP': 0,
 'EV_USE_CLOCK_SYSCALL': 0,
 'EV_USE_EVENTFD': 0,
 'EV_USE_FLOOR': 1,
 'EV_USE_INOTIFY': 0,
 'EV_USE_MONOTONIC': 0,
 'EV_USE_NANOSLEEP': 64,
 'EV_USE_REALTIME': 0,
 'EV_USE_SIGNALFD': 0,
 'FORK': 131072,
 'FORKCHECK': 33554432,
 'IDLE': 8192,
 'LIBEV_EMBED': True,
 'MAXPRI': 2,
 'MINPRI': -2,
 'NOINOTIFY': 1048576,
 'NONE': 0,
 'NOSIGMASK': 4194304,
 'PERIODIC': 512,
 'PREPARE': 16384,
 'READ': 1,
 'READWRITE': 3,
 'SIGNAL': 1024,
 'SIGNALFD': 2097152,
 'STAT': 4096,
 'TIMER': 256,
 'UNDEF': -1,
 'WRITE': 2,

And ubuntu 12:
{'ASYNC': 524288,
 'BACKEND_EPOLL': 4,
 'BACKEND_KQUEUE': 8,
 'BACKEND_POLL': 2,
 'BACKEND_PORT': 32,
 'BACKEND_SELECT': 1,
 'CHECK': 32768,
 'CHILD': 2048,
 'CLEANUP': 262144,
 'CUSTOM': 16777216,
 'EMBED': 65536,
 'ERROR': -2147483648,
 'EVENTS': gevent.core.EVENTS,
 'EV_USE_4HEAP': 2,
 'EV_USE_CLOCK_SYSCALL': 1,
 'EV_USE_EVENTFD': 64,
 'EV_USE_FLOOR': 1,
 'EV_USE_INOTIFY': 64,
 'EV_USE_MONOTONIC': 1,
 'EV_USE_NANOSLEEP': 64,
 'EV_USE_REALTIME': 0,
 'EV_USE_SIGNALFD': 64,
 'FORK': 131072,
 'FORKCHECK': 33554432,
 'IDLE': 8192,
 'LIBEV_EMBED': True,
 'MAXPRI': 2,
 'MINPRI': -2,
 'NOINOTIFY': 1048576,
 'NONE': 0,
 'NOSIGMASK': 4194304,
 'PERIODIC': 512,
 'PREPARE': 16384,
 'READ': 1,
 'READWRITE': 3,
 'SIGNAL': 1024,
 'SIGNALFD': 2097152,
 'STAT': 4096,
 'TIMER': 256,
 'UNDEF': -1,
 'WRITE': 2,

Are any of those flags of any significance?  I noticed many of the EV_USE_XXX are different between the two.
I've also reproduced this in an ubuntu 14.04 VM now as well.

Jason Madden

unread,
Jul 9, 2015, 4:13:12 PM7/9/15
to gev...@googlegroups.com

> On Jul 9, 2015, at 14:24, Steve Niemitz <snie...@twitter.com> wrote:
>
> Are any of those flags of any significance?

It seems possible :)

> I noticed many of the EV_USE_XXX are different between the two

I'm not really a libev expert, I'm just learning my way around it (a maze of twisty little ifdefs and shortcut macros that all look alike :). But two flags jump out at me for this situation. EV_USE_MONOTONIC and EV_USE_CLOCK_SYSCALL control the way libev updates its timers on every iteration through the loop. If those flags aren't defined (as on OS X), a simpler algorithm is used, presumably in acknowledgment of lesser resolution. If they are defined (as on Linux), a more complicated algorithm is used that includes multiple retries and extra system calls. I can't say for sure if it's the source of the jitter you're observing, but it might be worth it to compile with those flags turned off and see what happens (I think something like the following should work 'CFLAGS="-DEV_USE_MONOTONIC=0 -DEV_USE_CLOCK_SYSCALL=0" python ./setup.py install').

Steve Niemitz

unread,
Jul 9, 2015, 4:27:57 PM7/9/15
to gev...@googlegroups.com
Awesome, trying now :)

Steve Niemitz

unread,
Jul 9, 2015, 4:43:52 PM7/9/15
to gev...@googlegroups.com
No go on that one :(
I ended up with this config:

 'EV_USE_4HEAP': 0,
 'EV_USE_CLOCK_SYSCALL': 0,
 'EV_USE_EVENTFD': 0,
 'EV_USE_FLOOR': 1,
 'EV_USE_INOTIFY': 0,
 'EV_USE_MONOTONIC': 0,
 'EV_USE_NANOSLEEP': 64,
 'EV_USE_REALTIME': 0,
 'EV_USE_SIGNALFD': 0,

And still similar results.

Jason Madden

unread,
Jul 9, 2015, 5:36:22 PM7/9/15
to gev...@googlegroups.com

> On Jul 9, 2015, at 15:43, Steve Niemitz <snie...@twitter.com> wrote:
>
> No go on that one :(
> I ended up with this config:
>
> 'EV_USE_4HEAP': 0,
> 'EV_USE_CLOCK_SYSCALL': 0,
> 'EV_USE_EVENTFD': 0,
> 'EV_USE_FLOOR': 1,
> 'EV_USE_INOTIFY': 0,
> 'EV_USE_MONOTONIC': 0,
> 'EV_USE_NANOSLEEP': 64,
> 'EV_USE_REALTIME': 0,
> 'EV_USE_SIGNALFD': 0,
>
> And still similar results.
>

Well, it was just a theory. Too bad it didn't pan out.

I checked on one of our own Linux boxes. Interestingly, I'm *unable* to produce this (that is, I get the same low variability as on OS X) on CentOS 6.6 (November 2014 release) using either the stock Python 2.6 or a custom-built Python 2.7.10, both 64-bit. The kernel there is 2.6.32 with glibc 2.12. The EV flags are the same as in the first example (CLOCK and MONOTONIC both 1).

It seems there's some environmental factor at play, but I don't know what it is. The EV constants look the same to me.

{'ASYNC': 524288,
'BACKEND_EPOLL': 4,
'BACKEND_KQUEUE': 8,
'BACKEND_POLL': 2,
'BACKEND_PORT': 32,
'BACKEND_SELECT': 1,
'CHECK': 32768,
'CHILD': 2048,
'CLEANUP': 262144,
'CUSTOM': 16777216,
'EMBED': 65536,
'ERROR': -2147483648,
'EVENTS': gevent.core.EVENTS,
'EV_USE_4HEAP': 2,
'EV_USE_CLOCK_SYSCALL': 1,
'EV_USE_EVENTFD': 64,
'EV_USE_FLOOR': 1,
'EV_USE_INOTIFY': 64,
'EV_USE_MONOTONIC': 1,
'EV_USE_NANOSLEEP': 64,
'EV_USE_REALTIME': 0,

Steve Niemitz

unread,
Jul 9, 2015, 5:52:25 PM7/9/15
to gev...@googlegroups.com
Thanks again for investigating this with me so far.  Just to add some more data, my friend ran some tests on a few of our VMs, this is what we got:

boot2docker centos:
   1.0.2 0.832945299149
  0.13.8 0.579929566383
   1.0.2 0.865805792809
  0.13.8 0.586617588997
   1.0.2 0.909332203865
  0.13.8 0.600034832954
   1.0.2 0.872926926613
  0.13.8 0.641606879234
   1.0.2 0.850731801987
  0.13.8 0.611116147041


boot2docker 15.04:
   1.0.2 0.666646838188
  0.13.8 0.521508479118
   1.0.2 0.682622003555
  0.13.8 0.51507191658
   1.0.2 0.702557969093
  0.13.8 0.519368267059
   1.0.2 0.483837246895
  0.13.8 0.513943505287
   1.0.2 0.481173658371
  0.13.8 0.513987255096


os x:
   1.0.2 0.335381054878
  0.13.8 0.328310465813
   1.0.2 0.344401860237
  0.13.8 0.338456845284
   1.0.2 0.334801054001
  0.13.8 0.327352237701
   1.0.2 0.338637924194
  0.13.8 0.332820987701
   1.0.2 0.336233186722
  0.13.8 0.341444063187


Amazon Linux AMI release 2014.09:
   1.0.2 0.804312229156
  0.13.8 1.05503075123
   1.0.2 0.852017474174
  0.13.8 1.05490190983
   1.0.2 0.788119864464
  0.13.8 1.18320939541
   1.0.2 0.791809844971
  0.13.8 1.08416850567
   1.0.2 0.888424324989
  0.13.8 1.04688000679


And now the weird one, some other random 12.04 instance we have:
  1.0.2 0.587259078026
  0.13.8 0.774785208702
   1.0.2 0.561364197731
  0.13.8 0.746203279495
   1.0.2 0.560575842857
  0.13.8 0.775955438614
   1.0.2 0.594487857819
  0.13.8 0.779750490189
   1.0.2 0.57145447731
  0.13.8 0.763056874275


It seems like the kernel version doesn't matter (which makes sense, its not making any kernel calls in the inner loop).  Our guess now is some library dependency of libev.
Reply all
Reply to author
Forward
0 new messages