cherrypy & concurrency

2,066 views
Skip to first unread message

Mike Nolet

unread,
Feb 26, 2014, 1:06:24 PM2/26/14
to cherryp...@googlegroups.com
First -> I'm new to cherrypy, have only been playing with it for a week or so and I am hugely impressed.  Fantastic.

I'm working on building a REST API on cherrypy and have run into some interesting unexpected behavior.  I searched around and didn't find anything on this so thought I'd come here. Hopefully I spent enough time RTFM...

The behavior I've noticed is cherrypy seems to hit a pretty serious performance wall when it comes to even reasonable levels of concurrency.  

My test function is pretty simple and first does the passed # of calls sequentially, and then in parallel by spawning off one thread per call.   (This is running in a CentOS VM on a Macbook Pro).

>>> for t in range(1,11):

...    test(t*10)

... 

count=10: blocking: 142ms, parallel: 37ms

count=20: blocking: 298ms, parallel: 62ms

count=30: blocking: 448ms, parallel: 84ms

count=40: blocking: 581ms, parallel: 103ms

count=50: blocking: 717ms, parallel: 9254ms

count=60: blocking: 863ms, parallel: 9451ms

count=70: blocking: 1014ms, parallel: 9583ms

count=80: blocking: 1142ms, parallel: 9726ms

count=90: blocking: 1314ms, parallel: 9869ms

count=100: blocking: 9034ms, parallel: 10023ms

>>> 




At first I thought my code must be doing something stupid... but even when I strip this down to the most basic object I get the same results...  in the above I'm calling '/dummy/1'... cherrypy code as follows:

cherrypy.config.update({'server.socket_host': my_ip_address,

                        'server.socket_port': my_port,

                        'server.thread_pool': 100,

                        'server.socket_queue_size': 50,

                       })


class DummyAPIServer2():

   exposed = True

   def GET(self,id=None):

      time.sleep(0.01)

      return 'Done sleeping!'



if __name__ == '__main__':

   cherrypy.tree.mount(

      DummyAPIServer2(), '/dummy',

      {'/':

         {'request.dispatch': cherrypy.dispatch.MethodDispatcher()}

      }

   )


While test is running CPU on the server is low...  (7308 PID is the client, 35347 the server)


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                 

[:~]$ top -d 0.5 | egrep '(python)'

 7308 mnolet    20   0 3192m  46m 4708 S  2.0  1.2   0:21.47 python3.3                                               

35347 mnolet    20   0 3289m  45m 5656 S  7.9  1.2   0:07.48 python3.3                                               

 7308 mnolet    20   0 3002m  39m 4708 S  4.0  1.0   0:21.49 python3.3                                               

35347 mnolet    20   0 3289m  45m 5656 S 13.9  1.2   0:07.55 python3.3                                               

 7308 mnolet    20   0 2652m  31m 4708 S  6.0  0.8   0:21.52 python3.3                                               

35347 mnolet    20   0 3289m  45m 5656 S 13.9  1.2   0:07.62 python3.3                                               

 7308 mnolet    20   0 2301m  30m 4708 S  5.9  0.8   0:21.55 python3.3                                               

 7308 mnolet    20   0 2231m  30m 4708 S  2.0  0.8   0:21.56 python3.3                                               

35347 mnolet    20   0 3289m  45m 5656 S  2.0  1.2   0:07.63 python3.3                                               

 7308 mnolet    20   0 2231m  30m 4708 S  2.0  0.8   0:21.57 python3.3                                               

35347 mnolet    20   0 3289m  45m 5656 S  2.0  1.2   0:07.64 python3.3                                               

 7308 mnolet    20   0 2231m  30m 4708 S  4.0  0.8   0:21.59 python3.3                                               

35347 mnolet    20   0 3289m  45m 5656 S  2.0  1.2   0:07.65 python3.3                                               

 7308 mnolet    20   0 2231m  30m 4708 S  4.0  0.8   0:21.61 python3.3                                               

 7308 mnolet    20   0 2231m  30m 4708 S  4.0  0.8   0:21.63 python3.3                                               

35347 mnolet    20   0 3289m  45m 5656 S  2.0  1.2   0:07.66 python3.3                                               

 7308 mnolet    20   0 2231m  30m 4708 S  4.0  0.8   0:21.65 python3.3                                               



I've read plenty of the posts that say "put Apache in front of it" or "put nginx in front of it"... but feels like something is hanging inside cherrypy here no? Especially since this is consistently reproducible by just firing the a decent # of concurrent requests.


Curious to hear thoughts!


-Mike




PS: My URL classes for reference... please be gentle on my python, still learning =)


class URLRequest(object):

   def __init__(self,url=None,type='GET',params=None,data=None,headers=None,timeout=2.5):

      super().__init__()

      self.type=type

      self.type=type

      self.url = url

      self.params = params

      self.data = data

      self.headers = headers


   def request(self):

      if self.type == 'GET':

         return requests.get(self.url,params=self.params,data=self.data,headers=self.headers)

      elif self.type == 'PUT':

         return requests.put(self.url,params=self.params,data=self.data,headers=self.headers)

      elif self.type == 'POST':

         return requests.post(self.url,params=self.params,data=self.data,headers=self.headers)

      elif self.type == 'DELETE':

         return requests.delete(self.url,params=self.params,data=self.data,headers=self.headers)


class bgURLRequest(URLRequest,Thread):

   callback_obj = None

   callback_func =  None

   callback_message = None


   def __init__(self, **kwargs):

      super().__init__(**kwargs)


   def set_callback(self,obj,func,message=None):

      self.callback_obj = obj

      self.callback_func = func

      self.callback_message = message


   def run(self):

      try:

         self.response = self.request()

      except Exception as e:  # TODO -> Better error handling!!

         print ("OOPS!")

         self.response =  e


      f = getattr(self.callback_obj,self.callback_func)

      if self.callback_message != None:

         f(self.response,message=self.callback_message)

      else:

         f(self.response)




And test function:


def test(count=10):   

   r1=[]

   start_time = time.time()

   for i in range(count):

      u = multiurl.URLRequest(url='http://localhost:8080/dummy/1')

      d = Dummy(u.request())

      r1.append(Dummy(d))

   time_blocking = (time.time() - start_time)*1000

   r2=[]

   start_time=time.time()

   for i in range(count):

      u = multiurl.bgURLRequest(url='http://localhost:8080/dummy/1')

      d = Dummy()

      u.set_callback(d,'callback')

      u.start()

      r2.append(d)

   while len([i for i in r2 if i.waiting_on_response]) > 0:

      time.sleep(0.005)

   time_parallel = (time.time() - start_time)*1000

   print('count=%s: blocking: %2dms, parallel: %2dms' % (count,time_blocking,time_parallel))






David Bolen

unread,
Feb 26, 2014, 4:29:42 PM2/26/14
to cherryp...@googlegroups.com
Mike Nolet <mno...@gmail.com> writes:

> The behavior I've noticed is cherrypy seems to hit a pretty serious
> performance wall when it comes to even reasonable levels of concurrency.

By default, CherryPy's internal web server (see wsgiserver2.py) is
going to create a thread pool with 10 worker threads, and it won't (I
don't believe) grow beyond that automatically. It also has a default
listen queue of 5, so essentially can backlog up to 50 requests before
they start being rejected. I suspect that explains most of the
results.

Given your request handler is guaranteed to take 10ms + "request
overhead" to run - that seems to match the ~14ms/req you see pretty
consistently in your blocking tests up to 90 - though I'm not sure I
understand why you got the jump at 100, but perhaps some other issue
was going on.

For parallel requests, with perfect parallelism (which won't happen
due the GIL) you'd be looking at ~14ms/10req. Given they have to be
processed at most 10 at a time due to the thread pool, from 10-40
requests you are landing in a range of 25-37ms per 10 requests (the
highest being at 10 itself which would also have the fewest requests
to amortize fixed overhead across) which seems fairly reasonable.

Now, at 50 you've got a huge leap upwards, but only for parallel
testing. But that's also right around the point where you'd expect to
start dropping requests due to the listen queue filling. So it sort
of feels like a retry mechanism may be involved somewhere. However,
you don't mention errors on the client side, nor am I positive what is
actually issuing the requests (the requests library?), so maybe
there's some internal retry logic built in that is hiding the initial
rejected connections?

So as written, you've got a server that can handle about 400 req/s (at
25ms/10req), but at higher loads than that it'll start rejecting new
connections.

Of course, you can also bump up the thread pool or listen queue size
(either initially or dynamically - there's a grow method you can use
though I don't think CherryPy ever does it itself), but there are
diminishing returns - unless it's to process requests that depend on
external resources (so don't run afoul of the GIL for pure Python
code), it's not necessarily going to help throughput all that much.

So this is where the recommendations to offload the static processing
to a front end server start to come in. For example, if you've got
nginx handling all the plain images, css, javascript, etc.. (which
it's extremely good at doing) and only letting through the dynamic
requests to CherryPy, you leverage the higher overhead of CherryPy's
processing (and threads) for what really needs it. And 400 dynamic
requests/s is a pretty decent load.

-- David

Eric Larson

unread,
Feb 27, 2014, 2:50:43 AM2/27/14
to cherryp...@googlegroups.com

David Bolen writes:

> Mike Nolet <mno...@gmail.com> writes:
>
>> The behavior I've noticed is cherrypy seems to hit a pretty serious
>> performance wall when it comes to even reasonable levels of concurrency.
>
> By default, CherryPy's internal web server (see wsgiserver2.py) is
> going to create a thread pool with 10 worker threads, and it won't (I
> don't believe) grow beyond that automatically. It also has a default
> listen queue of 5, so essentially can backlog up to 50 requests before
> they start being rejected. I suspect that explains most of the
> results.
>

In the original post it looks like the threadpool was set to 100 and the
listen queue was set to 50.

>> cherrypy.config.update({'server.socket_host': my_ip_address,
>> 'server.socket_port': my_port,
>> 'server.thread_pool': 100,
>> 'server.socket_queue_size': 50,
>> })

With that said, I've done similar tests and found that bumping the
threadpool stops being productive after a while. Unfortunately, I don't
have any specific tests or examples to prove this.

With that said, in our production apps that are served behind a load
balancer, we typically will have at least one process per core and 30
threads per process. This number *was* found at one point to be optimal
for our uses, but again, I don't have the tests or scripts to prove
this.

It does make some sense that a higher threadpool doesn't automatically
amount to better performance. Each thread is still utilizing the same
CPU. At some point adding more threads really is just adding more
context switches. Essentially this means more time is spent switching
between threads than doing the actual work.

I'd try to experiment with some sort of load balancer, start a couple
cherrypy processes with lower thread counts and see if that helps. If
you still are having a problem, you can also try something like uWSGI
that can be configured with more than one process as well as threads. I
believe you'd lose the benefit of the cherrypy bus (cherrypy.engine) but
that might be reasonable for your use case.

All that said, there are tons of projects that can help manage processes
and load balancing. CherryPy is plenty fast and the lack of complexity
regarding multiple processes along side a really solid process handling
model (via the cherrypy.engine) makes the developing with cherrypy over
time a huge win.

I say this as we have apps that are getting close to 10 years old that
haven't had to go through any major upgrades due to cherrypy. Upgrading
python versions, databases, template systems, etc. have all been
reasonably easy thanks to CherryPy's unrelenting focus on allowing you
to write plain Python code. In all that time, CherryPy has never once
been the bottleneck for any performance problems.

Hope this helps and best of luck!

Eric

David Bolen

unread,
Feb 27, 2014, 2:12:22 PM2/27/14
to cherryp...@googlegroups.com
Eric Larson <er...@ionrock.org> writes:

> In the original post it looks like the threadpool was set to 100 and the
> listen queue was set to 50.

Crud, how did I miss that? My apologies to the OP.

It does seem odd that the stats appear to be suggestive of the default
values - I wonder if somehow that thread configuration isn't actually
taking effect.

Or maybe I was just finding an expected pattern in the data.

> With that said, I've done similar tests and found that bumping the
> threadpool stops being productive after a while. Unfortunately, I don't
> have any specific tests or examples to prove this.

This is similar for me. I do find it strange that there wasn't at
least a little better parallelism in the test case if that many
threads existed, even if it wouldn't help as much in a real production
code base. Each request handler is sleeping, which should release the
GIL, so there should be able to be quite a few requests overlapped in
parallel.

> I say this as we have apps that are getting close to 10 years old that
> haven't had to go through any major upgrades due to cherrypy. Upgrading
> python versions, databases, template systems, etc. have all been
> reasonably easy thanks to CherryPy's unrelenting focus on allowing you
> to write plain Python code. In all that time, CherryPy has never once
> been the bottleneck for any performance problems.

That's been my experience to date as well (though only 5 years on the main
app so far). I do admit, however, that outside of development testing, I
always offload the static stuff to nginx.

-- David

Sylvain Hellegouarch

unread,
Feb 27, 2014, 2:42:39 PM2/27/14
to cherryp...@googlegroups.com

That's been my experience to date as well (though only 5 years on the main
app so far).  I do admit, however, that outside of development testing, I
always offload the static stuff to nginx.

As do I. First of all, because I'm on a shared host most of the time so I always go through a reverse proxy, so I might as well. But, it also just makes more sense. CherryPy doesn't know about sendfile and other goodies that web servers like nginx or lighthttpd know know about.

This is why I don't need to raise much my thread-pool either. The socket backlog is trickier and requires a lot of testing with your the right hardware/OS before finding the sweet spot. I'd be happy seeing a CherryPy using asyncio someday, that'd be fun. I wanted to give it a spin but wsgiserver's code is just... too tangled to the socket interface to make it simple. Sadly.

--
- Sylvain
http://www.defuze.org
http://twitter.com/lawouach

Robert Brewer

unread,
Feb 28, 2014, 7:12:57 PM2/28/14
to cherryp...@googlegroups.com

I don't mean to excuse CherryPy here: it may actually be slower than what you need. But *never* trust a benchmark or load test that is:

    a. run on the same host as the server,
    b. run with multiple Python threads (instead of separate processes),
    c. run over localhost, or
    d. hand-written on the spur of the moment ;)

You have no idea whether the bottlenecks are in your code, the client request library, the GIL in the client, or memory/CPU pressure from the client skewing the performance of the server; nor whether/how much your loopback interface mocks real network latency. Your mileage WILL vary, greatly.


Robert Brewer
fuma...@aminus.org

--
You received this message because you are subscribed to the Google Groups "cherrypy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cherrypy-user...@googlegroups.com.
To post to this group, send email to cherryp...@googlegroups.com.
Visit this group at http://groups.google.com/group/cherrypy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Mike Nolet

unread,
Mar 1, 2014, 3:04:47 PM3/1/14
to cherryp...@googlegroups.com
On Saturday, March 1, 2014 1:12:57 PM UTC+13, fumanchu wrote:

I don't mean to excuse CherryPy here: it may actually be slower than what you need. But *never* trust a benchmark or load test that is:

    a. run on the same host as the server,
    b. run with multiple Python threads (instead of separate processes),
    c. run over localhost, or
    d. hand-written on the spur of the moment ;)

 
I wouldn't really call this a benchmark.  I ran into the timeouts when I was playing with my client library where I was trying to simulate a heavy web page by throwing 10 parallel requests at the server (just once, not in any kind of benchmark/load test) and every now and then one or two would hang sporadically.



On Thursday, February 27, 2014 10:29:42 AM UTC+13, David wrote:

By default, CherryPy's internal web server (see wsgiserver2.py) is
going to create a thread pool with 10 worker threads, and it won't (I
don't believe) grow beyond that automatically.  It also has a default
listen queue of 5, so essentially can backlog up to 50 requests before
they start being rejected.  I suspect that explains most of the
results.

I upped the thread pool --> and the upping definitely took effect.  When I first started to see the slow reqs I upped the pool from 10->50 and the sporadic hanging I saw with 10 parallel requests stopped completely.

 
Now, at 50 you've got a huge leap upwards, but only for parallel
testing.  But that's also right around the point where you'd expect to
start dropping requests due to the listen queue filling.  So it sort
of feels like a retry mechanism may be involved somewhere.  However,
you don't mention errors on the client side, nor am I positive what is
actually issuing the requests (the requests library?), so maybe
there's some internal retry logic built in that is hiding the initial
rejected connections?

No errors are coming back which is interesting.  I don't think requests has a retry mechanism (I will dig) as I've seen connection errors and the like come back.

You did motivate me to do some thinking, and I've noticed that sometimes my server is waiting for threads to finish before shutting down.  

I ran a few more tests, and with 50 threads in the pool, on initial server *start* I never get a timeout.  If I run the test() 2-3 times back to back I consistently get the issue.  Apologies if this sounds stupid as I'm not familiar at all with the cherrypy internals, but this sounds like threads aren't being freed up fast enough back to the pool for processing.   


So as written, you've got a server that can handle about 400 req/s (at
25ms/10req), but at higher loads than that it'll start rejecting new
connections.

I'm happy with the load.  If my API needs more than 400 rps then I'm a very happy camper.  And of course, if I run two that makes for even higher load =).
 

Of course, you can also bump up the thread pool or listen queue size
(either initially or dynamically - there's a grow method you can use
though I don't think CherryPy ever does it itself), but there are
diminishing returns - unless it's to process requests that depend on
external resources (so don't run afoul of the GIL for pure Python
code), it's not necessarily going to help throughput all that much.

I tried bumping up the pool to my testing and noticed the same thing that past 50 no real change happened.
 
So this is where the recommendations to offload the static processing
to a front end server start to come in.  For example, if you've got
nginx handling all the plain images, css, javascript, etc.. (which
it's extremely good at doing) and only letting through the dynamic
requests to CherryPy, you leverage the higher overhead of CherryPy's
processing (and threads) for what really needs it.  And 400 dynamic
requests/s is a pretty decent load.


Agree with the load point, BUT... sporadic 7-8 second requests are not ok, especially when they happen with relatively low concurrency.   

My spidey sense here says cherrypy is hanging on something in the thread and waiting on a timeout.  Perhaps requests does not close the connection properly?

Aha!  As I type this... I have a vague memory recall that requests library supports keep alive...!!  I bet that explains it. All the threads are sitting there waiting for their clients second request or to close the connection but it never comes.

Will do more testing and update....


 

Mike Nolet

unread,
Mar 3, 2014, 2:40:03 PM3/3/14
to cherryp...@googlegroups.com
Looks like indeed it was a keep-alive issue...  added a Connection: close header to the requests and now things scale up as expected, with no timeouts/slow requests.

Thanks for all the help.

>>> test(50)

count=50: blocking: 738ms, parallel: 143ms

>>> test(100)

count=100: blocking: 1477ms, parallel: 305ms

>>> test(100)

count=100: blocking: 1484ms, parallel: 291ms

>>> test(1000)

count=1000: blocking: 14619ms, parallel: 3158ms



Separate question --> The implication here is that cherrypy is a bit vulnerable with lots of dangling connections... is there a config/setting to change the behavior to close connections if the thread pool gets low?  Or is this where the answer becomes... "use nginx in front"?

-Mike

Michiel Overtoom

unread,
Mar 4, 2014, 4:40:31 AM3/4/14
to cherryp...@googlegroups.com

On Mar 3, 2014, at 20:40, Mike Nolet wrote:

> "use nginx in front"?

That's what I do. And I start/stop/restart my CherryPy apps using Supervisord.

Greetings,

Reply all
Reply to author
Forward
0 new messages