Tuning the number of waitress threads for production

11,771 views
Skip to first unread message

Tom Wiltzius

unread,
Mar 11, 2016, 4:08:16 PM3/11/16
to pylons-discuss
Hi there,

Recently I've been going through a round of (attempted) performance tuning of our Pyramid/waitress application (we use it as an API server, with SqlAlchemy as the ORM and nginx sitting in front of waitress as an SSL proxy), and this has left me with a few questions about the nature of the Pyramid/Waitress thread model. These questions are more about waitress than about Pyramid, but I don't see any dedicated waitress discussion lists so thought I'd try here. Please feel free to redirect me if I'm asking in the wrong place.

When loading complex pages browser clients will send as many as 10 API requests in parallel. I've noticed that when this happens, requests that I know return quickly on their own will get "blocked" behind requests that take longer -- the first byte of the response for later requests comes only after the earlier requests are finished downloading.

According to the info I can find on the waitress design [1], it has a fixed thread pool to service requests (defaulting to 4). My theory is that if the threads get tied up with a few slow requests, the server can no longer service the faster ones. Bumping the number of waitress threads to 20 (more than the number of requests we ever make in parallel) seems to mitigate the issue from a single client; the faster requests no longer block behind the slower requests.

However, this "solution" leaves me with more questions than answers. That same design document [1] indicates that waitress worker threads never do any I/O. But our application logic does lots of I/O to talk to the database server on another machine (through SqlAlchemy). So...

- Am I misunderstanding the waitress design? Or are we doing it wrong? 
- Is the Pyramid initialization code only run once (setting up routes, etc), or is it run once per worker thread? We have a bunch of our own services we initialize at the same time as route registration. We try to run them as singletons, and it all seems to work but now I'm in doubt over when/where this code is executed (is it on the waitress master process?) I read through the only other topic I could find discussing this [2] but it mostly discusses manually spinning up threads for slow tasks -- I'd like to avoid doing that for all database operations if at all possible.
- Other than increased memory consumption are there any significant downsides to increasing the the number threads? I thought I read somewhere to set the number of worker threads to the number of CPU cores available, which would make sense if the workload was CPU bound but our workload about 50% CPU and 50% database (i.e. I/O) by wall time.
- Is it possible I'm looking in the wrong place entirely, and nginx is actually what's causing request serialization? We're using 4 nginx worker processes with the default (512) number of concurrent connections, so my assumption is this is not the bottleneck.

Any guidance, insight, or further documentation references greatly appreciated!

Thanks,
Tom


Michael Merickel

unread,
Mar 11, 2016, 4:34:41 PM3/11/16
to Pylons
Waitress is multithreaded (it does not fork) with the main thread using asyncore to accept new sockets. A new socket is accepted and dispatched to a free thread in the threadpool. The configured WSGI app is shared between all threads. Since each request is managed in its own thread it is synchronous and free to do synchronous IO.

--
You received this message because you are subscribed to the Google Groups "pylons-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylons-discus...@googlegroups.com.
To post to this group, send email to pylons-...@googlegroups.com.
Visit this group at https://groups.google.com/group/pylons-discuss.
For more options, visit https://groups.google.com/d/optout.

Jonathan Vanasco

unread,
Mar 11, 2016, 5:44:24 PM3/11/16
to pylons-discuss
> My theory is that if the threads get tied up with a few slow requests, the server can no longer service the faster ones.

That's usually the issue.  It's compounded more when you don't pipe things through something nginx, which can block resources on slow/dropped connections.

A few ideas come to mind:

i'd take a look at your nginx config.  there are options to throttle the number of connections per client. (upstream and WAN)
your browser could also have a limit on requests as well, and the keepalive implementation (if enabled on nginx) could be a factor.   are you sure they're being sent in parallel and not serial?

it's possible that you're having issues with database blocking. 

it's also possible, though i doubt it, that you're running into issues with the GIL. you could try using uwsgi to see if there is any difference.


Tom Wiltzius

unread,
Mar 12, 2016, 7:02:35 PM3/12/16
to pylons-discuss
Thank you both for the information!

It sounds like there isn't any significant downside the increasing the number of waitress threads beyond the number of available CPU cores if we expect them to be I/O bound rather than CPU bound. Is that true?


I will investigate our nginx configuration; perhaps it's limiting the number of requests per client to the upstream server. Thanks for that tip. We're using SPDY  3.1 and I'm testing in Chrome, so I don't think the number of requests should be throttled by the client or by nginx on the WAN side (it should be one, persistent TCP connection).

I haven't tried uWSGI, but I did try gunicorn and switched to using multiple processes instead of multiple threads. That doesn't seem to have changed the timings much, so I don't think we're blocking on the GIL.

The last option is the database or SqlAlchemy; I have not ruled that out yet but I can write a script completely outside the context of the web server that makes similar requests and see how it performs.

Thank you both again for the help.

Bert JW Regeer

unread,
Mar 12, 2016, 7:10:39 PM3/12/16
to pylons-...@googlegroups.com

> On Mar 12, 2016, at 17:02, Tom Wiltzius <tom.wi...@gmail.com> wrote:
>
> Thank you both for the information!
>
> It sounds like there isn't any significant downside the increasing the number of waitress threads beyond the number of available CPU cores if we expect them to be I/O bound rather than CPU bound. Is that true?
>

This is correct.

>
> I will investigate our nginx configuration; perhaps it's limiting the number of requests per client to the upstream server. Thanks for that tip. We're using SPDY 3.1 and I'm testing in Chrome, so I don't think the number of requests should be throttled by the client or by nginx on the WAN side (it should be one, persistent TCP connection).

Depending on where your app is blocked, you may be able to have NGINX buffer the request/response which may help clear out connections within waitress faster.

>
> I haven't tried uWSGI, but I did try gunicorn and switched to using multiple processes instead of multiple threads. That doesn't seem to have changed the timings much, so I don't think we're blocking on the GIL.
>
> The last option is the database or SqlAlchemy; I have not ruled that out yet but I can write a script completely outside the context of the web server that makes similar requests and see how it performs.
>
> Thank you both again for the help.
>
> On Friday, March 11, 2016 at 2:44:24 PM UTC-8, Jonathan Vanasco wrote:
> > My theory is that if the threads get tied up with a few slow requests, the server can no longer service the faster ones.
>
> That's usually the issue. It's compounded more when you don't pipe things through something nginx, which can block resources on slow/dropped connections.
>
> A few ideas come to mind:
>
> i'd take a look at your nginx config. there are options to throttle the number of connections per client. (upstream and WAN)
> your browser could also have a limit on requests as well, and the keepalive implementation (if enabled on nginx) could be a factor. are you sure they're being sent in parallel and not serial?
>
> it's possible that you're having issues with database blocking.
>
> it's also possible, though i doubt it, that you're running into issues with the GIL. you could try using uwsgi to see if there is any difference.
>
>
>

pnkk...@gmail.com

unread,
Apr 28, 2016, 11:50:11 AM4/28/16
to pylons-discuss
What can happen if we increase the number of waitress threads beyond the number of CPU cores?

Bert JW Regeer

unread,
Apr 28, 2016, 11:55:45 AM4/28/16
to pylons-...@googlegroups.com
Your system will schedule each thread as it sees fit. There is nothing that will “happen”. If you are waiting on IO completion, having more threads than cores could be a way to handle more requests.

Bert
> --
> You received this message because you are subscribed to the Google Groups "pylons-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pylons-discus...@googlegroups.com.
> To post to this group, send email to pylons-...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pylons-discuss/c8156435-bc76-40d1-8e11-c70a6016b909%40googlegroups.com.

Otto Krüse

unread,
Nov 5, 2016, 12:21:14 PM11/5/16
to pylons-discuss
I believe that significantly scaling up the number of threads is almost never really usefull in CPython.

This is because of the Python GIL (Global Interpretor Lock). In CPython, in a Python process, only 1 thread will be running at a time. See http://www.dabeaz.com/GIL/

Having a small number of threads may be useful to prevent blocking your program while waiting on I/O. But this only gets you so far.

Another setup, not running into GIL limitations, would be to run a couple of pserve instances, each one in its own process, and load balance these with nginx. See http://docs.pylonsproject.org/projects/pyramid-cookbook/en/latest/deployment/nginx.html

Am interested in how you have fared so far.

Otto
Reply all
Reply to author
Forward
0 new messages