Hi there,
Recently I've been going through a round of (attempted) performance tuning of our Pyramid/waitress application (we use it as an API server, with SqlAlchemy as the ORM and nginx sitting in front of waitress as an SSL proxy), and this has left me with a few questions about the nature of the Pyramid/Waitress thread model. These questions are more about waitress than about Pyramid, but I don't see any dedicated waitress discussion lists so thought I'd try here. Please feel free to redirect me if I'm asking in the wrong place.
When loading complex pages browser clients will send as many as 10 API requests in parallel. I've noticed that when this happens, requests that I know return quickly on their own will get "blocked" behind requests that take longer -- the first byte of the response for later requests comes only after the earlier requests are finished downloading.
According to the info I can find on the waitress design [1], it has a fixed thread pool to service requests (defaulting to 4). My theory is that if the threads get tied up with a few slow requests, the server can no longer service the faster ones. Bumping the number of waitress threads to 20 (more than the number of requests we ever make in parallel) seems to mitigate the issue from a single client; the faster requests no longer block behind the slower requests.
However, this "solution" leaves me with more questions than answers. That same design document [1] indicates that waitress worker threads never do any I/O. But our application logic does lots of I/O to talk to the database server on another machine (through SqlAlchemy). So...
- Am I misunderstanding the waitress design? Or are we doing it wrong?
- Is the Pyramid initialization code only run once (setting up routes, etc), or is it run once per worker thread? We have a bunch of our own services we initialize at the same time as route registration. We try to run them as singletons, and it all seems to work but now I'm in doubt over when/where this code is executed (is it on the waitress master process?) I read through the only other topic I could find discussing this [2] but it mostly discusses manually spinning up threads for slow tasks -- I'd like to avoid doing that for all database operations if at all possible.
- Other than increased memory consumption are there any significant downsides to increasing the the number threads? I thought I read somewhere to set the number of worker threads to the number of CPU cores available, which would make sense if the workload was CPU bound but our workload about 50% CPU and 50% database (i.e. I/O) by wall time.
- Is it possible I'm looking in the wrong place entirely, and nginx is actually what's causing request serialization? We're using 4 nginx worker processes with the default (512) number of concurrent connections, so my assumption is this is not the bottleneck.
Any guidance, insight, or further documentation references greatly appreciated!
Thanks,
Tom