Dealing with many concurrent and overlapped HTTP requests

Graham Klyne

unread,

Apr 5, 2006, 7:59:11 AM4/5/06

to cherryp...@googlegroups.com

This message is a follow-up to a discussion I had on the CherryPy IRC channel:
http://www.defuze.org/oss/cpirc/%23cherrypy.2006-04-03.log.html#t2006-04-03T11:29:47

I have some ideas (below) for how to move forward with this application, but I'd
be interested if anyone has any better ones, especially with respect to getting
the best out of CherryPy.

...

Here's the situation:

I'm trying to understand how best to use CheeryPy to handle multiple overlapped
HTTP requests, maybe 200 at a time. The scenario is something like this (use
fixed font for diagram):

+---------+ +----------+ +--------+
| Browser | -->------ | CherryPy | ----->---| Web |
| client | | | gateway | | | server |
+---------+ | +----------+ | +--------+
| |
+---------+ | | +--------+
| client | -->---| |-->-- | server |
+---------+ | | +--------+
| |
+---------+ | | +--------+
| client | -->--- -->-- | server |
+---------+ +--------+

: :
etc. etc.

Most requests are very lightweight (almost no processing in the CherryPy
gateway), but the response may be delayed by time taken to get a response from
the ultimate servers. Each client may have several outstanding requests (Ajax).

Currently I'm using the regular Python library to make HTTP requests out from
the gateway component, which block until the request is complete (or time out
after about 20 seconds). When I load up the system by generating lots of
browser client requests, the gateway stalls (stops servicing new incoming
requests) and the requests start timing out (my Ajax code times out a request
after about 5 seconds). When the load is lifted, it can take several minutes
for the gateway to recover -- though, eventually, it usually does recover :).

...

Here's how I'm thinking of developing this application:

1. Increase the number of CherryPy handler threads. I'm wary of depending
entirely on this "obvious" approach because I don't know how expensive this
would be: is it practical to have 100s or 1000s of such threads?

2. Use Twisted instead of Python libraries for the HTTP requests issued from the
gateway. (Ideally, I'd like to be able to release the CherryPy request handling
thread and pick up the request context when the Twisted "deferred" calls back,
but I'm guessing that's not possible with unmodified CherryPy.) What I believe
Twisted can give me is better control over request timeout: I don't want to
wait 20 seconds to learn that a request has failed; more like 1-2 seconds.

3. Modify the Ajax browser request protocol so that a "pending" response can be
returned quickly, with final information returned in response to a subsequent
request. This presumes some kind of buffering in the gateway.

Where all this eventually leads is a kind of pubsub event distribution network
built using HTTP protocol, but I'm not going all-out for that immediately.

...

It has been suggested that I develop an alternative HTTP server to the CherryPy
default for handling these asynchronous requests. This might be a choice I must
make, but I'm put off by two factors: (a) the immediate extra effort required,
and (b) "forking" away from the standard CherryPy (actually: TurboGears)
distribution. If I go this route, I'd rather do it as an option within
CherryPy, but I don't yet have enough familiarity with CP internals. I would be
interested if anyone else has done (say) a CherryPy dispatcher based on Twisted
(I couldn't find anything with Google). I fear that this may be difficult,
because it seems to me on brief acquaintance that HTTP request handling context
is bound up with (not easily separated from) the handler's thread context - but
I could be wrong.

...

Any thoughts?

#g

--
Graham Klyne
For email:
http://www.ninebynine.org/#Contact

Robert Brewer

unread,

Apr 5, 2006, 2:06:24 PM4/5/06

to cherryp...@googlegroups.com

Graham Klyne wrote:
> I'm trying to understand how best to use CheeryPy to handle
> multiple overlapped HTTP requests, maybe 200 at a time.
> The scenario is something like this (use fixed font for
> diagram):
>
> +---------+ +----------+ +--------+
> | Browser | -->------ | CherryPy | ----->---| Web |
> | client | | | gateway | | | server |
> +---------+ | +----------+ | +--------+
> | |
> +---------+ | | +--------+
> | client | -->---| |-->-- | server |
> +---------+ | | +--------+
> | |
> +---------+ | | +--------+
> | client | -->--- -->-- | server |
> +---------+ +--------+
>
> : :
> etc. etc.
>
> Most requests are very lightweight (almost no processing in
> the CherryPy gateway), but the response may be delayed by
> time taken to get a response from the ultimate servers.
> Each client may have several outstanding requests (Ajax).

> ...

> 1. Increase the number of CherryPy handler threads.
> I'm wary of depending entirely on this "obvious" approach
> because I don't know how expensive this would be:
> is it practical to have 100s or 1000s of such threads?

Hundreds, yes. For example, Apache2 on winnt defaults to ThreadsPerChild
= 64 and ThreadLimit = 1920 (max 15000).

I'd recommend you put Apache in front of CP anyway, to get the benefit
of persistent connections. Use mod_python so that Apache creates and
controls the threads--when you get up into the thousands of threads, you
might start running into memory issues (IIRC, NT defaults to 1M of stack
space per thread), and you might be able to tweak that with Apache's
ThreadStackSize setting. If you proxy CP behind Apache (on a different
port), CP will then create its own threads and you'll have *two*
problems. ;)

> 2. Use Twisted instead of Python libraries for the HTTP

> requests issued from the gateway... What I believe

> Twisted can give me is better control over request
> timeout: I don't want to wait 20 seconds to learn
> that a request has failed; more like 1-2 seconds.

If you're using httplib to generate the requests now, you should be able
to extend HTTPConnection to call settimeout(secs) on the socket.

class Conn(httplib.HTTPConnection):

def connect(self):
"""Connect to the host and port specified in __init__."""
msg = "getaddrinfo returns an empty list"
for res in socket.getaddrinfo(self.host, self.port, 0,
socket.SOCK_STREAM):
af, socktype, proto, canonname, sa = res
try:
self.sock = socket.socket(af, socktype, proto)
# Set a timeout on the socket
self.sock.settimeout(2)
if self.debuglevel > 0:
print "connect: (%s, %s)" % (self.host, self.port)
self.sock.connect(sa)
except socket.error, msg:
if self.debuglevel > 0:
print 'connect fail:', (self.host, self.port)
if self.sock:
self.sock.close()
self.sock = None
continue
break
if not self.sock:
raise socket.error, msg

> 3. Modify the Ajax browser request protocol so that a
> "pending" response can be returned quickly, with final
> information returned in response to a subsequent
> request. This presumes some kind of buffering
> in the gateway.

I think that would be useless overhead. The fact that you've sent the
request should be enough of a flag that the request is "pending". Wait
for the XmlHttpRequest to complete or timeout. If they're not completing
in time, you need to fix the server side, not patch the client.

> Any thoughts?

I think you'd be quickly frustrated writing your own server. It seems
you're equating Ajax with Twisted because they both use the word
"asynchronous" in their marketing--they have nothing to do with each
other, really. Use CP, set a timeout on your backend calls, and use
Apache to get the keep-alive and thread-control benefits.

Robert Brewer
System Architect
Amor Ministries
fuma...@amor.org

Jason Earl

unread,

Apr 5, 2006, 6:39:24 PM4/5/06

to cherryp...@googlegroups.com

"Robert Brewer" <fuma...@amor.org> writes:

It's interesting that you should say that, as it is my understanding
that, at least using the prefork version of Apache on Linux that
proxying behind Apache is a win. Basically using mod_python (or
mod_perl where unfortunately I have more experience presently) you end
up with various forks each controlling one or more threads. Depending
on usage patterns this might well lead to more Python threads than
proxying. What's more you end up using much fatter Apache forks to
serve up static content.

Once again, this was just my understanding, which may very well be
wrong.

One thing is certain, the first thing to try is to increase the amount
of threads available.

Jason

Graham Klyne

unread,

Apr 5, 2006, 6:41:03 PM4/5/06

to cherryp...@googlegroups.com

Robert,

Thanks for your observations and comments. I'm sure I'll learn something from
them, though maybe not what you intended, as I think I wasn't particularly lucid
in describing the environment I'm aiming for...

Hmmm. And I thought that it was on WinNT especially that threads were expensive!

> I'd recommend you put Apache in front of CP anyway, to get the benefit
> of persistent connections. Use mod_python so that Apache creates and
> controls the threads--when you get up into the thousands of threads, you
> might start running into memory issues (IIRC, NT defaults to 1M of stack
> space per thread), and you might be able to tweak that with Apache's
> ThreadStackSize setting. If you proxy CP behind Apache (on a different
> port), CP will then create its own threads and you'll have *two*
> problems. ;)

This may sound contradictory, but the 100s or 1000s of threads (or concurrent
requests) is for a "lightweight" server. That is, the processing associated
with each thread is very light, it's just that there may be a large number
outstanding at any time. If the price is 1Mb per thread, then having an active
thread for each outstanding request for information isn't really an option.

I didn't say in my original post that the CheeryPy server is intended to run on
a pretty minimal system ... something like a diskless network appliance with
maybe 0.5-1Gb of flash memory and a very modest processor. (Such a system would
be Linux, not Windows, but the software aims to be portable.)

That's a very useful suggestion, thanks. I shall study and experiment with that
approach.

>> 3. Modify the Ajax browser request protocol so that a
>> "pending" response can be returned quickly, with final
>> information returned in response to a subsequent
>> request. This presumes some kind of buffering
>> in the gateway.
>
> I think that would be useless overhead. The fact that you've sent the
> request should be enough of a flag that the request is "pending". Wait
> for the XmlHttpRequest to complete or timeout. If they're not completing
> in time, you need to fix the server side, not patch the client.

My concern is that a thread will be tied up maintaining the context of the
pending request. Especially if it's tying up 1Mb of allocated stack. By
dismissing the initial request quickly, while the requested information is being
obtained. I hope to release that thread resource, even if it requires another
request to actually retrieve the data. I agree it's an undesirable overhead,
and far from ideal, but at this stage I don't see any other way to release the
resources otherwise tied up by keeping the browser client requests outstanding.

>> Any thoughts?
>
> I think you'd be quickly frustrated writing your own server.

I tend to agree. Not that I'd "write my own server" -- e.g. Twisted has modules
that would provide all the protocol support I'd need. It's integrating the
result with CherryPy that seems troublesome to me.

> It seems
> you're equating Ajax with Twisted because they both use the word
> "asynchronous" in their marketing--they have nothing to do with each
> other, really.

If I am, it's because my Ajax is implemented using Mochikit, which is in turn
based on a Deferred class inspired by Twisted. The critical feature they have
in common is that an outstanding and dormant request requires only a minimum of
resources to be reserved. With CherryPy, I think I lose this economy if I need
to tie up a thread for each outstanding request (unless the threads themselves
are very cheap).

> Use CP, set a timeout on your backend calls, and use
> Apache to get the keep-alive and thread-control benefits.

One part of your position I fail to understand is what Apache will actually do
for me here. I understand that it will provide security and performance for
high volume web serving, but neither of those are really priorities for me. My
problem is just that the number of outstanding HTTP requests may be very high.
Now, if Apache has its own super-lightweight threading system that can be
inherited by mod_python processors, then I'd see a real benefit. But I don't
understand if or how Apache can provide this.

Again, many thanks for your insights,

Robert Brewer

unread,

Apr 5, 2006, 6:50:21 PM4/5/06

to cherryp...@googlegroups.com

Graham Klyne wrote:
> One part of your position I fail to understand is what Apache
> will actually do for me here. I understand that it will
> provide security and performance for high volume web
> serving, but neither of those are really priorities for me.
> My problem is just that the number of outstanding HTTP
> requests may be very high. Now, if Apache has its own
> super-lightweight threading system that can be inherited
> by mod_python processors, then I'd see a real benefit.
> But I don't understand if or how Apache can provide this.

(answering quickly)

Apache buys you two things. The first is persistent connections; that
is, if a client makes more than one request, it can do so on a single
connection. This can save you a lot of overhead in CPU, memory, and I/O.

Second, Apache would be creating the threads instead of Python, and it
seems to have a facility for making the stack size smaller than the OS
default of 1Mb. I haven't thried it myself, but it's worth a shot IMO.

Robert Brewer

unread,

Apr 5, 2006, 6:58:31 PM4/5/06

to cherryp...@googlegroups.com

Jason Earl wrote:

> "Robert Brewer" <fuma...@amor.org> writes:
>
> > Graham Klyne wrote:
> >> I'm trying to understand how best to use CheeryPy to handle
> >> multiple overlapped HTTP requests, maybe 200 at a time.
> >

> > I'd recommend you put Apache in front of CP anyway, to get the
> > benefit of persistent connections. Use mod_python so that Apache
> > creates and controls the threads--when you get up into the thousands
> > of threads, you might start running into memory issues (IIRC, NT
> > defaults to 1M of stack space per thread), and you might be able to
> > tweak that with Apache's ThreadStackSize setting. If you proxy CP
> > behind Apache (on a different port), CP will then create its own
> > threads and you'll have *two* problems. ;)
>

> It's interesting that you should say that, as it is my understanding
> that, at least using the prefork version of Apache on Linux that
> proxying behind Apache is a win. Basically using mod_python (or
> mod_perl where unfortunately I have more experience presently) you end
> up with various forks each controlling one or more threads. Depending
> on usage patterns this might well lead to more Python threads than
> proxying. What's more you end up using much fatter Apache forks to
> serve up static content.

Hm. I must have mixed up my emails this morning; I could have sworn he
was on Windows, where you don't really have the option of multiple
processes (mpm_winnt uses a single process with lots of threads).
Regardless, if you proxy, you'll have more "live" threads at once:
Apache will create threads, and CP's HTTP server will have its own
threads. By using mod_python, you skip having CP create or manage any
threads, and just let Apache create and destroy them as needed. That's
not necessarily a good move on prefork (or any other forking mpm),
because you'll have one copy of CP and a separate Python interpreter per
process. That can quickly outweigh the benefits of lower thread volume.

Jason Earl

unread,

Apr 5, 2006, 7:59:23 PM4/5/06

to cherryp...@googlegroups.com

"Robert Brewer" <fuma...@amor.org> writes:

Ok, that makes sense. Somehow I missed the "Windows" part, and that
certainly makes a difference.

Jason

Graham Klyne

unread,

Apr 6, 2006, 5:00:35 AM4/6/06

to cherryp...@googlegroups.com

Robert,

Thanks for your observations and comments. I'm sure I'll learn something from
them, though maybe not what you intended, as I think I wasn't particularly lucid
in describing the environment I'm aiming for...

Hmmm. And I thought that it was on WinNT especially that threads were expensive!

> I'd recommend you put Apache in front of CP anyway, to get the benefit

> of persistent connections. Use mod_python so that Apache creates and
> controls the threads--when you get up into the thousands of threads, you
> might start running into memory issues (IIRC, NT defaults to 1M of stack
> space per thread), and you might be able to tweak that with Apache's
> ThreadStackSize setting. If you proxy CP behind Apache (on a different
> port), CP will then create its own threads and you'll have *two*
> problems. ;)

This may sound contradictory, but the 100s or 1000s of threads (or concurrent

requests) is for a "lightweight" server. That is, the processing associated
with each thread is very light, it's just that there may be a large number
outstanding at any time. If the price is 1Mb per thread, then having an active
thread for each outstanding request for information isn't really an option.

I didn't say in my original post that the CheeryPy server is intended to run on
a pretty minimal system ... something like a diskless network appliance with
maybe 0.5-1Gb of flash memory and a very modest processor. (Such a system would
be Linux, not Windows, but the software aims to be portable.)

>> 2. Use Twisted instead of Python libraries for the HTTP

That's a very useful suggestion, thanks. I shall study and experiment with that
approach.

>> 3. Modify the Ajax browser request protocol so that a

>> "pending" response can be returned quickly, with final
>> information returned in response to a subsequent
>> request. This presumes some kind of buffering
>> in the gateway.
>
> I think that would be useless overhead. The fact that you've sent the
> request should be enough of a flag that the request is "pending". Wait
> for the XmlHttpRequest to complete or timeout. If they're not completing
> in time, you need to fix the server side, not patch the client.

My concern is that a thread will be tied up maintaining the context of the

pending request. Especially if it's tying up 1Mb of allocated stack. By
dismissing the initial request quickly, while the requested information is being
obtained. I hope to release that thread resource, even if it requires another
request to actually retrieve the data. I agree it's an undesirable overhead,
and far from ideal, but at this stage I don't see any other way to release the
resources otherwise tied up by keeping the browser client requests outstanding.

>> Any thoughts?

>
> I think you'd be quickly frustrated writing your own server.

I tend to agree. Not that I'd "write my own server" -- e.g. Twisted has modules

that would provide all the protocol support I'd need. It's integrating the
result with CherryPy that seems troublesome to me.

> It seems

> you're equating Ajax with Twisted because they both use the word
> "asynchronous" in their marketing--they have nothing to do with each
> other, really.

If I am, it's because my Ajax is implemented using Mochikit, which is in turn

based on a Deferred class inspired by Twisted. The critical feature they have
in common is that an outstanding and dormant request requires only a minimum of
resources to be reserved. With CherryPy, I think I lose this economy if I need
to tie up a thread for each outstanding request (unless the threads themselves
are very cheap).

> Use CP, set a timeout on your backend calls, and use

> Apache to get the keep-alive and thread-control benefits.

One part of your position I fail to understand is what Apache will actually do

for me here. I understand that it will provide security and performance for
high volume web serving, but neither of those are really priorities for me. My
problem is just that the number of outstanding HTTP requests may be very high.
Now, if Apache has its own super-lightweight threading system that can be
inherited by mod_python processors, then I'd see a real benefit. But I don't
understand if or how Apache can provide this.

Again, many thanks for your insights,

Sylvain Hellegouarch

unread,

Apr 6, 2006, 5:58:33 AM4/6/06

to cherryp...@googlegroups.com

Jason Earl a écrit :

It would be interesting to see the difference between Linux and Windows
on the thread matter as we have noticed that Python threads were not
cheap on Linux (although it was less expensive with a 2.6 kernel)

Anyone experienced that?

- Sylvain

Gary Doades

unread,

Apr 6, 2006, 1:40:59 PM4/6/06

to cherryp...@googlegroups.com

Sylvain Hellegouarch wrote:
>>
>>
> It would be interesting to see the difference between Linux and Windows
> on the thread matter as we have noticed that Python threads were not
> cheap on Linux (although it was less expensive with a 2.6 kernel)
>
> Anyone experienced that?

Certainly, at least in terms of CPU. I have noticed that Linux threads
seems to have a higher "overhead" than Windows. This is particularly
noticeable when the CP application is idling.

I can start a CP application with 10 threads on Linux FC4 with Python
2.4 (or FreeBSD 6 for that matter) and leave it for a day, doing
nothing, and it will clock up about 10 minutes of CPU time!

On Windows XP the same CP application with pretty much the same hardware
it clocks up just a few milliseconds.

No connections are ever made in the above tests, just start the app and
leave it.

Using the same spec CPU (P4 3.2 GHz) I get about 5 to 10% higher
throughput on Windows than on Linux.