cancelling a request if client closes http socket

Matt Craighead

unread,

Feb 11, 2009, 6:06:58 PM2/11/09

to mod...@googlegroups.com

Suppose the client making an HTTP request to my WSGI app closes its socket, say, because the user hit Escape in their web browser. What happens to my Python interpreter executing the WSGI code in question? It keeps running, right?

This seems pretty unfortunate. Suppose that the implementation of my HTTP request needs to go out on the network to talk to some other server. (Which, in my case, some of them do.) connect(), send(), recv() can all take potentially unlimited amounts of time to complete. They may not consume any CPU time while they're blocking, but a thread is just sitting there doing nothing; and what, if anything, will cause that thread to die short of killing the Apache process or the WSGI daemon process (if any)? Leak enough threads and you could run out of memory, deadlock, or whatnot.

The behavior I'd think I'd want would be that a closed client socket would result in a Python KeyboardInterrupt being raised asynchronously inside my WSGI Python interpreter, exactly like Ctrl-C in a normal Python app. Then my code would nicely release any DB locks/rollback any pending DB transactions as the stack unrolled, and blocking IOs (socket or otherwise) could be interrupted via a signal (Unix)/IO cancellation (Windows)/some other mechanism (???).

mod_wsgi daemon mode seems like a partial solution at best:

- daemon mode is not supported on Windows, right?

- killing the daemon process (potentially?) kills other requests, not just the hung request

And any solution that involves one process per request, well, then we might as well be back to using CGI rather than WSGI...

--
Matt Craighead
Founder/CEO, Conifer Systems LLC
http://www.conifersystems.com
512-772-1834

gert

unread,

Feb 11, 2009, 6:45:41 PM2/11/09

to modwsgi

On Feb 12, 12:06 am, Matt Craighead

WSGI and CGI both are http, so how would that be different ?

Honestly i do not think this kind of keep alive cancel stuff will work
on a apache server or any webserver i know ?

Graham Dumpleton

unread,

Feb 11, 2009, 6:47:16 PM2/11/09

to mod...@googlegroups.com

2009/2/12 Matt Craighead <matt.cr...@conifersystems.com>:

> Suppose the client making an HTTP request to my WSGI app closes its socket,
> say, because the user hit Escape in their web browser. What happens to my
> Python interpreter executing the WSGI code in question? It keeps running,
> right?

First off, you don't mean 'Python interpreter', you mean 'request thread'.

Python interpreter instances once created within a process survive for
the life of the process. Separate interpreter instances are NOT
created for each request.

You may well understand this, but it does seem to be a misconception
that some do have, so am clarifying this point.

As to whether the 'request thread' keeps running, it depends on what
it is doing and how you are yield data from a WSGI application.

In the simplest case where a WSGI application forms the complete
response as a single string, or list of strings and returns it, the
WSGI application will complete the request regardless. This is because
the data is only being written at the end of the request and so
potentially no earlier point at which it could be detected that the
client had closed the connection.

If that request thread was performing an operation that took some
time, whether it be computational, or whether it needs to block on the
result from some external process, then whatever it is doing is not
interrupted.

In the more complicated case where the WSGI application has returned a
generator which yields data in blocks, then there is an attempt to
write data back to the client as the request progresses. Provided that
blocks of data are only generated when asked for, if writing a prior
block resulted in it being detected that client connection had closed,
the mod_wsgi will skip asking for more blocks of data and move
straight on to closing the generator and finalising the request.

Thus, use of generators and only generating data as it can be sent,
does provide an option to interupt a long running process. This still
doesn't help in situations where a specific request for a block of
data resulted in the application making some blocking call.

Another case is where the write() function from start_response() is
used. In this case writing back data to client is driven by the WSGI
application rather than a loop within mod_wsgi requesting data from a
generator. Thus, when closure of connection is detected then a Python
exception will be seen by the application and it is up to it as to
what to do. Most applications seem not to handle it and a 500 error
related to unhandled exception results.

The only other option for detecting that a client has closed the
connection is when application is reading wsgi.input. This again
generates a Python exception which the application would deal with as
appropriate.

> This seems pretty unfortunate. Suppose that the implementation of my HTTP
> request needs to go out on the network to talk to some other server.
> (Which, in my case, some of them do.) connect(), send(), recv() can all
> take potentially unlimited amounts of time to complete. They may not
> consume any CPU time while they're blocking, but a thread is just sitting
> there doing nothing; and what, if anything, will cause that thread to die
> short of killing the Apache process or the WSGI daemon process (if any)?
> Leak enough threads and you could run out of memory, deadlock, or whatnot.

If your backend process you are communicating with never returns then
that is a separate issue to the client closing the connection. For
detecting a backend process as never returning you should be
implementing non blocking operations in conjunction with a timeout to
ensure that any processing is completed in the time you expect it to
be.

Whether loss of the client connection should cause connection to
backend process to be closed really depends on what the application
does. It may be the case that you still need backend task to be
completed regardless, closing the connection to the backend process,
depending on how the service is implemented may be bad as it may cause
backend task to be interrupted and not complete. But then, if you
really need that, then you should be using a persistent message/task
queuing system to ensure requests aren't lost. Overall though, what
should be done can depend on the individual connections to backend
systems and thus should be handled at the application level.

When using daemon mode of mod_wsgi the only option you really have is
to set inactivity-timeout as a fail safe for all threads in the
process getting into a locked up state because of code which blocks
and never returns. What would happen is that even though all threads
in a process may be handling a request, if none of them actually read
any request input or generate any request output in the specificed
time, then the daemon process would be forcibly shutdown.

> The behavior I'd think I'd want would be that a closed client socket would
> result in a Python KeyboardInterrupt being raised asynchronously inside my
> WSGI Python interpreter, exactly like Ctrl-C in a normal Python app. Then
> my code would nicely release any DB locks/rollback any pending DB
> transactions as the stack unrolled, and blocking IOs (socket or otherwise)
> could be interrupted via a signal (Unix)/IO cancellation (Windows)/some
> other mechanism (???).

My understanding is that this wouldn't necessarily be a safe thing to
do as it would involve injecting an exception into a distinct thread.
I remember seeing some warnings about this at one point, but things
could have changed. Either way, I have looked at it before and wasn't
convinced it was a good idea.

> mod_wsgi daemon mode seems like a partial solution at best:

In what respect are you saying that?

> - daemon mode is not supported on Windows, right?

And never will be. First because fork() is not supported on Windows,
and second because I don't really regard Windows as a good deployment
platform for Apache.

> - killing the daemon process (potentially?) kills other requests, not just
> the hung request

Yes, although in the case of inactivity-timeout, all threads would
effectively need to have stalled before it kicked in and killed the
process.

> And any solution that involves one process per request, well, then we might
> as well be back to using CGI rather than WSGI...

But CGI will not help you with this either. Well, not completely true,
CGI will allow more and more processes to be created, but keep doing
that and it will consume all resources on your machine. You still need
something that is going to kill of stuck processes.

No other web hosting mechanism for WSGI applications I have seen
really provide a solution either. Some others provide timeouts on
individual requests and will kill processes, but none that I know of
will interject some sort of signal indicating that client connection
has closed. As partly explained above, in Apache at least you can only
know a client connection has closed when you attempt to read data from
it or write data to it. Apache is not event driven and so there is no
select/poll on a client connection such that you could be notified
immediately anyway.

All you can really do for any system is at your application level try
and implement timeouts on potentially blocking operations to backend
processes and otherwise simply ensure you have allowed enough
processes/threads to handle expected load with some additional
capacity to cope with requests stalling for a while until timeouts
kick in.

Graham

Matt Craighead

unread,

Feb 11, 2009, 8:43:26 PM2/11/09

to mod...@googlegroups.com

Agree with your terminology corrections.

As for Apache on Windows, I have no love for it -- but I have to offer my users *some* sort of Windows-based server solution. I have my own miniature WSGI web server for users who want it to "just work" without having to install anything separate, but some people want more advanced features like SSL.

As for CGI, it's not a full solution, but you can always kill a process manually, whereas there's no way to kill a specific hung worker thread buried inside a process. You have to kill the whole process. Putting on a sysadmin hat, I might like be able to kill a single bad request without killing the whole server. Obviously even better would be to never have to kill a request at all, but I'm not sure I see that as 100% realistic in a distributed software system with a lot of diverse components, some of which I didn't write and/or which don't have network protocol specs and which I therefore have to access through closed-source third-party API libraries.

Writing to the output in a loop is a valid solution for some classes of applications: a gigantic HTML table comes to mind. I don't see it working very well for my particular application. I could probably make it work for a subset of my queries.

Timeouts? Assuming I can catch every single potentially-blocking API call and make it time out (fairly challenging when we're talking about a WSGI module that accesses a file system driver that in turn may make network requests to service certain file system calls -- I'm planning to rearchitect this slightly but I don't think this part will go away entirely)... I'd better make those timeouts fairly long to avoid false positives. I think this kind of approach is probably good enough to prevent myself from running out of threads and/or memory... *if* I can put a timeout in every single potentially-blocking code path.

Last time I looked into this, I think I saw some web debates over whether it would be possible to have a language construct along the lines of:

with timeout(seconds):

do_stuff()

...where if the contents of the "with" code block took more than "seconds" seconds to execute, some sort of TimeoutException would be asynchronously thrown that would allow for a full and orderly cleanup. This would be great for my purposes. I doubt something like this will find its way into the language any time soon, seeing as I've already found bugs in Python where certain simple blocking socket API calls can't be Ctrl-C'd under Windows.

Overall, I'm getting the impression that the answer to my dilemma is: "Yes, this is hard. Deal with it." Would that be a correct assessment?

512-772-1834

Graham Dumpleton

unread,

Feb 11, 2009, 8:57:26 PM2/11/09

to mod...@googlegroups.com

2009/2/12 Matt Craighead <matt.cr...@conifersystems.com>:

Except that like trying to inject a Python exception into a running
thread, it will not work if the code is actually inside of C code and
that that is where it is hanging.

> Overall, I'm getting the impression that the answer to my dilemma is: "Yes,
> this is hard. Deal with it." Would that be a correct assessment?

More or less. One has to gauge what are the things that one would
need to protect against blocking. If it is your own internal network
or systems, probably safe. If going outside of your network, then
protect yourself.

As failsafe use inactivity-timeout on daemon process group. This is in
part what it was added for. So even if process hangs to period of
timeout, that at least it will automatically recover rather than
having to wait for human to kill it.

Graham

Matt Craighead

unread,

Feb 11, 2009, 9:54:08 PM2/11/09

to mod...@googlegroups.com

My assumption is that all the C modules shipping with Python would have to support the "with timeout" feature.

At least on Windows Vista and beyond, this wouldn't be so hard -- a call to CancelSynchronousIo plus a per-thread timeout object and change *WaitFor* calls to wait on that timeout object (change WaitForSingleObject to WaitForMultipleObjects as needed). For OS's before Vista, this is speculation, but I would guess that CancelSynchronousIo could be implemented on top of ntdll.dll APIs. I'm not enough of a Unix expert to say what would be involved on various Unix systems.

Anyway, obviously this is not going to happen in the forseeable future.

> One has to gauge what are the things that one would
> need to protect against blocking.

Right now I'm having a lot of difficulties coming from the fact that query X is blocked waiting for query Y to release a mutex -- and that mutex, in turn, won't be released until query Y finishes doing a bunch of slow network queries to server Z (which could be out on the public Internet in some cases). The general solution is to never hold a mutex while waiting for a network query that could take a potentially unbounded amount of time -- release the mutex first and reacquire after the network query finishes -- but this requires a fairly substantial rearchitecture of a lot of code. Some of which is written in Python, some in C/C++, and a good chunk of which is a file system driver.

I certainly have little to no control over how fast (third-party, often proprietary and closed-source, often not on the LAN) server Z chooses to respond to my queries. I've seen cases where server Z is bogged down responding to some other random unrelated query, holding a write lock on its DB, and even the simplest read-only query to server Z hangs for 10 minutes waiting for that huge unrelated DB query to complete.

Like I said, big complex distributed software system.

Graham Dumpleton

unread,

Feb 11, 2009, 10:28:38 PM2/11/09

to mod...@googlegroups.com

2009/2/12 Matt Craighead <matt.cr...@conifersystems.com>:

> My assumption is that all the C modules shipping with Python would have to
> support the "with timeout" feature.
>
> At least on Windows Vista and beyond, this wouldn't be so hard -- a call to
> CancelSynchronousIo plus a per-thread timeout object and change *WaitFor*
> calls to wait on that timeout object (change WaitForSingleObject to
> WaitForMultipleObjects as needed). For OS's before Vista, this is
> speculation, but I would guess that CancelSynchronousIo could be implemented
> on top of ntdll.dll APIs. I'm not enough of a Unix expert to say what would
> be involved on various Unix systems.

In the main UNIX does not have equivalent asynchronous APIs. There are
some for various stuff, but not as broad a range as Windows. Would
suspect that various target platforms for Python wouldn't even have
them, as think they are more likely an optional extension of POSIX and
not mandatory. Could be wrong there. Thus may be hard to mandate it as
a requirement in Python core.

Graham

Reply all

Reply to author

Forward