Custom PUT / POST Body Handlers

741 views
Skip to first unread message

Josh Marshall

unread,
Mar 14, 2011, 11:03:43 PM3/14/11
to Tornado Web Server
For my PyCon sprint today, I decided to finally get around to playing
with custom body handlers in Tornado. My fork is at
https://github.com/joshmarshall/tornado , and if it meets with Ben,
etc. approval I'll issue a pull request.

Here are a few usage examples:

PUT Temporary File: https://gist.github.com/870216
JSON to Request Arguments: https://gist.github.com/870225

I tried to make it as non-invasive as possible - it defaults to the
standard behavior. However, if you pass a custom HTTPParseBody
subclass to the body_handlers argument on HTTPServer (or Application)
with the appropriate content-type regex, it will __call__() the class
at the appropriate time. This means theoretically you can do most
anything with POST / PUT bodies, even deferring reading until after
authentication, etc. (although whether that's advised is another
question. :))

Let me know if this is interesting / useful / totally a waste of my
sprint time. :)

Jeremy Kelley

unread,
Mar 14, 2011, 11:08:44 PM3/14/11
to python-...@googlegroups.com
Awesome! Great work!

How about rolling the gist examples up into the HTTPServer class and
removing the "Test" prefix so they're automatically shipped with it?

-j

--
The Christian ideal has not been tried and found wanting;
it has been found difficult and left untried – G. K. Chesterton

Ben Darnell

unread,
Mar 22, 2011, 1:32:51 AM3/22/11
to python-...@googlegroups.com, Josh Marshall
Neat!  I think it would be cleaner if you split the two different use cases out into separate features.  One to parse arguments based on content-type (this could perhaps be done in RequestHandler, but even in HTTPServer it would be nice if you didn't have to call read_bytes when what you really want is to parse json), and one to handle the request body as the data comes in instead of reading it all into one giant string.  The latter is tricky since I think you really want it to be controllable on a per-path basis, but the decision needs to be made before the path dispatching in Application.  

-Ben 

Hermann Yung

unread,
Aug 21, 2013, 6:45:10 AM8/21/13
to python-...@googlegroups.com, Josh Marshall, b...@bendarnell.com
Hi, 

It has been more than 2 years by now... Has this been implemented in Tornado 3?

MK

Ben Darnell於 2011年3月22日星期二UTC+8下午1時32分51秒寫道:

Ben Darnell

unread,
Aug 22, 2013, 9:18:48 PM8/22/13
to Hermann Yung, Tornado Mailing List, Josh Marshall
On Wed, Aug 21, 2013 at 6:45 AM, Hermann Yung <hong.kon...@gmail.com> wrote:
Hi, 

It has been more than 2 years by now... Has this been implemented in Tornado 3?

No.  I really want to get streaming uploads into 3.2 though.

-Ben

Bryan Pham

unread,
Oct 13, 2013, 10:44:46 PM10/13/13
to python-...@googlegroups.com, Hermann Yung, Josh Marshall, b...@bendarnell.com
Second request for streaming upload and getting rid of the 100mb limit.  In term of memory usage, there's no real need to buffer more than 1mb in memory.  

Bryan Pham

unread,
Jan 6, 2014, 7:40:13 PM1/6/14
to python-...@googlegroups.com, Hermann Yung, Josh Marshall, b...@bendarnell.com
Hi Ben,

Now tornado is 3.2 beta1.  Is streaming body is going to be in this release ?

Ben Darnell

unread,
Jan 6, 2014, 8:48:16 PM1/6/14
to Bryan Pham, Tornado Mailing List, Hermann Yung, Josh Marshall
No, it didn't make it in.  Maybe for 3.3.

Josh Marshall

unread,
Jan 6, 2014, 8:59:59 PM1/6/14
to Ben Darnell, Bryan Pham, Tornado Mailing List, Hermann Yung
Maybe I should take a second look at this. :) I've wanted to make it path / handler class based, but as far as I can tell there is still no method to do that since this is determined before path processing. Anything I'm missing?

---
Josh Marshall
(254) 498-0366

Ben Darnell

unread,
Jan 6, 2014, 9:16:29 PM1/6/14
to Josh Marshall, Bryan Pham, Tornado Mailing List, Hermann Yung
On Mon, Jan 6, 2014 at 8:59 PM, Josh Marshall <catc...@gmail.com> wrote:
Maybe I should take a second look at this. :) I've wanted to make it path / handler class based, but as far as I can tell there is still no method to do that since this is determined before path processing. Anything I'm missing?

I'm thinking it's time to expand the interface between HTTPServer and Application.  Instead of one callback when the body is read, there at least be an intermediate callback when the headers are read, so the Application can tell the server whether to buffer the body (and how big a buffer to allow - right now max_buffer_size is global across the whole server), stream it, or reject the request immediately (this would also allow for proper 100-continue support).  

-Ben

jacob

unread,
Jan 9, 2014, 10:33:37 AM1/9/14
to python-...@googlegroups.com, Josh Marshall, Bryan Pham, Hermann Yung, b...@bendarnell.com
There is a pull request implementing a variant of this, with 100-continue support: https://github.com/facebook/tornado/pull/685
It would be really awesome to have streaming POST/PUT request body data implemented in the next release. This has been on top of my Tornado feature wish list for years. :)
-Jacob

Ben Darnell

unread,
Jan 9, 2014, 9:06:33 PM1/9/14
to Tornado Mailing List, Josh Marshall, Bryan Pham, Hermann Yung
Yeah, I feel bad about leaving that pull request open for so long, especially now that I look back at the log and I've punted on it solely because of timing for two consecutive releases.  This time, though... :)




--
You received this message because you are subscribed to the Google Groups "Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-tornad...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

tiadobatima

unread,
Jan 13, 2014, 2:41:32 PM1/13/14
to python-...@googlegroups.com, Josh Marshall, Bryan Pham, Hermann Yung, b...@bendarnell.com
Hi Ben,

Callbacks when headers are read would be great!

To expand on this, it would be amazing to have hooks into each of the HTTP request "phases":
- Pre and post request URL (GET /abc.html HTTP/1.1).
- Pre and post all headers are read.
- When each header is read.
- Pre and post body is read.
- Before server closes the connection.

Something similar could be done for the response, though it's unlikely someone would dynamically the response after it's started being sent to the client... still possible though.

Yes, some of these hooks do overlap and can be merged into one, but you get the idea.

What do you think?
g. 

Ben Darnell

unread,
Jan 16, 2014, 9:54:20 PM1/16/14
to tiadobatima, Tornado Mailing List, Josh Marshall, Bryan Pham, Hermann Yung
On Mon, Jan 13, 2014 at 2:41 PM, tiadobatima <gbar...@gmail.com> wrote:
Hi Ben,

Callbacks when headers are read would be great!

To expand on this, it would be amazing to have hooks into each of the HTTP request "phases":
- Pre and post request URL (GET /abc.html HTTP/1.1).
- Pre and post all headers are read.
- When each header is read.
- Pre and post body is read.
- Before server closes the connection.

Most of these can be grouped into one - since all the headers arrive at nearly the same time there's not much point in splitting them up into separate callbacks.  I'm thinking something like this:
* When the (TCP) connection is established (so you can reject a connection solely based on the IP address before doing any real work)
* For HTTPS, after the SSL handshake (so you can see the client's certificate).  There might be more SSL-related callbacks for SNI or NPN/ALPN if there is a need to expose those events to the application.
* After all the headers have been read
* As each chunk of the body is read (for applications that opt into streaming request processing).
* After the body has been completely read
* After the handler has finished generating its response
* After the last byte of the response has been sent to the client
* When the connection has reached an idle timeout and is closed, or has been closed by the client.

That last bullet also implies adding timeouts to each stage of the request, as well as byte limits (separate limits for headers and body instead of the one global buffer limit we have now).  I'd like to make these limits adjustable from the callbacks as well, so you can look at the headers and e.g. use a larger body limit for handlers that are expecting large PUTs than for other requests.  The streaming body callbacks will also need some way to apply backpressure to the connection.

-Ben

Roberto Polli

unread,
Feb 28, 2014, 9:11:34 AM2/28/14
to python-...@googlegroups.com, tiadobatima, Josh Marshall, Bryan Pham, Hermann Yung, b...@bendarnell.com
Hi Ben + all,

On Friday, January 17, 2014 3:54:20 AM UTC+1, Ben Darnell wrote:
Most of these can be grouped into one - since all the headers arrive at nearly the same time there's not much point in splitting them up into separate callbacks.  I'm thinking something like this:
* When the (TCP) connection is established (so you can reject a connection solely based on the IP address before doing any real work)
* For HTTPS, after the SSL handshake (so you can see the client's certificate).  There might be more SSL-related callbacks for SNI or NPN/ALPN if there is a need to expose those events to the application.
* After all the headers have been read
* As each chunk of the body is read (for applications that opt into streaming request processing).
* After the body has been completely read
* After the handler has finished generating its response
* After the last byte of the response has been sent to the client
* When the connection has reached an idle timeout and is closed, or has been closed by the client

Are there any drawbacks for this `simpler` solution:
1- modify HTTPServer so that HTTPConnection class can be extended
2- Everybody implements his workflow extending HTTPConnection ;)

Peace,
R.

Ben Darnell

unread,
Feb 28, 2014, 9:51:47 AM2/28/14
to Tornado Mailing List, tiadobatima, Josh Marshall, Bryan Pham, Hermann Yung
Even with a subclass of HTTPConnection, there are no appropriate methods to override to implement e.g. streaming uploads.  HTTPConnection currently reads the entire body with a single call to read_bytes inside a large method.  Since we have to define new interfaces anyway, we might as well do them in a way that integrates well with the rest of the framework instead of just what's simplest to implement.

-Ben
 

Peace,
R.

Roberto Polli

unread,
Feb 28, 2014, 10:33:49 AM2/28/14
to python-...@googlegroups.com
2014-02-28 15:51 GMT+01:00 Ben Darnell <b...@bendarnell.com>:
> Even with a subclass of HTTPConnection, there are no appropriate methods to
> override to implement e.g. streaming uploads.

> HTTPConnection currently
> reads the entire body with a single call to read_bytes inside a large
> method. Since we have to define new interfaces anyway, we might as well do
> them in a way that integrates well with the rest of the framework instead of
> just what's simplest to implement.
Ok for streaming uploads, but imho the inability to evaluate:
* authentication headers;
* 100-continue content-length headers;
before data transmission is really an issue.

As of now I'm managing this with a small kludge (not a tornado expert)

self._request = HTTPRequest(
connection=self, method=method, uri=uri, version=version,
headers=headers, remote_ip=self.address[0])

+# added a get_handler method in tornado.web.Application
+# retrieving the handler attached to the request
+handler = self.request_callback.get_handler(self._request)
+for cb in getattr(handler, 'after_headers', []):
+ # executing header_callback
+ cb(self._request)

# continue with default behavior
content_length = headers.get("Content-Length")

The after_headers functions are defined into the RequestHandler class
Are there any drawbacks? How would you implement this instead?

Thx + Peace,
R

Ben Darnell

unread,
Mar 1, 2014, 11:10:49 AM3/1/14
to Tornado Mailing List
This looks reasonable as a quick fix; in the long run we'll have an officially-supported cleaner interface for this kind of callback.

-Ben
 

Thx + Peace,
R

Roberto Polli

unread,
Mar 1, 2014, 11:24:13 AM3/1/14
to python-...@googlegroups.com
2014-03-01 17:10 GMT+01:00 Ben Darnell <b...@bendarnell.com>:
> This looks reasonable as a quick fix; in the long run we'll have an
> officially-supported cleaner interface for this kind of callback.

If you have some ideas on how to implement that I could make some stubs.

We could collect this post discussion in a set of use-cases
(eventually example http request and responses) to clean the vision of
our requirements.

I don't cover http streaming so it would really help me to understand
the streaming part of the issue.

This change could impact on the @authenticated stuff, as header
authentication should happen as soon as possible.

Thx++ for your time & Peace,
R.

Ben Darnell

unread,
Mar 2, 2014, 8:37:33 PM3/2/14
to Tornado Mailing List
On Sat, Mar 1, 2014 at 11:24 AM, Roberto Polli <robi...@gmail.com> wrote:
2014-03-01 17:10 GMT+01:00 Ben Darnell <b...@bendarnell.com>:
> This looks reasonable as a quick fix; in the long run we'll have an
> officially-supported cleaner interface for this kind of callback.

If you have some ideas on how to implement that I could make some stubs.

We could collect this post discussion in a set of use-cases
(eventually example http request and responses) to clean the vision of
our requirements.

I don't cover http streaming so it would really help me to understand
the streaming part of the issue.

I've been looking at this as primarily a streaming issue - if you can afford to buffer a whole request in memory, it's not a big deal to just receive the whole request before rejecting it, so pre-body callbacks are mainly important because they enable streaming.  Is there something I'm missing here?  Why do you want to be able to reject a request early if you're not concerned about streaming?
 

This change could impact on the @authenticated stuff, as header
authentication should happen as soon as possible.

Not necessarily.  Without "Expect: 100-continue" the only way to reject a request without reading it in HTTP 1.x is to close the connection, which is relatively expensive because it requires a new handshake and resets the TCP slow-start process.  A request must be quite large for this to be economical.  

-Ben
 

Thx++ for your time & Peace,
R.

Roberto Polli

unread,
Mar 4, 2014, 6:42:17 AM3/4/14
to python-...@googlegroups.com
2014-03-03 2:37 GMT+01:00 Ben Darnell <b...@bendarnell.com>:
> I've been looking at this as primarily a streaming issue - if you can afford
> to buffer a whole request in memory, it's not a big deal to just receive the
> whole request before rejecting it, so pre-body callbacks are mainly
> important because they enable streaming. Is there something I'm missing
> here? Why do you want to be able to reject a request early if you're not
> concerned about streaming?
I use a Tornado WebDAV application. When I PUT a file I'd like to ensure:
- user is properly authenticated
- user quota (Content-Length) is respected.

my wish is that this should happen before replying OK to 100-continue
or - in the worst case - as soon as possible.

>> This change could impact on the @authenticated stuff, as header
>> authentication should happen as soon as possible.
>
> Not necessarily. Without "Expect: 100-continue" the only way to reject a
> request without reading it in HTTP 1.x is to close the connection
As of now I experience the following behavior (with or without 100-continue)

= case 100-continue =
>PUT /dest/file.out HTTP/1.1
>Authorization: bad:or:no:credentials
>Content-Length: 500kB
>Expect: 100-continue
*strace says tornado recvfrom() 500kB*
< HTTP/1.1 100 (Continue)
< HTTP/1.1 401 Unauthorized
< Content-Length: 90
...

= case without 100-continue =
>PUT /dest/file.out HTTP/1.1
>Authorization: bad:or:no:credentials
>Content-Length: 500kB
*strace says tornado recvfrom() 500kB*
< HTTP/1.1 100 (Continue)
< HTTP/1.1 401 Unauthorized
< Content-Length: 90
...

Wrapping tornado around apache httpd I can enforce authentication
*before* reading the request and tornado does not receive anything.
But obviously I'd like to manage auth via tornado ;)

> PUT /dest/file.out HTTP/1.1
> Content-Length: 5264156
> Expect: 100-continue
>
< HTTP/1.1 401 Authorization Required
< Date: Tue, 04 Mar 2014 11:31:04 GMT
< WWW-Authenticate: Basic realm="realm"
< Vary: Accept-Encoding
< Content-Length: 401
< Connection: close
< Content-Type: text/html; charset=iso-8859-1


> relatively expensive because it requires a new handshake and resets the TCP
> slow-start process. A request must be quite large for this to be
> economical.
I suppose my case falls here, but I really wish to ear from you.

Thanks again and have a good day,
R.
Reply all
Reply to author
Forward
0 new messages