Streaming request body handler (for uploading large files)

3,215 views
Skip to first unread message

jacob

unread,
Aug 9, 2011, 3:45:46 PM8/9/11
to Tornado Web Server
Hi,

I really like Tornado, but I am missing a useful feature that nodejs
has, namely the ability to stream the request body (directly to a disc
or to a socket), instead of handling the whole request body in server
memory. This feature is essential, e.g. if you want to receive/proxy
large files, or run the server on a system with limited resources.

After searching this discussion forum I have found several posts
discussing the lack of this feature in Tornado. The default response
is to look into the file upload features in nginx. But considering the
native support in nodejs, I really feel that Tornado should also offer
a similar feature.

I have implemented experimental support for a streaming request body
handling. The code is on GitHub:
https://github.com/nephics/tornado/commit/1bd964488926aac9ef6b52170d5bec76b36df8a6

Here is an example demonstrating the use of this feature:
https://gist.github.com/1134964

I am sure that my implementation can be improved. But maybe this
experimental branch can inspire Ben and others to get this feature
implemented in the Tornado main branch. Please fork and improve my
code!

Regards,
Jacob

Twitter: @nephics

Josh Marshall

unread,
Aug 9, 2011, 4:28:15 PM8/9/11
to python-...@googlegroups.com
After searching this discussion forum I have found several posts
discussing the lack of this feature in Tornado. The default response
is to look into the file upload features in nginx. But considering the
native support in nodejs, I really feel that Tornado should also offer
a similar feature.


I implemented support for custom request body handling (including streaming to a file, or automatically parsing JSON, etc.) back in March, although I didn't follow up and I almost certainly need to make fixes for Tornado 2.0:

http://groups.google.com/group/python-tornado/browse_thread/thread/6413ac33dd7444b0

Perhaps we should sync up and make an actual pull request from all this? :) 

jacob

unread,
Aug 10, 2011, 2:46:39 AM8/10/11
to Tornado Web Server
> I implemented support for custom request body handling (including streaming
> to a file, or automatically parsing JSON, etc.) back in March, although I
> didn't follow up and I almost certainly need to make fixes for Tornado 2.0:
>
> http://groups.google.com/group/python-tornado/browse_thread/thread/64...
>
> Perhaps we should sync up and make an actual pull request from all this? :)

Great, I cannot explain how I managed to miss your contribution when
searching the discussions.
We should definitely bring this together and make a pull request to
the main branch.

Your approach, as I understand it, is based on specifying application
wide handlers for serving specific mimetypes.
My implementation allows for specifying a custom body handler for each
specific request handler's POST/PUT methods (independent of mimetype).
Somehow this should meet - personally I prefer my approach ;-)

Jacob Søndergaard

unread,
Aug 11, 2011, 10:57:43 AM8/11/11
to Artemi Krymski, Python Tornado
There is a default max size of 100 MB on the request body content_length, see:
https://github.com/facebook/tornado/blob/master/tornado/iostream.py#L80
https://github.com/facebook/tornado/blob/master/tornado/httpserver.py#L357

So it should be okay, if the server has memory to handle a bytestring of 100 MB. Note that if nginx is used in front of tornado, the entire request will be buffered in nginx berfore it is passed upstream. Hence, you may need (at least) the double amount of memory to handle the request. (read more here: http://wiki.nginx.org/HttpProxyModule)

Implementing the feature with streaming body may reduce the memory requirement of tornado when uploading large files. (But there is still a memory issue if nginx is used as proxy, so then it is better to use the nginx file upload module: http://www.grid.net.ru/nginx/upload.en.html)

On Thu, Aug 11, 2011 at 12:46, Artemi Krymski <kry...@post.fm> wrote:
Out of curiosity: does this mean that a posting a large enough request could crash a tornado app?  Is this a potential vulnerability and should all requests be streamed to disk first, in general?

youyou

unread,
May 25, 2012, 5:50:33 AM5/25/12
to python-...@googlegroups.com, Artemi Krymski


Le jeudi 11 août 2011 16:57:43 UTC+2, jacob a écrit :
There is a default max size of 100 MB on the request body content_length, see:
https://github.com/facebook/tornado/blob/master/tornado/iostream.py#L80
https://github.com/facebook/tornado/blob/master/tornado/httpserver.py#L357

So it should be okay, if the server has memory to handle a bytestring of 100 MB. Note that if nginx is used in front of tornado, the entire request will be buffered in nginx berfore it is passed upstream. Hence, you may need (at least) the double amount of memory to handle the request. (read more here: http://wiki.nginx.org/HttpProxyModule)
You can switch off proxy buffering for nginx 

proxy_buffering off;

According to the documentation (http://wiki.nginx.org/HttpProxyModule): 
If buffering is switched off, then the response is synchronously transferred to client immediately as it is received -> this is a pure streaming feature by nginx, better I think than store upload to file.
You can also set a specific header by the client (X-Accel-Buffering) in order to activate this feature.

Secondly when nginx buffer the request , all data doesn't go to memory (see http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_busy_buffers_size) but just some segments.

So I think it is inexact that nginx double the memory amount for handling this request.

Regards Youenn
Reply all
Reply to author
Forward
0 new messages