Cooperatively reading/writing a request/response in Django

344 views
Skip to first unread message

André Cruz

unread,
Oct 17, 2012, 11:53:52 AM10/17/12
to <gevent@googlegroups.com>, uWSGI developers and users list
Hello.

I have a Django application that does uploads and downloads of files. Since these are slow requests, I don't want a thread or a process per request, so I was thinking of using greenlets.

I have setup a uWSGI server with the Gevent loop, and I can do long polling requests just fine. I wait for a Gevent.Event and the thread switches to another greenlet which handles other requests meanwhile. The problem is that it seems read()s from the Django request object are not cooperative, as also response writes. Is there anyway to make these cooperative as well?

Thanks,
André Cruz

Roberto De Ioris

unread,
Oct 17, 2012, 2:41:58 PM10/17/12
to gev...@googlegroups.com
Be sure to not have post-buffering option in your uWSGI config. Buffering
request body to disk is not async in uWSGI.

--
Roberto De Ioris
http://unbit.it

André Cruz

unread,
Oct 17, 2012, 5:44:54 PM10/17/12
to gev...@googlegroups.com, rob...@unbit.it
On Wednesday, October 17, 2012 7:42:05 PM UTC+1, Roberto De Ioris wrote:

Be sure to not have post-buffering option in your uWSGI config. Buffering
request body to disk is not async in uWSGI. 

I do not have post-buffering in my config. Is this buffering for requests or responses?

I've managed to work around the cooperative read problem, by patching Django's read() from wsgi.input and calling   
gevent.socket.wait_read(environ['wsgi.input'].fileno()) before the actual read() call. This way, when there is no data to read, the flow will be switched to another greenlet. It seems to work, but I don't know if this is the correct approach.

But I've still got the cooperative write problem. Since my application returns an iterator to uWSGI, it seems that it's uWSGI that's writing to the client socket, blocking,  and not cooperating. Is there an wsgi.output environment variable that I can use to wait on, until I'm able to write? Or maybe I can reuse wsgi.input, is it the same socket?

On another note, can't uWSGI's gevent loop engine handle these scenarios and switch greenlets when a read() does not have anything to read and a write() would block?

Thanks,
André

Alex K

unread,
Oct 17, 2012, 9:38:29 PM10/17/12
to gev...@googlegroups.com, uWSGI developers and users list
Hi Andre,

Quotation from uWSGI docs "Currently uWSGI async mode does not support the read of POST data in async mode. This is not a big problem for most of the apps, but if your webserver handler does not do buffering of the post data (as apache) you could get some performance problem with big uploads."

May I suggest Nginx as a front-end with Django app served by gevent-fastcgi via FastCGI.

Alex K

Roberto De Ioris

unread,
Oct 18, 2012, 1:06:19 AM10/18/12
to gev...@googlegroups.com, rob...@unbit.it

> On Wednesday, October 17, 2012 7:42:05 PM UTC+1, Roberto De Ioris wrote:
>>
>>
>> Be sure to not have post-buffering option in your uWSGI config.
>> Buffering
>> request body to disk is not async in uWSGI.
>
>
> I do not have post-buffering in my config. Is this buffering for requests
> or responses?
>
> I've managed to work around the cooperative read problem, by patching
> Django's read() from wsgi.input and calling
> gevent.socket.wait_read(environ['wsgi.input'].fileno()) before the actual
> read() call. This way, when there is no data to read, the flow will be
> switched to another greenlet. It seems to work, but I don't know if this
> is
> the correct approach.


Yes, this is the correct approach, gevent monkey_patching cannot patch c
functions/modules.

>
> But I've still got the cooperative write problem. Since my
> application returns an iterator to uWSGI, it seems that it's uWSGI that's
> writing to the client socket, blocking, and not cooperating. Is there an
> wsgi.output environment variable that I can use to wait on, until I'm able
> to write? Or maybe I can reuse wsgi.input, is it the same socket?
>
> On another note, can't uWSGI's gevent loop engine handle these scenarios
> and switch greenlets when a read() does not have anything to read and a
> write() would block?


this is the job of your app, and from a WSGI point of view, what you are
trying to do is a violation of the standard. But honestly, i do not care
it :) You can use the file descriptor exported by uwsgi.connection_fd()
and simply stream data to it.

Roberto De Ioris

unread,
Oct 18, 2012, 1:13:09 AM10/18/12
to gev...@googlegroups.com

> Hi Andre,
>
> Quotation from uWSGI docs "Currently uWSGI async mode does not support the
> read of POST data in async mode. This is not a big problem for most of the
> apps, but if your webserver handler does not do buffering of the post data
> (as apache) you could get some performance problem with big uploads."


Ehm, it is referred to plain 'async mode' (nodejs-style callback usage),
the gevent plugin is a different thing. You can hit the problem if you
stream the post data to disk, but it looks like this is not the case.

The main problem (as always) is the correct cooperation with WSGI apps,
that should not know what is happening below, but with greenthreads
cooperation is often not possible.

Denis Bilenko

unread,
Oct 20, 2012, 2:43:19 PM10/20/12
to gev...@googlegroups.com
On Wed, Oct 17, 2012 at 11:44 PM, André Cruz <andre...@co.sapo.pt> wrote:
> I've managed to work around the cooperative read problem, by patching
> Django's read() from wsgi.input and calling
> gevent.socket.wait_read(environ['wsgi.input'].fileno()) before the actual
> read() call. This way, when there is no data to read, the flow will be
> switched to another greenlet. It seems to work, but I don't know if this is
> the correct approach.

No, because environ['wsgi.input'].read() will read the whole request
body, whereas wait_read() will exit as soon as there's some data
available to read.

So, if environ['wsgi.input'].read() was blocking, inserting
wait_read() before it won't help, it might block anyway.

André Cruz

unread,
Oct 21, 2012, 8:03:03 AM10/21/12
to gev...@googlegroups.com
Yes, you are right. Although I'm not using read() but read(CHUNK_SIZE), since CHUNK_SIZE is > 1, I still have that potential problem.

However, I've run some tests, and with a chunk size of 8k I can't seem to make it block even with very slow clients…

Anyway, is there a better alternative?

Best regards,
André

Denis Bilenko

unread,
Oct 21, 2012, 8:39:56 AM10/21/12
to gev...@googlegroups.com
On Sun, Oct 21, 2012 at 2:03 PM, André Cruz <andre...@co.sapo.pt> wrote:
> Yes, you are right. Although I'm not using read() but read(CHUNK_SIZE), since CHUNK_SIZE is > 1, I still have that potential problem.
>
> However, I've run some tests, and with a chunk size of 8k I can't seem to make it block even with very slow clients…
>
> Anyway, is there a better alternative?

If you use gevent-aware WSGI server (e.g. gevent.pywsgi), then
wsgi.input is already of the right type, suitable to be used in a
greenlet.

André Cruz

unread,
Oct 21, 2012, 10:20:15 AM10/21/12
to gev...@googlegroups.com
I'll give it a try. In order to take advantage of multiple cores, is this the recommended approach? http://stackoverflow.com/questions/7407868/gevent-pywsgi-server-multiprocessing

Best regards,
André

Denis Bilenko

unread,
Oct 21, 2012, 10:21:54 AM10/21/12
to gev...@googlegroups.com
No, use something where you don't need to code it yourself, like gunicorn.

André Cruz

unread,
Oct 21, 2012, 10:39:13 AM10/21/12
to gev...@googlegroups.com
On Oct 21, 2012, at 3:21 PM, Denis Bilenko <denis....@gmail.com> wrote:

> No, use something where you don't need to code it yourself, like unicorn.

Gunicorn does not support SSL yet. I just checked https://github.com/benoitc/gunicorn/pull/265.

SSL is one of my requirements, I'll try it when the support is there.

Thanks for the help anyway. :)
André


Roberto De Ioris

unread,
Oct 21, 2012, 11:02:39 AM10/21/12
to gev...@googlegroups.com
It will not block, as if you specify a chunk_size, wsgi.input read() will
behave like a standard socket. Django does not make use of read() without
chunk in its code so you should be safe (from that point of view, but
there are tons of other areas to work on to have a fully non-blocking
django app, albeit lot of users think combining gunicorn/uWSGI + gevent is
enough).

>
> Anyway, is there a better alternative?


Personally i always treat wsgi.input as a normal socket, there are
abstractions (included some monkeypatching technique) not forcing you to
write/patch http body handlers (like the gevent pywsgi server).

Roberto De Ioris

unread,
Oct 21, 2012, 12:23:39 PM10/21/12
to gev...@googlegroups.com
If you are only transferring files, i find using a greenlet for the whole
transfer a bit 'expensive'.

uWSGI from github has the concept of 'offload transfers': a second pthread
(read: no python involved) can send thousand of files asynchronously:

import gevent
import uwsgi

def application(environ, start_response):
start_response('200 OK', [('Content-Type','application/binary')])
# this trigger headers and start transferring the file
yield uwsgi.offload_transfer('hugefile')


The python api is still very simple:

uwsgi.offload_transfer(file[,size])

but the concept can be extended to other areas, feel free to post ideas on
the github issues tracker.

André Cruz

unread,
Oct 21, 2012, 12:43:26 PM10/21/12
to gev...@googlegroups.com
On Oct 21, 2012, at 5:23 PM, Roberto De Ioris <rob...@unbit.it> wrote:

> If you are only transferring files, i find using a greenlet for the whole
> transfer a bit 'expensive'.

These "files" are not locally stored. I have to fetch them in blocks from a storage system like S3. And I'm fetching the data blocks as the client read()s from his end of the connection… I don't think I can use static offloading this way, or can I?

Best regards,
André

Roberto De Ioris

unread,
Oct 22, 2012, 12:35:48 AM10/22/12
to gev...@googlegroups.com
No, i think gevent for such purpose is the best choice.

By the way, i have just found a bug causing blocking writes on chunk
bigger than 64k, so if you are using blocks bigger than that you may
experience a blocking behaviour. I am about to fix that as soon as
possibile.

vitaly

unread,
Oct 22, 2012, 1:09:54 AM10/22/12
to gev...@googlegroups.com
> I have to fetch them in blocks from a storage system like S3.

André, what are you using for gevent-compatible S3 downloads?  Boto with gevent-socket monkey-patching or something else?

Thank you,
Vitaly

André Cruz

unread,
Oct 22, 2012, 5:16:43 AM10/22/12
to gev...@googlegroups.com
On Oct 22, 2012, at 6:09 AM, vitaly <vitaly.kru...@gmail.com> wrote:

> > I have to fetch them in blocks from a storage system like S3.
>
> André, what are you using for gevent-compatible S3 downloads? Boto with gevent-socket monkey-patching or something else?

I use a monkey-patched "requests" for the downloads, and "poster" for uploads since "requests" does not yet support streaming uploads. I don't use S3, we have a local Swift cluster which is mostly S3 API compatible and should work with Boto, but it seemed too complicated for the simple operations I needed to do. Testing Boto is on my TODO list.

André


vitaly

unread,
Oct 22, 2012, 5:32:13 PM10/22/12
to gev...@googlegroups.com
> I use a monkey-patched "requests" for the downloads, and "poster" for uploads

Thank you,
Vitaly
Reply all
Reply to author
Forward
0 new messages