async / await

173 views
Skip to first unread message

Benj

unread,
Jan 20, 2016, 5:41:47 PM1/20/16
to Django users
Hi,

on my django project, at a point i resize an image with pilow before associating it to django field, then saving the instance.

is there any benefit in making the image resize / saving on disk async ?

I heard that since django is a non async framework, there is no interest in doing so. But is there ?

Russell Keith-Magee

unread,
Jan 20, 2016, 6:57:29 PM1/20/16
to Django Users
Hi Benj,

On Thu, Jan 21, 2016 at 6:41 AM, Benj <webk...@gmail.com> wrote:
Hi,

on my django project, at a point i resize an image with pilow before associating it to django field, then saving the instance.

is there any benefit in making the image resize / saving on disk async ?

Yes - although async isn’t the way I’d describe it - I’d describe this as background task processing.

When a user hits your web server, the server generates a page, and returns the content to the user. The user’s browser then displays the page. The user doesn’t see *any* content until the entire page has been generated and sent back to them. This means that it is critical that the page can be computed quickly - if it isn’t, the user will observe this as a slow page load.

So - if you’ve got a time intensive task, like resizing an image, you’re generally advised to do that *outside* the request/response loop. You *can* do it inline (or synchronously), and if you’re dealing with a low traffic site with users that don’t mind an occasional delay, you might do it inline just to expedite the development process. But if you’re on a high traffic site, or if users will notice a delay, you should get any time-expensive processing out of the page generation process.

There’s another reason to get time-expensive processes out of the page generation process: server load. Web servers generally have a fixed capacity - they can only be processing N requests at a time. Some web servers can *accept* very large numbers of simultaneous connections (nGinX, for example, can handle thousands) - but that’s only *accepting* the connection - web servers will generally *process* a handful of requests at a time (often some small multiple of the number of processors). 

So - if you have something that takes a non-trivial amount of time to perform, you will be locking up that web server thread until the processing is completed. That means you’ve just reduced the number of available web server threads. That means *everybody else* visiting your website will have degraded performance, not just the person whose request caused the time expensive task. If you’ve got a lot of users who simultaneously request the same time-expensive view, *nobody* will be able to get a request processed, because the web server will be tied up doing the time-expensive tasks.

The usual approach for this sort of problem (and image processing is the classic use case) is a worker thread. The user submits their image, the web server receives it, puts the image onto a work queue, and immediately response with a success acknowledging receipt. In a completely separate worker thread, images are taken off the queue, processed, and stored. This means the user gets a fast response, doesn’t drag everyone else down with them, but the work is still done.

This obviously imposes some extra overhead on your code - in your example, you can’t assume the image exists, so you have to put in fallback mechanisms when the user requests a page where the image needs to be displayed.

To implement this sort of feature, you need to have a worker queue - Celery is the heavy duty answer for this; if you just need a cheap and cheerful answer, RQ is a fairly easy-to-use option, or you can roll-your-own in the database without too much trouble.

As an aside - when web developers talk about “asynchronous” behaviour, they are generally referring to things like chat clients. This is a situation where the server is able to send data back to the client at will. “Classic web” is client-driven; user requests a page, server provides it. “Asynchronous web” is a different mode of operation, where the user requests a specific thing which isn’t available yet; server provides it when it is available. This approach *could* be used for something like image processing, but it isn’t something Django is well set-up to manage at present.
 
I heard that since django is a non async framework, there is no interest in doing so. But is there ?

I’m not sure exactly what you’ve heard - but it sounds like whoever told you was misinformed.

Django doesn’t *currently* have any built-in asynchronous tools, but that doesn’t mean we don’t want to add them, or that there aren’t options for doing asynchronous work right now. There are a couple of patches, in various stages of development, including one that just received a large grant from Mozilla, that will add various flavours of asynchronous handling to Django. 

Yours,
Russ Magee %-)



Benjamin Melki

unread,
Jan 20, 2016, 7:20:04 PM1/20/16
to django...@googlegroups.com


To implement this sort of feature, you need to have a worker queue - Celery is the heavy duty answer for this; if you just need a cheap and cheerful answer, RQ is a fairly easy-to-use option, or you can roll-your-own in the database without too much trouble.


Thank you Russel, for the informative answer. It’s all pretty clear, and it is a great news to learn that some async patch will make it into Django. 

Just to be perfectly clear…. you talk about celery and indeed i’ve seen some tutorials about it.
But most were written before 3.5 python.

Specifically for the image processing, do I need to install celery, can’t I just use async / await for this task ?











--
You received this message because you are subscribed to a topic in the Google Groups "Django users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-users/bHtaOx9eHYg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAJxq849Zua8%2BPMCRkAeT7nAXuJnshORcjnJLjv3SAgFSoJ6bxg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Russell Keith-Magee

unread,
Jan 20, 2016, 8:43:15 PM1/20/16
to Django Users
On Thu, Jan 21, 2016 at 8:19 AM, Benjamin Melki <webk...@gmail.com> wrote:



To implement this sort of feature, you need to have a worker queue - Celery is the heavy duty answer for this; if you just need a cheap and cheerful answer, RQ is a fairly easy-to-use option, or you can roll-your-own in the database without too much trouble.


Thank you Russel, for the informative answer. It’s all pretty clear, and it is a great news to learn that some async patch will make it into Django. 

Just to be perfectly clear…. you talk about celery and indeed i’ve seen some tutorials about it.
But most were written before 3.5 python.

Specifically for the image processing, do I need to install celery, can’t I just use async / await for this task ?

The Python 3.5 async/await keywords are an entirely different beast, for an entirely different problem - that’s why I flagged the fact that “asynchronous” was a misleading word to use in this case.

I’m sure you *could* use async/await to implement an image processing solution, but you wouldn’t gain much. A user would still connect to your server, start a request, which would spawn (or acquire) a thread to do image handling… and then wait until the background processing was complete before “await”ing, and then returning the result to the user. The end user experience is still going to be “nothing is seen until the entire process is completed”, and absent of some very sophisticated server gymnastics, the server thread is still going to be locked up waiting for the image processing to return. All you will have achieved is a lot of complexity in your code so that your image processing is handled in a separate thread. You could do exactly the same thing without async/await by spawning a subprocess or thread, and waiting/joining on the subprocess/thread response.

Ultimately, this is really all async/await is - syntactic sugar around diverting program flow to an alternate thread (Insert suitable amounts of hand waving here glossing over the details and nitpicking :-). You’re still going to have a web server waiting for something, and a user waiting for a finished request.

Moving to an all singing, all dancing asynchronous web server (such as twisted, tornado, or a patched version of Django such as the Django Channels that should be coming in an upcoming release), would solve the server resource starvation problem, which I suppose is a good thing, but the end user will still be waiting to see responses displayed - and remember, it only takes 0.3s of lag for a user to declare that a user interface is slow.

Remember, “asynchronous” isn’t magic fairy dust that makes everything faster. You need to understand *what* is causing blocking in your algorithm in order to evaluate if synchronicity is the problem you have. In the case of image processing, you don’t want asynchronicity - you want background processing.

Yours,
Russ Magee %-)
Reply all
Reply to author
Forward
0 new messages