problem with django, nginx and gunicorn

ReneMarxis

unread,

May 1, 2014, 7:11:01 AM5/1/14

to django...@googlegroups.com

Hello

i have some question on deploying an django app.
I'm using nginx and gunicorn to publish one django app.
For gunicorn i wanted to use gevent, because there are some calls to the app, that can take up to 5 minutes to finish (generating pdfs)

Firsts question is more an understanding question... When using gevent and only one gunicorn worker thread, all 'blocking' requests run in an seperate gevent/libevent thread correct? Thus, that call will not block the app completly right? If i understud the task of gevent correct, this is the sens of gevent right?

Now to my problem... Doing one blocking call to my app, and having only on worker, blocks my entire application.
In the gunicorn logs (startup) i can see, that gevent is used, but the calls seem to be synchronus.

Here is my nginx config
https://dpaste.de/Dfi5

and here my gunicorn
https://dpaste.de/UAqH

Can someone find some error in this config?

My next problem is, that calls that take longer than 60 seconds lead to an error if i set timeout to 120 inside the gunicorn config (504- Gateway Time-out).
Where can i tweak this? It seems like nginx is cutting the connection. The gunicorn thread is running 120 seconds and is getting restarted by gunicorn-master then (could check that by using top).

Any hints are very welcome ...

Thanks

Erik Cederstrand

unread,

May 1, 2014, 7:56:56 AM5/1/14

to Django Users

Den 01/05/2014 kl. 13.11 skrev 'ReneMarxis' via Django users <django...@googlegroups.com>:

> Now to my problem... Doing one blocking call to my app, and having only on worker, blocks my entire application.
> In the gunicorn logs (startup) i can see, that gevent is used, but the calls seem to be synchronus.

If your greenlet is not yielding somewhere, then it is in fact worse than standard threads in Python. Due to the GIL, only one thread can effectively run at a time. If your job is very CPU-intensive, then you need to use multiprocessing you make use of the other cores on your multi-core CPU.

> My next problem is, that calls that take longer than 60 seconds lead to an error if i set timeout to 120 inside the gunicorn config (504- Gateway Time-out).

You really shouldn't have requests that take this long. You need to redesign this, e.g. by sending an email to the user with a download URL when the PDF is ready, or by delegating the job to a task server and displaying a status page with some ajax call that polls the progress of your job.

Erik

ReneMarxis

unread,

May 1, 2014, 10:06:19 AM5/1/14

to django...@googlegroups.com

Hello Erik

thanks for your responce!

I started reading a little bit related to GIL. I think this is the root of my problem. I'm using xhtml2pdf to generate some large pdf's (up to 200 pages).
I do know i have to rewrite this code to run in background (e.g. using celery).
However i want to understand, why gevent is blocking. As i understood, gevent spawms a new 'microthread' that does the long lasting work. My mistake was/is that i thought a gevent thread would be a full featured gunicorn thread. but this can't be of course. wouldn't make sense to use gevent otherwise ...
However this is not a big issue, as the function itself is used only in admin by some special users. I can 'train' them to create only smaller pdf that get created under 10 seconds ...
Nevertheless it would be nice to know, what kind of "heavy work" can be done asynchronous with gevent. Reading the documentation i can't figure out, where gevent would realy help improving to serve more pages/sec.

Javier Guerra Giraldez

unread,

May 1, 2014, 12:27:32 PM5/1/14

to django...@googlegroups.com

On Thu, May 1, 2014 at 9:06 AM, 'ReneMarxis' via Django users
<django...@googlegroups.com> wrote:
> I started reading a little bit related to GIL. I think this is the root of
> my problem. I'm using xhtml2pdf to generate some large pdf's (up to 200
> pages).

no, the problem isn't the GIL. no matter the framework, language or
architecture, if you take more than a few seconds to generate an HTTP
response, users are going to complain. the appropriate approach for
anything that takes more time is to dispatch to other process.

that's why there are so many queue managers, they make it easier to
allocate a task to be performed by other process.

the high level process should be something like this:

- user click on some "do it" button or link. it generates an HTTP
request to do some work

- the web app (a Django View in this case) gets the request, picks any
necessary parameter and initiates a task. the task manager responds
immediately with some ID.

- the web app immediately generates a response for the user with some
message like "process initiated, please wait" probably shows the ID,
or maybe a link to the "is it ready" with this ID as a parameter.

- meanwhile, the task/queue manager delivers the task parameters to a
worker process.

- if the user (or some Javascript) clicks the "is it ready" link,
another Django view gets the task ID, checks the status and if it's
not ready, shows the "please wait" message.

- eventually, the task is finished, and the result (or some URL to a
produced file) is stored in the task, under the same ID

- when the "is it ready" view is hit again with this ID, it sees the
task has finished, so it redirects to the "result" URL.

about the task/queue manager, there are several available, from a
simple register in a "tasks" table and a cron task to check any
records there in the "unfinished" status (called a Ghetto queue), to
the full-featured Celery.

most of them allow you to define the tasks as simple Python functions
with a @task decorator, and simply calling it results in the queuing
of the arguments. something like this (semi-pseudocode):

---- tasks.py

from queue import task
from models import MyModelA, MyModelB

@task
def dolotsofwork(userid, modelaid, someotherarg):
user = User.objects.get(pk=userid)
modA = MyModelA.get(pk=modelaid)
...... do much more heavy processes....
return "result-%d.pdf" % resultId

------ views.py

from tasks import dolotsofwork

def starttask(request, arg):
user = request.user
modela = get_object_or_404(MyModelA, pk=arg)
taskid = dolotsofwork(user.id, modela.id, ......)
return rendertemplate('pleasewait.html', {'taskid': taskid})

def isitready(request, taskid)
status = queue.taskstatus(taskid)
if status.isready():
return redirect(resulturl, taskid)
return rendertemplate('pleasewait.html', {'taskid': taskid})

def resulturl(request, taskid):
resultfilename = queue.taskresult(taskid)
return rendertemplate('processready.html', {'resultfilename':
resultfilename})

--
Javier

ReneMarxis

unread,

May 2, 2014, 4:17:02 AM5/2/14

to django...@googlegroups.com

Hello Javier

also thank you for your answer. However i do know how to implement such a long running task for a customer.
I generaly use celery and send out an email with a link on completition of such a task. For simple tasks i use just a cronjob. Most of those jobs run on an separeate machine normaly.

The question is more related to gevent und unicorn. I'd like to understand what tasks/aktions can run in paralell when using gevent in gunicorn.
Lets say there are some calls inside the app that return just some calculated values from db. Those calls need something about 200-300 ms to complete and doing nothing else than reading values from db and calculating some results and presenting them with an django template.
I'm running 6 to 8 workers in gunicorn. If i understood the meaning for using gevent correctly, the app should be able to serve more that 8 current connections, right? Except if there are calls inside the app that block GIL (what seems to happen for xhtml2pdf), which seems to block the gunicorn master-worker.

I did not change anything on the django app till now. I thought gunicorn would monkypath everything on startup. I'll probably have to investigate some time here ...

Reply all

Reply to author

Forward