Timeouts from mod_wsgi, apache and nginx

1,497 views
Skip to first unread message

Jennifer Mehl

unread,
Jan 8, 2015, 7:10:55 PM1/8/15
to mod...@googlegroups.com
Hi there,

I’ve finally got our web application (Django) mostly working with mod_wsgi, apache and nginx as proxy.

This web application is basically a secure file transfer utility. I’m noticing some issues when uploading larger files - 750MB works OK - but 1GB seems to put it past some type of threshold which causes the following to occur:

Apache error log:
Jan 8 15:51:02 file-xfer apache-error: [Thu Jan 08 15:50:56.599540 2015] [wsgi:error] [pid 4011] [client 127.0.0.1:48096] Timeout when reading response headers from daemon process 'mod_wsgi': /var/www/transfergateway/myproject/apache/wsgi.py, referer: https://xxx.xxx.edu/myapp/upload/

The browser reports:
Gateway Timeout
The gateway did not receive a timely response from the upstream server or application.

However, the application does finally finish scanning and writing the file, and reports to syslog:
Jan 8 15:51:02 file-xfer mod_wsgi[1206]: jennifer_mehl814919 uploaded test_1GB.img with checksum 2a492f15396a6768bcbca016993f4b4c8b0b5307 from 128.111.12.225 at 2015-01-08 15:51:02.269835

Where should I begin in troubleshooting? The 750MB file took a total of 16 minutes and did not exhibit this behavior. The 1GB file took 21 minutes to upload, and is where I start to see these timeouts.

Are there adjustments I can make to mod_wsgi, or is this issue in the application itself taking too long to report back? I know there are also adjustments I could make in nginx and apache, but it doesn’t seem like the place to start here.

Thanks for any help!

—Jennifer

Jason Garber

unread,
Jan 8, 2015, 7:16:51 PM1/8/15
to mod...@googlegroups.com
Hi Jennifer,

nginx has some settings that impact long uploads and things that take a while to process:

proxy_read_timeout 600;
client_max_body_size 2048m;

I'm not sure how that relates to your apache log messages, but gateway timeout is typically an nginx message.

Thanks!
Jason





—Jennifer

--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.
To post to this group, send email to mod...@googlegroups.com.
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Graham Dumpleton

unread,
Jan 8, 2015, 9:13:30 PM1/8/15
to mod...@googlegroups.com
The error messages suggest a timeout in communicating between Apache child worker process that accepts the request and the mod_wsgi daemon process.

Various timeouts can apply here, plus you can see that if an Apache daemon process crashes.

Can you verify for me what version of mod_wsgi you are currently using?

What is the WSGIDaemonProcess line you are using?

And what do you have the Timeout directive set to in Apache configuration?

Finally, was there any evidence in the main Apache error log file (not virtual host) to suggest a process crashed with a Segmentation Fault message.

Graham

Jennifer Mehl

unread,
Jan 9, 2015, 1:08:57 PM1/9/15
to mod...@googlegroups.com
I am using mod_wsgi 4.4.1.

Here is the WSGIDaemonProcess line from my config:

  WSGIDaemonProcess mod_wsgi user=mod_wsgi group=mod_wsgi processes=8 threads=8 python-path=/var/www/transfergateway

There is no evidence that there was any segmentation fault within the  Apache logs.

In Apache, Timeout is set to 300 and KeepAliveTimeout is set to 5.

In Nginx, I've configured the proxy_read_timeout at 18000 and the client_max_body_size 2408M and client_body_buffer_size at 256K.

I think that the application itself might be erroring out - but I'm having a hard time debugging *why*.  I can't seem to get the Django debug to show up in the browser window when it is running behind Nginx/Apache.  

My next step is to try running the application within the Django dev runserver and see what happens there with timeouts.

thanks,
Jennifer

Jennifer Mehl

unread,
Jan 9, 2015, 1:33:31 PM1/9/15
to mod...@googlegroups.com
I have been able to get the Django debug to display, so I will start there on my troubleshooting.

I do appreciate more information about if I am missing anything else on the timeout stuff.

thanks,
Jennifer

Graham Dumpleton

unread,
Jan 9, 2015, 9:39:51 PM1/9/15
to mod...@googlegroups.com
There are two scenarios I can think of as being plausible to describe what you are seeing.

What could happen is that if for some reason the request thread handling the read in the mod_wsgi daemon process, and which is reading in the input, got stuck and stopped reading input even though it hadn't all been read.

In that case what will happen is that the Apache child worker process will still be trying to proxy across the request data, but eventually it would fill up the available network socket buffer. For Linux systems this is generally in the MBs, but the exact size is actually dynamic and depends on overall system buffer usage.

Either way, the very large request data you have would still fill up the buffer and the Apache child process proxying the request would block and wouldn't be able to write any more data.

When it blocks in this way, it would then timeout based by default on the timeout value set by the Timeout directive. In your case that is 5 minutes after it blocks. You would then see the error you do.

The request thread in the daemon process if it was blocked and can't recover, would stay busy and if this kept happening you would start to run out of request threads.

If this latter scenario is happening then there are special timeout values for daemon process mode that can be set to recover a process automatically, but I haven't seen anything to suggest your server as a whole is hanging due to request thread exhaustion.

The second scenario I can think of would as far as I know be hard to hit on Linux because of the larger socket buffer sizes, but the fact that you have large request content is one half of what can trigger it.

In this case if the amount of request content is larger than what can fit in the socket buffer and the WSGI application generates a response without having read the complete response, and where the response itself is larger than the socket buffer size, then you can get a situation where the Apache child process proxying the request is blocked as it cannot write more request content, but that the daemon is also blocked as it cannot write the response. It cannot write the response as the child process proxying the request content will only start reading the response when it has written all the content. So both sides block on each other. As before the child process will timeout after 5 minutes and things will then recover.

So is it possible that you have a situation where you are generating a response without having read all request content and the response content size itself is very large?

I have never seen anyone hit this later problem on Linux before, but I know on MacOS X it isn't too hard to achieve as the default socket buffer size on MacOS X is only 8KB and not in MB range as is the case with Linux. The WSGIDaemonProcess directive has options to change the socket buffer size specifically because of the small socket buffer size on Mac OS X.

Graham

Jennifer Mehl

unread,
Jan 16, 2015, 3:44:49 PM1/16/15
to mod...@googlegroups.com
Thanks so much for your response.

It turns out that setting the Apache Timeout to the equivalent of the Nginx proxy_connect_timeout, proxy_send_timeout and proxy_read_timeout values of 1800s (from default of 300s) made all the difference.

I was able to upload a 1.5GB file, and at the end, instead of getting a “Gateway Timeout Error” in my browser, the Django application responded with a refresh page as it is supposed to do. (Previously, it was completing the file upload process, but not responding back appropriately to the browser.

Next questions - are there any guidelines to help me figure out how many mod_wsgi processes and threads are the appropriate number to set? What about Apache threads/processes? I have 8GB of RAM and two 3.1ghz processors assigned to this VM guest. We are guessing that, in production, as many as 5 users will be uploading files simultaneously, none of which will exceed 2GB at a time.

thanks,
Jennifer
> You received this message because you are subscribed to a topic in the Google Groups "modwsgi" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/modwsgi/pxx9kuTxc48/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to modwsgi+u...@googlegroups.com.

Graham Dumpleton

unread,
Jan 18, 2015, 1:09:14 AM1/18/15
to mod...@googlegroups.com
It is not possible to give general advice about processes/threads configuration based on machine specs and memory alone, as it really depends on what your application is doing.

What you really need is to have monitoring in place that can track things like throughput, response times, web server capacity utilisation, memory and CPU usage.

My biased suggestion is to have a look at using New Relic. It has a free tier which provides a lot of what you need if you are concerned about forking out money. I also have an experimental plugin (https://pypi.python.org/pypi/mod_wsgi-metrics/1.1.0) for it which monitors some aspects of what Apache is doing at the same time.

So try that or some over monitoring solution and based on information from that can possibly direct you better.

Graham
Reply all
Reply to author
Forward
0 new messages