Using write with large amounts of data causing disconnects

57 views
Skip to first unread message

Greg Popp

unread,
Jan 10, 2024, 2:02:48 PMJan 10
to modwsgi
Hello!

My version of mod_wsgi is running on a Centos-7 system and is at version 3.4, (I know - very old) with python 2.7

I have been using mod_wsgi for a python application that runs a command-line program and marshals the output of the command line program back to an http client. The data being sent is binary and can be tens of gigs in size.

This app is "unconventional", in that it calls 'write' directly, instead of returning an iterable. The problem I have had recently, is that some clients are slow to read the data and the TCP buffer gets filled up. When this happens, the next call to write on a full buffer causes a "failed to write data" exception (which I trap) but if I try again to send the data I get "client connection closed".

Is there some config setting or methodology I can use to alleviate this issue? In other words, some way to back off and wait for the buffer to drain sufficiently to resume sending the data? OR - is there some way to get the current size (fullness) of the TCP write buffer on the connected socket? (Something like what you see from the 'ss' command line utility "Send-Q" column). If I could tell how full it is and what the max size is, I could implement a sleep/retry cycle of some kind.

I have looked - even in the source code - but haven't been able to figure it out if there is a way to achieve this.  Thanks in advance, for your attention.


Graham Dumpleton

unread,
Jan 10, 2024, 2:32:52 PMJan 10
to mod...@googlegroups.com
Are you using mod_wsgi embedded mode or daemon mode?

Graham

--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/3d97c06f-38ff-4345-af2f-eb86c2ef204cn%40googlegroups.com.

Greg Popp

unread,
Jan 10, 2024, 3:09:35 PMJan 10
to modwsgi
embedded

Graham Dumpleton

unread,
Jan 10, 2024, 4:38:51 PMJan 10
to mod...@googlegroups.com
So what you are encountering is limitations in the socket buffer size enforced by the operating system, in combination with Apache httpd applying a socket timeout.

In other words what happens is that the HTTP client isn't reading data and so the operating system level socket buffer fills up. At that point the Apache httpd write of the response data blocks with it eventually timing out, causing the initial error you see. In that situation Apache httpd will close down the connection, which results in you seeing the second error when still trying to write out more data anyway.

You may be able to adjust some Apache configuration settings to try and solve this, but it would affect all requests in the context which you apply the configuration (dependent on whether done in server, VirtualHost, Directory or Location contexts). So not something you could selectively do on a per client basis.

The first Apache directive to look at is SendBufferSize.


If this is not set it should default to 0, which means it uses the operating system default.

So you might be able to fiddle with this by setting it larger than the operating system default (although there is still some upper bound set by operating system you can go to).

The next Apache directive to look at is Timeout.


This would usually default to 60 seconds but some Linux distributions may override this in the Apache configuration they ship.

In very old Apache versions this actually defaulted to 300 seconds, but it was made lower at some point.

If playing with these, do be careful since they cause increased memory usage or cause other undesirable effects depending on traffic profile your server gets.

One other thing you may be able to use is mod_ratelimit.


I have never actually used this and not exactly sure how it works, so is a bit of a guess, but you may be able to use this to slow down how quickly your application outputs the data.

I am assuming here that this module will introduce waits into your application, by blocking your writes for a bit, to keep the flow of data being written by it under the rate limit. This would have the effect of not stuffing so much data into the response pipeline such that things work better with slower clients. Obviously using it would though penalise faster clients but you might find an acceptable balance by setting a higher rate limit for the initial burst of data and then using a lower rate after that.

Graham


Greg Popp

unread,
Jan 11, 2024, 9:38:24 AMJan 11
to modwsgi
Thank you very much! This is most helpful, though I don't think any of them will actually solve my issues, for many of the reasons you mentioned.

I was thinking that perhaps the mod_wsgi interface had access to the file descriptor for the network socket used by Apache and could call "select" to see if it had enough buffer space for the requested write. If it didn't, it could (optionally) sleep some configurable duration and try again some configurable number of times. I understand though, that for most applications this would not be necessary.

Yesterday, I tried implementing that same behavior in my wsgi app. I don't set the SendBufferSize and so use the system default. I grab the system TCP send buff value by running the 'sysctl' command. Then I keep track of the total bytes sent. If that value exceeds the system tcp send queue value, I run the 'ss' command from within my wsgi app to grab the Send-Q value for this connection (fortunately wsgi gives us the source-ip and source-port and I can filter the 'ss' output using that). If the Send-Q value is too high to accommodate another write, I sleep a second and try again until I get enough space. It's kind of a Rube Goldberg solution, but so far it seems to be working!

Thank you for taking the time to answer my questions! I very much appreciate the assistance!

Graham Dumpleton

unread,
Jan 11, 2024, 5:15:18 PMJan 11
to mod...@googlegroups.com
Also not sure whether it will help or not, but if the data you are sending is stored in a file and not generated on demand, then you might consider using the WSGI file_wrapper extension instead.


I don't know how this will behave when buffer fills up since when working properly it is all handled in the OS kernel and not in Apache.

Along similar lines, if is stored as a file, you might try mod_sendfile. It also would use kernel sendfile mechanism, but way it interacts may also see different behaviour in your situation.

Graham

Graham Dumpleton

unread,
Jan 11, 2024, 5:22:38 PMJan 11
to mod...@googlegroups.com
If using the file_wrapper feature, make sure you also add:

    WSGIEnableSendfile On

to mod_wsgi configuration as not on by default:


The file_wrapper mechanism would still have worked, but to use kernel sendfile feature have to also have the directive enabled.

Can't remember if also need to add:

    EnableSendfile On

to enable it in Apache itself. I don't think so.

Graham

Greg Popp

unread,
Jan 12, 2024, 9:35:00 AMJan 12
to modwsgi
Thank you again! The data IS in a file, but it requires an application to extract the requested salient pieces. I will look at the file wrapper extension.

After more testing, I now think that I can fix my problem just by using your second suggestion of increasing the Timeout configuration variable in Apache. That is an easy fix and so far seems to be working well.

Greg Popp

unread,
Feb 7, 2024, 3:54:47 PMFeb 7
to modwsgi
I'm still struggling with disconnects with my slow readers. Here is all that I have experimented with:

I downloaded the latest version of mod_wsgi source (5.0.0) and built it on my Centos7 system. This all seemed to work well and I am now running that version.
I modified my app to return an iterator and stopped calling "write" directly. No real change in behavior.

To eliminate network devices causing problems, I started experimenting with using localhost. Paradoxically, the localhost connections seem to be timing out sooner than remote ones and seem to time out 100% of the time!

The timeout seems to occur at around 17-ish minutes. I have the apache config param "TimeOut" set to 43200 (12 hours). I know that is insane but that should make a send that is blocked by a slow reader sit there for 12 hours.
Alas, it does not. My slow readers are still timing out.

The error log has this:
[Wed Feb 07 19:17:26.807222 2024] [wsgi:info] [pid 19013] [client 127.0.0.1:38570] mod_wsgi (pid=19013, process='', application='xcrutils.exegy-appliance.net|/xcr'): Reloading WSGI script '/var/web/sites/request_handler_wsgi.py'.
[Wed Feb 07 19:34:03.493988 2024] [wsgi:debug] [pid 19013] src/server/mod_wsgi.c(2443): [client 127.0.0.1:38570] mod_wsgi (pid=19013): Failed to write response data: Connection timed out.


My python wsgi application outputs this: 
Apache/mod_wsgi failed to write response data: Connetion timed out

That error string is what is in the trapped exception itself.

Looking at mod_wsgi, this call:
rv = ap_pass_brigade(r->output_filters, self->bb);
is resulting in rv being not equal to APR_SUCCESS and exception_when_aborted is false.

Could there be some kind of timeout implemented in the bucket brigade code?

Greg Popp

unread,
Feb 7, 2024, 4:00:25 PMFeb 7
to modwsgi
I forgot - one other thing I tried was the file wrapper. The file wrapper does NOT experience the disconnects.

Greg Popp

unread,
Feb 8, 2024, 4:32:04 PMFeb 8
to modwsgi
I have found that the tcp stack is causing these disconnects. I'm getting into a state where the sending socket's buffer is full, as is the receiving socket's and tcp begins to do retransmission attempts. Once it hits the system max the connection is terminated. 
In the immortal words of the SNL character, "Emily Litella": "Never mind!"
Reply all
Reply to author
Forward
0 new messages