After some investigation I found that the problem is a known interaction issue between the Nagle algorithm and delayed ACKs:
http://en.wikipedia.org/wiki/Nagle's_algorithm
http://www.stuartcheshire.org/papers/NagleDelayedAck/
What happens is that for small responses (those less than the size of a TCP segment) the system will send the first write (the headers) then buffer the remaining data until either more data is sent, an ACK is received from the far end, the socket is closed, or 200ms elapses. Since there is no more data and the far end ACK is delayed the response data is not sent for 200ms. String a few small responses together and clients end up blocked for seconds.
The problem can be reproduced by changing the delayed ACK behavior on another Mac. By default OS X's behavior is set to "streaming detection", which will not delay the ACK in this case. To switch to standard delayed ACKs run:
sudo sysctl -w net.inet.tcp.delayed_ack=1
Now run an ApacheBench test:
ab -k -n 10 -c 1 [url to a small response]
The -k flag is important, using HTTP 1.0 will not show the issue because the socket will close and flush the buffered data to the network.
ApacheBench should show an average response time of about 200ms. Now switch back to the default behavior by setting net.inet.tcp.delayed_ack to 3 and repeat the test. Average response time should go back to normal.
There are a couple of solutions. The brute force method is to disable the Nagle algorithm by setting the TCP_NODELAY option on the accepted sockets. This will result in a packet being sent for every write, including tiny ones like the chunked header/footer. A better solution is to group the headers and response data into a single write. This is simple enough for basic responses, but it looks like HTTPConnection depends on the tags associated with the separate writes for chunked transfers.
Any thoughts on this one? When I get a chance I'm going to look into how other servers handle this, but at the moment I don't see an easy fix.
Matt
-Robbie Hanson
Sent from my iPhone
> --
> You received this message because you are subscribed to the Google Groups "CocoaHTTPServer" group.
> To post to this group, send email to cocoaht...@googlegroups.com.
> To unsubscribe from this group, send email to cocoahttpserv...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/cocoahttpserver?hl=en.
>
I follow your reasoning (and this seems to be the recommended solution as per Nagle himself), but I'm not convinced this would totally solve the problem.
Let us assume for a moment that we try a similar technique by either:
- Grouping the separate writes within HTTPConnection
- Auto-grouping writes within AsyncSocket
If the total size of the write(s) exceeds the MSS, then don't we still end up with exactly the same problem?
For example, say the MSS is 536 bytes, the header is 500 bytes, and the data is 100 bytes. The first packet sent is the full header, plus a bit of the data. And the second packet, which gets delayed, is the final part of the data.
-Robbie Hanson
> If the total size of the write(s) exceeds the MSS, then don't we still end up with exactly the same problem?
You're right, if I have a file slightly larger than the MSS and change the read chunk size to be slightly less the same problem occurs even with the headers grouped into the first write.
It seems like what we need is a way to flush the socket when all of the response's data has been written. Toggling the nodelay option on before the last write then back off again seems to do the trick, but feels hackish.
Matt
In addition to this, I think GCDAsyncSocket should automatically coalesce writes when possible. (which may help in several cases without the flush stuff...)
-Robbie Hanson
Sent from my iPhone