very poor HTTP/1.1 paste.httpserver keep-alive performance

Andrew Dalke

unread,

Nov 23, 2009, 3:28:40 AM11/23/09

to Paste Users

I have a WSGI server which uses paste.webserver to handle HTTP/1.1
requests. In the environment I'm in the documented limitations to
paste.webserver's 1.1 support are fine.

I ran a benchmark and found that using keep-alive to make multiple
requests is a lot slower than making the same number of requests using
a new connection each time, by about 2 orders of magnitude (10
requests per second vs 1000 request per second)

Here's my test code:

from paste import httpserver

def echo_app(environ, start_response):
n = 10000
start_response("200 Ok", [("Content-Type", "text/plain"),
("Content-Length", str(n))])
return ["*" * n]

httpserver.serve(echo_app, protocol_version="HTTP/1.1")

To verify that it returns 10000 characters,

% curl http://localhost:8080/ > /dev/null
% Total % Received % Xferd Average Speed Time Time
Time Current
Dload Upload Total Spent
Left Speed
100 10000 100 10000 0 0 92838 0 --:--:-- --:--:--
--:--:-- 1953k

I'll use httperf to benchmark with 100 different connections:

% ./httperf --server localhost --port 8080 --num-conn 100
httperf --client=0/1 --server=localhost --port=8080 --uri=/ --send-
buffer=4096 --recv-buffer=16384 --num-conns=100 --num-calls=1
...
Total: connections 100 requests 100 replies 100 test-duration 0.076 s

Connection rate: 1320.4 conn/s (0.8 ms/conn, <=1 concurrent
connections)
Connection time [ms]: min 0.6 avg 0.8 max 1.9 median 0.5 stddev 0.2
Connection time [ms]: connect 0.1
Connection length [replies/conn]: 1.000

Request rate: 1320.4 req/s (0.8 ms/req)
Request size [B]: 62.0
...

Now I'll do the same test with 100 calls on the same connection

% ./httperf --server localhost --port 8080 --num-call 100
httperf --client=0/1 --server=localhost --port=8080 --uri=/ --send-
buffer=4096 --recv-buffer=16384 --num-conns=1 --num-calls=100
...
Total: connections 1 requests 100 replies 100 test-duration 8.939 s

Connection rate: 0.1 conn/s (8939.5 ms/conn, <=1 concurrent
connections)
Connection time [ms]: min 8939.5 avg 8939.5 max 8939.5 median 8939.5
stddev 0.0
Connection time [ms]: connect 0.1
Connection length [replies/conn]: 100.000

Request rate: 11.2 req/s (89.4 ms/req)
...

That is, I can do 1320 requests per second if I create a new
connection each time, and I get about 11.2 request per second if I
reuse the connection.

Those numbers were with Python 2.5 . Using Python 2.6 they are 1080
and 11.4 request/sec, which is effectively the same. (These numbers
are repeatable to within about 15%.)

I'm using paste.httpserver behind flup's Apache-style preforking
server, which is different than that multi-threading version here. To
get good profiling numbers in the face of threads, and to show that
it's not a threading problem, I've rewritten the code to be

from paste import httpserver

class MyHandler(httpserver.WSGIHandler):
sys_version = None
server_version = "MyServer/0.0"
protocol_version = "HTTP/1.1"

def log_request(self, *args, **kwargs):
pass

def echo_app(environ, start_response):
n = 10000
start_response("200 Ok", [("Content-Type", "text/plain"),
("Content-Length", str(n))])
return ["*" * n]

# WSGIServerBase is single-threaded
server = httpserver.WSGIServerBase(echo_app, ("localhost", 8080),
MyHandler)
server.handle_request()

Running this as

% python2.6 -m cProfile -o paste.prof paste_slowdown.py

and hitting it with

httperf --client=0/1 --server=localhost --port=8080 --uri=/ --send-
buffer=4096 --recv-buffer=16384 --num-conns=1 --num-calls=500
(that reports: Request rate: 11.2 req/s (89.4 ms/req) )

gives me profile numbers of

>>> p=pstats.Stats("paste.prof")
>>> p.strip_dirs().sort_stats("cumulative").print_stats()
Sun Nov 22 21:31:57 2009 paste.prof

109749 function calls in 46.570 CPU seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno
(function)
1 0.000 0.000 46.571 46.571 {execfile}
1 0.001 0.001 46.570 46.570 paste_slowdown.py:2
(<module>)
1 0.000 0.000 46.115 46.115 SocketServer.py:250
(handle_request)
1 0.000 0.000 44.675 44.675 SocketServer.py:268
(_handle_request_noblock)
1 0.000 0.000 44.675 44.675 SocketServer.py:301
(process_request)
1 0.000 0.000 44.675 44.675 SocketServer.py:318
(finish_request)
1 0.000 0.000 44.675 44.675 SocketServer.py:609
(__init__)
1 0.000 0.000 44.675 44.675 httpserver.py:456
(handle)
1 0.001 0.001 44.675 44.675 BaseHTTPServer.py:325
(handle)
501 0.006 0.000 44.674 0.089 httpserver.py:440
(handle_one_request)
2001 0.020 0.000 44.383 0.022 socket.py:373(readline)
501 44.354 0.089 44.354 0.089 {method 'recv' of
'_socket.socket' objects}
1 1.440 1.440 1.440 1.440 {select.select}

It looks like most of the time is spent in recv, but I can't figure
out why that would be the case here, with HTTP/1.1 and not with the
HTTP/1.0 call.

I then composed an HTTP/1.1 request by hand, which looks like

GET / HTTP/1.1
Location: localhost
Connection: Keep-Alive
Content-Length: 0

GET / HTTP/1.1
Location: localhost
Connection: Keep-Alive
Content-Length: 0

... repeat 97 more times, to have 99 keep-alives in total ...

GET / HTTP/1.1
Location: localhost
Connection: Close
Content-Length: 0

then send it with netcat, like this

nc localhost 8080 < ~/src/send_to_paste.txt

That took 0.03 seconds to run, my server instrumentation shows
echo_app was called 100 times, and netcat returned with the expected
number of bytes for 100 responses.

This made me think that httperf was the problem, but then I tried
using ApacheBench and got even worse results - it timed out!

This test does 100 requests, each with its own connection

% ab -n 100 localhost:8080/
This is ApacheBench, Version 1.3d <$Revision: 1.73 $> apache-1.3
...
Requests per second: 1111.11 [#/sec] (mean)
...

This test does 100 requests, reusing the same connection

% ab -n 100 -k localhost:8080/
This is ApacheBench, Version 1.3d <$Revision: 1.73 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)...
Server timed out

: Operation now in progress

That makes me think there's something going on inside of paste. But I
can't figure it out.

Any thoughts?

Andrew
da...@dalkescientific.com

Andrew Dalke

unread,

Nov 24, 2009, 2:33:48 AM11/24/09

to Paste Users

On Nov 23, 12:28 am, Andrew Dalke <andrewda...@gmail.com> wrote:
> I ran a benchmark and found that using keep-alive to make multiple
> requests is a lot slower than making the same number of requests using
> a new connection each time, by about 2 orders of magnitude (10
> requests per second vs 1000 request per second)

Trying to track this down, I instrumented socket.py in the Python 2.6
stdlib with

import time
def _time(label, recv, size):
print "Start of", label
t1 = time.time()
data = recv(size)
t2 = time.time()
print size, "%.6f" % (t2-t1), repr(data)
return data

... then later on the relevant recv code ....

self._rbuf = StringIO() # reset _rbuf. we consume it via
buf.
while True:
data = _time("readline2", self._sock.recv,
self._rbufsize)
if not data:
break

When I use httperf against the server with --num-calls the output
shows that the code is waiting sometimes 0.00001 or so seconds, and
sometimes 0.2 seconds.

Start of readline2
8192 0.000024 'GET / HTTP/1.1\r\nUser-Agent: httperf/0.9.0\r\nHost:
localhost\r\n\r\n'
Start of readline2
8192 0.000011 'GET / HTTP/1.1\r\nUser-Agent: httperf/0.9.0\r\nHost:
localhost\r\n\r\n'
Start of readline2
8192 0.000011 'GET / HTTP/1.1\r\nUser-Agent: httperf/0.9.0\r\nHost:
localhost\r\n\r\n'
Start of readline2
8192 0.000011 'GET / HTTP/1.1\r\nUser-Agent: httperf/0.9.0\r\nHost:
localhost\r\n\r\n'
Start of readline2
8192 0.000012 'GET / HTTP/1.1\r\nUser-Agent: httperf/0.9.0\r\nHost:
localhost\r\n\r\n'
Start of readline2
8192 0.000011 'GET / HTTP/1.1\r\nUser-Agent: httperf/0.9.0\r\nHost:
localhost\r\n\r\n'
Start of readline2
8192 0.000011 'GET / HTTP/1.1\r\nUser-Agent: httperf/0.9.0\r\nHost:
localhost\r\n\r\n'
Start of readline2
8192 0.000625 'GET / HTTP/1.1\r\nUser-Agent: httperf/0.9.0\r\nHost:
localhost\r\n\r\n'
Start of readline2
8192 0.191227 'GET / HTTP/1.1\r\nUser-Agent: httperf/0.9.0\r\nHost:
localhost\r\n\r\n'
Start of readline2
8192 0.199420 'GET / HTTP/1.1\r\nUser-Agent: httperf/0.9.0\r\nHost:
localhost\r\n\r\n'
Start of readline2
8192 0.199432 'GET / HTTP/1.1\r\nUser-Agent: httperf/0.9.0\r\nHost:
localhost\r\n\r\n'
Start of readline2
8192 0.000009 'GET / HTTP/1.1\r\nUser-Agent: httperf/0.9.0\r\nHost:
localhost\r\n\r\n'
Start of readline2
8192 0.000009 'GET / HTTP/1.1\r\nUser-Agent: httperf/0.9.0\r\nHost:
localhost\r\n\r\n'

I still don't know why this is the case. Perhaps next is to see what's
on the wire. Perhaps something's in a buffer somewhere.

Andrew Dalke
da...@dalkescientific.com

Andrew Dalke

unread,

Nov 25, 2009, 2:49:47 AM11/25/09

to Paste Users

I figured out the problem. It's related to Nagle's algorithm. The
outgoing message is buffered in the TCP layer, hoping to assemble a
larger packet. The client doesn't send an ACK (a delayed ACK) so after
0.2s the outgoing TCP layer sends the request.

In my previous post you can see the 200ms delay, but I didn't realize
where it was from, and thought it was in the receive. Well, it was,
but it was a side-effect because I couldn't instrument the full TCP
stack.

What I ended up doing was using Wireshark to watch the packets go by,
and I noticed the delay was in the outgoing/response from the server,
and not in the recv as I thought. That plus a friend's suggestion to
include "200ms" in my search led me to the suggestion of enabling
TCP_NODELAY to disable Nagle.

Doing that made my performance problems disappear.

Here's a patch to httpserver.py which fixes it for my purposes.

% hg diff httpserver.py
diff -r 2fccd121e818 lib/thirdparty/httpserver.py
--- a/lib/thirdparty/httpserver.py Tue Nov 17 01:25:09 2009 +0100
+++ b/lib/thirdparty/httpserver.py Tue Nov 24 23:39:59 2009 -0800
@@ -324,6 +324,21 @@
self.wsgi_write_chunk("Internal Server Error\n")
raise

+
+class _HTTPServer(HTTPServer):
+ def server_bind(self):
+ if self.RequestHandlerClass.protocol_version == "HTTP/1.1":
+ # Disable Nagle's Algorithm, which causes performance
+ # problems with Keep-Alive. Sometimes the server has sent
+ # a response to the client but the TCP stack buffers the
+ # response in hopes of reducing the number of packets to
+ # send. After about 200ms it gives up and sends the rest
+ # of the packet, but 0.2s is a long time to wait when
+ # there are many small, fast requests on the same
+ # connection.
+ self.socket.setsockopt(socket.IPPROTO_TCP,
socket.TCP_NODELAY, 1)
+ HTTPServer.server_bind(self)
+
#
# SSL Functionality
#
@@ -338,7 +353,7 @@
# functionality in that case.
SSL = None
SocketErrors = (socket.error,)
- class SecureHTTPServer(HTTPServer):
+ class SecureHTTPServer(_HTTPServer):
def __init__(self, server_address, RequestHandlerClass,
ssl_context=None, request_queue_size=None):
assert not ssl_context, "pyOpenSSL not installed"
@@ -356,7 +371,7 @@
def __getattr__(self, attrib):
return getattr(self.__conn, attrib)

- class SecureHTTPServer(HTTPServer):
+ class SecureHTTPServer(_HTTPServer):
"""
Provides SSL server functionality on top of the
BaseHTTPServer
by overriding _private_ members of Python's standard

I'm putting this into trac as a bug report.

Andrew
da...@dalkescientific.com

Jon Nelson

unread,

Nov 28, 2009, 12:14:05 PM11/28/09

to Paste Users

Strictly, you only want to disable NAGLE after you've written the last thing you are going to write.
Ideally (untested), you'd do something like this:

if about_to_go_back_to_waiting:
disable_nagle()
reenable_nagle()

I'm not sure if the sockopt takes effect immediately. If it doesn't, then you need to issue the setsockopt for the last write. Or you could issue an empty write. socket.write('') or something.

Disabling NAGLE for the entire transaction can have undesirable consequences.

--
Jon

Andrew Dalke

unread,

Nov 28, 2009, 1:40:44 PM11/28/09

to Paste Users

On Nov 28, 9:14 am, Jon Nelson <jnel...@jamponi.net> wrote:
> Strictly, you only want to disable NAGLE after you've written the last thing
> you are going to write.

Sure. I make no claims to understanding how sockets work at the
implementation level. I didn't even know that disabling/enabling was
something which can be changed on the fly, vs. a change which can only
be done before first use.

Can you suggest a fix?

> Ideally (untested), you'd do something like this:
>
> if about_to_go_back_to_waiting:
> disable_nagle()
> reenable_nagle()

It's a shame that calling flush() on the socket doesn't do something
like that.

> Disabling NAGLE for the entire transaction can have undesirable
> consequences.

Such as? The examples I found were increased congestion for large
payloads and a long time for a resend if the last packet had a
problem.

On the other hand, if HTTP/1.1 is used for something like Comet-style
connection polling then sending small/unfilled TCP packets is well
worth the extra overhead. Of course that's not possible with the
current paste code since each header line is sent via a socket.write,
rather than buffered up and sent at once.

Andrew
da...@dalkescientific.com

Reply all

Reply to author

Forward