New issue 203 by g.rodola: Add support for kqueue() and epoll() to event
loop
http://code.google.com/p/pyftpdlib/issues/detail?id=203
Right now the internal poller depends on asyncore module; as such it can
only use select() and poll() system calls which don't scale/perform well
with thousands of concurrent clients.
This is a benchmark using poll():
pyftpdlib 0.7.0:
2000 concurrent clients (connect, login) 36.63 secs
2000 concurrent clients (RETR 10M file) 128.07 secs
2000 concurrent clients (STOR 10M file) 189.73 secs
2000 concurrent clients (quit) 0.39 secs
proftpd 1.3.4rc2:
2000 concurrent clients (connect, login) 44.59 secs
2000 concurrent clients (RETR 10M file) 33.90 secs
2000 concurrent clients (STOR 10M file) 138.94 secs
2000 concurrent clients (quit) 2.28 secs
2000 clients here actually means 4000 concurrent connections (control +
data).
As noticeable, poll() clearly suffers a serious performance degradation.
select() on the other hand, wouldn't have been able to work at all as it
has a limit of 1024 fds.
epoll() (Linux) and kqueue() (BSD / OSX) are supposed to fix this problems
altogheter.
What I have in mind (for 1.0.0 version) is to add a "lib" package within a
modified version of asyncore.dispatcher and an asyncore.loop supporting
kqueue()/epoll().
A partial patch I wrote some time ago is here:
http://bugs.python.org/issue6692
Also, tornado (http://www.tornadoweb.org/) can be used as an example for
the epoll() implementation.
A preliminary patch is in attachment.
=== before patch (poll()) ===
giampaolo@ubuntu:~/svn/pyftpdlib$ python test/bench.py -u giampaolo -p XXX
-b concurrence -s 1K -n 2000
2000 concurrent clients (connect, login) 34.98 secs
2000 concurrent clients (RETR 1K file) 61.02 secs
2000 concurrent clients (STOR 1K file) 169.42 secs
2000 concurrent clients (quit) 0.11 secs
=== after patch (epoll()) ===
giampaolo@ubuntu:~/svn/pyftpdlib$ python test/bench.py -u giampaolo -p XXX
-b concurrence -s 1K -n 2000
2000 concurrent clients (connect, login) 19.46 secs
2000 concurrent clients (RETR 1K file) 24.29 secs
2000 concurrent clients (STOR 1K file) 122.09 secs
2000 concurrent clients (quit) 0.10 secs
Attachments:
ioloop.patch 20.4 KB
Patch in attachment adds kqueue() support (BSD and OSX systems).
Attachments:
kqueue.patch 15.4 KB
Attachments:
kqueue.patch 25.1 KB
Updated patch.
Attachments:
ioloop.patch 25.0 KB
Updated patch in attachment.
CHANGES:
- got rid of serve_forever()'s "use_poll" and "count" arguments; replaced
with a new "blocking" argument defaulting to True
TODO:
- kqueue() uses an hack for accepting sockets
- epoll()/poll() currently ckecks for error fds in order to detect closed
connections but this might not be necessary (twisted doesn't do that)
- on the other hand, select() on windows might need to do that
Attachments:
ioloop.patch 37.9 KB
Ok, I think this is done.
Here's a summary to clarify what I've done.
Before the patch
================
- The IO loop was based on asyncore stdlib module which only supports
select() and poll().
- These are known to scale/perform reasonably fine under a thousand
concurrent connections, then they start to show performance degration
(poll()) or don't work at all (select()).
- asyncore's IO poller is also particularly naive in that every registered
file descriptor is checked for both read and write operations, even for
idle connections.
- That means that with 200 connected clients we iterate over a list of 400
(200 * 2) elements on every loop.
After the patch
===============
- The IO loop has been rewritten from scratch and now supports epoll() and
kqueue() on Linux and OSX/BSD.
- epoll() and kqueue() scales/perform better with thousands of connections.
- asyncore's original select() and poll() implementation were rewritten.
- The poller is smarter in that it only iterates on fds which are actually
interested in either reading or writing.
- That means that with 200 idle clients except one we will iterate over a
list of 1 element instead of 400.
- This is valid for all pollers, including select().
- By default we use the better poller for the designated platform:
- Linux: epoll()
- OSX/BSD: kqueue()
- all other POSIX: poll()
- Windows: select()
- FTPServer.serve_forever() signature has changed.
Final benchamrk
===============
=== old select() implementation ===
200 concurrent clients (connect, login) 0.96 secs
STOR (1 file with 200 idle clients) 81.94 MB/sec
RETR (1 file with 200 idle clients) 89.01 MB/sec
200 concurrent clients (RETR 10M file) 2.80 secs
200 concurrent clients (STOR 10M file) 6.65 secs
200 concurrent clients (QUIT) 0.02 secs
=== new select() implementation ===
200 concurrent clients (connect, login) 0.78 secs
STOR (1 file with 200 idle clients) 399.46 MB/sec
RETR (1 file with 200 idle clients) 761.53 MB/sec
200 concurrent clients (RETR 10M file) 2.22 secs
200 concurrent clients (STOR 10M file) 5.79 secs
200 concurrent clients (QUIT) 0.01 secs
=== epoll() implementation ===
200 concurrent clients (connect, login) 0.77 secs
STOR (1 file with 200 idle clients) 535.83 MB/sec
RETR (1 file with 200 idle clients) 1632.50 MB/sec
200 concurrent clients (RETR 10M file) 2.24 secs
200 concurrent clients (STOR 10M file) 5.82 secs
200 concurrent clients (QUIT) 0.02 secs
Furter note
===========
A patch which can be applied to current 0.7.0 version version is in
attachment.
Attachments:
ioloop.patch 31.4 KB