os.sendfile and SelectorSocketTransport

Tomasz Elendt

unread,

Nov 28, 2013, 4:55:12 AM11/28/13

to python-tulip

Hi,

I’ve been recently playing with Tulip/asyncio and aiohttp (I’m writing a simple static HTTP server) trying to make it work together with os.sendfile and I feel I've hit the wall.

I use aiohttp.Response.send_headers() to send the headers and then, instead of reading the file’s content to user space and writing it to the response body I’d like to let kernel send the file out to response.transport._sock_fd.

If I understand it correctly I should add my own writer for that _socket_fd that would call os.sendfile until the whole file is sent. But first, I should make sure that the headers are flushed (and the transport._buffer is empty), because SelectorEventLoop.add_writer() really replaces the old one.

I've spent some time reading Tulip’s source code and I really don’t know how to do it in a nice, clean way.
To my understanding there’s missing functionality in SelectorSocketTransport, which implements WriteTransport interface, but doesn’t “emit an event” when the write buffer is flushed (unless I’m missing something). One possible solution of that issue would be queuing writers (next writer starts when the previous one is finished).

Best regards,
Tomasz

Guido van Rossum

unread,

Nov 28, 2013, 11:39:23 AM11/28/13

to Tomasz Elendt, python-tulip

Hi Tomasz,

What kind of crazy app are you trying to write that would need os.sendfile()? I expect that if you just read the file into memory (e.g. one 4 MB block at a time) and use transport.write() your app will perform just fine. It will be a little better if you use the latest Tulip from the repo and Python 3.4.

If you really want an API that tells you when the write buffer is drained, you can call set_write_buffer_limits(0) before you start writing. Your transport will then get a pause_writing() callback whenever any data enters the buffer, and resume_writing() when the buffer has drained. You can use this to keep track of whether the buffer is empty or not. If it's empty after the last write(), you can be assured there is no write handler and you can set your own. If it's not empty after the last write(), wait until resume_writing() is called and then set your own write handler. (If you get resume_writing() while you're not done writing the headers, you should ignore it.) All of this still falls outside the guarantees of the protocol/transport interface, so use at your own risk.

--
--Guido van Rossum (python.org/~guido)

Thomas Hervé

unread,

Nov 28, 2013, 12:07:37 PM11/28/13

to python...@googlegroups.com, Tomasz Elendt, gu...@python.org

Le jeudi 28 novembre 2013 17:39:23 UTC+1, Guido van Rossum a écrit :

Hi Tomasz,

What kind of crazy app are you trying to write that would need os.sendfile()? I expect that if you just read the file into memory (e.g. one 4 MB block at a time) and use transport.write() your app will perform just fine. It will be a little better if you use the latest Tulip from the repo and Python 3.4.

It'd be interesting to have benchmarks. I've been trying to add sendfile support to Twisted for a while, and I've seen at least a 300% improvement compared to regular read/write. It also reduces the memory usage noticeably. Maybe Tulip does a little bit less string copying than Twisted does, but I'd be surprised if using sendfile doesn't give a significant boost.

--
Thomas

Guido van Rossum

unread,

Nov 28, 2013, 12:32:23 PM11/28/13

to Thomas Hervé, python-tulip, Tomasz Elendt

300% improvement on a micro-benchmark doesn't mean much. (I recently made a change to write buffering that caused 1700% improvement on a micro-benchmark. :-)

Think about a benchmark setup that proves your point in a real-world app, make the benchmark repeatable, publish both the full benchmark code and your results (and a detailed description of the setup you used).

Benchmarking is experimental science. Treat it as such! (This also means that without a good idea of what you're setting out to prove or disprove you won't get useful results.)

Antoine Pitrou

unread,

Nov 28, 2013, 12:42:36 PM11/28/13

to python...@googlegroups.com

On Thu, 28 Nov 2013 09:32:23 -0800
Guido van Rossum <gu...@python.org> wrote:
>
> Think about a benchmark setup that proves your point in a real-world app,
> make the benchmark repeatable, publish both the full benchmark code and
> your results (and a detailed description of the setup you used).

A while ago, Giampaolo Rodola benchmarked the benefits of sendfile() on
his pure Python FTP server (pyftpdlib), the results are here:

http://code.google.com/p/pyftpdlib/issues/detail?id=152#c5

(AFAIK, pyftpdlib uses asyncore)

Regards

Antoine.

Thomas Hervé

unread,

Nov 28, 2013, 12:55:22 PM11/28/13

to python...@googlegroups.com

Le jeudi 28 novembre 2013 18:32:23 UTC+1, Guido van Rossum a écrit :

300% improvement on a micro-benchmark doesn't mean much. (I recently made a change to write buffering that caused 1700% improvement on a micro-benchmark. :-)

Think about a benchmark setup that proves your point in a real-world app, make the benchmark repeatable, publish both the full benchmark code and your results (and a detailed description of the setup you used).

Benchmarking is experimental science. Treat it as such! (This also means that without a good idea of what you're setting out to prove or disprove you won't get useful results.)

Sure, it's just a data point, it's still better than no data though :). sendfile has been around for a long time though, and everybody seems to agree that it's a significant improvement (which makes sense, you reduce the amount of copying as much as possible). Antoine gave another data point, the sendfile extension module talks about some similar numbers: http://code.google.com/p/pysendfile/. I think it's fair to say that sendfile is useful.

--
Thomas

Guido van Rossum

unread,

Nov 28, 2013, 1:32:17 PM11/28/13

to Thomas Hervé, python-tulip

On Thu, Nov 28, 2013 at 9:55 AM, Thomas Hervé <the...@gmail.com> wrote:

Le jeudi 28 novembre 2013 18:32:23 UTC+1, Guido van Rossum a écrit :

300% improvement on a micro-benchmark doesn't mean much. (I recently made a change to write buffering that caused 1700% improvement on a micro-benchmark. :-)

Think about a benchmark setup that proves your point in a real-world app, make the benchmark repeatable, publish both the full benchmark code and your results (and a detailed description of the setup you used).

Benchmarking is experimental science. Treat it as such! (This also means that without a good idea of what you're setting out to prove or disprove you won't get useful results.)

Sure, it's just a data point, it's still better than no data though :).

But saying "it made my app on Twisted 3x faster" doesn't mean much. I have no idea what your app does or how you measured the speedup, or how to translate this from Twisted to Tulip.

sendfile has been around for a long time though, and everybody seems to agree that it's a significant improvement (which makes sense, you reduce the amount of copying as much as possible).

It's also extremely non-portable, having a different signature on BSD vs. Linux, and absent on Windows.

Antoine gave another data point, the sendfile extension module talks about some similar numbers: http://code.google.com/p/pysendfile/. I think it's fair to say that sendfile is useful.

Antoine actually pointed to a page with a benchmark program, a patch, and results. That's the way to do science.

As to actually using os.sendfile in Tulip, assuming it's useful, if you leave it to me to come up with an API it will probably be pushed onto the huge stack of TODOs. (Some listed in the PEP, others in the Tulip tracker.) So perhaps you can try your hand at proposing an API and sketching an implementation? Oh, and please also sketch an implementation that works in cases where os.sendfile() cannot be used. (Remember that 70% of Python use is on Windows.)

Charles-François Natali

unread,

Nov 28, 2013, 2:27:33 PM11/28/13

to Thomas Hervé, python-tulip

Since it might not be obvious to everyone, the problem with sendfile
in an event loop is that it can block: not while trying to write to
the (non-blocking) socket, but reading from the source file
descriptor.
That's something to keep in mind.

Cheers,

cf

2013/11/28 Thomas Hervé <the...@gmail.com>:

Guido van Rossum

unread,

Nov 28, 2013, 2:51:32 PM11/28/13

to Charles-François Natali, Thomas Hervé, python-tulip

But so can reading data from a disk file any other way.

Giampaolo Rodola'

unread,

Nov 28, 2013, 3:00:12 PM11/28/13

to Guido van Rossum, Thomas Hervé, python-tulip

On Thu, Nov 28, 2013 at 7:32 PM, Guido van Rossum <gu...@python.org> wrote:

It's also extremely non-portable, having a different signature on BSD vs. Linux, and absent on Windows.

Please note that the signature is portable across all POSIX variants as long as you don't use headers/trailers/flag arguments, which in Python are basically useless (if you need to append or prepend some extra data you can just use send() from the plain socket object).

The way I see it is that it's not crucial for Tulip to expose this functionality right now but it would be wise to leave a door open so that sendfile() can be easily integrated later at some point.

AFAIK all decent FTP servers and many HTTP servers take advantage of sendfile() amongst which proftpd, vsftpd, apache and nginx:

http://www.proftpd.org/docs/howto/Sendfile.html

http://vsftpd.beasts.org/vsftpd_conf.html

http://httpd.apache.org/docs/2.2/mod/core.html#enablesendfile

http://wiki.nginx.org/HttpCoreModule#sendfile

...therefore I personally have no doubt that the feature would be useful and the gain in terms of transfer speed substantial.

The way sendfile() behaves is similar to send() in that it:

- returns the number of bytes sent

- returns the same error codes on failure (EAGAIN, EWOULDBLOCK, etc on retry, ESHUTDOWN, ECONNABORTED, etc on disconnect)

- (extra) returns 0 when EOF is reached

...that is why I think that this should be reasonably easy to implement by temporarily replacing the original underlying method using plain send() or something (at least this is what I did in pyftpdlib).

Regardless from sendfile() it might make sense to provide a high-level and generic send_file() method accepting a file which promises to send it until EOF is reached, and under the hood it can decide whether using sendfile() or not.

Something like:

def send_file(file, use_sendfile=False):
"""Send a file to the other peer and returns when EOF is reached.

On POSIX if use_sendfile is True os.sendfile() iif available else

fall back on using plain send().

The 'file' object should be a regular file providing a fileno() method.

>>> file = open(path, 'rb')

>>> yield from transport.send_file(file)

"""

Final note: on most POSIX platforms sendfile() works with regular fds only, so that should be taken into account.

--- Giampaolo

https://code.google.com/p/pyftpdlib/

https://code.google.com/p/psutil/

https://code.google.com/p/pysendfile/

Guido van Rossum

unread,

Nov 28, 2013, 7:10:57 PM11/28/13

to Giampaolo Rodola', Thomas Hervé, python-tulip

On Thu, Nov 28, 2013 at 12:00 PM, Giampaolo Rodola' <g.ro...@gmail.com> wrote:

On Thu, Nov 28, 2013 at 7:32 PM, Guido van Rossum <gu...@python.org> wrote:

It's also extremely non-portable, having a different signature on BSD vs. Linux, and absent on Windows.

Please note that the signature is portable across all POSIX variants as long as you don't use headers/trailers/flag arguments, which in Python are basically useless (if you need to append or prepend some extra data you can just use send() from the plain socket object).

Well, then why did someone wrap those parameters?

The way I see it is that it's not crucial for Tulip to expose this functionality right now but it would be wise to leave a door open so that sendfile() can be easily integrated later at some point.

Yeah, that door is called the PEP's provisional status.

AFAIK all decent FTP servers and many HTTP servers take advantage of sendfile() amongst which proftpd, vsftpd, apache and nginx:
http://www.proftpd.org/docs/howto/Sendfile.html

http://vsftpd.beasts.org/vsftpd_conf.html
http://httpd.apache.org/docs/2.2/mod/core.html#enablesendfile

http://wiki.nginx.org/HttpCoreModule#sendfile
...therefore I personally have no doubt that the feature would be useful and the gain in terms of transfer speed substantial.

It depends. I would expect that anyone running a high volume web server already runs nginx as a front-end and they just let nginx serve static files directly.

The way sendfile() behaves is similar to send() in that it:

- returns the number of bytes sent
- returns the same error codes on failure (EAGAIN, EWOULDBLOCK, etc on retry, ESHUTDOWN, ECONNABORTED, etc on disconnect)

- (extra) returns 0 when EOF is reached

I suppose the last point is actually a direct corollary of the first (i.e. it returns 0 when it didn't write any bytes)?

...that is why I think that this should be reasonably easy to implement by temporarily replacing the original underlying method using plain send() or something (at least this is what I did in pyftpdlib).

Well, at this point if you want to use it with asyncio in Python 3.4 (which won't add new features now that 3.4 beta 1 was released), you're going to have to write some 3rd party code. If you want to do it properly you're probably going to have to write a new Transport class.

Regardless from sendfile() it might make sense to provide a high-level and generic send_file() method accepting a file which promises to send it until EOF is reached, and under the hood it can decide whether using sendfile() or not.

That might be a useful thing to add for Python 3.5. If someone actually volunteers to write the code.

Something like:

def send_file(file, use_sendfile=False):
"""Send a file to the other peer and returns when EOF is reached.

On POSIX if use_sendfile is True os.sendfile() iif available else
fall back on using plain send().

The 'file' object should be a regular file providing a fileno() method.

>>> file = open(path, 'rb')
>>> yield from transport.send_file(file)

"""

Why do you need the use_sendfile argument? Why shouldn't it always try to use os.sendfile() when it exists?

There's also the issue that transport methods don't return coroutines or future (we went over the reason for this before). You'll have to design an API that can work with callbacks.

Final note: on most POSIX platforms sendfile() works with regular fds only, so that should be taken into account.

I suppose you mean that the *in* file descriptor must be a disk file? That can be checked with an os.fstat() call and an S_ISREG() call.

I keep thinking that sendfile() is so implementation-specific that it really doesn't fit well in asyncio, which is all about *portable* APIs for asynchronous I/O. It seems sendfile() won't be useful for TLS transports, nor for pipe transports. It also seems tricky if the file object has an internal buffer.

Giampaolo Rodola'

unread,

Nov 29, 2013, 11:53:03 AM11/29/13

to Guido van Rossum, Thomas Hervé, python-tulip

On Fri, Nov 29, 2013 at 1:10 AM, Guido van Rossum <gu...@python.org> wrote:
>
> On Thu, Nov 28, 2013 at 12:00 PM, Giampaolo Rodola' <g.ro...@gmail.com> wrote:
>>
>> On Thu, Nov 28, 2013 at 7:32 PM, Guido van Rossum <gu...@python.org> wrote:
>>>
>>> It's also extremely non-portable, having a different signature on BSD vs. Linux, and absent on Windows.
>>
>>
>> Please note that the signature is portable across all POSIX variants as long as you don't use headers/trailers/flag arguments, which in Python are basically useless (if you need to append or prepend some extra data you can just use send() from the plain socket object).
>
>
> Well, then why did someone wrap those parameters?

My idea was to provide a 1 to 1 interface 'just in case' and
discourage their use in the doc (which I did).
Maybe not a good idea after all.

>> The way sendfile() behaves is similar to send() in that it:
>> - returns the number of bytes sent
>> - returns the same error codes on failure (EAGAIN, EWOULDBLOCK, etc on retry, ESHUTDOWN, ECONNABORTED, etc on disconnect)
>> - (extra) returns 0 when EOF is reached
>
>
> I suppose the last point is actually a direct corollary of the first (i.e. it returns 0 when it didn't write any bytes)?

It returns 0 only at EOF. If no bytes were sent it returns EAGAIN or
EWOULDBLOCK.

> Why do you need the use_sendfile argument? Why shouldn't it always try to use os.sendfile() when it exists?

Because the file might not be a regular file (e.g. a io.BytesIO instance).
Also, there are some documented problems with non regular filesystems
such as NFS, SMBFS/Samba and CIFS, where you explicitly want to avoid
sendfile(), see http://www.proftpd.org/docs/howto/Sendfile.html.

> There's also the issue that transport methods don't return coroutines or future (we went over the reason for this before). You'll have to design an API that can work with callbacks.
>
>>
>> Final note: on most POSIX platforms sendfile() works with regular fds only, so that should be taken into account.
>
>
> I suppose you mean that the *in* file descriptor must be a disk file? That can be checked with an os.fstat() call and an S_ISREG() call.

Yes.
Looking for EINVAL on sendfile() also works, and is probably more
reliable than using S_ISREG().

>> Regardless from sendfile() it might make sense to provide a high-level and generic send_file()
>> method accepting a file which promises to send it until EOF is reached, and under the hood it can
>> decide whether using sendfile() or not.
>
> That might be a useful thing to add for Python 3.5. If someone actually volunteers to write the code.

Where would this belong? BaseSelectorEventLoop?

> It also seems tricky if the file object has an internal buffer.

What do you mean?

Saúl Ibarra Corretgé

unread,

Nov 29, 2013, 12:04:34 PM11/29/13

to Giampaolo Rodola', Guido van Rossum, Thomas Hervé, python-tulip

> Yes.
> Looking for EINVAL on sendfile() also works, and is probably more
> reliable than using S_ISREG().
>

FWIW, this is what libuv does:

https://github.com/joyent/libuv/blob/master/src/unix/fs.c#L431

In case EINVAL, EIO, ENOTSOCK or EXDEV it uses an emulation.

--
Sa�l Ibarra Corretg�
http://bettercallsaghul.com

Guido van Rossum

unread,

Nov 29, 2013, 12:22:09 PM11/29/13

to Giampaolo Rodola', Thomas Hervé, python-tulip

On Fri, Nov 29, 2013 at 8:53 AM, Giampaolo Rodola' <g.ro...@gmail.com> wrote:

On Fri, Nov 29, 2013 at 1:10 AM, Guido van Rossum <gu...@python.org> wrote:
>
> On Thu, Nov 28, 2013 at 12:00 PM, Giampaolo Rodola' <g.ro...@gmail.com> wrote:
>>
>> On Thu, Nov 28, 2013 at 7:32 PM, Guido van Rossum <gu...@python.org> wrote:
>>>
>>> It's also extremely non-portable, having a different signature on BSD vs. Linux, and absent on Windows.
>>
>>
>> Please note that the signature is portable across all POSIX variants as long as you don't use headers/trailers/flag arguments, which in Python are basically useless (if you need to append or prepend some extra data you can just use send() from the plain socket object).
>
>
> Well, then why did someone wrap those parameters?

My idea was to provide a 1 to 1 interface 'just in case' and
discourage their use in the doc (which I did).
Maybe not a good idea after all.

Ah, I didn't realize the code was yours. :-) OTOH not everybody wants to write portable code and the os module exists to support them.

>> The way sendfile() behaves is similar to send() in that it:
>> - returns the number of bytes sent
>> - returns the same error codes on failure (EAGAIN, EWOULDBLOCK, etc on retry, ESHUTDOWN, ECONNABORTED, etc on disconnect)
>> - (extra) returns 0 when EOF is reached
>
>
> I suppose the last point is actually a direct corollary of the first (i.e. it returns 0 when it didn't write any bytes)?

It returns 0 only at EOF. If no bytes were sent it returns EAGAIN or
EWOULDBLOCK.

> Why do you need the use_sendfile argument? Why shouldn't it always try to use os.sendfile() when it exists?

Because the file might not be a regular file (e.g. a io.BytesIO instance).
Also, there are some documented problems with non regular filesystems
such as NFS, SMBFS/Samba and CIFS, where you explicitly want to avoid
sendfile(), see http://www.proftpd.org/docs/howto/Sendfile.html.

This seems a weird API: if you know you don't want to use os.sendfile(), why call it? A more reasonable API would be one that uses os.sendfile() if it can, and an emulation if it can't, where "can't" includes platforms that don't have os.sendfile() as well as source files that os.sendfile() on this platform doesn't support.

> There's also the issue that transport methods don't return coroutines or future (we went over the reason for this before). You'll have to design an API that can work with callbacks.
>
>>
>> Final note: on most POSIX platforms sendfile() works with regular fds only, so that should be taken into account.
>
>
> I suppose you mean that the *in* file descriptor must be a disk file? That can be checked with an os.fstat() call and an S_ISREG() call.

Yes.
Looking for EINVAL on sendfile() also works, and is probably more
reliable than using S_ISREG().

>> Regardless from sendfile() it might make sense to provide a high-level and generic send_file()
>> method accepting a file which promises to send it until EOF is reached, and under the hood it can
>> decide whether using sendfile() or not.
>
> That might be a useful thing to add for Python 3.5. If someone actually volunteers to write the code.

Where would this belong? BaseSelectorEventLoop?

It should be a Transport method, parallel to write(), I believe. The implementations are in selector_events.py and proactor_events.py.

> It also seems tricky if the file object has an internal buffer.

What do you mean?

I think your proposal has the caller open a Python (not Tulip) stream, and then sendfile() would be equivalent to reading bytes from that stream and writing them to the transport. If the app opens a buffered stream and reads one byte from it, there are likely more bytes in the buffer. Then the file descriptor's seek position does not correspond to the stream's position -- the FD's position would be at the end of the buffer, while the stream's position is 1 byte from the start. It's possible to define the semantics here, but you have to make sure the emulation code behaves the same way in all cases as the os.sendfile() version, since the caller may not be aware which is going to be used.

Reply all

Reply to author

Forward