Hi Tomasz,What kind of crazy app are you trying to write that would need os.sendfile()? I expect that if you just read the file into memory (e.g. one 4 MB block at a time) and use transport.write() your app will perform just fine. It will be a little better if you use the latest Tulip from the repo and Python 3.4.
300% improvement on a micro-benchmark doesn't mean much. (I recently made a change to write buffering that caused 1700% improvement on a micro-benchmark. :-)Benchmarking is experimental science. Treat it as such! (This also means that without a good idea of what you're setting out to prove or disprove you won't get useful results.)
Think about a benchmark setup that proves your point in a real-world app, make the benchmark repeatable, publish both the full benchmark code and your results (and a detailed description of the setup you used).
Le jeudi 28 novembre 2013 18:32:23 UTC+1, Guido van Rossum a écrit :300% improvement on a micro-benchmark doesn't mean much. (I recently made a change to write buffering that caused 1700% improvement on a micro-benchmark. :-)Benchmarking is experimental science. Treat it as such! (This also means that without a good idea of what you're setting out to prove or disprove you won't get useful results.)
Think about a benchmark setup that proves your point in a real-world app, make the benchmark repeatable, publish both the full benchmark code and your results (and a detailed description of the setup you used).
Sure, it's just a data point, it's still better than no data though :).
sendfile has been around for a long time though, and everybody seems to agree that it's a significant improvement (which makes sense, you reduce the amount of copying as much as possible).
Antoine gave another data point, the sendfile extension module talks about some similar numbers: http://code.google.com/p/pysendfile/. I think it's fair to say that sendfile is useful.
It's also extremely non-portable, having a different signature on BSD vs. Linux, and absent on Windows.
On Thu, Nov 28, 2013 at 7:32 PM, Guido van Rossum <gu...@python.org> wrote:It's also extremely non-portable, having a different signature on BSD vs. Linux, and absent on Windows.Please note that the signature is portable across all POSIX variants as long as you don't use headers/trailers/flag arguments, which in Python are basically useless (if you need to append or prepend some extra data you can just use send() from the plain socket object).
The way I see it is that it's not crucial for Tulip to expose this functionality right now but it would be wise to leave a door open so that sendfile() can be easily integrated later at some point.
AFAIK all decent FTP servers and many HTTP servers take advantage of sendfile() amongst which proftpd, vsftpd, apache and nginx:...therefore I personally have no doubt that the feature would be useful and the gain in terms of transfer speed substantial.
The way sendfile() behaves is similar to send() in that it:
- returns the number of bytes sent- returns the same error codes on failure (EAGAIN, EWOULDBLOCK, etc on retry, ESHUTDOWN, ECONNABORTED, etc on disconnect)
- (extra) returns 0 when EOF is reached
...that is why I think that this should be reasonably easy to implement by temporarily replacing the original underlying method using plain send() or something (at least this is what I did in pyftpdlib).
Regardless from sendfile() it might make sense to provide a high-level and generic send_file() method accepting a file which promises to send it until EOF is reached, and under the hood it can decide whether using sendfile() or not.
Something like:
def send_file(file, use_sendfile=False):
"""Send a file to the other peer and returns when EOF is reached.On POSIX if use_sendfile is True os.sendfile() iif available elsefall back on using plain send().The 'file' object should be a regular file providing a fileno() method.
>>> file = open(path, 'rb')>>> yield from transport.send_file(file)
"""
Final note: on most POSIX platforms sendfile() works with regular fds only, so that should be taken into account.
On Fri, Nov 29, 2013 at 1:10 AM, Guido van Rossum <gu...@python.org> wrote:My idea was to provide a 1 to 1 interface 'just in case' and
>
> On Thu, Nov 28, 2013 at 12:00 PM, Giampaolo Rodola' <g.ro...@gmail.com> wrote:
>>
>> On Thu, Nov 28, 2013 at 7:32 PM, Guido van Rossum <gu...@python.org> wrote:
>>>
>>> It's also extremely non-portable, having a different signature on BSD vs. Linux, and absent on Windows.
>>
>>
>> Please note that the signature is portable across all POSIX variants as long as you don't use headers/trailers/flag arguments, which in Python are basically useless (if you need to append or prepend some extra data you can just use send() from the plain socket object).
>
>
> Well, then why did someone wrap those parameters?
discourage their use in the doc (which I did).
Maybe not a good idea after all.
It returns 0 only at EOF. If no bytes were sent it returns EAGAIN or
>> The way sendfile() behaves is similar to send() in that it:
>> - returns the number of bytes sent
>> - returns the same error codes on failure (EAGAIN, EWOULDBLOCK, etc on retry, ESHUTDOWN, ECONNABORTED, etc on disconnect)
>> - (extra) returns 0 when EOF is reached
>
>
> I suppose the last point is actually a direct corollary of the first (i.e. it returns 0 when it didn't write any bytes)?
EWOULDBLOCK.
Because the file might not be a regular file (e.g. a io.BytesIO instance).
> Why do you need the use_sendfile argument? Why shouldn't it always try to use os.sendfile() when it exists?
Also, there are some documented problems with non regular filesystems
such as NFS, SMBFS/Samba and CIFS, where you explicitly want to avoid
sendfile(), see http://www.proftpd.org/docs/howto/Sendfile.html.
Yes.
> There's also the issue that transport methods don't return coroutines or future (we went over the reason for this before). You'll have to design an API that can work with callbacks.
>
>>
>> Final note: on most POSIX platforms sendfile() works with regular fds only, so that should be taken into account.
>
>
> I suppose you mean that the *in* file descriptor must be a disk file? That can be checked with an os.fstat() call and an S_ISREG() call.
Looking for EINVAL on sendfile() also works, and is probably more
reliable than using S_ISREG().
Where would this belong? BaseSelectorEventLoop?
>> Regardless from sendfile() it might make sense to provide a high-level and generic send_file()
>> method accepting a file which promises to send it until EOF is reached, and under the hood it can
>> decide whether using sendfile() or not.
>
> That might be a useful thing to add for Python 3.5. If someone actually volunteers to write the code.
What do you mean?
> It also seems tricky if the file object has an internal buffer.