I'm looking into implementing a small performance optimization on pyuv, which consists on creating the Python string object early (on alloc_cb) and then passing its internal buffer to libuv, so later I don't need to copy any buffer. This works because the Python object is not visible anywhere until is given to the user in the on_read_cb, so we can mangle the internals ;-)
However, the suggested size (64k) is quite big and shrinking the Python string down so much actually has the opposite effect I'm trying to achieve :-S If I disregard the suggested_size and use a smaller value, say 4096 results are as expected, that is, there is a performance gain in preallocating the Python string object. (this pattern is used by CPython itself)
It would be nice to be able to set the suggested size when calling uv_read_start, so that the user can control it. While I can do it in pyuv I's rather mirror libuv as much as possible.
Having that said, I noticed #495 and #505, so I'm wondering if there is any timeline for it. Is it planned for version 0.10.0? If not, would a patch adding suggested_size to uv_read_start be welcome? The idea would be to store the suggested_size in the uv_stream_t struct and use it as we do now. (if zero is passed the default 64k would be set)
> I'm looking into implementing a small performance optimization on pyuv,
> which consists on creating the Python string object early (on alloc_cb) and
> then passing its internal buffer to libuv, so later I don't need to copy any
> buffer. This works because the Python object is not visible anywhere until
> is given to the user in the on_read_cb, so we can mangle the internals ;-)
> However, the suggested size (64k) is quite big and shrinking the Python
> string down so much actually has the opposite effect I'm trying to achieve
> :-S If I disregard the suggested_size and use a smaller value, say 4096
> results are as expected, that is, there is a performance gain in
> preallocating the Python string object. (this pattern is used by CPython
> itself)
> It would be nice to be able to set the suggested size when calling
> uv_read_start, so that the user can control it. While I can do it in pyuv
> I's rather mirror libuv as much as possible.
> Having that said, I noticed #495 and #505, so I'm wondering if there is any
> timeline for it. Is it planned for version 0.10.0? If not, would a patch
> adding suggested_size to uv_read_start be welcome? The idea would be to
> store the suggested_size in the uv_stream_t struct and use it as we do now.
> (if zero is passed the default 64k would be set)
There's no need for that. suggested_size is exactly that, a
suggestion. If you don't want to allocate 64K, then don't.
libuv asks for 64K because that lets us (nearly always) drain the
kernel receive buffer with a single syscall.
On a tangential note, I've played around with querying the number of
pending bytes but the extra syscalls degrade performance by a
substantial margin, 25-40% on most benchmarks.
Alternative approaches I haven't tried yet are:
a) Query the size of the receive buffer.
Pro: Needs to be done only once.
Con: Returned value is not always accurate. On Linux, for example,
the returned value is about twice the size of the actual receive
buffer.
b) Query the MTU.
Pro: Needs to be done only once, returned value should be
accurate. Also works for UDP sockets.
Con: I guess it's technically possible for the MTU to change on the fly.
Ben Noordhuis wrote:
> On Tue, Aug 28, 2012 at 1:55 PM, Sa l Ibarra Corretg <sag...@gmail.com> wrote:
>> Hi all,
>> I'm looking into implementing a small performance optimization on pyuv,
>> which consists on creating the Python string object early (on alloc_cb) and
>> then passing its internal buffer to libuv, so later I don't need to copy any
>> buffer. This works because the Python object is not visible anywhere until
>> is given to the user in the on_read_cb, so we can mangle the internals ;-)
>> However, the suggested size (64k) is quite big and shrinking the Python
>> string down so much actually has the opposite effect I'm trying to achieve
>> :-S If I disregard the suggested_size and use a smaller value, say 4096
>> results are as expected, that is, there is a performance gain in
>> preallocating the Python string object. (this pattern is used by CPython
>> itself)
>> It would be nice to be able to set the suggested size when calling
>> uv_read_start, so that the user can control it. While I can do it in pyuv
>> I's rather mirror libuv as much as possible.
>> Having that said, I noticed #495 and #505, so I'm wondering if there is any
>> timeline for it. Is it planned for version 0.10.0? If not, would a patch
>> adding suggested_size to uv_read_start be welcome? The idea would be to
>> store the suggested_size in the uv_stream_t struct and use it as we do now.
>> (if zero is passed the default 64k would be set)
> There's no need for that. suggested_size is exactly that, a
> suggestion. If you don't want to allocate 64K, then don't.
Fair enough :-)
> libuv asks for 64K because that lets us (nearly always) drain the
> kernel receive buffer with a single syscall.
I see. I don't have really accurate benchmarks, but when testing (with pyuv and ab) making smaller allocations yielded more requests per second.
Of course, I guess this depends on how much you usually read + how the VM works...
> On a tangential note, I've played around with querying the number of
> pending bytes but the extra syscalls degrade performance by a
> substantial margin, 25-40% on most benchmarks.
> Alternative approaches I haven't tried yet are:
> a) Query the size of the receive buffer.
> Pro: Needs to be done only once.
> Con: Returned value is not always accurate. On Linux, for example,
> the returned value is about twice the size of the actual receive
> buffer.
> b) Query the MTU.
> Pro: Needs to be done only once, returned value should be
> accurate. Also works for UDP sockets.
> Con: I guess it's technically possible for the MTU to change on the fly.
Thanks for sharing! I'm still undecided, mainly because I don't want to add extra stuff to pyuv with is not on libuv. I already did it once and I kind of regret it.
On Tue, Aug 28, 2012 at 4:44 PM, Saúl Ibarra Corretgé <sag...@gmail.com> wrote:
>> libuv asks for 64K because that lets us (nearly always) drain the
>> kernel receive buffer with a single syscall.
> I see. I don't have really accurate benchmarks, but when testing (with pyuv
> and ab) making smaller allocations yielded more requests per second.
> Of course, I guess this depends on how much you usually read + how the VM
> works...
malloc(lots_of_bytes) can be expensive. Details vary across
implementations but it's usually the case that small allocations come
from a pool of memory arenas while larger allocations are done with
mmap() or brk() syscalls.
In node.js, we alleviate that by allocating big slabs of memory (10 MB
or more) and parceling that out in our alloc_cb[1][2][3].
Ben Noordhuis wrote:
> On Tue, Aug 28, 2012 at 4:44 PM, Saúl Ibarra Corretgé<sag...@gmail.com> wrote:
>>> libuv asks for 64K because that lets us (nearly always) drain the
>>> kernel receive buffer with a single syscall.
>> I see. I don't have really accurate benchmarks, but when testing (with pyuv
>> and ab) making smaller allocations yielded more requests per second.
>> Of course, I guess this depends on how much you usually read + how the VM
>> works...
> malloc(lots_of_bytes) can be expensive. Details vary across
> implementations but it's usually the case that small allocations come
> from a pool of memory arenas while larger allocations are done with
> mmap() or brk() syscalls.
> In node.js, we alleviate that by allocating big slabs of memory (10 MB
> or more) and parceling that out in our alloc_cb[1][2][3].
Thanks for the links. It does look indeed related and would explain what is going on. I guess it's not such a bad idea to make it configurable in pyuv then :-)
One thing that annoys me is that you don't know the exact size on the
callback.
On .net it would be incredible just to create a bytearray of that size
and just fill it up.
Now I have to come up with some ByteBuffer voodoo, a class which
basically just points at the start and at the end of a byte array.
On Saturday, September 22, 2012 1:43:03 AM UTC+2, Andrius Bentkus wrote:
> One thing that annoys me is that you don't know the exact size on the > callback. > On .net it would be incredible just to create a bytearray of that size > and just fill it up. > Now I have to come up with some ByteBuffer voodoo, a class which > basically just points at the start and at the end of a byte array.