Streams, reading and the suggested size

Saúl Ibarra Corretgé

unread,

Aug 28, 2012, 7:55:35 AM8/28/12

to li...@googlegroups.com

Hi all,

I'm looking into implementing a small performance optimization on pyuv,
which consists on creating the Python string object early (on alloc_cb)
and then passing its internal buffer to libuv, so later I don't need to
copy any buffer. This works because the Python object is not visible
anywhere until is given to the user in the on_read_cb, so we can mangle
the internals ;-)

However, the suggested size (64k) is quite big and shrinking the Python
string down so much actually has the opposite effect I'm trying to
achieve :-S If I disregard the suggested_size and use a smaller value,
say 4096 results are as expected, that is, there is a performance gain
in preallocating the Python string object. (this pattern is used by
CPython itself)

It would be nice to be able to set the suggested size when calling
uv_read_start, so that the user can control it. While I can do it in
pyuv I's rather mirror libuv as much as possible.

Having that said, I noticed #495 and #505, so I'm wondering if there is
any timeline for it. Is it planned for version 0.10.0? If not, would a
patch adding suggested_size to uv_read_start be welcome? The idea would
be to store the suggested_size in the uv_stream_t struct and use it as
we do now. (if zero is passed the default 64k would be set)

Kind regards,

--
Sa�l Ibarra Corretg�
http://saghul.net/blog | http://about.me/saghul

Ben Noordhuis

unread,

Aug 28, 2012, 8:48:04 AM8/28/12

to li...@googlegroups.com

There's no need for that. suggested_size is exactly that, a
suggestion. If you don't want to allocate 64K, then don't.

libuv asks for 64K because that lets us (nearly always) drain the
kernel receive buffer with a single syscall.

On a tangential note, I've played around with querying the number of
pending bytes but the extra syscalls degrade performance by a
substantial margin, 25-40% on most benchmarks.

Alternative approaches I haven't tried yet are:

a) Query the size of the receive buffer.
Pro: Needs to be done only once.
Con: Returned value is not always accurate. On Linux, for example,
the returned value is about twice the size of the actual receive
buffer.

b) Query the MTU.
Pro: Needs to be done only once, returned value should be
accurate. Also works for UDP sockets.
Con: I guess it's technically possible for the MTU to change on the fly.

Saúl Ibarra Corretgé

unread,

Aug 28, 2012, 10:44:56 AM8/28/12

to li...@googlegroups.com

Hi Ben,

Ben Noordhuis wrote:

> On Tue, Aug 28, 2012 at 1:55 PM, Sa�l Ibarra Corretg�<sag...@gmail.com> wrote:
>> Hi all,
>>
>> I'm looking into implementing a small performance optimization on pyuv,
>> which consists on creating the Python string object early (on alloc_cb) and
>> then passing its internal buffer to libuv, so later I don't need to copy any
>> buffer. This works because the Python object is not visible anywhere until
>> is given to the user in the on_read_cb, so we can mangle the internals ;-)
>>
>> However, the suggested size (64k) is quite big and shrinking the Python
>> string down so much actually has the opposite effect I'm trying to achieve
>> :-S If I disregard the suggested_size and use a smaller value, say 4096
>> results are as expected, that is, there is a performance gain in
>> preallocating the Python string object. (this pattern is used by CPython
>> itself)
>>
>> It would be nice to be able to set the suggested size when calling
>> uv_read_start, so that the user can control it. While I can do it in pyuv
>> I's rather mirror libuv as much as possible.
>>
>> Having that said, I noticed #495 and #505, so I'm wondering if there is any
>> timeline for it. Is it planned for version 0.10.0? If not, would a patch
>> adding suggested_size to uv_read_start be welcome? The idea would be to
>> store the suggested_size in the uv_stream_t struct and use it as we do now.
>> (if zero is passed the default 64k would be set)
>
> There's no need for that. suggested_size is exactly that, a
> suggestion. If you don't want to allocate 64K, then don't.
>

Fair enough :-)

> libuv asks for 64K because that lets us (nearly always) drain the
> kernel receive buffer with a single syscall.
>

I see. I don't have really accurate benchmarks, but when testing (with
pyuv and ab) making smaller allocations yielded more requests per second.

Of course, I guess this depends on how much you usually read + how the
VM works...

> On a tangential note, I've played around with querying the number of
> pending bytes but the extra syscalls degrade performance by a
> substantial margin, 25-40% on most benchmarks.
>
> Alternative approaches I haven't tried yet are:
>
> a) Query the size of the receive buffer.
> Pro: Needs to be done only once.
> Con: Returned value is not always accurate. On Linux, for example,
> the returned value is about twice the size of the actual receive
> buffer.
>
> b) Query the MTU.
> Pro: Needs to be done only once, returned value should be
> accurate. Also works for UDP sockets.
> Con: I guess it's technically possible for the MTU to change on the fly.
>

Thanks for sharing! I'm still undecided, mainly because I don't want to
add extra stuff to pyuv with is not on libuv. I already did it once and
I kind of regret it.

Regards,

Ben Noordhuis

unread,

Aug 28, 2012, 10:59:07 AM8/28/12

to li...@googlegroups.com

On Tue, Aug 28, 2012 at 4:44 PM, Saúl Ibarra Corretgé <sag...@gmail.com> wrote:
>> libuv asks for 64K because that lets us (nearly always) drain the
>> kernel receive buffer with a single syscall.
>
> I see. I don't have really accurate benchmarks, but when testing (with pyuv
> and ab) making smaller allocations yielded more requests per second.
>
> Of course, I guess this depends on how much you usually read + how the VM
> works...

malloc(lots_of_bytes) can be expensive. Details vary across
implementations but it's usually the case that small allocations come
from a pool of memory arenas while larger allocations are done with
mmap() or brk() syscalls.

In node.js, we alleviate that by allocating big slabs of memory (10 MB
or more) and parceling that out in our alloc_cb[1][2][3].

[1] https://github.com/joyent/node/blob/3b17f3b/src/slab_allocator.cc
[2] https://github.com/joyent/node/blob/3b17f3b/src/stream_wrap.cc#L165-170
[3] https://github.com/joyent/node/blob/3b17f3b/src/stream_wrap.cc#L196-198

Saúl Ibarra Corretgé

unread,

Aug 28, 2012, 11:37:31 AM8/28/12

to li...@googlegroups.com

Thanks for the links. It does look indeed related and would explain what is going on. I guess it's not such a bad idea to make it configurable in pyuv then :-)

Thanks,

Andrius Bentkus

unread,

Sep 21, 2012, 7:43:02 PM9/21/12

to libuv

One thing that annoys me is that you don't know the exact size on the
callback.
On .net it would be incredible just to create a bytearray of that size
and just fill it up.
Now I have to come up with some ByteBuffer voodoo, a class which
basically just points at the start and at the end of a byte array.

Bert Belder

unread,

Sep 21, 2012, 7:44:24 PM9/21/12

to li...@googlegroups.com

Huh, the uv_buf_t remains untouched right?

Reply all

Reply to author

Forward