bufio: ReadN method

416 views
Skip to first unread message

Vladimir Mihailenco

unread,
Aug 12, 2012, 3:22:59 PM8/12/12
to golan...@googlegroups.com
Package bufio has method Peek, that returns the next n bytes without advancing the reader. But there is no simple/efficient way to peek n bytes and advance the reader like this:

// ReadN uses Peek to read next n bytes and advances the reader.
func (b *Reader) ReadN(n int) ([]byte, error) {
    buf, err := b.Peek(n)
    if err == nil {
        b.r += n
    } else {
        b.r = b.w
    }
    return buf, err
}

I know that standard answer is to fork package you need, but I think such method will be widely useful for parsing protocols that contain data length (e.g. Redis protocol):

$5\r\n // 5 is len("hello")
hello\r\n

which can be parsed like this:

rd.ReadLine() // reads header "$5\r\n"
// parse data length
rd.ReadN(5+2) // reads data "hello\r\n"

As I understand currently I have to use following code that requires additional memory allocation without great need:

buf := make([]byte, 5+2)
rd.Read(buf)

So my proposal is to add ReadN method to the bufio package. What do you think?

Rob Pike

unread,
Aug 12, 2012, 3:35:11 PM8/12/12
to Vladimir Mihailenco, golan...@googlegroups.com
Just call Read(buf[0:N])

-rob

Vladimir Mihailenco

unread,
Aug 12, 2012, 3:50:23 PM8/12/12
to Rob Pike, golan...@googlegroups.com
That means that I have to allocate buf separately. But in my case I don't need it:

buf, err := rd.ReadN(10)
if err != nil {}
return string(buf), nil

That may sound like micro optimisation, but ReadLine behaves in a similar way.

Rob Pike

unread,
Aug 12, 2012, 3:51:50 PM8/12/12
to Vladimir Mihailenco, golan...@googlegroups.com
On Sun, Aug 12, 2012 at 12:50 PM, Vladimir Mihailenco
<vladimi...@gmail.com> wrote:
> That means that I have to allocate buf separately.

No it doesn't. If you already have a buffer of length >= N, buf[0:N]
will do the job without allocation.

-rob

Vladimir Mihailenco

unread,
Aug 12, 2012, 4:02:15 PM8/12/12
to Rob Pike, golan...@googlegroups.com
What I am proposing is following:

// init
rd := bufio.NewReader(r)

// usage

buf, err := rd.ReadN(10)
if err != nil {}
return string(buf), nil

Instead of:

// init
rd := bufio.NewReader(r, 1024)
buf := make([]byte, 1024)

// usage
_, err := rd.Read(buf[:10])
if err != nil {}
return string(buf[:10]), nil

No it doesn't. If you already have a buffer of length >= N, buf[0:N]
will do the job without allocation.
Thanks for explanation. I understand how arrays and slices work.

Message has been deleted

Rob Pike

unread,
Aug 12, 2012, 9:22:42 PM8/12/12
to Vladimir Mihailenco, golan...@googlegroups.com
On Sun, Aug 12, 2012 at 1:02 PM, Vladimir Mihailenco
<vladimi...@gmail.com> wrote:
> What I am proposing is following:
>
> // init
> rd := bufio.NewReader(r)
>
> // usage
> buf, err := rd.ReadN(10)
> if err != nil {}
> return string(buf), nil

This design requires allocation on every call to ReadN. The existing
design allows reuse of the buffer. We should not encourage designs
that force allocation.

Since this function is so easy to implement and has such an
undesirable property, it should not be part of the bufio package.

-rob
Message has been deleted

Jesse McNelis

unread,
Aug 12, 2012, 10:33:29 PM8/12/12
to Peter Russell, golan...@googlegroups.com, Vladimir Mihailenco
On Mon, Aug 13, 2012 at 11:57 AM, Peter Russell <retep....@ymail.com> wrote:
> func ReadN(r *bufio.Reader, n int) ([]byte, error) {
> p, err := r.Peek(n)
> if err == nil {
> r.Skip(n)
> }
> return p, err
> }
>

I would assume that r.Skip() would invalidate the slice returned from
r.Peak() as Skipping forward may require additional reads from the
underlying io.Reader.
So you'd have to copy the data in p before calling Skip() and thus
you're basically doing a Read() anyway.




--
=====================
http://jessta.id.au
Message has been deleted

Vladimir Mihailenco

unread,
Aug 13, 2012, 5:08:38 AM8/13/12
to Peter Russell, golan...@googlegroups.com, jes...@jessta.id.au, Rob Pike
This is implementation I currently use: https://github.com/vmihailenco/bufio/blob/master/bufio.go#L287 . It behaves like ReadSlice and other Read* methods - returns slice of existing bufio.Reader.buf array.

>As far as using this feature in a Redis client is concerned, you will need to ensure that the bufio buffer size is larger than any value that you retrieve. This does not seem desirable in a general purpose Redis client.
This is how I use ReadN method to read value that is greater than buffer size: https://github.com/vmihailenco/redis/blob/master/parser.go#L121 . Basically there are 2 cases:

1. if value fits in bufio.Reader - I use bufio.Reader slice. It is efficient and very simple.
2. if value is larger than bufio.Reader buffer size - I allocate new larger buffer and read into it.

Test case: https://github.com/vmihailenco/redis/blob/master/redis_test.go#L100

I assume that 99% reads will use case #1, because usually Redis is not used to store large amount of data.

There is no big perfomance penalty of allocating new buffer on every read (~230 ns for new buffer vs ~80 ns for reusing existing slice at my PC), but ReadN looks very convenient (for me).

On Mon, Aug 13, 2012 at 5:45 AM, Peter Russell <retep....@ymail.com> wrote:
I would assume that r.Skip() would invalidate the slice returned from
r.Peak() as Skipping forward may require additional reads from the
underlying io.Reader.

Skip does not require additional reads because there are at least n bytes in the buffer.

bryanturley

unread,
Aug 13, 2012, 2:32:17 PM8/13/12
to golan...@googlegroups.com
There is no big perfomance penalty of allocating new buffer on every read (~230 ns for new buffer vs ~80 ns for reusing existing slice at my PC), but ReadN looks very convenient (for me).


By your numbers it is nearly 3 times slower... Also did you take into account the time the GC is gonna spend chewing up all those random allocs vs the single reused buffer?

When I started learning go I wanted something like ReadN, but then I learned how to use slices 8)


Lunix Watt

unread,
Aug 15, 2012, 4:54:57 AM8/15/12
to golan...@googlegroups.com, Vladimir Mihailenco
So why hash.Hash.Sum appends the hash at the end of an []byte ?!

Jesse McNelis

unread,
Aug 15, 2012, 5:03:21 AM8/15/12
to Lunix Watt, golan...@googlegroups.com, Vladimir Mihailenco
On Wed, Aug 15, 2012 at 6:54 PM, Lunix Watt <con...@gmail.com> wrote:
> So why hash.Hash.Sum appends the hash at the end of an []byte ?!
>

Hash.Hash.Sum() appends to the given slice because often the caller
won't actually know how big to make the slice to fit the hashsum.
A caller to Read() knows how much they want to read.

--
=====================
http://jessta.id.au

Lunix Watt

unread,
Aug 15, 2012, 5:10:27 AM8/15/12
to golan...@googlegroups.com, Lunix Watt, Vladimir Mihailenco, jes...@jessta.id.au
hash.Hash.Size() exists for this purpose.

Vladimir Mihailenco

unread,
Aug 15, 2012, 7:20:12 AM8/15/12
to Lunix Watt, golan...@googlegroups.com, jes...@jessta.id.au
- ReadLine reads data into existing (bufio.Reader.buf) slice until it founds '\r\n'.
- ReadN reads data into existing slice until it reads N bytes.
- Both ReadLine and ReadN may return ErrBufferFull if buffer is too small.
- Read reads data into existing buffer and than copies data into another buffer (provided by user). It uses 2 buffers and involves 1 copy call. There is case when it reads directly to user buffer.

I don't understand why there is so much confusion, but obviously I can't explain things properly.

Jan Mercl

unread,
Aug 15, 2012, 7:28:32 AM8/15/12
to Vladimir Mihailenco, Lunix Watt, golan...@googlegroups.com, jes...@jessta.id.au
On Wed, Aug 15, 2012 at 1:20 PM, Vladimir Mihailenco <vladimi...@gmail.com> wrote:
- ReadLine reads data into existing (bufio.Reader.buf) slice until it founds '\r\n'.
- ReadN reads data into existing slice until it reads N bytes.
- Both ReadLine and ReadN may return ErrBufferFull if buffer is too small.
- Read reads data into existing buffer and than copies data into another buffer (provided by user). It uses 2 buffers and involves 1 copy call. There is case when it reads directly to user buffer.

I don't understand why there is so much confusion, but obviously I can't explain things properly.

IMHO Rob's posts says it all: One can have ReadN easily via Read(buf[:N]), so there's no need for another API function. Additionally, 'Read' is a method of the well known/widely used io.Reader interface, so it should be preferred wherever possible - then the code works with _any_ io.Reader, which is really neat.

-j

Daniel Morsing

unread,
Aug 15, 2012, 8:16:24 AM8/15/12
to Jesse McNelis, Lunix Watt, golan...@googlegroups.com, Vladimir Mihailenco
On Wed, Aug 15, 2012 at 11:03 AM, Jesse McNelis <jes...@jessta.id.au> wrote:
> Hash.Hash.Sum() appends to the given slice because often the caller
> won't actually know how big to make the slice to fit the hashsum.
> A caller to Read() knows how much they want to read.
>

Sum() taking an argument is actually a bit of a hack.

Originally, the Sum() function only returned a byte slice, but this
caused way too much memory allocation.

The intended use for Sum() is:

var x [hashsize]byte
for _, h := range collection {
result := h.Sum(x[:0])
// use result for something
}

This way, you don't need to allocate a new slice for every call to Sum().

Maybe it would make sense to make Sum() into a Read() when go 2 comes around.

Regards,
Daniel Morsing
Message has been deleted

Chen Yufei

unread,
Feb 20, 2013, 1:07:32 AM2/20/13
to golan...@googlegroups.com, Lunix Watt

On Wednesday, August 15, 2012 7:20:12 PM UTC+8, Vladimir Mihailenco wrote:
- ReadLine reads data into existing (bufio.Reader.buf) slice until it founds '\r\n'.
- ReadN reads data into existing slice until it reads N bytes.
- Both ReadLine and ReadN may return ErrBufferFull if buffer is too small.
- Read reads data into existing buffer and than copies data into another buffer (provided by user). It uses 2 buffers and involves 1 copy call. There is case when it reads directly to user buffer.

I don't understand why there is so much confusion, but obviously I can't explain things properly.
 
I'm also requiring more API from the bufio package, with the goal to avoid unnecessary copy from the internal buffer of bufio.Reader to a user provided buffer.

I guess your key point for this API is to avoid the copy, which is not stated very clear.

I'll just fork the bufio package and see how the new API is going to be used. If it goes well, maybe I'll raise this request again later.
Reply all
Reply to author
Forward
0 new messages