bufio.Reader.Peek vs. bytes.Buffer.Next?

Jens-Uwe Mager

unread,

Nov 12, 2011, 3:23:42 PM11/12/11

to golan...@googlegroups.com

For parsing large binary files I would like to write code that works well wether the file is mmap'ed and wrapped into a bytes.Buffer or used normally via a buffered reader. These two interfaces are very similar, with the exception for example Next vs. Peek. Both do the same but are named differently. And would it be reasonable to propose a Skip(n int) interface to efficiently skip over big blobs of data?

peterGo

unread,

Nov 12, 2011, 4:37:12 PM11/12/11

to golang-nuts

Jens-Uwe,

No, because you are asking for file operations on a buffer. A buffer
is not a file.

Peter

Jens-Uwe Mager

unread,

Nov 12, 2011, 5:16:02 PM11/12/11

to golan...@googlegroups.com

Well, to me it looks pretty much like a file. It has most of the interfaces that makes it work like a file in most of the cases, with some exceptions.

peterGo

unread,

Nov 12, 2011, 6:12:22 PM11/12/11

to golang-nuts

Jens-Uwe,

On Nov 12, 5:16 pm, Jens-Uwe Mager <juma...@gmail.com> wrote:
> Well, to me it looks pretty much like a file. It has most of the interfaces
> that makes it work like a file in most of the cases, with some exceptions.

So, you agree that, while a buffer and a file are similar, they are
not the same.

This issue has come up several times. For example,

"The design of bytes.Buffer is to implement a stream; changing to a
random access object would not only make the code significantly more
complex and slower - and speed is very important to this code - but
would change the meaning of "read" from "consume" to "scan"."

"It would be much wiser to make a different type, probably one each
for bytes and strings, that implemented seeking and scanning rather
than consuming as the model for reading."

-rob [pike]

http://groups.google.com/group/golang-nuts/msg/a0c49db1db343ce0

Peter

Jens-Uwe Mager

unread,

Nov 13, 2011, 8:24:30 AM11/13/11

to golan...@googlegroups.com

I did some more benchmarks with the kind of files I have to work with, and it boils down to using mmap and bytes slices, this is so much faster than any other I/O that I will use that model exclusively.

Jim Robinson

unread,

Nov 17, 2011, 1:15:00 AM11/17/11

to golan...@googlegroups.com

Would you have numbers you can share with us?

I'd be curious to know the numbers!

Jim

Jens-Uwe Mager

unread,

Nov 17, 2011, 9:48:53 AM11/17/11

to golan...@googlegroups.com

I do work with gridded binary files (GRIB, meteorological data tightly packed in binary). For scanning and indexing 522Mb of test data in 106 files, each file between a few KB and up to 16Mb I get the following time result.

with convential file IO:

real    0m39.588s
user    0m1.056s
sys    0m1.239s

with mmap and byte slices:

real    0m0.096s
user    0m0.031s
sys    0m0.049s

This is on an older Macbook Pro 15 with GOARCH=amd64. I did run each command multiple times, so the buffer cache was primed.

I did work with very large graphical files (TIFF mostly) in a previous job for a very long time. Switching over to mmap was a real boon there as well.

Michael Jones

unread,

Nov 17, 2011, 12:44:26 PM11/17/11

to golan...@googlegroups.com

Presumably you "seek" a great deal.

--

Michael T. Jones | Chief Technology Advocate | m...@google.com | +1 650-335-5765

Jens-Uwe Mager

unread,

Nov 17, 2011, 1:15:45 PM11/17/11

to golan...@googlegroups.com

Indeed, that is the point of the operation. The indexing means finding headers describing variable length data in the binary file, the header detailing type and length of the data. The data itself are big arrays of floating point data. Doing it with byte slices pointing to the mmap'ed file just is the ideal type of representation for this kind of file.

Reply all

Reply to author

Forward