byte reader to rune reader wrapper?

838 views
Skip to first unread message

krolaw

unread,
Apr 18, 2013, 7:07:21 PM4/18/13
to golan...@googlegroups.com
Hi,

I've written an http proxy server which checks the first 16k of each page for certain attributes using regular expression matching.

if checkRegexp.MatchReader(bufio.NewReader(io.LimitReader(reader, 16384))) {
    // Found - do something
}

reader is wrapped with gzip or zlib reader as necessary.

It works well, but I'm using bufio to provide rune reading functionality for the match reader.  As I'm trying to minimise memory allocations, bufio seems overkill.  Although there are byte slice to rune readers provided, there doesn't seem to be a straight byte reader to rune reader.  Is a buffer required to convert bytes to runes?  Have I completely missed what I am looking for?

Thanks.

Tamás Gulácsi

unread,
Apr 19, 2013, 12:17:01 AM4/19/13
to golan...@googlegroups.com
If memory is valueable, you can try bufio.Scanner, but if you only allocate that 16k once and reuse it (with ie not bufio.Reader but ReadFull to have that buffer full), than I think you are ok.

If you need Peek, than implement your own bufio.Reader with some leaky channel implementation of a buffer pool.

Jesse McNelis

unread,
Apr 19, 2013, 12:35:10 AM4/19/13
to krolaw, golang-nuts
On Fri, Apr 19, 2013 at 9:07 AM, krolaw <kro...@gmail.com> wrote:
It works well, but I'm using bufio to provide rune reading functionality for the match reader.  As I'm trying to minimise memory allocations, bufio seems overkill.  Although there are byte slice to rune readers provided, there doesn't seem to be a straight byte reader to rune reader.  Is a buffer required to convert bytes to runes?  Have I completely missed what I am looking for?


There are 1 to 4 bytes per rune in utf8.
So you either need to make up to four 1 byte reads per rune or make one read of at least 4 bytes and buffer it per rune.
If you're worried about the size of the default bufio buffer,
you can set it yourself with, http://golang.org/pkg/bufio/#NewReaderSize with the minimum size being 16 bytes.

The other option is to buffer the reads yourself in to a slice and use the utf8 package to find runes in it. But that seems like overkill just to avoid the allocation.
--
=====================
http://jessta.id.au

Reply all
Reply to author
Forward
0 new messages