reading with scanner.Bytes()

1,290 views
Skip to first unread message

chri...@uber.com

unread,
Aug 29, 2016, 11:30:48 PM8/29/16
to golang-nuts
I am reading a file line by line, and send it to a chan []byte, and consume it on another goroutine.

However, I found I must create a new []byte, because the scanner.Bytes() returned a []byte slice that's shared, and the scanner may still write to the []byte slice.

How to efficiently create a new []byte slice that's not shared?

The simple way I can think of is to []byte(string(bytes)).
Or am I approach this correctly at all?

Chris

        scanner := bufio.NewScanner(file)
        for scanner.Scan() {
            // this conversion to string and then to []byte is needed.
            // calling scanner.Bytes() will cause malformed lines.
            outChannel <- []byte(scanner.Text())
        }

Caleb Spare

unread,
Aug 29, 2016, 11:42:28 PM8/29/16
to chri...@uber.com, golang-nuts
You can do

b0 := scanner.Bytes()
b1 := make([]byte, len(b0))
copy(b0, b1)
outChannel <- b0
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Chris Lu

unread,
Aug 30, 2016, 12:15:49 AM8/30/16
to Caleb Spare, golang-nuts
Nice. Thanks!

            b0 := scanner.Bytes()
            b1 := make([]byte, len(b0))
            copy(b1, b0)
            outChannel <- b1

Chris

Tamás Gulácsi

unread,
Aug 30, 2016, 12:32:17 AM8/30/16
to golang-nuts
If you can, you should reuse those slices, e.g. with sync.Pool.

adon...@google.com

unread,
Aug 30, 2016, 5:35:16 PM8/30/16
to golang-nuts, chri...@uber.com
On Monday, 29 August 2016 23:30:48 UTC-4, chri...@uber.com wrote:
I am reading a file line by line, and send it to a chan []byte, and consume it on another goroutine.

However, I found I must create a new []byte, because the scanner.Bytes() returned a []byte slice that's shared, and the scanner may still write to the []byte slice.

How to efficiently create a new []byte slice that's not shared?
 
If you're concerned about efficiency, don't send a copy of every token over a channel; consume it in the same goroutine.  A typical scanner is a stateful function (or a method of an object) that returns a pair (token int, text []bytes), where the int describes the kind of token you are looking at.  You only need to look at the text for literal strings and numbers, and you usually don't need to retain a copy of it.

Rob's talk about scanning with channels is a lot of fun, but I wouldn't use that approach in production.

Mateusz Czapliński

unread,
Aug 30, 2016, 7:11:29 PM8/30/16
to golang-nuts, chri...@uber.com


W dniu wtorek, 30 sierpnia 2016 05:30:48 UTC+2 użytkownik chri...@uber.com napisał:
How to efficiently create a new []byte slice that's not shared?
The simple way I can think of is to []byte(string(bytes)).

Alternatively, I'd expect the following should work:

  outChannel <- append([]byte(nil), scanner.Bytes()...)
  // or:
  outChannel <- append([]byte{}, scanner.Bytes()...)

though as to efficiency, I can't say much.

/M.

Rob Pike

unread,
Aug 30, 2016, 8:29:49 PM8/30/16
to Mateusz Czapliński, golang-nuts, chri...@uber.com
There's a 50% chance the receiver wants a string, in which case it would be simpler to use scanner.Text().

-rob


--
Reply all
Reply to author
Forward
0 new messages