Order not preserved when bufio scan.Bytes() sends over a channel

301 views
Skip to first unread message

Samuel Lampa

unread,
Aug 12, 2013, 7:11:45 AM8/12/13
to golan...@googlegroups.com
Hi,

When I loop over a file line by line with the bufio scan.Bytes() function, the order is not preserved when sending the output over a channel!

You can find my complete test setup, with the run_bytearray.sh script to run the test, here: 
https://github.com/samuell/bufchan_test

So basically:

1. I have an input file, input.txt, of 100 lines, with the lines numbered to the left (https://github.com/samuell/bufchan_test/blob/master/input.txt).
2. I run it with the code in https://github.com/samuell/bufchan_test/blob/master/test_bytearray.go ... which for the sake of the test uses unbuffered channels (BUFSIZE = 0), and 1 thread.
3. then, when I run this program and pipe it to another file, output.txt, there is a difference between the files (one of the lines, 49, is overwritten with line 98), more or less every time:

[samuel gotest]$ go run test_bytearray.go > output_bytearray.txt
[samuel gotest]$ diff input.txt output_bytearray.txt
49c49
< 49 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
---
> 98 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

This does not seem like expected behaviour, as channels should be FIFO, AFAIK? Or, what am I missing?

Just for reference, It works correctly if I create a string channel, and read with scan.Text() instead, like in https://github.com/samuell/bufchan_test/blob/master/test_string.go then I get the correct result, and the result from diff is empty:

[samuel gotest]$ go run test_string.go > output_string.txt
[samuel gotest]$ diff input.txt output_string.txt
[samuel gotest]$ 

The same goes, if writing out the results from scan.Bytes() ... then it also returns the lines in correct order:

[samuel gotest]$ go run test_bytearray_wochan.go > output_bytearray_wochan.txt
[samuel gotest]$ diff input.txt output_bytearray_wochan.txt
[samuel gotest]$ 
 
Best Regards
// Samuel

Dave Cheney

unread,
Aug 12, 2013, 7:16:17 AM8/12/13
to Samuel Lampa, golang-nuts
Reading the documentation for scanner.Bytes(),

func (s *Scanner) Bytes() []byte
Bytes returns the most recent token generated by a call to Scan. The
underlying array may point to data that will be overwritten by a
subsequent call to Scan. It does no allocation.

suggests to me that it is not safe to pass the []byte slice over a
channel as the receiver may receive a slice who's contents are being
mutated by another goroutine. This would also explain why when you ask
for the line as a string, everything works fine, due to the implicit
conversion.
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Jan Mercl

unread,
Aug 12, 2013, 7:16:37 AM8/12/13
to Samuel Lampa, golang-nuts
On Mon, Aug 12, 2013 at 1:11 PM, Samuel Lampa <samuel...@gmail.com> wrote:
> Hi,
>
> When I loop over a file line by line with the bufio scan.Bytes() function,
> the order is not preserved when sending the output over a channel!

The order is preserved, channels are FIFOs. You have to find a bug in your code.

OR

If you can create a minimal self-contained repro case then fill a bug.

-j

Volker Dobler

unread,
Aug 12, 2013, 7:18:52 AM8/12/13
to golan...@googlegroups.com
This has nothing to do with channels or FIFO or whatever.

and you'll see your bug.

V.

Dmitry Vyukov

unread,
Aug 12, 2013, 7:18:38 AM8/12/13
to Jan Mercl, golang-nuts, Samuel Lampa

Or under tip race detector

Samuel Lampa

unread,
Aug 12, 2013, 7:22:35 AM8/12/13
to Dave Cheney, golang-nuts
Hmm, good point. But the fact that I run all the Scan's in the same
goroutine, makes me think that I should not be stepping over my own toes
here at least ...

// Samuel
--
Developer at SNIC-UPPMAX www.uppmax.uu.se
Developer at Dept of Pharm Biosciences www.farmbio.uu.se

André Moraes

unread,
Aug 12, 2013, 7:33:35 AM8/12/13
to Samuel Lampa, Dave Cheney, golang-nuts
On Mon, Aug 12, 2013 at 8:22 AM, Samuel Lampa <samuel...@gmail.com> wrote:
> Hmm, good point. But the fact that I run all the Scan's in the same
> goroutine, makes me think that I should not be stepping over my own toes
> here at least ...

The problem isn't with Scan... it's where you use the data.

Change your code to do something like

buf := reader.Scan()
tmp := make([]byte, 0, len(buf)
copy(tmp, buf)
outputChan <-tmp

And see if the error is gone


--
André Moraes
http://amoraes.info

Samuel Lampa

unread,
Aug 12, 2013, 7:35:34 AM8/12/13
to André Moraes, Dave Cheney, golang-nuts
Indeed, with this change, it produces correct output:

for scan.Scan() {
newBytesArray := append([]byte(nil), scan.Bytes()...)
// Copy the array
ch <- newBytesArray
}


Samuel Lampa

unread,
Aug 12, 2013, 7:36:48 AM8/12/13
to André Moraes, Dave Cheney, golang-nuts
On 08/12/2013 01:33 PM, Andr� Moraes wrote:
> The problem isn't with Scan... it's where you use the data.
>
> Change your code to do something like
>
> buf := reader.Scan()
> tmp := make([]byte, 0, len(buf)
> copy(tmp, buf)
> outputChan <-tmp
>
> And see if the error is gone

Indeed, with this output, it produces the correct output:

for scan.Scan() {
newBytesArray := append([]byte(nil), scan.Bytes()...)
// Copy the array
ch <- newBytesArray
}

Ok, many thanks for the quick help!

Best
// Samuel

Rob Pike

unread,
Aug 12, 2013, 9:16:59 AM8/12/13
to Samuel Lampa, André Moraes, Dave Cheney, golang-nuts
This is a standard problem of I/O design. Unless told otherwise one
should assume that slices of bytes returned by an I/O routine occupy
shared storage and should be copied if they are to be overwritten or
shared further.

Of course, copying is expensive, which is why read routines take a
buffer from the user rather than return one, so the user can decide
whether to pay the price.

-rob

Samuel Lampa

unread,
Aug 12, 2013, 9:27:42 AM8/12/13
to Rob Pike, André Moraes, Dave Cheney, golang-nuts
Thanks for the clarification, I appreciate it!

(I guess it is not an uncommon pitfall for script kiddies entering the
world of compiled languages, like me :") )

Best
// Samuel
Reply all
Reply to author
Forward
0 new messages