bufio.Scanner: token too long

526 views
Skip to first unread message

Gobin Sougrakpam

unread,
Jul 13, 2020, 1:42:13 AM7/13/20
to golang-nuts
Hi Folks,


I encountered this error but was able to fix it after setting the scanner.Buffer size to a rather large number.

bufio.Scanner: token too long

Here is my fixed function:

func scanFile(f *os.File) error {
scanner := bufio.NewScanner(f)
scanner.Buffer(make([]byte, 0, 819200), 819200)
splitFunc := func(data []byte, atEOF bool) (advance int, token []byte, err error) {
for i := 0; i < len(data); i++ {
if data[i] == ',' {
return i + 1, data[:i], nil
}
if !atEOF {
return 0, nil, nil
}
}
return 0, data, bufio.ErrFinalToken
}
scanner.Split(splitFunc)
newfile, err := os.Create("test/new.txt")
if err != nil {
return err
}
writer := bufio.NewWriter(newfile)
for scanner.Scan() {
n, err := writer.Write(scanner.Bytes())
if err != nil {
return err
}
fmt.Printf("%d bytes written\n", n)
}

if err := scanner.Err(); err != nil {
return err
}
return nil
}


Now, the question is when I run this function successfully, the first token that is generated from the split function is 637 bytes.
What is taking up the buffer that I am getting the error when I set the buffer to smaller values?

Thanks.


Ian Lance Taylor

unread,
Jul 13, 2020, 1:49:55 AM7/13/20
to Gobin Sougrakpam, golang-nuts
On Sun, Jul 12, 2020 at 10:42 PM Gobin Sougrakpam
<gobinso...@gmail.com> wrote:
>
> I encountered this error but was able to fix it after setting the scanner.Buffer size to a rather large number.
>
> bufio.Scanner: token too long

...

> Now, the question is when I run this function successfully, the first token that is generated from the split function is 637 bytes.
> What is taking up the buffer that I am getting the error when I set the buffer to smaller values?

Please include code as plain text or as a link to the Go playground,
not in reverse. Plain text is easier to read in general. Thanks.

I don't understand the split function that you showed:

splitFunc := func(data []byte, atEOF bool) (advance int, token []byte,
err error) {
for i := 0; i < len(data); i++ {
if data[i] == ',' {
return i + 1, data[:i], nil
}
if !atEOF {
return 0, nil, nil
}
}
return 0, data, bufio.ErrFinalToken
}

Unless the very first character is a comma, that function is going to
keep returning 0, nil, nil until all the data is pulled into the
buffer. Perhaps you meant to put the atEOF check outside of the loop
over data.

Ian

Tamás Gulácsi

unread,
Jul 13, 2020, 1:52:20 AM7/13/20
to golang-nuts
Your  splitFunc always returns "0, nil, nil" if data[0]!=','.

Use "i := bytes.IndexByte(data, ',')" instead of this for cycle.
Reply all
Reply to author
Forward
0 new messages