bufio.Scanner - possible bug or doc err?

343 views
Skip to first unread message

Mark

unread,
Oct 12, 2023, 4:39:39 AM10/12/23
to golang-nuts
I'm reading Debian *Package files, some of which are over 1M lines long.
I used bufio.Scanner and found that it won't read past 1M lines (I'm using Go 1.21.1 linux/amd64).
Is this a limitation of bufio.Scanner? If so then it ought to be in the docs.
Or is it a bug?
Or maybe I made a mistake (although using bufio.Scanner seems easy)?
```
scanner := bufio.NewScanner(file)
        lino := 1
for scanner.Scan() {
line := scanner.Text()
                lino++
                ... // etc
        }
```
Anyway, I've switched to using bufio.Reader and that works great.

Rob Pike

unread,
Oct 12, 2023, 4:45:10 AM10/12/23
to Mark, golang-nuts
I just did a simple test with a 2M line file and it worked fine, so I suspect it's a bug in your code. But if not, please provide a complete working executable example, with data, to help identify the problem.

-rob


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/69f2fa03-c650-4c02-9470-51894dc56d1an%40googlegroups.com.

Mark

unread,
Oct 12, 2023, 1:20:45 PM10/12/23
to golang-nuts
I have written and attached an example that compares bufio.Reader and bufio.Scanner.
Here's the output from `go run .` (a line count followed by the first error encountered):
```
Reader 1333665 <nil>
Scanner 777758 bufio.Scanner: token too long
```
This probably _won't_ fail on your 2M line file; it looks like the problem is with the line length of a Debian Packages file. If you have a Debian-derived distro you could try replacing the filename in the file with one from `/var/lib/apt/lists/`.

The docs for bufio.Scanner do say
"Programs that need more control over error handling or large tokens, or must run sequential scans on a reader, should use bufio.Reader instead"
Perhaps it would be more helpful to mention what the token length limit is?
scan.go

Dan Kortschak

unread,
Oct 12, 2023, 2:39:07 PM10/12/23
to golan...@googlegroups.com
On Thu, 2023-10-12 at 10:20 -0700, 'Mark' via golang-nuts wrote:
> I have written and attached an example that compares bufio.Reader and
> bufio.Scanner.
> Here's the output from `go run .` (a line count followed by the first
> error encountered):
> ```
> Reader 1333665 <nil>
> Scanner 777758 bufio.Scanner: token too long
> ```
> This probably _won't_ fail on your 2M line file; it looks like the
> problem is with the line length of a Debian Packages file. If you
> have a Debian-derived distro you could try replacing the filename in
> the file with one from `/var/lib/apt/lists/`.
>
> The docs for bufio.Scanner do say
> "Programs that need more control over error handling or large tokens,
> or must run sequential scans on a reader, should use bufio.Reader
> instead"
> Perhaps it would be more helpful to mention what the token length
> limit is?
>

It may not be immediately obvious, but it is noted in the constants
section, https://pkg.go.dev/bufio#pkg-constants and in the docs for the
Buffer method, https://pkg.go.dev/bufio#Scanner.Buffer.

Ian Lance Taylor

unread,
Oct 12, 2023, 3:56:05 PM10/12/23
to Mark, golang-nuts
On Thu, Oct 12, 2023 at 10:21 AM 'Mark' via golang-nuts
<golan...@googlegroups.com> wrote:
>
> The docs for bufio.Scanner do say
> "Programs that need more control over error handling or large tokens, or must run sequential scans on a reader, should use bufio.Reader instead"
> Perhaps it would be more helpful to mention what the token length limit is?

It's MaxScanTokenSize: https://pkg.go.dev/bufio#pkg-constants . See
also https://pkg.go.dev/bufio#Scanner.Buffer .

Ian

Mark

unread,
Oct 13, 2023, 2:42:39 AM10/13/23
to golang-nuts
Yes, I can see now.

Perhaps consider changing:

Programs that need more control over error handling or large tokens, or must run sequential scans on a reader, should use bufio.Reader instead.

to:

Programs that need more control over error handling or large tokens (such as lines longer than MaxScanTokenSize), or must run sequential scans on a reader, should use bufio.Reader instead.

Just a thought.

Thanks.

Ian Lance Taylor

unread,
Oct 13, 2023, 3:40:45 PM10/13/23
to Mark, golang-nuts
On Thu, Oct 12, 2023 at 11:42 PM 'Mark' via golang-nuts
<golan...@googlegroups.com> wrote:
>
> Yes, I can see now.
>
> Perhaps consider changing:
>
> Programs that need more control over error handling or large tokens, or must run sequential scans on a reader, should use bufio.Reader instead.
>
> to:
>
> Programs that need more control over error handling or large tokens (such as lines longer than MaxScanTokenSize), or must run sequential scans on a reader, should use bufio.Reader instead.

Thanks, instead of that I added a link to Scanner.Buffer
(https://go.dev/cl/535216). I hope that will help guide people in the
right direction.

Ian


> On Thursday, October 12, 2023 at 8:56:05 PM UTC+1 Ian Lance Taylor wrote:
>>
>> On Thu, Oct 12, 2023 at 10:21 AM 'Mark' via golang-nuts
>> <golan...@googlegroups.com> wrote:
>> >
>> > The docs for bufio.Scanner do say
>> > "Programs that need more control over error handling or large tokens, or must run sequential scans on a reader, should use bufio.Reader instead"
>> > Perhaps it would be more helpful to mention what the token length limit is?
>>
>> It's MaxScanTokenSize: https://pkg.go.dev/bufio#pkg-constants . See
>> also https://pkg.go.dev/bufio#Scanner.Buffer .
>>
>> Ian
>
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/0392f8b3-a006-4bc0-aa54-3759aa0d3b7en%40googlegroups.com.

Mark

unread,
Oct 14, 2023, 3:31:28 AM10/14/23
to golang-nuts
Yes, that's a much better solution.
Reply all
Reply to author
Forward
0 new messages