bufio.Scanner: token too long

6,540 views
Skip to first unread message

Peter Kleiweg

unread,
Nov 17, 2013, 11:29:57 AM11/17/13
to golan...@googlegroups.com
Processing a text with long lines of text, using bufio.Scanner, scanner.Err() gives:

    bufio.Scanner: token too long

How do I fix this?

Jan Mercl

unread,
Nov 17, 2013, 11:47:32 AM11/17/13
to Peter Kleiweg, golang-nuts

IIRC it's a defensive measure, "fix it" is not applicable. You'll have to use some other mechanism to process such long data lines, for example a custom written/generated lexer.

-j

Dan Kortschak

unread,
Nov 17, 2013, 3:49:55 PM11/17/13
to Peter Kleiweg, golan...@googlegroups.com
If you know the maximum length of the tokens you will be reading, copy the bufio.Scanner code into your project and change the const MaxScanTokenSize value. If you don't know the max token size, you will need to do as Jan says, which might involve just removing that test from your copy of the code - be prepared for bad things when the training wheels are not there and the road gets bumpy.

Dave Cheney

unread,
Nov 17, 2013, 3:52:45 PM11/17/13
to Dan Kortschak, Peter Kleiweg, golan...@googlegroups.com
Out of interest, what is the makeup of the text which blows up
bufio.Scanner. A while back some bioinformatics guys tried to blow up
bufio.Scanner with their data sets and it survived.
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Dan Kortschak

unread,
Nov 17, 2013, 4:02:57 PM11/17/13
to Dave Cheney, Peter Kleiweg, golan...@googlegroups.com
I could see some abusive use of FASTA/FASTQ format doing this, but this would be very rare.

brendan...@gmail.com

unread,
May 26, 2014, 12:43:46 PM5/26/14
to golan...@googlegroups.com, Dave Cheney, Peter Kleiweg
 I blew up bufio.Scanner this morning attempting to parse bioinformatics data (specifically, 1000 Genomes population frequency data - not even big stuff!). Took me a while to track down the error since there's no warning or anything emitted.  Is there a way to tell if a scanner has reached the end of the file or aborted due to a line that wasn't able to be read? A little confusing since both seem to be handled the same way. 

Dan Kortschak

unread,
May 26, 2014, 7:14:08 PM5/26/14
to brendan...@gmail.com, golan...@googlegroups.com
What was the cause? What do the 1000G lines look like that you are
parsing?

Rui Ueyama

unread,
May 26, 2014, 7:20:28 PM5/26/14
to brendan...@gmail.com, golang-nuts, Dave Cheney, Peter Kleiweg
I think you can check the return value of (*Scanner).Err() to tell if it aborted due to the length limit. If it's ErrTooLong, that's the error. If nil, there was no error.


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dave Cheney

unread,
May 26, 2014, 7:23:44 PM5/26/14
to golan...@googlegroups.com, Dave Cheney, Peter Kleiweg, brendan...@gmail.com
Can you show the code you used and the outputs you saw ?

brendan...@gmail.com

unread,
May 26, 2014, 7:31:49 PM5/26/14
to golan...@googlegroups.com, Dave Cheney, Peter Kleiweg, brendan...@gmail.com
 The line was tab-separated data, and at least one of the tokens was around 200,000 characters long. It was in a loop like this:

 for scanner.Scan() {
  line := scanner.Text()
  //process the line.... 
 }


 Rui, thanks for the tip on using ErrTooLong() ! Sounds like that's what I'm looking for. 

Kevin Gillette

unread,
May 26, 2014, 10:50:36 PM5/26/14
to golan...@googlegroups.com
bufio.Scanner deals with tokens of reasonable size.  If you need to deal with arbitrarily long lines, then http://golang.org/pkg/bufio/#Reader.ReadLine was specifically designed for that.

Dan Kortschak

unread,
May 27, 2014, 9:22:47 AM5/27/14
to brendan...@gmail.com, golan...@googlegroups.com, Dave Cheney, Peter Kleiweg
1000G people have inflicted some of the most appalling formats on us.
Reply all
Reply to author
Forward
0 new messages