How to return an empty final token from a bufio.SplitFunc

167 views
Skip to first unread message

Scott Pakin

unread,
Jul 22, 2015, 12:43:29 AM7/22/15
to golang-nuts
I'm stumped.  I'm trying to write a custom bufio.SplitFunc that can return an empty final token.  Alas, my attempts oscillate between not returning the token at all and blowing up with a 100 empty tokens without progressing error.  Consider the following basic code structure:


The MyScanLines function is copied almost verbatim from bufio.ScanLines.  In this version, the string "foo\nbar\nbaz" (no trailing newline) correctly scans into three tokens.  However, the string "foo\nbar\nbaz\n" (with a trailing newline) scans into those same three tokens when I in fact want the four tokens "foo", "bar", "baz", and "".

Removing the first if statement lets control flow to the return len(data), data, nil, which properly returns an empty slice, but because it consumes no data, the function gets called again and again and again with the same arguments.

I can think of some gross kludges to make MyScanLines work, for example using external state to hold the last token or returning a custom error type that embeds the final token.  Nevertheless, I'd like to believe there's a clean, elegant way to return an empty final token from a bufio.SplitFunc.  Is there?

Thanks,
— Scott

Rob Pike

unread,
Jul 22, 2015, 1:11:35 AM7/22/15
to Scott Pakin, golang-nuts
Your split function needs to return only a single final token. As
written it's just returning an infinite number of empty strings at
EOF.

https://play.golang.org/p/Ivm2C02cb4

-rob
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Scott Pakin

unread,
Jul 22, 2015, 4:00:35 PM7/22/15
to golang-nuts
On Tuesday, July 21, 2015 at 11:11:35 PM UTC-6, Rob 'Commander' Pike wrote:
Your split function needs to return only a single final token. As
written it's just returning an infinite number of empty strings at
EOF.

https://play.golang.org/p/Ivm2C02cb4 

Thanks.  That's better than what I was considering doing.

I admit I do find it a little annoying that bufio.Scan{Bytes,Lines,Runes,Words} can work fine as ordinary functions, but dealing with my minor variation requires a method on a helper object.  Would you be amenable to a patch to bufio.Scan to treat an io.EOF (or a bufio.EndOfScan or whatever) returned from a bufio.SplitFunc as a sentinel that the returned token is valid but that the function doesn't want to be called again?  This would cover not only my use case of separators (as opposed to terminators) but also the case of scanning a stream that might contain an end marker (à la Perl's __END__ or TeX's \endinput) followed by arbitrary bogus text.  I don't believe this change would affect any existing code.

— Scott

Rob Pike

unread,
Jul 22, 2015, 6:26:22 PM7/22/15
to Scott Pakin, golang-nuts
Sounds reasonable, but please just file an issue at golang.org/issue
for Go 1.6. You can assign it to me. The tree is frozen except for
release-blocking fixes at the moment.

-rob

Scott Pakin

unread,
Jul 23, 2015, 1:04:46 AM7/23/15
to golang-nuts, r...@golang.org
On Wednesday, July 22, 2015 at 4:26:22 PM UTC-6, Rob 'Commander' Pike wrote:
Sounds reasonable, but please just file an issue at golang.org/issue
for Go 1.6. You can assign it to me. The tree is frozen except for
release-blocking fixes at the moment.

Done.  It's issue #11836.  I don't know how to assign it to you or label it as Go 1.6, though.

— Scott
Reply all
Reply to author
Forward
0 new messages