[bufio.Scanner] skip empty lines

2,422 views
Skip to first unread message

gwenn...@gmail.com

unread,
Feb 28, 2015, 3:27:19 AM2/28/15
to golan...@googlegroups.com
Hello,
I would like to use the bufio.Scanner to read lines but skip empty lines:
This example does not work because bytes.TrimSpace may return nil.

Would it be possible to enhance the bufio.Scanner such as
a nil token but a positive advance returned by the split function mean "ignore/skip this data" ?
diff --git a/src/bufio/scan.go b/src/bufio/scan.go
index 364d159..4123ce3 100644
--- a/src/bufio/scan.go
+++ b/src/bufio/scan.go
@@ -138,6 +138,8 @@ func (s *Scanner) Scan() bool {
                                        }
                                }
                                return true
+                       } else if advance > 0 {
+                               continue
                        }
                }
                // We cannot generate a token with what we are holding.

Regards.

Dan Kortschak

unread,
Feb 28, 2015, 3:57:24 AM2/28/15
to gwenn...@gmail.com, golan...@googlegroups.com
Why not do that work in the scan loop? http://play.golang.org/p/xOvyvCHXLt

Now you keep possibility of tracking the line number the code works and the std lib doesn't need to change.

gwenn

unread,
Feb 28, 2015, 5:29:02 AM2/28/15
to Dan Kortschak, golan...@googlegroups.com
Because the scan loop is written by clients.
Each client needs to repeat the skip test.
Whereas the logic can be shared in one split function implementation
(which may be stateful to keep track of the line number):
http://play.golang.org/p/RGJ-qe4hYh
The real problem I want to solve is to skip comment line (starting
with '#') and empty line in TDF files...

roger peppe

unread,
Feb 28, 2015, 9:01:13 AM2/28/15
to gwenn, Dan Kortschak, golan...@googlegroups.com
Something like this perhaps? I assumed you didn't really want
to remove all leading/trailing white space from *all* lines.

http://play.golang.org/p/ItwhUpdj7r
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Axel Wagner

unread,
Feb 28, 2015, 10:32:12 AM2/28/15
to roger peppe, gwenn, Dan Kortschak, golan...@googlegroups.com
I think that's all far too complex. I don't see what's the problem with:
http://play.golang.org/p/xqqoHl8AH7

roger peppe

unread,
Feb 28, 2015, 12:28:42 PM2/28/15
to Axel Wagner, gwenn, Dan Kortschak, golan...@googlegroups.com
Yeah, that's nice.

gwenn

unread,
Feb 28, 2015, 1:19:48 PM2/28/15
to roger peppe, Axel Wagner, Dan Kortschak, golan...@googlegroups.com
When you implement a split function,
you can't return a nil token except:
a) to request more data:
"SplitFunc can return (0, nil, nil) to signal the Scanner to read more
data into the slice and try again".
b) to notify the end of scan:
if atEOF && len(data) == 0 {
return 0, nil, nil
}

If you don't follow this rule (bytes.TrimSpace may return nil),
http://play.golang.org/p/RGJ-qe4hYh
your implementation partially works (rule a) (it skips data)
until you are near the end of file where it stops prematurely (rule b).

I am certainly biased but why not extends this behaviour (skipping
data) even at end of file ?
Something like:
http://golang.org/src/text/template/parse/lex.go#L135
But for Scanner.

Regards.

Axel Wagner

unread,
Feb 28, 2015, 1:36:56 PM2/28/15
to gwenn, roger peppe, Dan Kortschak, golan...@googlegroups.com
Hi,

gwenn <gwenn...@gmail.com> writes:
> When you implement a split function,

Why do you need to implement a split-function? It just sounds like total
overkill to me, to do that, if you get a *lot* cleaner and clearer code
without it.

> your implementation partially works (rule a) (it skips data)
> until you are near the end of file where it stops prematurely (rule
> b).

My Implementation doesn't use SplitFunc at all. Have you even looked at
it? Because I get the impression that you didn't. I wrap a
bufio.Scanner and just skip all returned empty lines (note, that
len([]byte(nil)) == 0).

Best,

Axel

gwenn...@gmail.com

unread,
Feb 28, 2015, 1:57:12 PM2/28/15
to golan...@googlegroups.com, gwenn...@gmail.com, rogp...@gmail.com, dan.ko...@adelaide.edu.au
Because the encoding/csv package cannot parse TDF correctly (https://github.com/golang/go/issues/3150).
And because I already have a split function implementation:
with a workaround.
I just hope I will be able to remove it...

Axel Wagner

unread,
Feb 28, 2015, 3:14:21 PM2/28/15
to gwenn...@gmail.com, golan...@googlegroups.com, gwenn...@gmail.com, rogp...@gmail.com, dan.ko...@adelaide.edu.au
Okay, if you really can't just rewrite the code to use a
scanner-wrapper, then I can't help you, I'm afraid, never bothered
rolling my own split functions. I still believe you would fair better
with calling into bufio.SplitLines and then testing the result,
though. But I don't know exactly how that would look, so I'll shut up
now :)

gwenn...@gmail.com writes:

> Because the encoding/csv package cannot parse TDF correctly
> (https://github.com/golang/go/issues/3150).
> And because I already have a split function implementation:
> https://github.com/gwenn/yacr/blob/master/reader.go#L220
> with a workaround.
> I just hope I will be able to remove it...
>
> On Saturday, February 28, 2015 at 7:36:56 PM UTC+1, Axel Wagner wrote:
>>
>> Hi,
>>
>> gwenn <gwenn...@gmail.com <javascript:>> writes:
>> > When you implement a split function,
>>
>> Why do you need to implement a split-function? It just sounds like total
>> overkill to me, to do that, if you get a *lot* cleaner and clearer code
>> without it.
>>
>> > your implementation partially works (rule a) (it skips data)
>> > until you are near the end of file where it stops prematurely (rule
>> > b).
>>
>> My Implementation doesn't use SplitFunc at all. Have you even looked at
>> it? Because I get the impression that you didn't. I wrap a
>> bufio.Scanner and just skip all returned empty lines (note, that
>> len([]byte(nil)) == 0).
>>
>> Best,
>>
>> Axel
>>
>
Reply all
Reply to author
Forward
0 new messages