How to detect newline when reading words

4,798 views
Skip to first unread message

HarrydB

unread,
Oct 19, 2011, 2:50:59 PM10/19/11
to golang-nuts
Can somebody tell me how to detect a line ending when scanning space
separated words?

I have input from an io.reader which contains lines like
word1 word2 word3 \n
word4 word4 word6 \n

As a result i want for each line a []string like
sentence[0] = [word1, word2, word3]
sentence[1] = [word4, word5, word6]

I can scan each character manually and create the strings myselve, but
I feel there there should be an easier way with fmt.Fscan for example.
I do not see how to get that to work properly though.

Does anyone know how to do this nicely?

--
Harry

Paul Borman

unread,
Oct 19, 2011, 6:53:21 PM10/19/11
to HarrydB, golang-nuts
If you don't want to parse it yourself you can use Split on newlines and then Split the resulting strings on whitespace.

Gustavo Niemeyer

unread,
Oct 19, 2011, 7:07:03 PM10/19/11
to HarrydB, golang-nuts
> Can somebody tell me how to detect a line ending when scanning space
> separated words?
>
> I have input from an io.reader which contains lines like

b := bufio.NewReader(r)
line, err := b.ReadSlice('\n')
if err != nil {
// handle it!
}

If the there's the potential for someone to have malicious input in
your input, use ReadLine instead and take isPrefix in consideration.

http://golang.org/pkg/bufio/#Reader.ReadSlice
http://golang.org/pkg/bufio/#Reader.ReadLine

To split the words have a look at strings.Split:

http://golang.org/pkg/strings#Split

--
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/plus
http://niemeyer.net/twitter
http://niemeyer.net/blog

-- I never filed a patent.

Thorolf Jahnsen

unread,
Oct 19, 2011, 6:57:23 PM10/19/11
to golang-nuts
I'd do it with:
http://golang.org/pkg/bufio/#Reader.ReadLine
to read the lines from io.reader
http://golang.org/pkg/strings/#Split
and to split to words.

Thorolf Jahnsen

unread,
Oct 19, 2011, 6:58:42 PM10/19/11
to golang-nuts
to split to words.

On 19 Okt., 20:50, HarrydB <harrydb1...@gmail.com> wrote:

HarrydB

unread,
Oct 19, 2011, 8:18:04 PM10/19/11
to golang-nuts
Hmm, yes but that would mean going over each character twice, once to
find the newline and once to find the spaces. Seems a bit wasteful
especially since I have several hundreds of megabytes of plain text to
process.

On Oct 20, 1:07 am, Gustavo Niemeyer <gust...@niemeyer.net> wrote:
> > Can somebody tell me how to detect a line ending when scanning space
> > separated words?
>
> > I have input from an io.reader which contains lines like
>
> b := bufio.NewReader(r)
> line, err := b.ReadSlice('\n')
> if err != nil {
>     // handle it!
>
> }
>
> If the there's the potential for someone to have malicious input in
> your input, use ReadLine instead and take isPrefix in consideration.
>
> http://golang.org/pkg/bufio/#Reader.ReadSlicehttp://golang.org/pkg/bufio/#Reader.ReadLine
>
> To split the words have a look at strings.Split:
>
> http://golang.org/pkg/strings#Split
>
> --
> Gustavo Niemeyerhttp://niemeyer.nethttp://niemeyer.net/plushttp://niemeyer.net/twitterhttp://niemeyer.net/blog

Kyle Lemons

unread,
Oct 19, 2011, 9:16:27 PM10/19/11
to HarrydB, golang-nuts
word1 word2 word3 \n
word4 word4 word6 \n

Is that actually a space before the newline?
 

As a result i want for each line a []string like
sentence[0] = [word1, word2, word3]
sentence[1] = [word4, word5, word6]

I can scan each character manually and create the strings myselve, but
I feel there there should be an easier way with fmt.Fscan for example.
I do not see how to get that to work properly though.

None of the standard functions will help you with this in particular, except for strings.Split.  If you split by spaces and then go through and find words that start with \n, you can do your division with minimal recomputation.
 
Does anyone  know how to do this nicely?

You could also use strings.IndexAny to find the next whitespace, and then either append the word or move on to the next sentence depending on what kind it is. 

HarrydB

unread,
Oct 20, 2011, 12:51:58 PM10/20/11
to golang-nuts


On Oct 20, 3:16 am, Kyle Lemons <kev...@google.com> wrote:
> Is that actually a space before the newline?

Yes it is.

Thanks for all the suggestions. I was asking because it seemed like
something that might be done trivially with the standard library.
I now wrote a function that does exactly what I want (although it may
be a bit slow still):

func scanUntil(r *bufio.Reader, delim byte) (tokens []string, err
os.Error) {
t := make([]byte, 0, 32)
c, err := r.ReadByte()

for err == nil && c != delim {
if isSpace(c) {
if len(t) > 0 {
tokens = append(tokens, string(t))
t = t[0:0]
}
} else {
t = append(t, c)
}
c, err = r.ReadByte()
}
return
}

Kyle Lemons

unread,
Oct 20, 2011, 5:01:50 PM10/20/11
to HarrydB, golang-nuts
func scanUntil(r *bufio.Reader, delim byte) (tokens []string, err
os.Error) {
       t := make([]byte, 0, 32)
       c, err := r.ReadByte()

       for err == nil && c != delim {
               if isSpace(c) {
                       if len(t) > 0 {
                               tokens = append(tokens, string(t))
                               t = t[0:0]
                       }
               } else {
                       t = append(t, c)
               }
               c, err = r.ReadByte()
       }
       return
}

scanUntil is almost exactly r.ReadLine
Reply all
Reply to author
Forward
0 new messages