Loop through lines in a file

10,406 views
Skip to first unread message

robfig

unread,
Jan 25, 2012, 9:37:37 PM1/25/12
to golan...@googlegroups.com
What is the "best practice" on looping through lines in a file?  

Right now, the simple code that I end up writing is this: 

fileBytes, err := ioutil.ReadFile(filename)
if err != nil {
  .. 
}
for _, line := range string.Split(string(fileBytes), "\n") {
  ..
}

but that has some downsides:
- it reads the whole file at once
- it does not handle non-linux line endings

Other, more complete solutions, look like a huge pain.  This one is the only relevant snippet on Google page 1 for "golang loop through lines in a file" 

Is there a simple way that I'm missing?  Is it worth adding one to core?  

e.g. ioutil.ReadLines(file *os.File) (<-chan string)  

file, _ := Open(filename)
for line := range ioutil.ReadLines(file) {
  ..
}


John Asmuth

unread,
Jan 25, 2012, 9:59:18 PM1/25/12
to golan...@googlegroups.com
There's this: https://github.com/skelterjohn/exp/blob/master/iochan/iochan.go

Just something I was fooling around with a while back. Let's you do this: https://github.com/skelterjohn/exp/blob/master/iochan/iochan_test.go

Krzysztof Kowalik

unread,
Jan 25, 2012, 10:09:19 PM1/25/12
to golan...@googlegroups.com
Hi,


r := bufio.NewReader(file)
for line, _, err := r.ReadLine(); err != io.EOF {
    // ...
}

Cheers!  nu7
---


2012/1/26 John Asmuth <jas...@gmail.com>

John Asmuth

unread,
Jan 25, 2012, 10:18:05 PM1/25/12
to golan...@googlegroups.com


On Wednesday, January 25, 2012 9:59:18 PM UTC-5, John Asmuth wrote:
There's this: https://github.com/skelterjohn/exp/blob/master/iochan/iochan.go

Just something I was fooling around with a while back. Let's you do this: https://github.com/skelterjohn/exp/blob/master/iochan/iochan_test.go

Jessta

unread,
Jan 25, 2012, 10:34:34 PM1/25/12
to Krzysztof Kowalik, golan...@googlegroups.com
On Thu, Jan 26, 2012 at 2:09 PM, Krzysztof Kowalik <ch...@nu7hat.ch> wrote:
> Hi,
>
> you can use bufio (http://weekly.golang.org/pkg/bufio/#Reader.ReadLine):
>
> r := bufio.NewReader(file)
> for line, _, err := r.ReadLine(); err != io.EOF {
>     // ...
> }
>

That will work fine as long as you're certain no line will ever be
more than 4096 bytes(default bufio buffer size).
Ignoring the prefix return value isn't a good idea. Returning an error
value stating that the file was in an incorrect format is a good idea,
or at least panic if your assumptions are incorrect.


--
=====================
http://jessta.id.au

Krzysztof Kowalik

unread,
Jan 25, 2012, 10:40:58 PM1/25/12
to Jessta, golan...@googlegroups.com
Jessta, it's just an example - everything you've said is described in docs i linked to :)

2012/1/26 Jessta <jes...@jessta.id.au>

Martin Kanarr

unread,
Jan 25, 2012, 11:49:11 PM1/25/12
to Krzysztof Kowalik, golan...@googlegroups.com
I've been looking for the "best" way, or the "Go way" for doing this as well. This example does not work though, you'll get syntax errors. The syntax errors will go away if you add a semicolon after EOF, but then it ends up looping forever with line == the first line of the file.


Martin

Kyle Lemons

unread,
Jan 26, 2012, 4:51:50 AM1/26/12
to Martin Kanarr, Krzysztof Kowalik, golan...@googlegroups.com
I almost always do them in for{} loops.

for {
  line, prefix, err := lines.ReadLine()
  if prefix { ... }
  if err == os.EOF { break }
  if err != nil { ... }
  ...
}

Not particularly concise, but it's explicit, and it's pretty easy to recognize and then you don't have to think about it too much whenever you see it.

roger peppe

unread,
Jan 26, 2012, 5:51:43 AM1/26/12
to golan...@googlegroups.com
> What is the "best practice" on looping through lines in a file?

i usually use bufio.Reader.ReadString('\n'),
and strip off any trailing \r\n or \n as necessary.

i do think the current design of ReadLine is unfortunate - the
most common case (reading a line regardless of how
long it is, with \r\n or \n stripped) is satisfied by none
of the existing calls.

as ever, the design is complicated by the fact that the
final line may or may not be newline-terminated, and
we'd like to be able to preserve that information.
but in my experience most callers don't care about this.

so i'd be happy if the existing ReadLine was renamed to ReadLineSlice
and there were two additional calls:

// ReadLine reads a line and returns it as a string,
// not including the end-of-line bytes.
// If you need to know whether the final line is
// terminated, use ReadLineSlice instead.
func (r *Reader) ReadLine() (string, error)

// ReadLineBytes reads a line and returns it as a []byte,
// not including the end-of-line bytes.
// If you need to know whether the final line is
// terminated, use ReadLineSlice instead.
func (r *Reader) ReadLineBytes() ([]byte, error)

so the canonical "read lines" loop can look like this:

for {
line, err := r.ReadLine()
if err != nil {
break
}
doSomething(line)
}

which is a lot closer to ideal IMHO.

it's not going to happen for Go-1 though.

unread,
Jan 26, 2012, 6:39:38 AM1/26/12
to golang-nuts
On Jan 26, 11:51 am, roger peppe <rogpe...@gmail.com> wrote:
> > What is the "best practice" on looping through lines in a file?
>
> i usually use bufio.Reader.ReadString('\n'),
> and strip off any trailing \r\n or \n as necessary.

A problem is that bufio.Reader has no method which would

- return bufio.Reader's internal buffer (if the delimiter was found in
the internal buffer)

- automatically create and return a new []byte if it is impossible to
return bufio.Reader's internal buffer (that is: if the size is greater
than 4096 bytes)

roger peppe

unread,
Jan 26, 2012, 7:17:05 AM1/26/12
to ⚛, golang-nuts
On 26 January 2012 11:39, ⚛ <0xe2.0x...@gmail.com> wrote:
> On Jan 26, 11:51 am, roger peppe <rogpe...@gmail.com> wrote:
>> > What is the "best practice" on looping through lines in a file?
>>
>> i usually use bufio.Reader.ReadString('\n'),
>> and strip off any trailing \r\n or \n as necessary.
>
> A problem is that bufio.Reader has no method which would
>
> - return bufio.Reader's internal buffer (if the delimiter was found in
> the internal buffer)
>
> - automatically create and return a new []byte if it is impossible to
> return bufio.Reader's internal buffer (that is: if the size is greater
> than 4096 bytes)

i think that's an optimisation that's unnecessary in most cases.
if you find that you're allocating too much in a particular
program, it's simple to write a function that does this.

func readLine(r *bufio.Reader) ([]byte, error) {
line, isPrefix, err := r.ReadLine()
if !isPrefix {
return line, err
}
buf := append([]byte(nil), line...)
for isPrefix && err == nil {
line, isPrefix, err = r.ReadLine()
buf = append(buf, line...)
}
return buf, err
}

although to be fair it's not possible to give this the same
semantics as my proposed ReadLine as there's no
way AFAICS to avoid the possibility of returning a line
and an error at the same time.

Sonia Keys

unread,
Jan 26, 2012, 2:09:52 PM1/26/12
to golan...@googlegroups.com
For quick and dirty stuff, I use ReadFile, Split, range, and TrimSpace.

If by best practice though, you mean writing code that can handle unusual cases and is robust in production work, you want bufio.ReadLine in a for loop as Kyle showed, and you want to handle each return value however is appropriate for your application.

Check the error value, and for isPrefix, stop and think, what would it mean if you encountered a line longer than 4K?  Usually it means the user has opened the wrong file and the appropriate action is to stop immediately.  Do you happen to be reading a file where you expect lines up to 64K in length?  Then create your bufio.Reader of that size.  Do you really want unlimited line length?  Unlimited, really?  That's usually a bad idea and I think it's great that it is not the default behavior.  If you want to build up long lines from 4K pieces, append makes that easy, but really you should be making the conscious decision to do that by writing a couple of lines of code.  I also like that ReadLine doesn't allocate.  You decide to allocate just what you want to allocate and there is less waste and churn.

There is no one simple way.  The simple way that is right for one application will not be right for another.

Johann Höchtl

unread,
Jan 26, 2012, 3:23:38 PM1/26/12
to golan...@googlegroups.com
A ReadLine returning a line regardless of how long it is, is subject to eat all your Vram, unless you are certain that it will never exceed 1GB ;)

Johann

Michael Jones

unread,
Jan 26, 2012, 2:20:08 PM1/26/12
to golan...@googlegroups.com
There is also the issue, worth stating directly, that a malicious user could arrange to send you a gigabyte "line." Likely best to know before the malloc. ;-)
--
Michael T. Jones | Chief Technology Advocate  | m...@google.com |  +1 650-335-5765

rob...@yext.com

unread,
Jan 27, 2012, 12:00:01 PM1/27/12
to golan...@googlegroups.com
That looks like a great proposal.  Too bad it can't make the Go-1 cut. 
Reply all
Reply to author
Forward
0 new messages