Why is there no proper ReadLine function?

2,311 views
Skip to first unread message

Peter Kleiweg

unread,
Oct 31, 2012, 7:23:36 PM10/31/12
to golang-nuts
There is bufio.ReadLine(), but that will only give you a partial line
if the line is very long. And it doesn't work with Mac line-endings,
because it only checks for Unix and DOS line-endings.

The doc for bufio.ReadLine() says most people should just use
bufio.ReadString('\n') instead, but that only works with Unix line-
endings.

What I would want is a function that returns a single line, all line-
endings removed, either \n, \r, or \r\n (even \n\r ?), and complete
lines, not that stuff with prefixes. And it should return the last
line without error if it has non-zero length, and is missing a line-
ending.

Sure I can write this function myself, but I think it is a basic
enough functionality to belong in the standard library.

Rory McGuire

unread,
Nov 1, 2012, 4:51:38 AM11/1/12
to golan...@googlegroups.com
Which limits should this new ReadLine function have?

The current built in solutions are the simplest and cleanest, and they make people that
want to write their own version have to think about it.

Andrew Gerrand

unread,
Nov 1, 2012, 5:03:54 AM11/1/12
to Rory McGuire, golang-nuts
ReadString and ReadBytes don't have a limit. IIUC, Peter is asking for a ReadLine that doesn't have a limit.

Andrew


--
 
 

Rory McGuire

unread,
Nov 1, 2012, 5:09:29 AM11/1/12
to Andrew Gerrand, golang-nuts
Exactly, so anyone that can supply this fictional reader with data can cause the OS to run out of memory.

Andrew Gerrand

unread,
Nov 1, 2012, 5:20:18 AM11/1/12
to Rory McGuire, golang-nuts
Well, yes. But I don't see how that's different to ReadBytes or ReadString, which already exist.

It's a shame that ReadLine is named ReadLine, because ReadLine would be a great name for the suggested function.

Andrew

Rory McGuire

unread,
Nov 1, 2012, 5:54:40 AM11/1/12
to Andrew Gerrand, golang-nuts
:D good point, you win.

I think there should at least be some limit, in those functions to though.

I doubt a valid line will ever be longer than 10000 runes. I use 1.0.3 and there is no limit in ReadBytes or ReadSlice.

How would one use ReadBytes safely esp. in a server that relies on it (such as telnet)?

Thanks,

Kevin Gillette

unread,
Nov 1, 2012, 6:01:00 AM11/1/12
to golan...@googlegroups.com, Rory McGuire
ReadLine is the only one of the four (including ReadBytes, ReadSlice, and ReadString) that is actually purposed for reading lines, and without application-specific information, it's impossible to know how long a line is expected to be, and in some applications, the line length is unknowable. ReadLine retains the flexibility to be useful in the worst input case (when you don't know the line length and it may be very long) or the best algorithmic case (when you can process the line in one pass, without needing to store it fully).

If an application has clear constraints on the input, such as a limited length line, or the need to buffer the entire line, then ReadBytes, ReadSlice, or ReadString might as well be used.

Kevin Gillette

unread,
Nov 1, 2012, 6:05:24 AM11/1/12
to golan...@googlegroups.com, Andrew Gerrand
You would use it safely by constraining the input or using ReadLine. ReadString and friends are convenience methods, and aren't for heavy lifting. If you're not scanning for a line, you'd might as well use Read() or ReadByte() or ReadRune(), depending.

Rory McGuire

unread,
Nov 1, 2012, 6:17:57 AM11/1/12
to Kevin Gillette, golan...@googlegroups.com, Andrew Gerrand
I'm saying that if the standard library wants to be used for servers surely it should by default do the right thing
for something as simple as reading a line, even bradfitz on github used bufio.ReadSlice with his smtp server.
Anyone making a server HTTP, SMTP, etc which use lines for part of the input would likely use the standard
library functions, and in so doing make their servers vulnerable.

I think the simplest solution that gets everyone thinking about it is to make the max line length an argument.
No confusion then. The best solution is that ReadBytes takes a slice as an argument ReadBytes(make([]byte, 0, 1024), "\r\n")
 because that would have no hidden allocation.

Apologies for hijacking the thread.

:D Cheers,


Rory McGuire
ClearFormat - Research and Development
UK : 44 870 224 0424
USA : 1 877 842 6286
RSA : 27 21 466 9400
Email: rmcg...@clearformat.com
Website: www.clearformat.com

--
 
 

Peter Kleiweg

unread,
Nov 1, 2012, 7:02:55 AM11/1/12
to golang-nuts
On 1 nov, 00:23, Peter Kleiweg <pklei...@xs4all.nl> wrote:

> Sure I can write this function myself, but I think it is a basic
> enough functionality to belong in the standard library.

I wrote it:

https://github.com/pebbe/util/blob/master/readline.go

It may need some more testing.

yy

unread,
Nov 1, 2012, 7:04:39 AM11/1/12
to Andrew Gerrand, Rory McGuire, golang-nuts



On 1 November 2012 10:20, Andrew Gerrand <a...@golang.org> wrote:

> It's a shame that ReadLine is named ReadLine, because ReadLine would be a great name for the suggested function.
>

Actually, I'd prefer:

func (b *Reader) ReadLineBytes() (line []byte, err error)


--
- yiyus || JGL .

Peter Kleiweg

unread,
Nov 1, 2012, 7:05:26 AM11/1/12
to golang-nuts
On 1 nov, 10:21, Andrew Gerrand <a...@golang.org> wrote:

> It's a shame that ReadLine is named ReadLine, because ReadLine would be a
> great name for the suggested function.

ReadLine should be called ReadUnixOrDOSLine, because it doesn't work
with Mac lines.

Andrew Gerrand

unread,
Nov 1, 2012, 7:06:36 AM11/1/12
to Rory McGuire, golang-nuts
On 1 November 2012 20:54, Rory McGuire <rjmc...@gmail.com> wrote:
:D good point, you win.

I think there should at least be some limit, in those functions to though.

I doubt a valid line will ever be longer than 10000 runes. I use 1.0.3 and there is no limit in ReadBytes or ReadSlice.

How would one use ReadBytes safely esp. in a server that relies on it (such as telnet)?

One possible approach is to wrap the reader with an io.LimitedReader before passing it to bufio.NewReader.


And then you can re-set the N field after each read.

Or just use ReadLine.

Andrew 

Andrew Gerrand

unread,
Nov 1, 2012, 7:09:38 AM11/1/12
to Peter Kleiweg, golang-nuts
MacOS 10 is Unix. Do you mean earlier versions? Does Go run on those systems at all?

Peter Kleiweg

unread,
Nov 1, 2012, 7:30:53 AM11/1/12
to golang-nuts
On 1 nov, 12:10, Andrew Gerrand <a...@golang.org> wrote:
> On 1 November 2012 22:05, Peter Kleiweg <pklei...@xs4all.nl> wrote:
>
> > On 1 nov, 10:21, Andrew Gerrand <a...@golang.org> wrote:
>
> > > It's a shame that ReadLine is named ReadLine, because ReadLine would be a
> > > great name for the suggested function.
>
> > ReadLine should be called ReadUnixOrDOSLine, because it doesn't work
> > with Mac lines.
>
> MacOS 10 is Unix. Do you mean earlier versions? Does Go run on those
> systems at all?

I am talking about file formats. I am working on Linux, with text
files that are generated on Linux, Windows, Mac. They all have
different line-endings. Mac line-ending is a single \r , without the
\n.

bufio.ReadLine() only handles \n and \r\n.

Andrew Gerrand

unread,
Nov 1, 2012, 7:46:56 AM11/1/12
to Peter Kleiweg, golang-nuts
What Mac file formats use \r as line endings? Which programs generate them? 

I just created a file in TextEdit on my Mac OS X system and it appears to use \n for line breaks:

% hexdump -C test.txt 
00000000  74 65 73 74 0a 74 65 73  74 0a 74 65 73 74 0a     |test.test.test.|
0000000f

Andrew



--



Andrew Gerrand

unread,
Nov 1, 2012, 7:51:28 AM11/1/12
to Peter Weinberger (温博格), Peter Kleiweg, golang-nuts
I know that OS9 uses \r, but I am wondering what file formats Peter K is interested in reading.

I think it's okay that Go's ReadLine doesn't support file formats generated by an operating system that has been obsolete for 10 years.


On 1 November 2012 22:48, Peter Weinberger (温博格) <p...@google.com> wrote:
OS9
> --
>
>

Nate Finch

unread,
Nov 1, 2012, 7:54:50 AM11/1/12
to golan...@googlegroups.com, Peter Kleiweg
Pre-OSX Macs did that. That's the "Mac File Format". OSX uses the "Unix file format". Yes it's confusing. If you have to support it, it's an extra headache, because otherwise, ending a line with \n hits both windows \r\n and unix/linux \n, and you're just done.

Thomas Kappler

unread,
Nov 1, 2012, 8:32:05 AM11/1/12
to golan...@googlegroups.com, Peter Kleiweg

On Thursday, November 1, 2012 12:54:50 PM UTC+1, Nate Finch wrote:
Pre-OSX Macs did that. That's the "Mac File Format". OSX uses the "Unix file format". Yes it's confusing. If you have to support it, it's an extra headache, because otherwise, ending a line with \n hits both windows \r\n and unix/linux \n, and you're just done.

For further processing you also need to strip the \r in case of \r\n. It's not rocket-science, but a bit more boilerplate code than I'd want to paste into every program dealing with lines.
 

Matt Kane's Brain

unread,
Nov 1, 2012, 8:48:37 AM11/1/12
to Andrew Gerrand, Peter Kleiweg, golang-nuts
On Thu, Nov 1, 2012 at 7:46 AM, Andrew Gerrand <a...@golang.org> wrote:
> What Mac file formats use \r as line endings? Which programs generate them?

If you export a CSV or other text file from Microsoft Excel, it has \r
instead of \n for line endings. Unfortunate!

--
matt kane's brain
http://hydrogenproject.com

Patrick Mylund Nielsen

unread,
Nov 1, 2012, 8:57:48 AM11/1/12
to Andrew Gerrand, Peter Weinberger (温博格), Peter Kleiweg, golang-nuts
I don't use a Mac, so I can't say exactly, but I still deal with this problem regularly. IIRC the default text editor(s) (TextEdit?) let you save "Mac" or "UNIX" text, or with "Mac" or "UNIX" linebreaks, and the linebreak character becomes \r. The frequency at which I face this problem would suggest that it's the default in at least some of those editors, and Office--it's 50/50 when I'm dealing with user data from Macs. I'll try to find out exactly which programs the next time I see it.

So far my fix has been something quick and ugly like perl -p -i -e "s/\r/\n/g" foo.csv


--
 
 

Andrew Gerrand

unread,
Nov 1, 2012, 8:57:29 AM11/1/12
to Peter Kleiweg, golang-nuts
You could always write a converting reader that swaps lone \r bytes with \n, and then continue to use ReadLine:



On 1 November 2012 22:30, Peter Kleiweg <pkle...@xs4all.nl> wrote:

--



Patrick Mylund Nielsen

unread,
Nov 1, 2012, 9:00:58 AM11/1/12
to Andrew Gerrand, Peter Kleiweg, golang-nuts
or throw your arms in the air and refuse to process anything but lines with \n in your programs. It's 2013 soon -- it's ridiculous and sad that we're still dealing with this problem.


--
 
 

Larry Clapp

unread,
Nov 1, 2012, 9:17:48 AM11/1/12
to golan...@googlegroups.com, Peter Kleiweg
Just as an aside, this is an awesome feature of Go that you can chain Readers this way.  The more I use Go, the more I see how it embraces (what I think of as) the Unix philosophy: write small things and chain them together (e.g. pipes and such in the shell).  You can do this via Readers & Writers, or channels, or probably lots of other things.

I guess this is really the CSP philosophy, not so much Unix, as such.

Regardless, it's cool.  Kudos to the stdlib authors for enabling it.

-- Larry

Ryan Tarpine

unread,
Nov 1, 2012, 9:47:51 AM11/1/12
to golan...@googlegroups.com, Peter Kleiweg
This is an example of the decorator pattern (http://en.wikipedia.org/wiki/Decorator_pattern) and this exact feature has been present in Java since 1.0 back in '96 :-)

-Ryan

Paulo Pinto

unread,
Nov 1, 2012, 10:08:38 AM11/1/12
to golang-nuts
Excel showing its Mac roots! :)

On Nov 1, 1:49 pm, "Matt Kane's Brain" <mkb-pr...@hydrogenproject.com>
wrote:

Larry Clapp

unread,
Nov 1, 2012, 10:21:59 AM11/1/12
to golan...@googlegroups.com, Peter Kleiweg
Good point.  It's nifty there, too.  :)

Peter Kleiweg

unread,
Nov 1, 2012, 10:51:42 AM11/1/12
to golang-nuts
On 1 nov, 13:58, Andrew Gerrand <a...@golang.org> wrote:
> You could always write a converting reader that swaps lone \r bytes with
> \n, and then continue to use ReadLine:
>
> http://play.golang.org/p/-aYmMfYKnZ

That is not correct. When r.r.Read(b) gives an error, you should still
process the bytes from that call.

Peter Weinberger (温博格)

unread,
Nov 1, 2012, 7:48:36 AM11/1/12
to Andrew Gerrand, Peter Kleiweg, golang-nuts
OS9
> --
>
>

Rory McGuire

unread,
Nov 1, 2012, 9:11:31 AM11/1/12
to Patrick Mylund Nielsen, Andrew Gerrand, Peter Kleiweg, golang-nuts
+1


Rory McGuire
ClearFormat - Research and Development
UK : 44 870 224 0424
USA : 1 877 842 6286
RSA : 27 21 466 9400
Email: rmcg...@clearformat.com
Website: www.clearformat.com

--
 
 

Rob Pike

unread,
Nov 1, 2012, 12:39:50 PM11/1/12
to golan...@googlegroups.com
The right answer here requires more thought. For instance, it's
possible that a better design is a more general approach with a
specialization to deal with lines. The problem space is large and so
is the design space. The proposal here addresses the immediate problem
but not the full one.

I've been thinking about this area for a while and it's harder than it
looks to get right.

-rob

Peter Kleiweg

unread,
Nov 1, 2012, 1:00:04 PM11/1/12
to golang-nuts
Can you tell more about the full problem?

The thing I can think of is handling text encoded in UTF-16. My
implementation ignores that.

darkgray

unread,
Nov 1, 2012, 1:26:50 PM11/1/12
to golan...@googlegroups.com
A simple way of doing it (which sometimes improves performance as well) is to just make a buffer that's big enough to fit any (reasonably) sized line:

    r := bufio.NewReaderSize(file, 1<<20)
    for {
        line, _, err := r.ReadLine()

Admittedly I don't know how it keeps filling the buffer, but I made a test run through an Apache log file of 27MB and despite the bizarrely long spam/hack requests, it never ran across a prefix, so it seems safe enough to ignore the damned thing.

I'd be happy if you proceed to tell me how wrong this approach is, since it helps me learn. :)


On Thursday, 1 November 2012 00:23:43 UTC+1, Peter Kleiweg wrote:
There is bufio.ReadLine(), but that will only give you a partial line
if the line is very long. And it doesn't work with Mac line-endings,
because it only checks for Unix and DOS line-endings.

The doc for bufio.ReadLine() says most people should just use
bufio.ReadString('\n') instead, but that only works with Unix line-
endings.

What I would want is a function that returns a single line, all line-
endings removed, either \n, \r, or \r\n (even \n\r ?), and complete
lines, not that stuff with prefixes. And it should return the last
line without error if it has non-zero length, and is missing a line-
ending.

Erwin

unread,
Nov 1, 2012, 2:54:56 PM11/1/12
to Andrew Gerrand, Peter Kleiweg, golang-nuts
For some fun/misery with EOL, try reading the lines of a couple of .pdf files. Chances are you'll encounter some that  use \n, \r, and \r\n in the same file!

Rory McGuire

unread,
Nov 1, 2012, 3:29:05 PM11/1/12
to golang-nuts

Read the code in bufio.go. To me it looks like ReadLine uses ReadSlice which appears (quick look) to use the defaultBufSize as the buffer size so I think you would get a prefix if your line was longer than 4096 bytes (1 <<20 bytes in your example). To test you could set your buffer size to 16(minReadBufferSize).

ReadLine and ReadSlice should be safe then, and ReadSlice does what I wanted.

ReadBytes allocates its own memory and repeatedly calls ReadSlice so it's not safe, same for ReadString because it uses ReadBytes. Neither of them have memory limits so you can't use them with unchecked user input, as Andrew said you would have to use a LimitedReader except not how the standard http implementation use it because it has a LimitedReader set to int64(1<<63)

>> --
>>  
>>  

Peter Kleiweg

unread,
Nov 1, 2012, 3:36:43 PM11/1/12
to golang-nuts
The Go compiler can't handle source code with Mac line-endings:

go build test.go
can't load package: package :
test.go:1:15: expected ';', found 'import'

Python can.

****

Here is some info on different types of line-endings:

http://en.wikipedia.org/wiki/Newline#

Dan Cross

unread,
Nov 1, 2012, 3:38:41 PM11/1/12
to Peter Kleiweg, golang-nuts
To my knowledge it won't work with EBCDIC, Baudot or the 6-bit
character set of the UNISYS 1100 series machines without re-encoding,
either.

It strikes me that it is reasonable for the Go standard library to not
support something that is a decade out of date. Go makes it fairly
easy to solve this problem with a wrapper, and that seems sufficient
for applications seeking to process obsolete file formats.

- Dan C.

Dustin

unread,
Nov 1, 2012, 3:40:10 PM11/1/12
to golan...@googlegroups.com

On Thursday, November 1, 2012 12:36:48 PM UTC-7, Peter Kleiweg wrote:
The Go compiler can't handle source code with Mac line-endings:

    go build test.go
    can't load package: package :
    test.go:1:15: expected ';', found 'import'

Python can.

  Software that produced mac line endings had no need to exist after the creation of go.  Considering the number of years it's been since I've seen a file with a mac line ending on my mac, I can't imagine it'd be a priority for the compiler to support such a thing. 

Dan Cross

unread,
Nov 1, 2012, 3:40:18 PM11/1/12
to Peter Kleiweg, golang-nuts
This is a little silly.

What application are you using on a platform that supports Go to write
Go programs that saves text using MacOs 9-style line endings?

- Dan C.

spc

unread,
Nov 1, 2012, 3:48:53 PM11/1/12
to golan...@googlegroups.com

May I suggest you try the syscall.Mmap function to map the entire file into memory,
and scan for "\n", "\r", "\r\n". 

The only real limit is 2^32 bytes that mapped into memory.
I have done the same thing recently in a go program that reads in lots of text files
and split them up into lines (with dos, unix, mac format).

I generally always using mmap (in C/C++) for reading in files, unless I know the file is small
(< 8k for example). It's faster and easier and you get the whole file into a byte array.

If you are using Windows OS, use function is syscall.MapViewOfFile instead.

Regards,
spc.

Peter Kleiweg

unread,
Nov 1, 2012, 3:49:38 PM11/1/12
to golang-nuts
On 1 nov, 20:39, Dan Cross <cro...@gmail.com> wrote:
> On Thu, Nov 1, 2012 at 7:05 AM, Peter Kleiweg <pklei...@xs4all.nl> wrote:

> > ReadLine should be called ReadUnixOrDOSLine, because it doesn't work
> > with Mac lines.
>
> To my knowledge it won't work with EBCDIC, Baudot or the 6-bit
> character set of the UNISYS 1100 series machines without re-encoding,
> either.

I've never come across anything like those. Mac files: regularly.

Peter Kleiweg

unread,
Nov 1, 2012, 3:51:23 PM11/1/12
to golang-nuts
On 1 nov, 20:40, Dan Cross <cro...@gmail.com> wrote:
> On Thu, Nov 1, 2012 at 3:36 PM, Peter Kleiweg <pklei...@xs4all.nl> wrote:
> > The Go compiler can't handle source code with Mac line-endings:
>
> >     go build test.go
> >     can't load package: package :
> >     test.go:1:15: expected ';', found 'import'
>
> > Python can.
>
> > ****
>
> > Here is some info on different types of line-endings:
>
> >    http://en.wikipedia.org/wiki/Newline#
>
> This is a little silly.
>
> What application are you using on a platform that supports Go to write
> Go programs that saves text using MacOs 9-style line endings?

I know plenty of Unix people who still use 'ed'. I don't know what Mac
people use. I don't know about Mac people.

Dan Cross

unread,
Nov 1, 2012, 3:59:45 PM11/1/12
to Peter Kleiweg, golang-nuts
Really? In what context? I don't think I've seen any since Mac OS 9
stopped being a going concern. Where I did, I trivially converted
them to the Unix convention.

Put another way: what percentage of text files likely to be read by
something that calls ReadLine use the Mac OS 9 convention? Is it
really worth it to support an obsolete line convention? What other
obsolete data formats should the Go standard library support? This
doesn't make a strong case for inclusion in a standard library.

The fact is that these files are either legacy data or produced by
legacy tools; as such, they are more than likely a tiny percentage of
data likely to processed by tools written in Go. More reasonable
alternatives exist: convert to a supported line convention, or put a
wrapper in your code.

- Dan C.

Dan Cross

unread,
Nov 1, 2012, 4:00:06 PM11/1/12
to Peter Kleiweg, golang-nuts
That's irrelevant.

> I don't know about Mac people.

Current Mac software produces text files using the Unix line-ending convention.

Mac OS 9 produced what you are calling "mac files," but that was
deprecated ten years ago (read: development of Mac OS 9 stopped in
2002). The Go suite does not support Mac OS 9, to my knowledge, so
whether the Go compiler can read files that use the Mac OS <X line
ending convention doesn't seem relevant to me at all. That Python can
is also irrelevant: it's an older language that did, I believe, have a
port to Mac OS prior to OS X.

- Dan C.

Job van der Zwan

unread,
Nov 1, 2012, 4:56:14 PM11/1/12
to golan...@googlegroups.com, Peter Kleiweg
I'm sure it's a legacy thing most people never have to deal with in a serious way, but considering Peter Kleiweg works at the department of Humanties Computing in Groningen, I'd expect very old textfiles would be quite a common thing for him to work with.

Dan Cross

unread,
Nov 1, 2012, 7:19:16 PM11/1/12
to Job van der Zwan, golan...@googlegroups.com, Peter Kleiweg
On Thu, Nov 1, 2012 at 4:56 PM, Job van der Zwan
<j.l.van...@gmail.com> wrote:
> On Thursday, 1 November 2012 21:00:38 UTC+1, Dan Cross wrote:
>> Current Mac software produces text files using the Unix line-ending
>> convention.
>>
>> Mac OS 9 produced what you are calling "mac files," but that was
>> deprecated ten years ago (read: development of Mac OS 9 stopped in
>> 2002). The Go suite does not support Mac OS 9, to my knowledge, so
>> whether the Go compiler can read files that use the Mac OS <X line
>> ending convention doesn't seem relevant to me at all. That Python can
>> is also irrelevant: it's an older language that did, I believe, have a
>> port to Mac OS prior to OS X.
>
> I'm sure it's a legacy thing most people never have to deal with in a
> serious way, but considering Peter Kleiweg works at the department of
> Humanties Computing in Groningen, I'd expect very old textfiles would be
> quite a common thing for him to work with.

I'm sure he has a need for it; I'm not doubting his veracity. But an
individual having a need for such a thing does not imply that it
should be added to the standard library. Similarly, the standard
library not supporting a convention not habitually used by any current
systems not does imply that it is somehow broken or lacking proper
functionality.

Particularly when simple workarounds exist: chaining readers, tr
'\015' '\012', etc.

- Dan C.

Thomas Bushnell, BSG

unread,
Nov 1, 2012, 7:24:30 PM11/1/12
to darkgray, golan...@googlegroups.com

I think that's quite foolish. If you want a function that reads an arbitrary amount, then have no limit. If you want a limit, have the caller specify it. What you think is obviously big enough will be insufficient for some case years from now that nobody has thought of.

Thomas

--
 
 

Jens Alfke

unread,
Nov 1, 2012, 7:58:19 PM11/1/12
to golan...@googlegroups.com, Peter Kleiweg
I agree that it would be silly for Go to add special support for the highly-obsolete Classic-Mac-OS line break. (Even calling it "Mac line breaks" or "Mac format" is misleading. This format was pretty well stamped out years ago. Maybe people dealing with very old files, or very old Mac apps, still see these sometimes, but that's a special case and it's not hard to preprocess the file.)

There are some very real problems trying to support all 3 flavors. I used to work in the Java group at Apple in the late '90s, and saw several developers run into a nasty deadlock problem reading lines from network streams. The Java ReadLine method would support all styles of line breaks. This meant that when it received a CR it would read the next byte to check whether it was a LF, and if so skip it. Unfortunately if it was reading from a socket and the last thing the peer sent was a CR, because the peer was sending Old-Mac-format lines, then it would block waiting for another byte; whereas the peer had finished sending its line and was waiting for a response. The result was deadlock. (This was fixed in the JDK by being smarter about the LF, not trying to read it immediately. I'm just pointing this out as a pitfall one can run into trying to parse line breaks.)

--Jens

Peter Kleiweg

unread,
Nov 2, 2012, 6:38:23 AM11/2/12
to golang-nuts
On 2 nov, 00:58, Jens Alfke <j...@mooseyard.com> wrote:
> I agree that it would be silly for Go to add special support for the
> highly-obsolete Classic-Mac-OS line break.

However highly-obsolete, it is still around. Reading files means
reading something that already exists. It may be a document created
moments ago, or years ago. It may be a document created moments ago by
a program that pastes in document fragments created years ago. (I see
this in PostScript documents: parts in the same file with different
line-endings.) In many real applications, you need to be able to deal
with the past.

You need general support, not special support.

Kevin Gillette

unread,
Nov 2, 2012, 9:59:52 AM11/2/12
to golan...@googlegroups.com
On Thursday, November 1, 2012 1:36:48 PM UTC-6, Peter Kleiweg wrote:
The Go compiler can't handle source code with Mac line-endings
Python can.

Python was around well before Mac OS X existed, and Python certainly did (if not still does) support OS 9. Also, python on OS 9 supports unix line endings for source files.

I think it's rather ridiculous that the entire ecosystem hasn't reduced to using \n for source code files. Any _decent_ windows editor (even if it's merely notepad without the arbitrary patheticisms) supports unix line endings. Apple made it's choice. Windows of course has a legacy burden, but sane "text" files should not be using \n for some non-eol purpose when the line endings are \r\n or \r anyway, so MS could certainly change their "rt" file mode handling to consider \r\n and \n both line endings, and phase out \r\n in "wt" file modes a few years later.

Jens Alfke

unread,
Nov 2, 2012, 12:02:51 PM11/2/12
to Peter Kleiweg, golang-nuts

On Nov 2, 2012, at 3:38 AM, Peter Kleiweg <pkle...@xs4all.nl> wrote:

However highly-obsolete, it is still around. Reading files means
reading something that already exists.

To use an analogy: There are thousands of old graphics file formats floating around too, and it’s not hard to find images saved in them, but that doesn’t mean that every platform’s built-in image class/library knows how to read them all. Generally you just expect support for the ones in wide use like JPEG/PNG/GIF. If you’re writing code that needs to be able to open Targa or PICT or Xerox Alto bitmaps, you go find an external library that supports them and link with that.

The same is true IMHO of reading text files. The bog-standard built-in ReadLine function should be expected to handle formats in current use, i.e. Unix and Windows line breaks, and 8-bit and UTF-16 encodings. Anything more exotic, like Mac-classic line breaks or Shift-JIS encoding, can be left to 3rd party packages.

If you still really, really disagree, then I think it would be more productive for you to write an industrial-strength new ReadLine function (plus lots of unit tests, including ones for the deadlock I wrote about!) and submit a patch to the Go dev team.

—Jens

agl

unread,
Nov 2, 2012, 2:44:17 PM11/2/12
to golan...@googlegroups.com
Crumbs, this is developing into quite a thread :)

As the fool responsible for ReadLine, I'd like to explain why I wrote it the way that I did.

I didn't support Mac line endings because, honestly, I've not seen anything use them in a very long time. I have a Mac and I still haven't seen them used.

It only supports limited line lengths because I don't want programs that subtly explode and thrash the page file when fed invalid input. I figured that it was fairly easy to build an allocating version of ReadLine from the limited one, and the isPrefix flag also allows code that doesn't want to handle huge lines to resync. (In hindsight, perhaps I should have left the name ReadLine for the allocating version.) But it's tough to build the robust version from the allocating version so I wrote the primitive first.

I've used ReadLine in a bunch of programs now and it's always worked just fine. I don't think it was a terrible design, even if it's not perfect.

But if people want to handle UTF-16 line breaks, Mac line breaks, etc I suspect that's a little much for bufio. I'd suggest that might be a separate package, building on bufio.


Cheers

AGL

Peter Kleiweg

unread,
Nov 2, 2012, 3:01:46 PM11/2/12
to golang-nuts
On 2 nov, 17:02, Jens Alfke <j...@mooseyard.com> wrote:

> To use an analogy: There are thousands of old graphics file formats floating around too, and it’s not hard to find images saved in them, but that doesn’t mean that every platform’s built-in image class/library knows how to read them all. Generally you just expect support for the ones in wide use like JPEG/PNG/GIF. If you’re writing code that needs to be able to open Targa or PICT or Xerox Alto bitmaps, you go find an external library that supports them and link with that.
>
> The same is true IMHO of reading text files. The bog-standard built-in ReadLine function should be expected to handle formats in current use, i.e. Unix and Windows line breaks, and 8-bit and UTF-16 encodings. Anything more exotic, like Mac-classic line breaks or Shift-JIS encoding, can be left to 3rd party packages.

What is classic is a matter of perspective. I can say, we don't need
to bother with inches and feet and yards and miles, we abandoned those
in the dark ages, everything should be metric by now. Well, it isn't.

An importent point is that line-end codings is not something that
separates one document from another like EBCDIC from ASCII from Shift-
JIS. The latter are very different beasts. But Unix, DOS or Mac ASCII
text files, they can and do live in the same environment, usually
without the user noticing the difference.

But this is just part of the issue of why I started this thread.

I have come to expect from modern programming languages to have line
processing build in, or at least part of the standard library. See
Perl, Python. Go only offers a partial solution. It's like that age
old C functions, fgets(), you can get by by using a buffer that you
think is big enough. Go is a modern languages where you don't have to
bother about those C things like allocating and freeing memory, where
you can just concatenate two strings together without having to worry
about available buffer size. Why then is there no function that simply
gives me a single line of text when I want to?

There are other things I am missing in the standard library. Like
handling different text encodings. Yes, EBCDIC and Shift-JIS, all
those strange beasts, to me, that's standard toolkit stuff. But here
we are at a point where there is no clear answer to what should be in
the standard library. It's a matter of perspective, from a certain
point the choices become arbitrary. Why are there all those crypto
packages, but no OAuth functionality? Why no library for building a
GUI (even something primitive, like tK). Why yacc, but not (f)lex?

agl

unread,
Nov 2, 2012, 3:17:47 PM11/2/12
to golan...@googlegroups.com
On Friday, November 2, 2012 3:01:53 PM UTC-4, Peter Kleiweg wrote:
An importent point is that line-end codings is not something that
separates one document from another like EBCDIC from ASCII from Shift-
JIS. The latter are very different beasts. But Unix, DOS or Mac ASCII
text files, they can and do live in the same environment, usually
without the user noticing the difference.

My reply was not to suggest that we shouldn't handle Mac line endings, just to explain why we don't. Had I known that Mac line endings weren't dead, I would have supported them. (And sighed, but there you go.)

It's not clear to me whether adding support for Mac line endings in the stable releases is too great a semantic change.

But since people seem to have numerous desiderata concerning line endings, I suspect it would be a chat with Rob and a new package.

As for "why do we have $x, not $y?". Because we needed $x, and not yet $y. I'm sure we write different sorts of programs and the contents of the Go standard library reflect the people who wrote it and their backgrounds. I think that's inevitable and it doesn't suggest that $x is universally important, but $y isn't. We where have a hole, the subrepos can fill in.


Cheers

AGL

Peter Kleiweg

unread,
Nov 2, 2012, 3:34:22 PM11/2/12
to golang-nuts
On 2 nov, 19:44, agl <a...@golang.org> wrote:

> But if people want to handle UTF-16 line breaks, Mac line breaks, etc I
> suspect that's a little much for bufio. I'd suggest that might be a
> separate package, building on bufio.

There are some functions in bufio that go part of the way.
ReadBytes(), ReadString(). They are limited in allowing only a single
byte to be used as a delimiter. A more general function would allow
sets of bytes or strings, for instance ["a", "b", "cd"]. Or even a
regular expression as a delimiter.

Or perhaps a more generalized stream library. What about bytes.Buffer,
can you keep writing to it indefinately, if you read from it also?

Thomas Bushnell, BSG

unread,
Nov 2, 2012, 5:37:56 PM11/2/12
to Peter Kleiweg, golang-nuts

I think the point is that "line" is a vague and complex thing, capable of too much dangerous oversimplification. Better in this area to specify what line terminator you want and use that, better to think about what kind of allocation you want and use that.

--


Peter Kleiweg

unread,
Nov 2, 2012, 6:16:01 PM11/2/12
to golang-nuts
On 2 nov, 20:17, agl <a...@golang.org> wrote:

> As for "why do we have $x, not $y?". Because we needed $x, and not yet $y.
> I'm sure we write different sorts of programs and the contents of the Go
> standard library reflect the people who wrote it and their backgrounds. I
> think that's inevitable and it doesn't suggest that $x is universally
> important, but $y isn't. We where have a hole, the subrepos can fill in.

I completely understand. There is no such thing as the perfect
standard library.

Maxim Khitrov

unread,
Feb 19, 2013, 8:24:05 AM2/19/13
to ayng...@gmail.com, golan...@googlegroups.com
On Tue, Feb 19, 2013 at 5:46 AM, <ayng...@gmail.com> wrote:
> It's been said before in this thread... the current, MODERN, Microsoft Excel
> uses the old mac line endings if you ask it to export a CSV. It is still not
> a thing of the past, even if I wish it was.

Not exactly. Excel 2010 gives you three "Save as type" options:

1. CSV (Comma delimited) (*.csv)
2. CSV (MS-DOS) (*.csv)
3. CSV (Macintosh) (*.csv)

Only the third uses mac line endings. The first two use CRLF (no idea
what the difference between 1 & 2 is). Unfortunately, some people pick
the first "CSV" option that catches their eye, so yes, you might end
up with CR line endings from Excel.

- Max

Nate Finch

unread,
Feb 19, 2013, 10:36:04 AM2/19/13
to golan...@googlegroups.com, ayng...@gmail.com
Note this thread: https://groups.google.com/d/msg/golang-nuts/QiR85Z1W6fc/870wtfwP2q0J

Sounds like something new may be coming down the pipeline... though it sounds like the default will still not include \r as a standalone line delimiter.  Also, I wasn't clear on when that code would make it into Go, since 1.1 is past feature freeze at this point.

minux

unread,
Feb 19, 2013, 11:55:46 AM2/19/13
to Nate Finch, golan...@googlegroups.com, ayng...@gmail.com
On Tue, Feb 19, 2013 at 11:36 PM, Nate Finch <nate....@gmail.com> wrote:
Note this thread: https://groups.google.com/d/msg/golang-nuts/QiR85Z1W6fc/870wtfwP2q0J

Sounds like something new may be coming down the pipeline... though it sounds like the default will still not include \r as a standalone line delimiter.  Also, I wasn't clear on when that code would make it into Go, since 1.1 is past feature freeze at this point.
Thankfully, it will, as the accompanying issue is marked Go1.1

Dan Kortschak

unread,
Apr 12, 2014, 9:32:26 PM4/12/14
to Peter Kleiweg, golang-nuts
The only place old-style mac line endings still exist is in MS Excel
documents AFAIK.
Reply all
Reply to author
Forward
0 new messages