There is bufio.ReadLine(), but that will only give you a partial line
if the line is very long. And it doesn't work with Mac line-endings,
because it only checks for Unix and DOS line-endings.
The doc for bufio.ReadLine() says most people should just use
bufio.ReadString('\n') instead, but that only works with Unix line-
endings.
What I would want is a function that returns a single line, all line-
endings removed, either \n, \r, or \r\n (even \n\r ?), and complete
lines, not that stuff with prefixes. And it should return the last
line without error if it has non-zero length, and is missing a line-
ending.
Sure I can write this function myself, but I think it is a basic
enough functionality to belong in the standard library.
On Thursday, 1 November 2012 01:23:43 UTC+2, Peter Kleiweg wrote:
> There is bufio.ReadLine(), but that will only give you a partial line > if the line is very long. And it doesn't work with Mac line-endings, > because it only checks for Unix and DOS line-endings.
> The doc for bufio.ReadLine() says most people should just use > bufio.ReadString('\n') instead, but that only works with Unix line- > endings.
> What I would want is a function that returns a single line, all line- > endings removed, either \n, \r, or \r\n (even \n\r ?), and complete > lines, not that stuff with prefixes. And it should return the last > line without error if it has non-zero length, and is missing a line- > ending.
> Sure I can write this function myself, but I think it is a basic > enough functionality to belong in the standard library.
> Which limits should this new ReadLine function have?
> The current built in solutions are the simplest and cleanest, and they
> make people that
> want to write their own version have to think about it.
> On Thursday, 1 November 2012 01:23:43 UTC+2, Peter Kleiweg wrote:
>> There is bufio.ReadLine(), but that will only give you a partial line
>> if the line is very long. And it doesn't work with Mac line-endings,
>> because it only checks for Unix and DOS line-endings.
>> The doc for bufio.ReadLine() says most people should just use
>> bufio.ReadString('\n') instead, but that only works with Unix line-
>> endings.
>> What I would want is a function that returns a single line, all line-
>> endings removed, either \n, \r, or \r\n (even \n\r ?), and complete
>> lines, not that stuff with prefixes. And it should return the last
>> line without error if it has non-zero length, and is missing a line-
>> ending.
>> Sure I can write this function myself, but I think it is a basic
>> enough functionality to belong in the standard library.
On Thu, Nov 1, 2012 at 11:03 AM, Andrew Gerrand <a...@golang.org> wrote:
> ReadString and ReadBytes don't have a limit. IIUC, Peter is asking for a
> ReadLine that doesn't have a limit.
> Andrew
> On 1 November 2012 19:51, Rory McGuire <rjmcgu...@gmail.com> wrote:
>> Which limits should this new ReadLine function have?
>> The current built in solutions are the simplest and cleanest, and they
>> make people that
>> want to write their own version have to think about it.
>> On Thursday, 1 November 2012 01:23:43 UTC+2, Peter Kleiweg wrote:
>>> There is bufio.ReadLine(), but that will only give you a partial line
>>> if the line is very long. And it doesn't work with Mac line-endings,
>>> because it only checks for Unix and DOS line-endings.
>>> The doc for bufio.ReadLine() says most people should just use
>>> bufio.ReadString('\n') instead, but that only works with Unix line-
>>> endings.
>>> What I would want is a function that returns a single line, all line-
>>> endings removed, either \n, \r, or \r\n (even \n\r ?), and complete
>>> lines, not that stuff with prefixes. And it should return the last
>>> line without error if it has non-zero length, and is missing a line-
>>> ending.
>>> Sure I can write this function myself, but I think it is a basic
>>> enough functionality to belong in the standard library.
> Exactly, so anyone that can supply this fictional reader with data can
> cause the OS to run out of memory.
> On Thu, Nov 1, 2012 at 11:03 AM, Andrew Gerrand <a...@golang.org> wrote:
>> ReadString and ReadBytes don't have a limit. IIUC, Peter is asking for a
>> ReadLine that doesn't have a limit.
>> Andrew
>> On 1 November 2012 19:51, Rory McGuire <rjmcgu...@gmail.com> wrote:
>>> Which limits should this new ReadLine function have?
>>> The current built in solutions are the simplest and cleanest, and they
>>> make people that
>>> want to write their own version have to think about it.
>>> On Thursday, 1 November 2012 01:23:43 UTC+2, Peter Kleiweg wrote:
>>>> There is bufio.ReadLine(), but that will only give you a partial line
>>>> if the line is very long. And it doesn't work with Mac line-endings,
>>>> because it only checks for Unix and DOS line-endings.
>>>> The doc for bufio.ReadLine() says most people should just use
>>>> bufio.ReadString('\n') instead, but that only works with Unix line-
>>>> endings.
>>>> What I would want is a function that returns a single line, all line-
>>>> endings removed, either \n, \r, or \r\n (even \n\r ?), and complete
>>>> lines, not that stuff with prefixes. And it should return the last
>>>> line without error if it has non-zero length, and is missing a line-
>>>> ending.
>>>> Sure I can write this function myself, but I think it is a basic
>>>> enough functionality to belong in the standard library.
On Thu, Nov 1, 2012 at 11:20 AM, Andrew Gerrand <a...@golang.org> wrote:
> Well, yes. But I don't see how that's different to ReadBytes or
> ReadString, which already exist.
> It's a shame that ReadLine is named ReadLine, because ReadLine would be a
> great name for the suggested function.
> Andrew
> On 1 November 2012 20:09, Rory McGuire <rjmcgu...@gmail.com> wrote:
>> Exactly, so anyone that can supply this fictional reader with data can
>> cause the OS to run out of memory.
>> On Thu, Nov 1, 2012 at 11:03 AM, Andrew Gerrand <a...@golang.org> wrote:
>>> ReadString and ReadBytes don't have a limit. IIUC, Peter is asking for a
>>> ReadLine that doesn't have a limit.
>>> Andrew
>>> On 1 November 2012 19:51, Rory McGuire <rjmcgu...@gmail.com> wrote:
>>>> Which limits should this new ReadLine function have?
>>>> The current built in solutions are the simplest and cleanest, and they
>>>> make people that
>>>> want to write their own version have to think about it.
>>>> On Thursday, 1 November 2012 01:23:43 UTC+2, Peter Kleiweg wrote:
>>>>> There is bufio.ReadLine(), but that will only give you a partial line
>>>>> if the line is very long. And it doesn't work with Mac line-endings,
>>>>> because it only checks for Unix and DOS line-endings.
>>>>> The doc for bufio.ReadLine() says most people should just use
>>>>> bufio.ReadString('\n') instead, but that only works with Unix line-
>>>>> endings.
>>>>> What I would want is a function that returns a single line, all line-
>>>>> endings removed, either \n, \r, or \r\n (even \n\r ?), and complete
>>>>> lines, not that stuff with prefixes. And it should return the last
>>>>> line without error if it has non-zero length, and is missing a line-
>>>>> ending.
>>>>> Sure I can write this function myself, but I think it is a basic
>>>>> enough functionality to belong in the standard library.
ReadLine is the only one of the four (including ReadBytes, ReadSlice, and ReadString) that is actually purposed for reading lines, and without application-specific information, it's impossible to know how long a line is expected to be, and in some applications, the line length is unknowable. ReadLine retains the flexibility to be useful in the worst input case (when you don't know the line length and it may be very long) or the best algorithmic case (when you can process the line in one pass, without needing to store it fully).
If an application has clear constraints on the input, such as a limited length line, or the need to buffer the entire line, then ReadBytes, ReadSlice, or ReadString might as well be used.
You would use it safely by constraining the input or using ReadLine. ReadString and friends are convenience methods, and aren't for heavy lifting. If you're not scanning for a line, you'd might as well use Read() or ReadByte() or ReadRune(), depending.
I'm saying that if the standard library wants to be used for servers surely
it should by default do the right thing
for something as simple as reading a line, even bradfitz on github used
bufio.ReadSlice with his smtp server.
Anyone making a server HTTP, SMTP, etc which use lines for part of the
input would likely use the standard
library functions, and in so doing make their servers vulnerable.
I think the simplest solution that gets everyone thinking about it is to
make the max line length an argument.
No confusion then. The best solution is that ReadBytes takes a slice as an
argument ReadBytes(make([]byte, 0, 1024), "\r\n")
because that would have no hidden allocation.
Apologies for hijacking the thread.
:D Cheers,
On Thu, Nov 1, 2012 at 12:05 PM, Kevin Gillette
<extemporalgen...@gmail.com>wrote:
> You would use it safely by constraining the input or using ReadLine.
> ReadString and friends are convenience methods, and aren't for heavy
> lifting. If you're not scanning for a line, you'd might as well use Read()
> or ReadByte() or ReadRune(), depending.
> On Thursday, November 1, 2012 3:54:52 AM UTC-6, Rory McGuire wrote:
>> :D good point, you win.
>> I think there should at least be some limit, in those functions to though.
>> I doubt a valid line will ever be longer than 10000 runes. I use 1.0.3
>> and there is no limit in ReadBytes or ReadSlice.
>> How would one use ReadBytes safely esp. in a server that relies on it
>> (such as telnet)?
On 1 nov, 12:10, Andrew Gerrand <a...@golang.org> wrote:
> On 1 November 2012 22:05, Peter Kleiweg <pklei...@xs4all.nl> wrote:
> > On 1 nov, 10:21, Andrew Gerrand <a...@golang.org> wrote:
> > > It's a shame that ReadLine is named ReadLine, because ReadLine would be a
> > > great name for the suggested function.
> > ReadLine should be called ReadUnixOrDOSLine, because it doesn't work
> > with Mac lines.
> MacOS 10 is Unix. Do you mean earlier versions? Does Go run on those
> systems at all?
I am talking about file formats. I am working on Linux, with text
files that are generated on Linux, Windows, Mac. They all have
different line-endings. Mac line-ending is a single \r , without the
\n.
> On 1 nov, 12:10, Andrew Gerrand <a...@golang.org> wrote:
> > On 1 November 2012 22:05, Peter Kleiweg <pklei...@xs4all.nl> wrote:
> > > On 1 nov, 10:21, Andrew Gerrand <a...@golang.org> wrote:
> > > > It's a shame that ReadLine is named ReadLine, because ReadLine would
> be a
> > > > great name for the suggested function.
> > > ReadLine should be called ReadUnixOrDOSLine, because it doesn't work
> > > with Mac lines.
> > MacOS 10 is Unix. Do you mean earlier versions? Does Go run on those
> > systems at all?
> I am talking about file formats. I am working on Linux, with text
> files that are generated on Linux, Windows, Mac. They all have
> different line-endings. Mac line-ending is a single \r , without the
> \n.
> On Thu, Nov 1, 2012 at 7:46 AM, Andrew Gerrand <a...@golang.org> wrote:
> > What Mac file formats use \r as line endings? Which programs generate
> them?
> > I just created a file in TextEdit on my Mac OS X system and it appears to
> > use \n for line breaks:
> > On 1 November 2012 22:30, Peter Kleiweg <pklei...@xs4all.nl> wrote:
> >> On 1 nov, 12:10, Andrew Gerrand <a...@golang.org> wrote:
> >> > On 1 November 2012 22:05, Peter Kleiweg <pklei...@xs4all.nl> wrote:
> >> > > On 1 nov, 10:21, Andrew Gerrand <a...@golang.org> wrote:
> >> > > > It's a shame that ReadLine is named ReadLine, because ReadLine
> would
> >> > > > be a
> >> > > > great name for the suggested function.
> >> > > ReadLine should be called ReadUnixOrDOSLine, because it doesn't work
> >> > > with Mac lines.
> >> > MacOS 10 is Unix. Do you mean earlier versions? Does Go run on those
> >> > systems at all?
> >> I am talking about file formats. I am working on Linux, with text
> >> files that are generated on Linux, Windows, Mac. They all have
> >> different line-endings. Mac line-ending is a single \r , without the
> >> \n.
Pre-OSX Macs did that. That's the "Mac File Format". OSX uses the "Unix file format". Yes it's confusing. If you have to support it, it's an extra headache, because otherwise, ending a line with \n hits both windows \r\n and unix/linux \n, and you're just done.
> On 1 November 2012 22:30, Peter Kleiweg <pkle...@xs4all.nl <javascript:>>wrote:
>> On 1 nov, 12:10, Andrew Gerrand <a...@golang.org> wrote: >> > On 1 November 2012 22:05, Peter Kleiweg <pklei...@xs4all.nl> wrote:
>> > > On 1 nov, 10:21, Andrew Gerrand <a...@golang.org> wrote:
>> > > > It's a shame that ReadLine is named ReadLine, because ReadLine >> would be a >> > > > great name for the suggested function.
>> > > ReadLine should be called ReadUnixOrDOSLine, because it doesn't work >> > > with Mac lines.
>> > MacOS 10 is Unix. Do you mean earlier versions? Does Go run on those >> > systems at all?
>> I am talking about file formats. I am working on Linux, with text >> files that are generated on Linux, Windows, Mac. They all have >> different line-endings. Mac line-ending is a single \r , without the >> \n.
On Thursday, November 1, 2012 12:54:50 PM UTC+1, Nate Finch wrote:
> Pre-OSX Macs did that. That's the "Mac File Format". OSX uses the "Unix > file format". Yes it's confusing. If you have to support it, it's an extra > headache, because otherwise, ending a line with \n hits both windows \r\n > and unix/linux \n, and you're just done.
For further processing you also need to strip the \r in case of \r\n. It's not rocket-science, but a bit more boilerplate code than I'd want to paste into every program dealing with lines.
I don't use a Mac, so I can't say exactly, but I still deal with this
problem regularly. IIRC the default text editor(s) (TextEdit?) let you save
"Mac" or "UNIX" text, or with "Mac" or "UNIX" linebreaks, and the linebreak
character becomes \r. The frequency at which I face this problem would
suggest that it's the default in at least some of those editors, and
Office--it's 50/50 when I'm dealing with user data from Macs. I'll try to
find out exactly which programs the next time I see it.
So far my fix has been something quick and ugly like perl -p -i -e
"s/\r/\n/g" foo.csv
On Thu, Nov 1, 2012 at 12:51 PM, Andrew Gerrand <a...@golang.org> wrote:
> I know that OS9 uses \r, but I am wondering what file formats Peter K is
> interested in reading.
> I think it's okay that Go's ReadLine doesn't support file formats
> generated by an operating system that has been obsolete for 10 years.
> On 1 November 2012 22:48, Peter Weinberger (温博格) <p...@google.com> wrote:
>> OS9
>> On Thu, Nov 1, 2012 at 7:46 AM, Andrew Gerrand <a...@golang.org> wrote:
>> > What Mac file formats use \r as line endings? Which programs generate
>> them?
>> > I just created a file in TextEdit on my Mac OS X system and it appears
>> to
>> > use \n for line breaks:
>> > On 1 November 2012 22:30, Peter Kleiweg <pklei...@xs4all.nl> wrote:
>> >> On 1 nov, 12:10, Andrew Gerrand <a...@golang.org> wrote:
>> >> > On 1 November 2012 22:05, Peter Kleiweg <pklei...@xs4all.nl> wrote:
>> >> > > On 1 nov, 10:21, Andrew Gerrand <a...@golang.org> wrote:
>> >> > > > It's a shame that ReadLine is named ReadLine, because ReadLine
>> would
>> >> > > > be a
>> >> > > > great name for the suggested function.
>> >> > > ReadLine should be called ReadUnixOrDOSLine, because it doesn't
>> work
>> >> > > with Mac lines.
>> >> > MacOS 10 is Unix. Do you mean earlier versions? Does Go run on those
>> >> > systems at all?
>> >> I am talking about file formats. I am working on Linux, with text
>> >> files that are generated on Linux, Windows, Mac. They all have
>> >> different line-endings. Mac line-ending is a single \r , without the
>> >> \n.
> On 1 nov, 12:10, Andrew Gerrand <a...@golang.org> wrote:
> > On 1 November 2012 22:05, Peter Kleiweg <pklei...@xs4all.nl> wrote:
> > > On 1 nov, 10:21, Andrew Gerrand <a...@golang.org> wrote:
> > > > It's a shame that ReadLine is named ReadLine, because ReadLine would
> be a
> > > > great name for the suggested function.
> > > ReadLine should be called ReadUnixOrDOSLine, because it doesn't work
> > > with Mac lines.
> > MacOS 10 is Unix. Do you mean earlier versions? Does Go run on those
> > systems at all?
> I am talking about file formats. I am working on Linux, with text
> files that are generated on Linux, Windows, Mac. They all have
> different line-endings. Mac line-ending is a single \r , without the
> \n.
or throw your arms in the air and refuse to process anything but lines with
\n in your programs. It's 2013 soon -- it's ridiculous and sad that we're
still dealing with this problem.
On Thu, Nov 1, 2012 at 1:57 PM, Andrew Gerrand <a...@golang.org> wrote:
> You could always write a converting reader that swaps lone \r bytes with
> \n, and then continue to use ReadLine:
> On 1 November 2012 22:30, Peter Kleiweg <pklei...@xs4all.nl> wrote:
>> On 1 nov, 12:10, Andrew Gerrand <a...@golang.org> wrote:
>> > On 1 November 2012 22:05, Peter Kleiweg <pklei...@xs4all.nl> wrote:
>> > > On 1 nov, 10:21, Andrew Gerrand <a...@golang.org> wrote:
>> > > > It's a shame that ReadLine is named ReadLine, because ReadLine
>> would be a
>> > > > great name for the suggested function.
>> > > ReadLine should be called ReadUnixOrDOSLine, because it doesn't work
>> > > with Mac lines.
>> > MacOS 10 is Unix. Do you mean earlier versions? Does Go run on those
>> > systems at all?
>> I am talking about file formats. I am working on Linux, with text
>> files that are generated on Linux, Windows, Mac. They all have
>> different line-endings. Mac line-ending is a single \r , without the
>> \n.
Just as an aside, this is an awesome feature of Go that you can chain Readers this way. The more I use Go, the more I see how it embraces (what I think of as) the Unix philosophy: write small things and chain them together (e.g. pipes and such in the shell). You can do this via Readers & Writers, or channels, or probably lots of other things.
I guess this is really the CSP philosophy, not so much Unix, as such.
Regardless, it's cool. Kudos to the stdlib authors for enabling it.
> On 1 November 2012 22:30, Peter Kleiweg <pkle...@xs4all.nl <javascript:>>wrote:
>> On 1 nov, 12:10, Andrew Gerrand <a...@golang.org> wrote: >> > On 1 November 2012 22:05, Peter Kleiweg <pklei...@xs4all.nl> wrote:
>> > > On 1 nov, 10:21, Andrew Gerrand <a...@golang.org> wrote:
>> > > > It's a shame that ReadLine is named ReadLine, because ReadLine >> would be a >> > > > great name for the suggested function.
>> > > ReadLine should be called ReadUnixOrDOSLine, because it doesn't work >> > > with Mac lines.
>> > MacOS 10 is Unix. Do you mean earlier versions? Does Go run on those >> > systems at all?
>> I am talking about file formats. I am working on Linux, with text >> files that are generated on Linux, Windows, Mac. They all have >> different line-endings. Mac line-ending is a single \r , without the >> \n.
On Thursday, November 1, 2012 9:17:48 AM UTC-4, Larry Clapp wrote:
> Just as an aside, this is an awesome feature of Go that you can chain > Readers this way. The more I use Go, the more I see how it embraces (what > I think of as) the Unix philosophy: write small things and chain them > together (e.g. pipes and such in the shell). You can do this via Readers & > Writers, or channels, or probably lots of other things.
> I guess this is really the CSP philosophy, not so much Unix, as such.
> Regardless, it's cool. Kudos to the stdlib authors for enabling it.
> -- Larry
> On Thursday, November 1, 2012 8:58:25 AM UTC-4, Andrew Gerrand wrote:
>> You could always write a converting reader that swaps lone \r bytes with >> \n, and then continue to use ReadLine:
>> On 1 November 2012 22:30, Peter Kleiweg <pkle...@xs4all.nl> wrote:
>>> On 1 nov, 12:10, Andrew Gerrand <a...@golang.org> wrote: >>> > On 1 November 2012 22:05, Peter Kleiweg <pklei...@xs4all.nl> wrote:
>>> > > On 1 nov, 10:21, Andrew Gerrand <a...@golang.org> wrote:
>>> > > > It's a shame that ReadLine is named ReadLine, because ReadLine >>> would be a >>> > > > great name for the suggested function.
>>> > > ReadLine should be called ReadUnixOrDOSLine, because it doesn't work >>> > > with Mac lines.
>>> > MacOS 10 is Unix. Do you mean earlier versions? Does Go run on those >>> > systems at all?
>>> I am talking about file formats. I am working on Linux, with text >>> files that are generated on Linux, Windows, Mac. They all have >>> different line-endings. Mac line-ending is a single \r , without the >>> \n.