Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Universal Newline in C++

16 views
Skip to first unread message

jehuga...@gmail.com

unread,
Apr 10, 2007, 10:32:12 PM4/10/07
to
Hello:

I have been programming in C++ for a while. I have always wondered
whether \n is a portable newline.

Is this true? or should I define a macro to help me out? For instance:

#ifdef UNIX
#define NEW_LINE "\n"
#elif WINDOWS
#define NEW_LINE "\r\n"
#elif MAC
#define NEW_LINE "\r"
#elif WEB
#define NEW_LINE "<br />"
#endif

Alternate suggestions very welcome.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Thomas Richter

unread,
Apr 11, 2007, 10:05:48 AM4/11/07
to
jehuga...@gmail.com wrote:
> Hello:
>
> I have been programming in C++ for a while. I have always wondered
> whether \n is a portable newline.

What do you imply by that? \n is portable in the sense that if you
open a file for textual input/output, then the new line sequence of
the operating system is automatically converted to \n on input, and
\n is automatically converted to the newline sequence of the operating
system.

> Is this true? or should I define a macro to help me out? For instance:
>
> #ifdef UNIX
> #define NEW_LINE "\n"
> #elif WINDOWS
> #define NEW_LINE "\r\n"
> #elif MAC
> #define NEW_LINE "\r"

Almost always not required, the IO calls of the STL/C library do that
for you.

> #elif WEB
> #define NEW_LINE "<br />"
> #endif

This is not a newline sequence that is used by any operating system. It
is the encoding of a paragraph end in HTML, which is a different thing.
C/C++ text I/O does not try to write "formatted" output. You need to do
this yourself.

So long,
Thomas

Pete Becker

unread,
Apr 11, 2007, 10:24:58 AM4/11/07
to
jehuga...@gmail.com wrote:
> Hello:
>
> I have been programming in C++ for a while. I have always wondered
> whether \n is a portable newline.
>

The character '\n' represents the end of a line. If you're doing
non-portable stuff like hacking around with the bytes in a file you've
got to worry about the details of how the end of a line is represented,
but that's a quagmire: some OS's don't have a character sequence that
marks the end of a line.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)

Anand Hariharan

unread,
Apr 12, 2007, 4:06:07 PM4/12/07
to
On Apr 11, 9:24 am, Pete Becker <p...@versatilecoding.com> wrote:

> jehugalea...@gmail.com wrote:
> > Hello:
>
> > I have been programming in C++ for a while. I have always wondered
> > whether \n is a portable newline.
>
> The character '\n' represents the end of a line.

.. for the target platform.

If you're doing
> non-portable stuff like hacking around with the bytes in a file you've
> got to worry about the details of how the end of a line is represented,
> but that's a quagmire: some OS's don't have a character sequence that
> marks the end of a line.
>

Take this very ordinary case:

I write a simple student assignment that needs to read a text file
line-by-line. I take the advise "Use escape sequences. It does the
right thing for you." and use \n and std::getline (which uses \n as
default argument IIRC). I compile it to a Windows binary.

Only, the graded gives a text file that was saved in a UNIX
environment. The result? Even if the input text file is only a
couple of hundred lines long, the executable causes sytem to choke (it
doubles memory allocation everytime it sees that the EOL is not
encountered, and it tries to read the entire file as one line).

Grade: F.

- Anand


--

Pete Becker

unread,
Apr 13, 2007, 4:44:34 AM4/13/07
to
Anand Hariharan wrote:
> On Apr 11, 9:24 am, Pete Becker <p...@versatilecoding.com> wrote:
>> jehugalea...@gmail.com wrote:
>>> Hello:
>>> I have been programming in C++ for a while. I have always wondered
>>> whether \n is a portable newline.
>> The character '\n' represents the end of a line.
>
> .. for the target platform.
>

No, it represents the end of a line in C and C++ strings.

> If you're doing
>> non-portable stuff like hacking around with the bytes in a file you've
>> got to worry about the details of how the end of a line is represented,
>> but that's a quagmire: some OS's don't have a character sequence that
>> marks the end of a line.
>>
>
> Take this very ordinary case:
>
> I write a simple student assignment that needs to read a text file
> line-by-line. I take the advise "Use escape sequences. It does the
> right thing for you." and use \n and std::getline (which uses \n as
> default argument IIRC). I compile it to a Windows binary.
>
> Only, the graded gives a text file that was saved in a UNIX
> environment. The result? Even if the input text file is only a
> couple of hundred lines long, the executable causes sytem to choke (it
> doubles memory allocation everytime it sees that the EOL is not
> encountered, and it tries to read the entire file as one line).

That's because the input file did not respect the system's line-ending
conventions. The program is correct. The data isn't a valid text file
for Windows.

>
> Grade: F.
>

Yup. F for the instructor, who didn't provide valid input data.

When you transfer text files between systems you have to follow the
target system's conventions. If you screw up the line endings, there are
a bunch of programs that won't like your input file. Try running GNU
make under Unix with a file that uses Windows line endings.

That's why ftp has ASCII mode: it converts line endings as needed. Or if
you use zip, the -a option gets line endings right.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Timo Geusch

unread,
Apr 13, 2007, 4:44:29 AM4/13/07
to
jehuga...@gmail.com wrote:

> Hello:
>
> I have been programming in C++ for a while. I have always wondered
> whether \n is a portable newline.

IME, the Windows runtime libraries do tend to translate \n correctly; I
haven't worked on a Mac so I can't comment on this.

> Is this true? or should I define a macro to help me out? For instance:

You should definitely *not* define a macro for this; If you have to
accomodate the different line endings at least you it properly and used
a const char array.

The only reason I could see for doing anything like this would be that
you had to accomodate the HTML-style <br/> tag. Nevertheless I would
argue that this is bad style as you're mixing different idioms in this
case.


--
The lone C++ coder's blog: http://www.bsdninjas.co.uk/codeblog/

Sebastian Redl

unread,
Apr 13, 2007, 7:48:29 AM4/13/07
to

On Thu, 12 Apr 2007, Anand Hariharan wrote:

> On Apr 11, 9:24 am, Pete Becker <p...@versatilecoding.com> wrote:
> > The character '\n' represents the end of a line.
>
> .. for the target platform.

No, it represents the end of a line within the C++ program. The platform
is a different matter, and handled by text-mode I/O upon reading and
writing.

> Take this very ordinary case:
>
> I write a simple student assignment that needs to read a text file
> line-by-line. I take the advise "Use escape sequences. It does the
> right thing for you." and use \n and std::getline (which uses \n as
> default argument IIRC). I compile it to a Windows binary.

Provided the input stream is in text mode, it would then convert, upon
reading, "\r\n" sequences to "\n".

> Only, the graded gives a text file that was saved in a UNIX
> environment.

Meaning that within the file there are lonely "\n" characters.

> The result? Even if the input text file is only a
> couple of hundred lines long, the executable causes sytem to choke (it
> doubles memory allocation everytime it sees that the EOL is not
> encountered, and it tries to read the entire file as one line).

Not really, no. Upon reading, "\r\n" is converted to "\n" and "\n" is left
alone. Thus, the in-memory representation of the Unix file should have the
line breaks in the same place. Your scenario is actually harmless.
(Because a Unix file is already in C++'s required internal format, it's
pretty safe to read them anywhere.)

Here's one: the executable was compiled for a Unix system, thus no
translation is done on the input file. However, the input file comes from
a Windows system. Line endings are represented as "\r\n". As there is no
translation done, the "\n" is interpreted as the line ending, and the "\r"
remains as part of the stream. Chances are that the code that parses each
line chokes on this. (Similar: Opening a Windows file on a Mac, only that
you have "\n" at the start of lines instead of "\r" at the end.)

Here's another: the input file comes from a Mac system. Line endings are
"\r". No conversion is done, no "\n" are found, everything is interpreted
as a single line.

Now, that doesn't change with your proposal, though. As far as C++ is
concerned, things are pretty simple: in memory, there is only '\n'. Upon
reading and writing, a bit of translation might need to be done.


What you proposed doesn't help with your actual problem. Your problem is
that the input file you get is unsuitable for the system you're working
on. That's a system-level problem, not C++'s problem, but as, apparently,
the grading system you're using is too stupid to supply Windows
applications with Windows input files (a grave shortcoming, IMO, and I
would certainly go to the lab people to complain if I got a fail grade (or
even a bad grade) because of it), and because it's automated, you have no
chance to intercede (e.g. by calling unix2dos on the file), you can work
around the problem in your code.
Basically, you would open the file in binary mode, detect the line ending
(simply look for the first \n and \r: if it's a lonely "\n", the file is
Unix, if it's a lonely "\r", the file is Mac, if it's a "\r\n" sequence,
the file is Windows), and do all conversions yourself.

Note, however, that since this is a runtime decision, #ifdef won't help
you.

Sebastian Redl

jehuga...@gmail.com

unread,
Apr 13, 2007, 10:00:30 AM4/13/07
to
Thank you all for your replies. I will just let you know that I
mentioned macros only to stir some blood. :-) I think anyone who has
read a C++ book in the last 10 years will know of the community-wide
witch hunt. inlines, typedefs, templates, etc. all played their part
to reduce the need for them.

I added the "<br />" for flavor (rather I decided to keep it). I
assumed someone would get high-strung about it. :-)

>From what I have read in this message, it would seem that the newline
character is interpreted differently across different operating
systems.

In other words, so long as you write the code and work with the output
on the same OS, you are okay.

Windows will expect a "\r\n" and Unix will expect a "\n", even though
it appears as "\n" in source. You just can't transfer files between
OSs and expect it to work.

If you send files among Windows and Unix, you will often see ^M (or
some other symbol) at the end your code in a Unix text editor (VIM).

The only solution to this is to write a text utility for converting
between systems. sed would do the job.

The main reason for my post was my confusion between compile time and
runtime. Unless I write code to detect the OS at runtime and I
distinguish between the various libraries I am using, I cannot
guarantee a consistent newline. I was curious whether Windows handled
the "\n" properly in its headers. It would appear the C++ standard
libraries treat \n as a system-dependent newline. The Windows
libraries do not. The fact of the matter is that neither libraries
gives me the option, in some cases, which character(s) it uses as
newline. My conclusion is to pass C++ SL "\n" and MS "\r\n".

Perhaps this clears up my question a bit. I would love to hear what
other people have concluded about this. I am sure someone has run into
the same problem.

Thanks,
Travis

Pete Becker

unread,
Apr 13, 2007, 3:59:11 PM4/13/07
to
jehuga...@gmail.com wrote:
>
> Windows will expect a "\r\n" and Unix will expect a "\n", even though
> it appears as "\n" in source.

That is, Windows will expect the bytes 0x0A 0x0D, and Unix will expect
0x0D. Either way, it appears in source as '\n'. "\r\n" is a carriage
return followed by a newline.

I know that's not what you meant, but being sloppy about the difference
between source code and raw data is exactly what leads to this confusion.

> You just can't transfer files between
> OSs and expect it to work.
>

You can if you transfer them correctly.

> If you send files among Windows and Unix, you will often see ^M (or
> some other symbol) at the end your code in a Unix text editor (VIM).
>
> The only solution to this is to write a text utility for converting
> between systems. sed would do the job.
>

ftp was designed for this job. If you don't use a file transfer utility
that understands the systems it's transferring between, yes, you can
write a sed script to do the same job. Provided you understand what
needs to be done. Personally, I'd rather let ftp handle the details.

> The main reason for my post was my confusion between compile time and
> runtime. Unless I write code to detect the OS at runtime and I
> distinguish between the various libraries I am using, I cannot
> guarantee a consistent newline. I was curious whether Windows handled
> the "\n" properly in its headers. It would appear the C++ standard
> libraries treat \n as a system-dependent newline. The Windows
> libraries do not.

They most certainly do. On input, all the standard libraries I'm aware
of for windows translate the Windows two-byte newline representation
into '\n' on input, and vice versa on output.

The fact of the matter is that neither libraries
> gives me the option, in some cases, which character(s) it uses as
> newline.

That's right. The conventions for representing text depend on the
operating system. When you compile a program for Windows you get code
that follows the Windows convention for representing text files. When
you compile a program for Unix you get code that follows the Unix
convention for representing text files.

> My conclusion is to pass C++ SL "\n" and MS "\r\n".
>

Nope. "\r\n" means carriage return followed by newline. If you do that
with code compiled for Windows you'll get 0x0A 0x0A 0x0D in your output
file. That's not a valid newline under Windows.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

jehuga...@gmail.com

unread,
Apr 13, 2007, 11:46:19 PM4/13/07
to
{ please don't quote signatures or moderation server banners. -mod }

On 13 avr, 13:59, Pete Becker <p...@versatilecoding.com> wrote:

Interesting. When I try to just send in "\n" I only get a "\n". Now,
if I use MFC or even newer libraries, it works fine (in some cases).
But I am talking WIN32 (non-GUI). The best example is when you send a
"\n" into a multiline text box (GUI). No matter how many times you
pass "\n", it will never break. I am forced to type "\r\n". And like I
said in my previous post, I have no other choice but to consider the
library I am using and OS I am running on. This is just a depressing
reality I must accept.

FTP is not an option when the text files are part of a project's
configuration files. The hope would be to use a single file for any
location of the application. In such a case I have to write code that
treats the newline uniformally. Since the library I am using (SL)
changes the newline across some OSs, I can't decide that before hand.
So I have to have a new file for each OS. Weep.

Perhaps this seems to undo my original question a little. Perhaps I
should have asked whether there was a way to make a program think a
newline was something other than what was default on that system. But
at the same time, I was wondering about whether std::copy(begin, past,
(ostream_iterator<string>(cout, "\n"))) would do what I wanted it to.

Thank you for your replies. Your corrections and insights are greatly
appreciated. Are you going to write a new book for TR2, too? :-)


--

Chris Vine

unread,
Apr 13, 2007, 11:45:18 PM4/13/07
to
Pete Becker wrote:

> jehuga...@gmail.com wrote:
>>
>> Windows will expect a "\r\n" and Unix will expect a "\n", even though
>> it appears as "\n" in source.
>
> That is, Windows will expect the bytes 0x0A 0x0D, and Unix will expect
> 0x0D. Either way, it appears in source as '\n'. "\r\n" is a carriage
> return followed by a newline.

I don't disagree with your overall point, but if you are dealing with ASCII
then Windows will expect 0x0D (CR) followed by 0x0A (LF), and Unix will
expect 0x0A.

Chris

--
To reply by e-mail, remove the "--nospam--"

Pete Becker

unread,
Apr 15, 2007, 11:56:03 AM4/15/07
to
jehuga...@gmail.com wrote:
> The best example is when you send a
> "\n" into a multiline text box (GUI).

You'll have to ask Microsoft about that. Multiline text boxes are not
part of standard C++.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Pete Becker

unread,
Apr 15, 2007, 11:56:07 AM4/15/07
to
Chris Vine wrote:
> Pete Becker wrote:
>
>> jehuga...@gmail.com wrote:
>>> Windows will expect a "\r\n" and Unix will expect a "\n", even though
>>> it appears as "\n" in source.
>> That is, Windows will expect the bytes 0x0A 0x0D, and Unix will expect
>> 0x0D. Either way, it appears in source as '\n'. "\r\n" is a carriage
>> return followed by a newline.
>
> I don't disagree with your overall point, but if you are dealing with
ASCII
> then Windows will expect 0x0D (CR) followed by 0x0A (LF), and Unix will
> expect 0x0A.
>

Oops, looks like I got them backwards. It's been a while since I did
low-level stuff like that, and I knew I should have looked it up. :-(

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Ralf Fassel

unread,
Apr 16, 2007, 9:48:55 AM4/16/07
to
* "jehuga...@gmail.com" <jehuga...@gmail.com>

| Windows will expect a "\r\n" and Unix will expect a "\n", even
| though it appears as "\n" in source.

In my experience the application itself makes a greater difference
than the OS. E.g., opening a UNIX-style text file on Windows with
'notepad' shows one long line, but opening it with 'wordpad' shows the
'correct' line breaks.

So it depends on the application whether it relies on the OS to
remove/convert the \r at the end of line or whether it opens all files
as binary and removes/hides the occasional \r manually.

| If you send files among Windows and Unix, you will often see ^M (or
| some other symbol) at the end your code in a Unix text editor (VIM).

Same on Unix: recent emacs versions can hide the ^M in DOS-style-files
and only indicate them in the mode line. I guess you could set up
other editors to do the same.

| The only solution to this is to write a text utility for converting
| between systems. sed would do the job.

dos2unix/unix2dos are exactly for this purpose.

| Unless I write code to detect the OS at runtime and I distinguish
| between the various libraries I am using, I cannot guarantee a
| consistent newline.

For input, if you want to handle text files from different systems,
open all files in binary mode, and look for \n or \r yourself,
discarding the occasional \r right in front of an \n.

For output, the best way IMHO is to open files in text mode and use
plain "\n" as line ending. The OS will translate that for you as
required.

| I was curious whether Windows handled the "\n" properly in its
| headers.

At least the compilers cope with mixed input styles (\r\n in the
headers and \n only in the sources).

R'

Eugene Gershnik

unread,
Apr 16, 2007, 9:47:30 AM4/16/07
to
On Apr 13, 8:46 pm, "jehugalea...@gmail.com" <jehugalea...@gmail.com>
wrote:

> Perhaps this seems to undo my original question a little. Perhaps I
> should have asked whether there was a way to make a program think a
> newline was something other than what was default on that system.

Read files in binary mode and parse end-of-line markers yourself. On
modern systems you can expect any of '\r', '\n' or '\r\n' to serve as
a terminator. It is a pretty trivial exercise that most Windows
programmers went through one time or another. As an added bonus your
code will work as well and accept all possible formats on any other
modern OS.

In general if you expect standard text mode formatted I/O facilities
to work well for robust and user-friendly input you are going to have
more problems than this one.

--
Eugene

0 new messages