"If a source file that is not empty does not end in a new-line
character, [...] the behaviour is undefined."
Can anyone enlighten me as to the rationale of this rule? I have not
seen it enforced by any other compiler so far, and considering that line
breaks are normally ignored (except in string literals), I find it
rather strange.
Gerhard Menzl
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
One of the commonest causes of errors with #included files are exactly
that, the last line of a file runs straight into the first line of the
next. Usually you get an error message, but sometimes several lines
later. I would be happy if all compilers diagnosed this error, but that
can only be done at inclusion time.
--
Francis Glassborow
Check out the ACCU Spring Conference 2002
4 Days, 4 tracks, 4+ languages, World class speakers
For details see: http://www.accu.org/events/public/accu0204.htm
Because the behavior of the first few phases of translation
are defined in terms of sequences of characters that end
with a newline.
A compiler is allowed to accept source code that does not end
with a newline, but is not required to accept it.
--
Steve Clamage, stephen...@sun.com
> One of the commonest causes of errors with #included files are exactly
> that, the last line of a file runs straight into the first line of the
> next. Usually you get an error message, but sometimes several lines
> later. I would be happy if all compilers diagnosed this error, but
> that can only be done at inclusion time.
I don't understand. Can you give a practical example of a file which
contains well-formed C++ code, and which causes an error on inclusion if
not terminated by a new-line character but doesn't if the new-line is
added?
Gerhard Menzl
I am aware that this is the case, but the question is: why? From a
certain point in the translation process, new-line characters are
treated like spaces. Otherwise code like
int<new-line>
f()<new-line>
{<new-line>
}<new-line>
would not compile. Why then is it not possible to treat end-of-file
similarly? To put it differently: in which phase do new-lines really
matter? An example where an otherwise legal translation unit would
become ill-formed beause of a missing new-line would be helpful.
Gerhard Menzl
x.cc
8<----
#include "screwup.hh"
int main(){}
---->8
screwup.hh
8<----
#ifndef WW_screwup_hh_WWGUARD
#define WW_screwup_hh_WWGUARD
// I have just screwed up your code! :-)))
#endif---->8 This shows: no newline here!!!
Tried to compile:
8<----
~/work/!>make x
CC -o x x.cc
"x.cc", line 1: Warning: Last line in file "screwup.hh" is not
terminated with a newline.
1 Warning(s) detected.
Undefined first referenced
symbol in file
main /prog/SWS6.1/WS6U1/lib/crt1.o
ld: fatal: Symbol referencing errors. No output written to x
*** Error code 1
make: Fatal error: Command failed for target `x'
---->8
Preprocessed output (-P):
8<----
---->8
YES, empty, since _everything_ after the endif is ignored by this
compiler.
Attila
[...]
>
> One of the commonest causes of errors with #included files are exactly
> that, the last line of a file runs straight into the first line of the
> next. Usually you get an error message, but sometimes several lines
> later. I would be happy if all compilers diagnosed this error, but that
> can only be done at inclusion time.
It can be detected when the compiler "sees" the file to be included.
Of course you can run it on that single file separately. Did you
allude to some other problem?
file.h
int flag() ; // simple example
file.cpp
#include "file.h"
int flag() {
return 0;
}
Is that simple enough for you?
In fact you have a problem with any final line that does not finish with
either a semicolon or a newline.
--
Francis Glassborow
Check out the ACCU Spring Conference 2002
4 Days, 4 tracks, 4+ languages, World class speakers
For details see: http://www.accu.org/events/public/accu0204.htm
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
> An example where an otherwise legal translation unit would become
> ill-formed beause of a missing new-line would be helpful.
int f(int * a, int * b)
{
return *a /
*b; /* random comment */;
}
If you eliminate the newline just after the divide sign, you turn a valid
expression into the start of a comment and the code still compiles:
int f(int * a, int * b)
{
return *a /*b; /* random comment */;
}
Another example where a newline is significant is
vector<vector<int>
> vec;
Remove the newline between the two > signs and it won't compile.
Finally,
#define foo "\
\t"
and
#define foo "\\t"
are definitely not the same either.
--
Hubert Matthews http://www.oxyware.com/
Software Consultant hub...@oxyware.com
>Gerhard Menzl wrote:
[snip...]
>x.cc
>8<----
>#include "screwup.hh"
>int main(){}
>---->8
>
>
>screwup.hh
>8<----
>#ifndef WW_screwup_hh_WWGUARD
>#define WW_screwup_hh_WWGUARD
>// I have just screwed up your code! :-)))
>#endif---->8 This shows: no newline here!!!
>
[snip...]
>Preprocessed output (-P):
>8<----
>
>---->8
>
>YES, empty, since _everything_ after the endif is ignored by this
>compiler.
>From the error messages you get it doesn't seem you end up with an
empy "output". But strictly speaking the problem in your code is
different: due to the lack of new-line the #endif line in screwup.hh
is not a preprocessing directive at all (Even if most compilers will
consider it so).
Genny.
I do get an empty output. :-))) Please believe me: I am good enough to
find this out. On SUN CC with the -P flag, which emits the
preprocessed file (x.i), it will be a file, with a newline in it. Size:
1 octets. Nothing else. :-)
I have not tested if I get the } into the .i if I put it in the x.cc
into a new line. Now I did. I get it into the .i, whatever it means.
The point is: it _can_ happen :-)
Attila
> file.h
> int flag() ; // simple example
>
> file.cpp
>
> #include "file.h"
> int flag() {
> return 0;
> }
>
> Is that simple enough for you?
Yes, thank you. The question that remains is: would it really screw up
the translation process completely and break teralines of existing code
if 2.1 would be extended by something like "On inclusion of a source
file, end-of-file is replaced by a new-line"? A compiler that is smart
enough to detect a missing new-line and can perform much more
complicated textual replacements should have no difficulties there.
Up until now, I thought that
template<class T = X<int> >
was the most deplorable syntactic peculiarity in C++, but the new-line
rule beats it. What could be more counterintuitive than a character
which is not even visible in most text editing tools yet changes the
semantics of a program completely, especially considering that most
visible new-lines (i.e. those that do not contain single-line comments
or preprocessor directives) are syntactically insignificant?
Is there any introductory C++ textbook that warns about this? Given the
possibly insidious consequences of the existing rule, every such text
should contain a big warning in bold print.
> In fact you have a problem with any final line that does not finish
> with either a semicolon or a newline.
Or }.
Gerhard Menzl
And are you sure that there is no code out there that relies on the
preprocessor to splice files together? I'm not. I just tell delegates on
my courses to make sure they hit CR at the end of their files. Hardly
difficult.
--
Francis Glassborow
Check out the ACCU Spring Conference 2002
4 Days, 4 tracks, 4+ languages, World class speakers
For details see: http://www.accu.org/events/public/accu0204.htm
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
> Steve Clamage wrote in answer to the original question:
> > Because the behavior of the first few phases of translation are
> > defined in terms of sequences of characters that end with a
> > newline.
> > A compiler is allowed to accept source code that does not end with
> > a newline, but is not required to accept it.
> I am aware that this is the case, but the question is: why? From a
> certain point in the translation process, new-line characters are
> treated like spaces.
Only from a certain point on. Until you are finished preprocessing,
new lines are significant (unless escaped). Others have pointed out
examples, but I suspect that the main reason it is undefined behavior
is to avoid having to require the compiler to do something sensible
when a single token spans the end of file. (Collecting characters to
into a token is typically one of the tightest innermost loops in a
compiler, and the last thing you want to do is complicate it.)
One reasonable alternative might have been to require an end of file
to be treated as a new line.
--
James Kanze mailto:ka...@gabi-soft.de
Beratung in objektorientierer Datenverarbeitung --
-- Conseils en informatique orientée objet
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany, Tél.: +49 (0)69 19 86 27
Suppose the rule was this:
"If there is no newline at the end of the header file, one will be added
automatically".
Since this rule would only affect headers that are currently illegal, it can
have no impact whatsoever on legal legacy code.
[snip...]
>I do get an empty output. :-))) Please believe me: I am good enough to
>find this out. On SUN CC with the -P flag, which emits the
>preprocessed file (x.i), it will be a file, with a newline in it. Size:
>1 octets. Nothing else. :-)
Yes, I'm sure the "preprocessed file" is empty :) Anyhow, you should
agree we are in compiler-specific land here. Emitting (what
represents) the result of translations phases up to the 4th, is a
compiler-specific feature itself.
It remains that the source file (identified by) "screwup.hh" does not
end in a new-line character. Now, if your compiler behavior is to act
as if the translation unit was:
#ifndef WW_screwup_hh_WWGUARD
#define WW_screwup_hh_WWGUARD
// I have just screwed up your code! :-)))
#endifint main(){}
(this I think is what Gerhard is alluding to) the code is of course
ill-formed. Moreover, since (again) the behavior is in general
undefined, I think nothing prevents the compiler to emit an empty
output when it's given a -P switch while merging x.cc and screwup.hh
when compiling normally.
P.S.: as to the compiler messages in your first post, it seems (I say
"seems" because there's a strange formatting, so it's not clear ho to
read them...:) ) there's a warning about an undefined symbol.
Genny.
I would be surprised if such code didn't exist, but as it relies on
undefined behaviour, I don't see a problem here.
> I just tell delegates on my courses to make sure they hit CR at the
> end of their files. Hardly difficult.
Hitting the <Enter> key is not difficult. Having to remember to do it is
an unnecessary annoyance.
Gerhard Menzl
> One reasonable alternative might have been to require an end of file
> to be treated as a new line.
This is what several compilers obviously do today. I think it should be
standard.
Gerhard Menzl
I think we have to be careful to distinguish between ill-formed and
undefined behaviour. We can do anything we like with ill-formed because
the compiler is required to diagnose it. UB is a little different
because one OK thing for a compiler to do is to provide an
implementation definition of any specific UB. Now if we convert UB to a
universally defined action any implementor who has used the licence to
make UB implementation defined has just been sabotaged. If you are going
to change this particular case, make it implementation defined.
--
Francis Glassborow
Check out the ACCU Spring Conference 2002
4 Days, 4 tracks, 4+ languages, World class speakers
For details see: http://www.accu.org/events/public/accu0204.htm
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
> Francis Glassborow wrote:
>
> > file.h
> > int flag() ; // simple example
> >
> > file.cpp
> >
> > #include "file.h"
> > int flag() {
> > return 0;
> > }
> >
> > Is that simple enough for you?
>
> Yes, thank you. The question that remains is: would it really screw up
> the translation process completely and break teralines of existing code
> if 2.1 would be extended by something like "On inclusion of a source
> file, end-of-file is replaced by a new-line"? A compiler that is smart
> enough to detect a missing new-line and can perform much more
> complicated textual replacements should have no difficulties there.
Turn the question around. What is the benefit added by making this
change? To allow sloppy editing by programmers? To allow sloppy
programming by the writers of code generation software?
To be sure, it would not add much complexity to a compiler. There are
thousands of other little changes that would not add much complexity
to a compiler, either. But add them all up, and thousands of "not
much complexities" add up to much complexity indeed.
New features should provide a great, and probably obvious, benefit.
Who benefits from this? Is that benefit important enough to take
compiler developers' time away from such things as implementing
export?
Language standards quickly grow into Godzilla sized monsters if every
little feature that anyone happens to think of gets thrown in. If you
actually think this feature will provide sufficient benefit to the C++
community at large, prepare a logical case and test it out in
comp.std.c++.
--
Jack Klein
Home: http://JK-Technology.Com
>In article <3C7F429E...@sea.ericsson.se>, Gerhard Menzl
><gerhar...@sea.ericsson.se> writes
>>Yes, thank you. The question that remains is: would it really screw up
>>the translation process completely and break teralines of existing code
>>if 2.1 would be extended by something like "On inclusion of a source
>>file, end-of-file is replaced by a new-line"?
>
>And are you sure that there is no code out there that relies on the
>preprocessor to splice files together? I'm not. I just tell delegates on
>my courses to make sure they hit CR at the end of their files. Hardly
>difficult.
Well, I can't speak for Gerhard of course, but if the "replace eof
with new-line" rule was introduced from the beginning (i.e. in the C90
standard) there would be no such (odd) code today. I think Gerhard's
question is not whether we can change it now, but "why this rule was
*originally* introduced?"
Genny.
> Francis Glassborow wrote:
>
> > file.h
> > int flag() ; // simple example
> >
> > file.cpp
> >
> > #include "file.h"
> > int flag() {
> > return 0;
> > }
> >
> > Is that simple enough for you?
>
> Yes, thank you. The question that remains is: would it really screw up
> the translation process completely and break teralines of existing code
> if 2.1 would be extended by something like "On inclusion of a source
> file, end-of-file is replaced by a new-line"? A compiler that is smart
> enough to detect a missing new-line and can perform much more
> complicated textual replacements should have no difficulties there.
Turn the question around. What is the benefit added by making this
change? To allow sloppy editing by programmers? To allow sloppy
programming by the writers of code generation software?
To be sure, it would not add much complexity to a compiler. There are
thousands of other little changes that would not add much complexity
to a compiler, either. But add them all up, and thousands of "not
much complexities" add up to much complexity indeed.
New features should provide a great, and probably obvious, benefit.
Who benefits from this? Is that benefit important enough to take
compiler developers' time away from such things as implementing
export?
Language standards quickly grow into Godzilla sized monsters if every
little feature that anyone happens to think of gets thrown in. If you
actually think this feature will provide sufficient benefit to the C++
community at large, prepare a logical case and test it out in
comp.std.c++.
--
Jack Klein
Home: http://JK-Technology.Com
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
There was (1) warning, about missing EOL at the end of screwup.hh - that
was the last line what the compiler has emitted. And then there was an
_error_, from the linker, about undersolved external(s):
CC -o x x.cc
"x.cc", line 1: Warning: Last line in file "screwup.hh" is not
terminated with a newline.
1 Warning(s) detected.
<<<<<<<<<================ Up to here compiler
Undefined first referenced
symbol in file
main /prog/SWS6.1/WS6U1/lib/crt1.o
ld: fatal: Symbol referencing errors. No output written to x
*** Error code 1
make: Fatal error: Command failed for target `x'
<<<<<<<<<================ Above this, linker.
Attila
> > One reasonable alternative might have been to require an end of
> > file to be treated as a new line.
> This is what several compilers obviously do today. I think it should
> be standard.
I would consider the most natural thing would be to simply ignore it.
If I want a new line, I enter one. If I don't, I don't. (In fact, I
configure my editor so that it will automatically append the new line
when writing the file, if one is missing.)
And you still have to address the question of an escaped new line.
Since I explicitly didn't want that one.
--
James Kanze mailto:ka...@gabi-soft.de
Beratung in objektorientierer Datenverarbeitung --
-- Conseils en informatique orientée objet
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany, Tél.: +49 (0)69 19 86 27
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
> Suppose the rule was this:
> "If there is no newline at the end of the header file, one will be
> added automatically".
That's one possible rule. The other is to treat exactly the
characters seen, ignoring the end of file. Depending on how you write
your compiler, one or the other (or treating it as a non end of line
white space) falls out naturally, supposing you don't do anything
special.
In the end, I suppose that trying to get concensus about which rule to
adopt wasn't considered worth the effort. After all, I've never used
an editor which didn't at least have an option to force all files to
end with a new line.
--
James Kanze mailto:ka...@gabi-soft.de
Beratung in objektorientierer Datenverarbeitung --
-- Conseils en informatique orientée objet
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany, Tél.: +49 (0)69 19 86 27
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[...]
>There was (1) warning, about missing EOL at the end of screwup.hh - that
>was the last line what the compiler has emitted. And then there was an
>_error_, from the linker, about undersolved external(s):
Oh yes, it doesn't really matter. I'm sorry instead that our
discussion shifted somewhat away from the op question :( What I was
stressing is the fact that a preprocessing directive *must* end with a
new-line (clause 16). So, strictly speaking, your code contains a
syntax error. This is *more* than generically lacking a new-line at
the end of a source file. Note infact that your example is different
from the one provided by F. Glassborow. :)
Now I think we should bring the thread back to its origin: why the
rule is that a non-empty source file not ending in a new-line
character causes undefined behavior? Does anyone know?
P.S.: maybe the answer will come from a C guru, since this is a legacy
of that language.
Genny.
> > Yes, thank you. The question that remains is: would it really screw
> > up the translation process completely and break teralines of
> > existing code if 2.1 would be extended by something like "On
> > inclusion of a source file, end-of-file is replaced by a new-line"?
> > A compiler that is smart enough to detect a missing new-line and can
> > perform much more complicated textual replacements should have no
> > difficulties there.
>
> Turn the question around. What is the benefit added by making this
> change? To allow sloppy editing by programmers? To allow sloppy
> programming by the writers of code generation software?
The benefit would be to remove the need to manually care about an
invisible detail of the physical source code representation that does
not contribute anything to readability and can easily be handled by a
machine.
I do not consider forgetting about a rule that is so counterintuitive
and so little documented as sloppy.
> To be sure, it would not add much complexity to a compiler. There are
> thousands of other little changes that would not add much complexity
> to a compiler, either. But add them all up, and thousands of "not
> much complexities" add up to much complexity indeed.
>
> New features should provide a great, and probably obvious, benefit.
> Who benefits from this? Is that benefit important enough to take
> compiler developers' time away from such things as implementing
> export?
I am not a compiler writer, but I imagine that the change would be
trivial. Compared to implementing export, it certainly is minuscule.
Several compilers already do this. I think standardizing something which
is already practiced and costs little to implement would be worthwhile.
> Language standards quickly grow into Godzilla sized monsters if every
> little feature that anyone happens to think of gets thrown in. If you
> actually think this feature will provide sufficient benefit to the C++
> community at large, prepare a logical case and test it out in
> comp.std.c++.
Given the predominant reaction here, I don't think I will go to the
trouble.
Gerhard Menzl
Right. I want a new line mainly to break code into readable pieces. In
some cases, I want a new line because I need it for termination of a
preprocessor directive. At the end of a file, I don't care. Intuitively,
an end-of-file is a stronger separator than a new-line.
> (In fact, I configure my editor so that it will automatically append
> the new line when writing the file, if one is missing.)
Nice feature, but not all text processing tools provide it. I think the
compiler would be the right tool to do it.
> And you still have to address the question of an escaped new line.
> Since I explicitly didn't want that one.
Do you mean code that uses the escape character to make a line span
across files? Can you give a practical example?
Gerhard Menzl