In order to keep track of this number, the lexer has to identify the end
of the current token (this was the subject of another discussion some
while ago). Given that the line number of the first source line is 1,
what should be the value of i in the following?
int i = __LINE__\
The backslash-newline pair being ``non-existant'', could it be
``attached'' to __LINE__, thus producing the value `2'? Obviously, the
value `1' is reasonable as well. Does the Standard make the effect of
such a statement implementation-defined?
Graduate student in Operations Research
École Polytechnique de Montréal
The standard is quite clear: its value is 1. Read about "phases of
translation", and note that the line number is generated after phase 1,
but backslash newline is not processed until phase 2.
I would argue (weakly) that it is to the start of the current token, and
thus 1. However, if you want to play safe you should assume it is to an
unspecified point between the start and end of the current token, so in
that case it would be 2.
[This was addressed in a UK Defect Report ages ago, but I forget what
the answer was.]
Clive D.W. Feather | Director of | Work: <cl...@demon.net>
Tel: +44 181 371 1138 | Software Development | Home: <cl...@davros.org>
Fax: +44 181 371 1037 | Demon Internet Ltd. | Web: <http://www.davros.org>
Written on my laptop; please observe the Reply-To address
I'm not sure, but I think the standard doesn't have much to say
I personally believe it's just as reasonable for a compiler to keep
track of the *beginning* of each token instead. (The lexers I
write do just that, and I haven't had any complaints.) It's
about the same amount of work for the lexer in either case.
With that in mind, what is the result of this code:
1: int i = __LI\
The standard would probably classify this as implementation-defined,
if it says anything about it at all. (But clearly there are only
two valid answers: 'i' is either 1 or 2.)
-- David R. Tribble, dtri...@technologist.com --
>int i = __LINE__\
>The backslash-newline pair being ``non-existant'', could it be
>``attached'' to __LINE__, thus producing the value `2'?
I don't see why not. The standard doesn't specify exactly what happens
here, so the implementation has some wiggle room. Similarly,
int i = __LINE__\
might cause i to have the value 1, 2, or 3.
This has the merit of being uniquely defined. However, my question
arose because, in a previous discussion which implied the use of
__LINE__ as an argument of a function-like macro, ...
Clive D.W. Feather wrote:
> In this case, I wasn't proposing a change to make it be standardised. I
> was pointing out that that's how I read the definition of __LINE__. Look
> at 6.8.4 paragraph 2 - the line number is based on the number of
> newlines up to the "current token". When processing a macro expansion,
> this surely is the ) at the end of the invocation.
So sometimes, it is expected that it is the _end_ of a construct that
defines the associated value of __LINE__.
Maybe it's simpler to agree on the start of the current token, and that
for function-like macros, this means the start of the identifier token
that introduces the macro.
>> int i = __LINE__\
>The standard is quite clear: its value is 1. Read about "phases of
>translation", and note that the line number is generated after phase 1,
>but backslash newline is not processed until phase 2.
That's irrelevant. Labelling each character with its source line number
int i = __LINE__;
The question is: is the line number of a token the line number of the
first character, the last character, the first character not part of the
token, or an unspecified character within the token ? Or what ?
Since every character in the token occurs in line 1, how can its line
number possibly be anything other than 1?
>Since every character in the token occurs in line 1, how can its line
>number possibly be anything other than 1?
blah blah blah
... \n ...
end of token
val = line;
It's a real question. No one has ever adequately explained how __LINE__
works, and I believe the committee has decided it's not worth deciding.
Copyright 1999, All rights reserved. Peter Seebach / se...@plethora.net
C/Unix wizard, Pro-commerce radical, Spam fighter. Boycott Spamazon!
Send me money - get cool programs and hardware! No commuting, please.
Visit my new ISP <URL:http://www.plethora.net/> --- More Net, Less Spam!
>This has the merit of being uniquely defined. However, my question
>arose because, in a previous discussion which implied the use of
>__LINE__ as an argument of a function-like macro, ...
>Clive D.W. Feather wrote:
>Maybe it's simpler to agree on the start of the current token, and that
>for function-like macros, this means the start of the identifier token
>that introduces the macro.
I suspect that the best answer is to say that it is the line number of
some unspecified point between the first character of the token and the
character immediately following the token.
I don't understand the point of this example. Yes, if a token begins on
one line and ends on another, there's an ambiguity about which line it's
on. That's not the case in the example that I responded to: every
character in __LINE__ is on the same line. Is this supposed to
illustrate that a naive implementation can get this wrong? If so, that's
not particularly relevant. What is the issue here?
Because the implementation doesn't know it's at the end of the token
until it's on line 2.-Larry Jones
I suppose if I had two X chromosomes, I'd feel hostile too. -- Calvin
>It's not even that: the DR that asked the question never got answered.
Yes. The committee decided that it was not worth answering. In a vote.
I think it may even have been a formal vote, but it may have just been
a show of hands.
My point is that there's nothing unlikely about counting the lines first,
and *then* noticing that the next character (which was on the next line)
was really no longer part of this token.
you can't tell, until you've seen the newline, that you're done with __LINE__,
so you're on line 2 when you definitely see that you're done with the token,
so I'd consider that a reasonable answer to the question "what line were you
on when you saw this token".
This sounds reasonable, and it probably covers most existing
Thus this fragment:
1: int i =\
which results in these characters and source lines:
int i =__LINE__;
would result in one of the following acceptable values of 'i':
3, 4, 5, 6, or 7.
I consider any other values as broken.
You might be right, though I don't remember it.
On the other hand, I note that none of DRs 173 to 178 ever got answered
offically, though it looks like the other 5 have all been solved in C9X.
issue 1: does the standard really specify that this naive implementation
is wrong? I'm agnostic on that issue.
issue 2: should the standard be so specific that it does prohibit this
naive implementation? I can't come up with a good argument why __LINE__
needs to be that well defined. AFAIK, it's main use is as a debugging
aid, intended to point the programmer at the right piece of code. Anyone
who writes code such that __LINE__'s value leaves them unclear about the
place at which it was evaluated, deserves what they get.
When the __LINE__ token occurs entirely *on* one line,
that is obviously the appropriate line number.
When __LINE__ spans multiple lines (via \new-line etc.),
as I recall we didn't care to specify which of the
possibilities it had to be, since it doesn't matter in practice.
It is resolute in considering the line number of
\-pasted things as belonging to the first line with
the \. (In trying the example, I trimmed the initial
[0-9}: and white-space, of course.)
I which way is this:
int i = __LINE__\
different from this:
int i =\
(GCC emits `int i = 2' for the first and `int i = 7;' for the second).
Although in the first example __LINE__ occurs entirely on one line the
compiler still has to look ahead beyond \ to find the end of the token.
>int i = __LINE__\
>... GCC emits `int i = 2'
int i = \
Ritchie's preprocessor presumably emits `int i = 1;'.
So we have two practical counterexamples to Gwyn's suggestion.
I think both implementations conform to the standard
in this amusingly trivial matter.
That's what the preprocessor I wrote does, too. It's become clear that
I'm going to have to change it, and the change will involve keeping
additional line number information around just to satisfy this detail
for `__LINE__'. I hope it's going to be worth the trouble.
- John Hauser
> I think both implementations conform to the standard
> in this amusingly trivial matter.
I don't think so. Section 6.8.8 states that `__LINE__' expands to
``the line number of the current source line'', and Section 6.8.4
defines the line number of the current line as ``one greater than the
number of new-line characters read or introduced in translation phase 1
while processing the source file to the current token''. There is
clearly 1 new-line before the `__LINE__' token in the second example
above, so `__LINE__' cannot validly expand to `1'; it has to be
at least `2', and in this case exactly `2', surely. So Ritchie's
preprocessor (and my own, too, as I've noted) is not conforming in this
- John Hauser
That demonstrates that lookahead needs to track line number
*and so does pushback of an unused lookahead*. GCC has it wrong.
That's a legitimate rationalization, but I don't see any words in the
standard that support it.
> Basically, given
> you can't tell, until you've seen the newline, that you're done with __LINE__,
> so you're on line 2 when you definitely see that you're done with the token,
> so I'd consider that a reasonable answer to the question "what line were you
> on when you saw this token".
My copy of the standard says that __LINE__ is "the line number of the
current source line", not "the line number of the current source line,
or maybe one more, depending on whether its followed by an escaped
I agree that this isn't an earth-shattering issue from the perspective
of standards conformance, but it is distressing to read what look like
clear words in the standard, and to be told that they don't mean what
they say, and get such a vague explanation.
I'm not convinced that your decision is wrong... But on the other hand,
it seems to me that in
/* line 1 */
j; /* line 3 */
int i = __LINE__; /* line 4 */
i has to be 4. So, you have to count those backslash-newlines *somewhere*.
On the other hand, if you said
/* line 1 */
nt i = __LINE__\
; int j = __LINE__;
I would think anything from 2-4 would be okay for i, and j would be 4.
>That's a legitimate rationalization, but I don't see any words in the
>standard that support it.
I also don't see anything contradicting it.
I just realized,
>one greater than the
>number of new-line characters read or introduced in translation phase 1
>while processing the source file to the current token
But since you're allowed to do all of TP1, then all of TP2, etcetera...
You could make the case that either *every* newline in the file was "read or
introduced in translation phase 1" while processing the source, or *none*
were. More likely "all". Because you processed them all in TP1, then you
started TP2, and thus, to process to the current token, you read 'em all.
>My copy of the standard says that __LINE__ is "the line number of the
>current source line", not "the line number of the current source line,
>or maybe one more, depending on whether its followed by an escaped
The problem is "what's current"? Is current where you were "inside" the
token, or where you where when you got confirmation that you had a token?
>I agree that this isn't an earth-shattering issue from the perspective
>of standards conformance, but it is distressing to read what look like
>clear words in the standard, and to be told that they don't mean what
>they say, and get such a vague explanation.
Here's my thinking:
int i = __LINE_\
it's fairly clear that either of these lines could be considered "current".
My claim is that, when the 'cursor' is just *past* a token, that's also
"current". Or could be, legitimately.
Well, I think that's a stretch, influenced by knowing how compilers are
often written. I don't think a programmer who hasn't worked with the
innards of compilers would read it that way.
FWIW, the preprocessors I write keep track of the beginning of each
token. They do this by calling a low-level "getchar" function that
returns the character code along with its (physical) line number
(and column position and include-file index as well). I stick the
position info of the first character of a new token into the token
info before I collect characters for the rest of the token.
Obviously, this is only one approach; it's perfectly reasonable
for a token's line number to be the position of its last character.
I would argue, though, that's it's misguided to associate a line
number with a token in which none of the characters of the token
actually appear on that line. But then there's the special case
of splicing lines together separated by \-newlines in a separate
pass, which would seem to make it okay to treat such meta-lines
as a single source line (just as long as the next line resumes
with the correct line number, such as by generating extra newlines
in the preprocessed output).
But I wouldn't sweat it; it's a minor issue anyway. It's something
that's best designed in from the beginning, before you write your
lexer, rather than adding later.
And don't forget the other issues that our lexers must deal with
if they are going to properly handle C9X-compliant (and C++
compliant) source. Things like trigraphs, digraphs, UCNs,
alternate punctuation keywords, wide characters, and hex float
literals, to name a few.
I agree. The arguments about lookahead etc. could just as well
be applied to
We sure didn't want the first case to expand to a line number
other than that of the line that the token is embedded within.
I disagree. In the first case, the token clearly ends before the
newline. In the second case, the backslash-newline does *not* end
the token, you have to look at the first character on the next line
before you know whether you've got an entire token or not, and thus
the token ends on the next line even if no characters from that line
are actually part of it.
> We sure didn't want the first case to expand to a line number
> other than that of the line that the token is embedded within.
That, I'll certainly agree with.
I don't want to be THIS good! -- Calvin
You (the LR parser) don't know that the first token has ended *until
after you have read the newline*.
My copy of the standard defines the meaning of __LINE__ in terms of the
source line on which it occurs, not when an LR parser might recognize
that the end of a token occurs. Does yours say something different?
Yes, but it's easy enough to delay processing that newline until after
you've associated the current line number with the token. In the other
case, you have to read the backslash, the newline, and at least one
following character (more if that character is also a backslash), and
it *isn't* easy to delay processing an arbitrarily long sequence of
OK, what's the NEXT amendment say? I know it's in here someplace. --
But there is no "arbitrarily long sequence of characters" here. There's
only backslash followed by an escaped character. Once it's determined
that this can't be part of __LINE__ you know you're at the end of the
were in C89
just new tokens for the list in the lexer's source code
>alternate punctuation keywords,
were in C89 (though "xxx" L"yyy" is new, I'll admit)
>and hex float
New, I agree.
>> > > __LINE__
>> > > __LINE__\
>> You (the LR parser) don't know that the first token has ended *until
>> after you have read the newline*.
>Yes, but it's easy enough to delay processing that newline until after
>you've associated the current line number with the token. In the other
>case, you have to read the backslash, the newline, and at least one
>following character (more if that character is also a backslash), and
>it *isn't* easy to delay processing an arbitrarily long sequence of
Furthermore, in the first case the newline has survived to the relevant
phase of translation. In the second it's already disappeared.
[How long is your lexer input buffer ?]
These are the alternate spellings found in <iso646.h> (such as 'and'
and 'or'). They aren't a problem in C (C89 or C9X) since they're
just macros, but they are reserved/predefined keywords in C++
(which are meaningful in the preprocessor phase and beyond).
Which means that they must be dealt with (at some level) if your
lexer is used for both C and C++.
But is there a difficulty in saving the source line number for the
last non-white (and non-backslash-newline) source character that was
read? Once you're past the backslash-newline(s), you know the line
number of the last (non-white) character of the token, right?
You're right, of course. But it's still a false issue: it's simple to
keep track of the line number where the backslash-newline sequence
began. The length of the sequence doesn't matter.
>just new tokens for the list in the lexer's source code
That depends on the implementation. For a character-based
preprocessor, digraphs can take quite a bit more work than that.
For example, when I added digraph support to GCC's preprocessor, I used
the fact that `%:' is not a digraph if preceded by an odd number of
`<'s, because the code scans backwards at that point! The preprocessor
normally needn't do anything special about `<' when expanding macros,
but it must do so in the presence of digraphs, and it's more efficent
for it to worry about this only when a potential digraph is discovered
than to worry about it whenever `<' is discovered. Therefore GCC's
preprocessor scans backwards through the input when %: is discovered,
counting `<'s as it goes, to see whether the %: is really a #.
This is not the only bit of hairy digraph code that appears in the GCC
preprocessor. I admit that my thoughts at the time were less than kind
about the people on the standardization committee who foisted digraphs
on the rest of us. I wouldn't mind so much if digraphs were actually
used in practice, but they aren't.
Yes, but the newline character is on the same line as the token. You have not
yet read any characters from the _next_ line. In the second case you have
actually read a character from the next line before you know you are done.
-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
I could go on :-)
There's not a particular difficulty (I think), but the question is what
behaviours should and should not be allowed.
Please don't jump into a conversation that you have not been tracking.
Ow. I forgot that particular C++ gratuitous change.
Yuk. What's wrong with maximal munch in the forward direction ?
> The preprocessor
>normally needn't do anything special about `<' when expanding macros,
>but it must do so in the presence of digraphs, and it's more efficent
>for it to worry about this only when a potential digraph is discovered
>than to worry about it whenever `<' is discovered.
I don't even understand this comment. Can you please give an example ?
>This is not the only bit of hairy digraph code that appears in the GCC
>preprocessor. I admit that my thoughts at the time were less than kind
>about the people on the standardization committee who foisted digraphs
>on the rest of us.
You aren't the only one. [My personal take on their history is: "Denmark
forced them on us, USA refused to fight".]
In effect, that's what the lexer has to do to get it right.
if ( alphanumeric( c = getnext() ) )
token_line_number = line_number;
instead of just
c = getnext();
somewhere deep inside the parser.
How do you know that they aren't?
Both trigraphs and digraphs were forced on us to obtain the support
of the Danish delegation to WG14, who insisted that they had a lot of
keyboards that didn't provide any convenient way to enter certain C
>Yuk. What's wrong with maximal munch in the forward direction ?
Nothing's _incorrect_ about maximal munch. It's just harder to write,
and slower, that's all.
>In article <78rnqq$bor$1...@shade.twinsun.com>, Paul Eggert
>> The preprocessor
>>normally needn't do anything special about `<' when expanding macros,
>>but it must do so in the presence of digraphs, and it's more efficent
>>for it to worry about this only when a potential digraph is discovered
>>than to worry about it whenever `<' is discovered.
>I don't even understand this comment. Can you please give an example ?
Normally, when the preprocessor is processing a macro, it just copies
the definiens, looking for identifiers and `#'; the other characters
can just be copied through as-is, without tokenization. I'm omitting
details about whitespace, backslash-newline, strings, trigraphs,
multibyte characters, and so on; but the basic idea is that a
character-oriented preprocessor needn't worry about tokenizing when
it's analyzing the definiens of, say, `#define f(a,b) (a--<<--b)';
it can just copy the `--<<--' through without worrying about token
boundaries. This simplifies the writing of the preprocessor, and
makes it a tad faster.
The same basic idea can be used even in the presence of digraphs like
`%:', but it gets trickier. When you're analyzing
`#define f(a,b) (a<<<<<<<<<<<<%:b)', you must count the number of
`<'s before the `%:' to see whether the `%:' is really a `#'. Ugh.
I have the distinct impression that the people who proposed digraphs
never implemented a character-oriented preprocessor for them, and I
have the sneaking suspicion that they didn't build a token-based
preprocessor either. I'm afraid that digraphs were an example of
specify now, implement later -- which is backwards from what the C
standard ought to be.
>Paul Eggert wrote:
>> I wouldn't mind so much if digraphs were actually
>> used in practice, but they aren't.
>How do you know that they aren't?
I know because I get the GCC bug reports, and nobody ever complains
about digraphs not working. :-)
We've had discussions before like this.
I recall your claiming that trigraphs don't break existing programs,
except for perhaps a ``handful'' of chess programs.
I showed several instances of breakage in widely used programs,
including GDB, f2c, and the JPEG library.
You said that I didn't provide enough examples,
and anyway the breakages weren't all that big a deal.
I also recall claiming that `long long' system types could break a lot
of existing code. You expressed skepticism, and said ``show me''.
So I showed you, with several examples of widely used code, including Apache.
I distinctly recall your pooh-poohing the evidence,
and saying that it was no big deal.
And this was for discussions where I was proving a positive,
and could show hard evidence. Now you're asking me to prove a negative!
I doubt whether you would be convinced by any evidence that I can supply.
I could inspect all the source code at my site for digraphs and
come up empty (as I'm sure that I would, except for the GCC test cases),
and you'd still say that perhaps some Dane somewhere
might be using digraphs in the corner of his garage.
[ ... ]
> Yes, but the newline character is on the same line as the token. You have not
> yet read any characters from the _next_ line. In the second case you have
> actually read a character from the next line before you know you are done.
I hate to say it, but I'm starting to wonder if there's any real point
to this discussion at all. The reality is that the compiler is free
to define what constitutes the end of a line, and it doesn't have to
bear any particularly close relationship with anything else on earth.
About the only limitation (that I can think of) is that preprocessor
lines really have to be treated as LINES, not just text. e.g. a
``#define'' that isn't at the beginning of a line (excluding white
space) isn't treated as a preprocessor directive. In addition, I
suppose the compiler is obliged to follow #line directives.
Other than that, the compiler is free to say that _nothing_
constitutes a new line, and simply not insert (or delete, at its
option) any new-line characters in the input, so all normal code is
treated as one long line.
At the opposite extreme, the compiler would be free to define every
semicolon outside of a string/character constant as being the end of a
"line" and count its lines that way.
As such, it's perfectly legal to have a __LINE__ on what most of us
would think of as the 250th line of a program, and have the compiler
tell us that it's, say, line 5 or line 360. In short, the compiler
can choose nearly ANY value it feels like for nearly any __LINE__, and
there's really no way to say it's legal or illegal. About the only
thing you can say about the value of __LINE__ is that they have to be
assigned in a non-decreasing order in the absence of #line directives.
There are undoubtedly a FEW other restrictions if a __LINE__ is at or
_VERY_ close to the beginning of a translation unit, but under most
circumstances, nearly ANY value can be assigned legally. In short,
it's an arbitrary number, and nearly the ONLY control is quality of
Sorry, you should first think of the correctness, and only then speed.
> >In article <78rnqq$bor$1...@shade.twinsun.com>, Paul Eggert
> >> The preprocessor
> >>normally needn't do anything special about `<' when expanding macros,
> >>but it must do so in the presence of digraphs, and it's more efficent
> >>for it to worry about this only when a potential digraph is discovered
> >>than to worry about it whenever `<' is discovered.
> >I don't even understand this comment. Can you please give an example ?
> Normally, when the preprocessor is processing a macro, it just copies
> the definiens, looking for identifiers and `#'; the other characters
> can just be copied through as-is, without tokenization. I'm omitting
> details about whitespace, backslash-newline, strings, trigraphs,
> multibyte characters, and so on; but the basic idea is that a
> character-oriented preprocessor needn't worry about tokenizing when
> it's analyzing the definiens of, say, `#define f(a,b) (a--<<--b)';
> it can just copy the `--<<--' through without worrying about token
> boundaries. This simplifies the writing of the preprocessor, and
> makes it a tad faster.
I think that a preprocessor *should* tokenize for correct parsing. Of course,
it can delay tokenizing until needed. That means you tokenize upto '#define
f(' to recognize that it is a function macro, but can keep the rest of the
line as a string. I also think that a compiler should be integrated with the
preprocessor for speed (instead of reading the output of preprocessor). Then
it can convert a pp-token to token without tokenizing again and first time
tokenization is not wasted. I know that there is place for a stand-alone
preprocessor, but then it must first *correctly* parse.
> The same basic idea can be used even in the presence of digraphs like
> `%:', but it gets trickier. When you're analyzing
> `#define f(a,b) (a<<<<<<<<<<<<%:b)', you must count the number of
> `<'s before the `%:' to see whether the `%:' is really a `#'. Ugh.
Here the problem is that you are searching for a token (%:) without
tokenizing!! This kind of short-cuts (so called optimizations) make the
maintenance of a lexer/parser impossible. If a language evolves, then you
have to visit the entire list of such short-cuts to see if they are still
valid. For example, C may add a new operator <<< (like Java) or alternate
spelling for a punctuator character (did anybody say digraph)? I know you are
a seasoned programmer and I do not want to sound condescending, but I
strongly feel about this from my experience.
> I have the distinct impression that the people who proposed digraphs
> never implemented a character-oriented preprocessor for them, and I
> have the sneaking suspicion that they didn't build a token-based
> preprocessor either.
I do not agree with you! Whether digraphs are useful to anybody is a
separate topic, but I do think that a well-written lexer should be capable
to handle a digraph (just another token).
> I'm afraid that digraphs were an example of
> specify now, implement later -- which is backwards from what the C
> standard ought to be.
-- Saroj Mahapatra
Thank you for the vacuous advice. Since I have actively participated in
this thread since its beginning, I don't see that it applies here.
>I also think that a compiler should be integrated with the
>preprocessor for speed (instead of reading the output of preprocessor).
GCC currently has a build-time option that will let you substitute an
integrated preprocessor that does tokenization. Unfortunately, if you
select that option, GCC becomes buggier and slower. (So much for theory. :-)
If you'd like to help rectify this situation, I can put you in touch
with the maintainer for the integrated-preprocessor option. He's
gradually making it faster and more reliable. I'm sure that he
would appreciate some help.
>Here the problem is that you are searching for a token (%:) without
I would say that the problem is that the people who added digraphs
didn't understand how a character-based preprocessor works.
Clearly you're in the ``all preprocessors should tokenize'' camp,
so you don't care whether someone changes the standard to render
character-based preprocessors infeasible. However, the standard
should be more catholic -- it should cater to existing practice,
and this includes both kinds of preprocessors.
But such a compiler clearly does not conform to the requirements of
the C standard, which *does* define what constitutes an input line.
Well, then, please pay attention.
Sure we did. In fact we debated the very point, and the proponents
of digraphs-as-tokens prevailed. I'm sorry you weren't participating,
as the outcome might then have been more to your liking.
Personally I don't think there was *ever* a need for trigraphs,
digraphs, or \u-escapes in the C standard. How input characters
are coded should never have been a C language issue.
Thank you for your input. I'll give it the consideration it deserves.
>Jerry Coffin wrote:
Where? I can't find this definition. What I can find is (5.2.1)
In source characters there shall be some way of indicating the
end of each line of text; this International standard treats
such an end-of-line indicator as if it were a single new-line
My implementation for the Kludge 9000 super-duper computer is that the
end of line character is a line-feed character if
it occurs on a line that begins with any number of spaces and
tabs followed by a #
it is followed by any number of spaces and tabs followed by a
In other contexts line-feed in a source file is translated to space.
Thus the program
Michael M Rubenstein
Please re-read section 184.108.40.206 of the standard, with an emphasis on
phase 1 of translation. Pay close attention to the fact that the
standard says new-line characters will be introduced as a substitution
for "end-of-line indicators", but carefully does NOT define what
constitutes an end-of-line indicator. After doing so, attempt to find
a single part of the standard that is violated by a phase-one mapping
such as the following:
\\$ -> <nothing>
\n[ \t\r\n]*#\(.*\)$ -> \n#\1
\n -> <nothing>
I've looked several times for a limitation on the mapping done in
phase one of translation, and I can't find one. Absent such a
limitation, I believe the mapping given above is legal. It results in
each preprocessor line being a line by itself, and virtually
everything else appearing as one really LONG line.
I think that third line had better be
\n -> <space>