Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Adjacent string literals

248 views
Skip to first unread message

James Kuyper

unread,
Jan 25, 2021, 10:15:28 AM1/25/21
to
I learned a couple of decades ago that adjacent string literals get
concatenated into a single longer literal, even if separated by
arbitrarily large amounts of white-space.

Yesterday I happened to notice that translation phase 6 says only that
"Adjacent string literal tokens are concatenated.", without saying
anything about white-space. White-space doesn't lose it's significance
until translation phase 7. Therefore, string literals that are separated
by white-space do not qualify as adjacent. There's also no mention of
white-space in the fuller discussion that occurs in 6.4.5p5.

Am I missing something obvious here? I can imagine someone telling me
that "adjacent" should be understood as "adjacent, ignoring white-space"
- but that doesn't seem obvious to me. It also sounds vaguely familiar,
like I've had this discussion with someone before, but I can't locate
the discussion. Every example of adjacent string literals that appears
in the standard has at least one white-space character separating them,
so the intent is crystal-clear, but the wording doesn't clearly say so.

If the phrase "White-space characters separating tokens are no longer
significant." were moved from the beginning of the description of phase
7 to the beginning of the description phase 6, it would make the
insignificance of white space separating string literals perfectly
clear, and as far as I can see, would have no other effect

Ben Bacarisse

unread,
Jan 26, 2021, 7:22:54 AM1/26/21
to
James Kuyper <james...@alumni.caltech.edu> writes:

> I learned a couple of decades ago that adjacent string literals get
> concatenated into a single longer literal, even if separated by
> arbitrarily large amounts of white-space.
>
> Yesterday I happened to notice that translation phase 6 says only that
> "Adjacent string literal tokens are concatenated.", without saying
> anything about white-space. White-space doesn't lose it's significance
> until translation phase 7. Therefore, string literals that are separated
> by white-space do not qualify as adjacent. There's also no mention of
> white-space in the fuller discussion that occurs in 6.4.5p5.
>
> Am I missing something obvious here? I can imagine someone telling me
> that "adjacent" should be understood as "adjacent, ignoring white-space"
> - but that doesn't seem obvious to me.

Surely it just means "next to", and in the sequence of tokens "a" "b"
the two are next to each other. It happens that string literal tokens
are such that they can be adjacent without having any white-space
between then, but I suspect that's making you over-think the meaning.
Would you say that 'long int x' has no tokens adjacent to any others?

--
Ben.

Jakob Bohm

unread,
Jan 26, 2021, 7:48:28 AM1/26/21
to
The interesting situation is cases like these:

"a" /* Long comment explaining why b is the next byte */ "b"

And

#define LEAD_BYTE "a"
#define TRAIL_BYTE "b"

LEAD_BYTE TRAIL_BYTE

Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

Richard Damon

unread,
Jan 26, 2021, 7:52:45 AM1/26/21
to
I'm not sure, but 6.4p3 it says

As described in 6.10, in certain circumstances during translation phase
4, white space (or the absence thereof) serves as more than
preprocessing token separation.

which seems to imply that for most purposes (unless expressly stated)
white-space between tokens is generally insignificant. There are cases
where it matters, like the difference between

#define macro(x) (x)
and
#define macro (x) (x)

but these cases explicitly talk about the white-space affecting the
meaning. This would seem to at least imply that it is to be ignored
elsewhere, and thus the white-space between literals doesn't mean they
aren't adjacent.

It would seem that the removal of the possible significance could have
been moved up earlier (but has to be after phase 4 since that has an
explicit use of white-space), as far as I can see, phases 5 and 6 don't
need the white-space significance, but maybe the fact that phase 7 also
converts processor tokens into token says that we want to handle all the
string literal stuff before doing that.

James Kuyper

unread,
Jan 26, 2021, 9:29:35 AM1/26/21
to
On 1/26/21 7:22 AM, Ben Bacarisse wrote:
No, I would not - and that's precisely because "long int x" is not
parsed as a declaration until translation phase 7, and the very first
sentence of the description of that phase says "White-space characters
separating tokens are no longer significant.". Phase 6 occurs before
that sentence applies, which is precisely my point.

Keith Thompson

unread,
Jan 26, 2021, 4:05:47 PM1/26/21
to
Sorry, but those cases aren't particularly interesting. Comments are
replaced by spaces in translation phase 3, and macros are expanded in
phase 4. Adjacent string literals are concatenated in phase 6.

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

Ben Bacarisse

unread,
Jan 26, 2021, 4:40:22 PM1/26/21
to
Jakob Bohm <jb-u...@wisemo.com.invalid> writes:

> On 2021-01-26 13:22, Ben Bacarisse wrote:
>> James Kuyper <james...@alumni.caltech.edu> writes:
>>
>>> I learned a couple of decades ago that adjacent string literals get
>>> concatenated into a single longer literal, even if separated by
>>> arbitrarily large amounts of white-space.
>>>
>>> Yesterday I happened to notice that translation phase 6 says only that
>>> "Adjacent string literal tokens are concatenated.", without saying
>>> anything about white-space. White-space doesn't lose it's significance
>>> until translation phase 7. Therefore, string literals that are separated
>>> by white-space do not qualify as adjacent. There's also no mention of
>>> white-space in the fuller discussion that occurs in 6.4.5p5.
>>>
>>> Am I missing something obvious here? I can imagine someone telling me
>>> that "adjacent" should be understood as "adjacent, ignoring white-space"
>>> - but that doesn't seem obvious to me.
>>
>> Surely it just means "next to", and in the sequence of tokens "a" "b"
>> the two are next to each other. It happens that string literal tokens
>> are such that they can be adjacent without having any white-space
>> between then, but I suspect that's making you over-think the meaning.
>> Would you say that 'long int x' has no tokens adjacent to any others?
>>
>
> The interesting situation is cases like these:
>
> "a" /* Long comment explaining why b is the next byte */ "b"

By translation phase 6 (when adjacent string literals are concatenated)
this has become

"a" "b"

> And
>
> #define LEAD_BYTE "a"
> #define TRAIL_BYTE "b"
>
> LEAD_BYTE TRAIL_BYTE

And this has become

"a" "b"

Am I missing some ambiguity?

--
Ben.

Ben Bacarisse

unread,
Jan 26, 2021, 4:46:13 PM1/26/21
to
I meant at the stage you were asking about: phase 6. The example was an
attempt to find out if your reluctance to see "a" "b" as being adjacent
was in part due to do with the fact that they could have been written
with no spaces.

I think your answer makes it clear that, at phase 6, you think that
there are no two tokens adjacent to one another. I find that a rather
artificial reading.

--
Ben.

James Kuyper

unread,
Jan 26, 2021, 6:28:24 PM1/26/21
to
On 1/26/21 4:46 PM, Ben Bacarisse wrote:
> James Kuyper <james...@alumni.caltech.edu> writes:
>
>> On 1/26/21 7:22 AM, Ben Bacarisse wrote:
...
>> No, I would not - and that's precisely because "long int x" is not
>> parsed as a declaration until translation phase 7, and the very first
>> sentence of the description of that phase says "White-space characters
>> separating tokens are no longer significant.". Phase 6 occurs before
>> that sentence applies, which is precisely my point.
>
> I meant at the stage you were asking about: phase 6. The example was an
> attempt to find out if your reluctance to see "a" "b" as being adjacent
> was in part due to do with the fact that they could have been written
> with no spaces.

Yes, it is. In "a""b", the two tokens are adjacent. In "a" "b", they are
not, because both are adjacent to some white-space instead. I'm not
suggesting that the committee intended to prohibit white space between
the tokens, merely that wording chosen doesn't clearly allow it.

> I think your answer makes it clear that, at phase 6, you think that
> there are no two tokens adjacent to one another. I find that a rather
> artificial reading.

If they had used the term "consecutive", I could have seen that as a
reasonable interpretation. "a" is one token, and "b" is the next token,
even though they are separated by something, because that something
isn't a token.

Ben Bacarisse

unread,
Jan 26, 2021, 8:16:16 PM1/26/21
to
James Kuyper <james...@alumni.caltech.edu> writes:

> On 1/26/21 4:46 PM, Ben Bacarisse wrote:
>> James Kuyper <james...@alumni.caltech.edu> writes:
>>
>>> On 1/26/21 7:22 AM, Ben Bacarisse wrote:
> ...
>>> No, I would not - and that's precisely because "long int x" is not
>>> parsed as a declaration until translation phase 7, and the very first
>>> sentence of the description of that phase says "White-space characters
>>> separating tokens are no longer significant.". Phase 6 occurs before
>>> that sentence applies, which is precisely my point.
>>
>> I meant at the stage you were asking about: phase 6. The example was an
>> attempt to find out if your reluctance to see "a" "b" as being adjacent
>> was in part due to do with the fact that they could have been written
>> with no spaces.
>
> Yes, it is. In "a""b", the two tokens are adjacent. In "a" "b", they are
> not, because both are adjacent to some white-space instead.

Adjacent does not mean with nothing in between (thought it can, of
course). What's more, things can be adjacent to each other, and also
adjacent to something in between. I can say that there was a fire in
the house adjacent to mine. The two house are adjacent. But both are
adjacent to the lane separating them.

<cut>
--
Ben.

James Kuyper

unread,
Jan 26, 2021, 10:48:38 PM1/26/21
to
On 1/26/21 8:16 PM, Ben Bacarisse wrote:
> James Kuyper <james...@alumni.caltech.edu> writes:
...
>> Yes, it is. In "a""b", the two tokens are adjacent. In "a" "b", they are
>> not, because both are adjacent to some white-space instead.
>
> Adjacent does not mean with nothing in between (thought it can, of
> course). What's more, things can be adjacent to each other, and also
> adjacent to something in between. I can say that there was a fire in
> the house adjacent to mine. The two house are adjacent. But both are
> adjacent to the lane separating them.

It takes at least two dimensions for the issue you raise to come up. As
far as the C standard is concerned, source code is a one-dimensional
sequence of characters. It's possible to think of the text
two-dimensionally, but the standard doesn't make use of that fact in any
way that I'm aware of. I don't think anyone would suggest that two
string literals that are vertically adjacent to each other:

char first = "James";
char second = "Kuyper";

should be merged.
Even if you acknowledge only that this is one possible way of
interpreting "adjacent", that would mean the meaning is ambiguous.
Moving the first sentence of translation phase 7 to be the first
sentence of translation phase 6 would remove all ambiguity, and have, as
far as I can see, no other consequence.

Ben Bacarisse

unread,
Jan 27, 2021, 10:46:51 AM1/27/21
to
James Kuyper <james...@alumni.caltech.edu> writes:

> On 1/26/21 8:16 PM, Ben Bacarisse wrote:
>> James Kuyper <james...@alumni.caltech.edu> writes:
> ...
>>> Yes, it is. In "a""b", the two tokens are adjacent. In "a" "b", they are
>>> not, because both are adjacent to some white-space instead.
>>
>> Adjacent does not mean with nothing in between (thought it can, of
>> course). What's more, things can be adjacent to each other, and also
>> adjacent to something in between. I can say that there was a fire in
>> the house adjacent to mine. The two house are adjacent. But both are
>> adjacent to the lane separating them.
>
> It takes at least two dimensions for the issue you raise to come up.

I don't follow. 1 and 2 are adjacent integers on the real line
(i.e. despite having other kinds of number between them). In addition,
they are both integers adjacent to 1/2.

> As
> far as the C standard is concerned, source code is a one-dimensional
> sequence of characters. It's possible to think of the text
> two-dimensionally, but the standard doesn't make use of that fact in any
> way that I'm aware of. I don't think anyone would suggest that two
> string literals that are vertically adjacent to each other:
>
> char first = "James";
> char second = "Kuyper";
>
> should be merged.
> Even if you acknowledge only that this is one possible way of
> interpreting "adjacent", that would mean the meaning is ambiguous.

Lots of words in the standard could, at a pinch, be taken to mean
something other than what is obviously intended. But if you think
someone might read about phase 6 and think that "a""b" will be
concatenated but not "a" "b", then you should file a defect report.

> Moving the first sentence of translation phase 7 to be the first
> sentence of translation phase 6 would remove all ambiguity, and have, as
> far as I can see, no other consequence.

I think the strongest case for the possibility of misunderstanding comes
from this sentence being where it is. I don't see any problem with the
word "adjacent", but I can imagine someone wondering why this sentence
is where it is if not to do what you are suggesting.

--
Ben.

James Kuyper

unread,
Jan 27, 2021, 11:20:47 AM1/27/21
to
On 1/27/21 10:46 AM, Ben Bacarisse wrote:
> James Kuyper <james...@alumni.caltech.edu> writes:
>
>> On 1/26/21 8:16 PM, Ben Bacarisse wrote:
>>> James Kuyper <james...@alumni.caltech.edu> writes:
>> ...
>>>> Yes, it is. In "a""b", the two tokens are adjacent. In "a" "b", they are
>>>> not, because both are adjacent to some white-space instead.
>>>
>>> Adjacent does not mean with nothing in between (thought it can, of
>>> course). What's more, things can be adjacent to each other, and also
>>> adjacent to something in between. I can say that there was a fire in
>>> the house adjacent to mine. The two house are adjacent. But both are
>>> adjacent to the lane separating them.
>>
>> It takes at least two dimensions for the issue you raise to come up.
>
> I don't follow. 1 and 2 are adjacent integers on the real line
> (i.e. despite having other kinds of number between them). In addition,
> they are both integers adjacent to 1/2.

I'm not familiar with any meaning that could reasonably be attached to
"adjacent" which would make either of those statements true. In the
future, I will try to remember that there's at least one person who does
attach such a meaning to that word - but it would make it easier for me
to understand how you could say such a thing if you would specify that
definition.

When using a meaning that allows 1 and 2 to be both adjacent to 1/2,
while also being adjacent to each other, how do you interpret "adjacent
string literal" so that it doesn't apply to

ptrdiff_t d = "Ben"-"Bacarisse";

It seems to me that, despite having no idea how you could possibly mean
what you seem to have said, I can make a direct analogy, matching 1 with
"Ben", 1/2 with '-', and 2 with "Bacarisse". So, how does that analogy
break down? Or are you claiming that they should be concatenated?

...
>> Moving the first sentence of translation phase 7 to be the first
>> sentence of translation phase 6 would remove all ambiguity, and have, as
>> far as I can see, no other consequence.
>
> I think the strongest case for the possibility of misunderstanding comes
> from this sentence being where it is. I don't see any problem with the
> word "adjacent", but I can imagine someone wondering why this sentence
> is where it is if not to do what you are suggesting.

I think you just agreed with me, but you didn't quite say so directly.

Ben Bacarisse

unread,
Jan 27, 2021, 10:05:28 PM1/27/21
to
James Kuyper <james...@alumni.caltech.edu> writes:

> On 1/27/21 10:46 AM, Ben Bacarisse wrote:
>> James Kuyper <james...@alumni.caltech.edu> writes:
>>
>>> On 1/26/21 8:16 PM, Ben Bacarisse wrote:
>>>> James Kuyper <james...@alumni.caltech.edu> writes:
>>> ...
>>>>> Yes, it is. In "a""b", the two tokens are adjacent. In "a" "b", they are
>>>>> not, because both are adjacent to some white-space instead.
>>>>
>>>> Adjacent does not mean with nothing in between (thought it can, of
>>>> course). What's more, things can be adjacent to each other, and also
>>>> adjacent to something in between. I can say that there was a fire in
>>>> the house adjacent to mine. The two house are adjacent. But both are
>>>> adjacent to the lane separating them.
>>>
>>> It takes at least two dimensions for the issue you raise to come up.
>>
>> I don't follow. 1 and 2 are adjacent integers on the real line
>> (i.e. despite having other kinds of number between them). In addition,
>> they are both integers adjacent to 1/2.
>
> I'm not familiar with any meaning that could reasonably be attached to
> "adjacent" which would make either of those statements true.

That's and interesting view, but probably so off-topic that it would not be
reasonable to investigate it here.

> In the future, I will try to remember that there's at least one person
> who does attach such a meaning to that word - but it would make it
> easier for me to understand how you could say such a thing if you
> would specify that definition.

I am not a lexicographer, and not skilled at writing definitions. So I
looked in the two dictionaries on the shelf here. The OED says:

"Lying near to; adjoining; bordering. (Not necessarily touching.)"

and Collins says

"being near or close, esp. having a common boundary; adjoining;
contiguous."

These are pretty close to what I feel the word means.

For comparison, what is your understanding of the word?

> When using a meaning that allows 1 and 2 to be both adjacent to 1/2,
> while also being adjacent to each other, how do you interpret "adjacent
> string literal" so that it doesn't apply to
>
> ptrdiff_t d = "Ben"-"Bacarisse";
>
> It seems to me that, despite having no idea how you could possibly mean
> what you seem to have said, I can make a direct analogy, matching 1 with
> "Ben", 1/2 with '-', and 2 with "Bacarisse". So, how does that analogy
> break down? Or are you claiming that they should be concatenated?

It depends on what is the considered significant and what is merely a
separator or common boundary.

On the number line, we can stress what we want to focus on. "Adjacent
/integers/" relegates everything else to being a mere separating
boundary.

So, to push the point to the edge of reason, if I choose to read the key
sentence as "Adjacent /string literal/ tokens are concatenated", I
could, at a pinch, make the case that "Ben" and "Bacarisse" are, in your
example, adjacent. The context would have to be such that considering
another token as a mere boundary or separator would be reasonable. The
C standard is not such a context.

But if I read it as "Adjacent string literal /tokens/ are concatenated",
then the intervening token stops them being adjacent. When tokenising a
character stream, all the tokens matter, so I believe there is only one
reasonable way to read that sentence.

> ...
>>> Moving the first sentence of translation phase 7 to be the first
>>> sentence of translation phase 6 would remove all ambiguity, and have, as
>>> far as I can see, no other consequence.
>>
>> I think the strongest case for the possibility of misunderstanding comes
>> from this sentence being where it is. I don't see any problem with the
>> word "adjacent", but I can imagine someone wondering why this sentence
>> is where it is if not to do what you are suggesting.
>
> I think you just agreed with me, but you didn't quite say so directly.

Agreement is not binary. I don't find your argument based on what
adjacent means to be compelling, but I agree that the presence of that
sentence one phase too late muddies the waters a bit.

I've tried to express the extent and the nature of my agreement (and
disagreement) as directly as I can. I'm sorry if you think I have been
oblique.

TL;DR: The fact that adjacent means something in the cluster of ideas
around "being near to" and "having a common boundary, but not
necessarily touching" means that I don't think there is any problem with
"a" "b" being described as adjacent string literal tokens.

--
Ben.

Jakob Bohm

unread,
Jan 28, 2021, 3:53:36 AM1/28/21
to
Sorry, but I couldn't easily find the definition of the translation
phases, only scattered mentions of "phase 6" and "phase 7", so I had to
guess which practically related language features were buried in that
distinction.

James Kuyper

unread,
Jan 28, 2021, 5:45:50 AM1/28/21
to
"5.1.1.2 Translation Phases
The precedence among the syntax rules of translation is specified by the
following
phases. 6)
1. Physical source file multibyte characters are mapped, in an
implementation- defined manner, to the source character set (introducing
new-line characters for end-of-line indicators) if necessary. Trigraph
sequences are replaced by corresponding single-character internal
representations.
2. Each instance of a backslash character (\) immediately followed by a
new-line character is deleted, splicing physical source lines to form
logical source lines. Only the last backslash on any physical source
line shall be eligible for being part of such a splice. A source file
that is not empty shall end in a new-line character, which shall not be
immediately preceded by a backslash character before any such splicing
takes place.
3. The source file is decomposed into preprocessing tokens 7) and
sequences of white-space characters (including comments). A source file
shall not end in a partial preprocessing token or in a partial comment.
Each comment is replaced by one space character. New-line characters are
retained. Whether each nonempty sequence of white-space characters other
than new-line is retained or replaced by one space character is
implementation-defined.
4. Preprocessing directives are executed, macro invocations are
expanded, and _Pragma unary operator expressions are executed. If a
character sequence that matches the syntax of a universal character name
is produced by token concatenation (6.10.3.3), the behavior is
undefined. A #include preprocessing directive causes the named header or
source file to be processed from phase 1 through phase 4, recursively.
All preprocessing directives are then deleted.
5. Each source character set member and escape sequence in character
constants and string literals is converted to the corresponding member
of the execution character set; if there is no corresponding member, it
is converted to an implementation-defined member other than the null
(wide) character. 8)
6. Adjacent string literal tokens are concatenated.
7. White-space characters separating tokens are no longer significant.
Each preprocessing token is converted into a token. The resulting tokens
are syntactically and semantically analyzed and translated as a
translation unit.
8. All external object and function references are resolved. Library
components are linked to satisfy external references to functions and
objects not defined in the current translation. All such translator
output is collected into a program image which contains information
needed for execution in its execution environment."

The referenced footnotes are:
"6) Implementations shall behave as if these separate phases occur, even
though many are typically folded together in practice. Source files,
translation units, and translated translation units need not necessarily
be stored as files, nor need there be any one-to-one correspondence
between these entities and any external representation. The description
is conceptual only, and does not specify any particular implementation.
7) As described in 6.4, the process of dividing a source file’s
characters into preprocessing tokens is context-dependent. For example,
see the handling of < within a #include preprocessing directive.
8) An implementation need not convert all non-corresponding source
characters to the same execution character."

Tim Rentsch

unread,
Jul 10, 2021, 11:49:09 AM7/10/21
to
The word "adjacent" doesn't alway mean touching. There is another
word for that, the word "adjoining". Booking a hotel reservation
for adjacent rooms is not the same as a reservation for adjoining
rooms.

Keith Thompson

unread,
Jul 10, 2021, 5:59:03 PM7/10/21
to
Tim Rentsch <tr.1...@z991.linuxsc.com> writes:
> James Kuyper <james...@alumni.caltech.edu> writes:
[...]
>> If the phrase "White-space characters separating tokens are no longer
>> significant." were moved from the beginning of the description of phase
>> 7 to the beginning of the description phase 6, it would make the
>> insignificance of white space separating string literals perfectly
>> clear, and as far as I can see, would have no other effect
>
> The word "adjacent" doesn't alway mean touching. There is another
> word for that, the word "adjoining". Booking a hotel reservation
> for adjacent rooms is not the same as a reservation for adjoining
> rooms.

That's not entirely clear. dictionary.com (not a definitive reference
but a convenient one) shows "adjoining" as one of the definitions of
"adjacent".

If I understand you correctly, if rooms 110 and 112 share a common wall,
perhaps with a door going between them, they're both adjacent and
adjoining, but if instead they're on opposide sides of the elevator
they're adjacent but not adjoining. Is that what you meant? I'm not
sure I'd call them "adjacent" in that case.

A footnote on "Adjacent string literals are concatenated" saying that
two string literals are adjacent if they're adjoining or separated only
by white-space characters would clear this up. Moving "White-space
characters separating tokens are no longer significant." from the
beginning of phase 7 to the beginning of phase 6 would also be a good
solution.

But given the clear examples, I wouldn't object to leaving it as it is.

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips

James Kuyper

unread,
Jul 11, 2021, 2:41:50 PM7/11/21
to
But, if it doesn't mean "touching", what does it mean? If a blank space
doesn't prevent them from being adjacent, what does? How do you
draw the line between things that do prevent two string literals from
being adjacent, and things that don't? And - most importantly, where
in the actual text of the standard does it clearly make that distinction?
I contend that it doesn't clearly make that distinction anywhere, but
that moving the sentence "White-space characters separating
tokens are no longer significant." From the beginning of phase 7 to
the beginning of phase 6 would remove all ambiguity, making the text
match the way all real world implementations actually handle this
issue, and would have no other effect. Do you disagree? If so, with
which part of what I just said, and for what reason?

Tim Rentsch

unread,
Jul 22, 2021, 1:29:37 PM7/22/21
to
Keith Thompson <Keith.S.T...@gmail.com> writes:

> Tim Rentsch <tr.1...@z991.linuxsc.com> writes:
>
>> James Kuyper <james...@alumni.caltech.edu> writes:
>
> [...]
>
>>> If the phrase "White-space characters separating tokens are no longer
>>> significant." were moved from the beginning of the description of phase
>>> 7 to the beginning of the description phase 6, it would make the
>>> insignificance of white space separating string literals perfectly
>>> clear, and as far as I can see, would have no other effect
>>
>> The word "adjacent" doesn't alway mean touching. There is another
>> word for that, the word "adjoining". Booking a hotel reservation
>> for adjacent rooms is not the same as a reservation for adjoining
>> rooms.
>
> That's not entirely clear. dictionary.com (not a definitive reference
> but a convenient one) shows "adjoining" as one of the definitions of
> "adjacent".

That's consistent with what I said: "adjoining" being only one
of the definitions is consistent with saying "adjacent" doesn't
_always_ mean touching. Words in English can be ambiguous in
their meanings.

> If I understand you correctly, if rooms 110 and 112 share a common wall,
> perhaps with a door going between them, they're both adjacent and
> adjoining,

In the case of hotels I think "adjoining" always means connected,
either with or perhaps without a door, but yes.

> but if instead they're on opposide sides of the elevator
> they're adjacent but not adjoining. Is that what you meant? I'm not
> sure I'd call them "adjacent" in that case.

A better example is a small utility closet rather than an
elevator. "Adjacent" usually implies "closeness" even if
it doesn't always mean touching, and two rooms with a bank
of four elevators between them would for most people not
be considered adjacent, I think. In the case of hotel
rooms at least it's a matter of degree.

Another example is two rooms having the same latitude and
longitude, but on different (consecutive) floors. I think most
people wouldn't call those rooms "adjacent". However, if there
is a connecting stairway between them, a hotel might very well
offer them as "adjoining rooms".

> A footnote on "Adjacent string literals are concatenated" saying that
> two string literals are adjacent if they're adjoining or separated only
> by white-space characters would clear this up. Moving "White-space
> characters separating tokens are no longer significant." from the
> beginning of phase 7 to the beginning of phase 6 would also be a good
> solution.
>
> But given the clear examples, I wouldn't object to leaving it as it is.

Given that the wording lasted more than 30 years without anyone
even noticing a problem, I think the case for leaving it alone
is decidedly stronger than the case for making a change.

Tim Rentsch

unread,
Jul 22, 2021, 6:26:22 PM7/22/21
to
In hotels, normally it means on the same floor and with no
intervening rooms or other major building structures (but small
things like utility closets don't count). In a country inn where
there are standalone cottages rather than rooms, two cottages
would normally be called adjacent if there were no other cottages
in between, and the cottages in question were not inordinately far
apart.

In the C standard it means having no intervening tokens.

> If a blank space
> doesn't prevent them from being adjacent, what does?

Another token (not a string literal token, presumably, but only
because we might consider a sequence of string literal tokens
to be "adjacent tokens").

> How do you
> draw the line between things that do prevent two string literals from
> being adjacent, and things that don't?

In the text of the C standard, the word "adjacent" is an adjective
modifying the noun "tokens", and hence tokens are what matters.
The line is drawn by normal English usage.

> And - most importantly, where in the actual text of the standard
> does it clearly make that distinction?

That depends in part on one's notion of what it means "to clearly
make" a distinction. Speaking for myself, the combination of
"adjacent" modifying "tokens" and the examples given in 6.4.5 make
the distinction quite clearly enough.

> I contend that it doesn't clearly make that distinction anywhere,

If I may make a suggestion, how you read the C standard doesn't
match the reading mode expected by its authors. The C standard
wasn't written for a target audience of lawyers or mathematicians,
but by practical software developers expecting it would be read by
other practical software developers. The issue suggested here is
way below their radar, and indeed way below the radar of most
people who read the C standard. If no one else has noticed it in
more than 30 years, what does that say about how clear or unclear
the distinction is?

> but
> that moving the sentence "White-space characters separating
> tokens are no longer significant." From the beginning of phase 7 to
> the beginning of phase 6 would remove all ambiguity, making the text
> match the way all real world implementations actually handle this
> issue, and would have no other effect. Do you disagree?

I don't either agree or disagree, because I think the extremely
low probability of anyone being confused makes it not worth the
effort of investigating the question.

> If so, with which part of what I just said, and for what reason?

If there is something I disagree with, I think it's the idea that
attempting to "clarify" the language here will necessarily result
in a net benefit. Consider for example the C++ standard: its
authors apparently strive for exact and precise (and presumably
ambiguity free) phrasing, but the result is an unreadable mess.
To me it seems obvious that the writing in the C standard is much
closer to a good balance point between being formally exact and
being understandable. From my point of view, if writing in the C
standard (or other similar standards) isn't understandable, it's
useless, no matter how precise or exact it is. In this particular
case I would say the current wording is definitely on the right
side of the line.

James Kuyper

unread,
Jul 22, 2021, 8:29:20 PM7/22/21
to
On Thursday, July 22, 2021 at 6:26:22 PM UTC-4, Tim Rentsch wrote:
> James Kuyper <james...@alumni.caltech.edu> writes:
> > On Saturday, July 10, 2021 at 11:49:09 AM UTC-4, Tim Rentsch wrote:
...
> >> The word "adjacent" doesn't alway mean touching. There is another
> >> word for that, the word "adjoining". Booking a hotel reservation
> >> for adjacent rooms is not the same as a reservation for adjoining
> >> rooms.
> >
> > But, if it doesn't mean "touching", what does it mean?
> In hotels, normally it means on the same floor and with no
> intervening rooms or other major building structures (but small
> things like utility closets don't count). In a country inn where
> there are standalone cottages rather than rooms, two cottages
> would normally be called adjacent if there were no other cottages
> in between, and the cottages in question were not inordinately far
> apart.
>
> In the C standard it means having no intervening tokens.
> > If a blank space
> > doesn't prevent them from being adjacent, what does?
> Another token (not a string literal token, presumably, ...

I think your wording got a little confused there. In "A""B""C", the "B"
string literal token definitely does prevent the "A" and "C" string literal
tokens from being considered adjacent. An implementation would
certainly be non-conforming if it concatenated "A" directly to "C" without
first concatenating one or the other with "B".
The following wording may be intended to address that issue:

> ... but only
> because we might consider a sequence of string literal tokens
> to be "adjacent tokens").

but it's not very clear that it does. The simpler approach is to say that
the one thing that unambiguously DOES prevent two string literal tokens
from being considered adjacent is another string literal token. The only
real question is whether there's anything else that does so.

It would make much more sense for pre-processing tokens to serve as
separators, rather than tokens, since tokens don't exist yet during
translation phase 6 - they don't come into existence until they are
created by conversion from pre-processing tokens during translation
phase 7. String literals are members of both categories. header-names
are removed during translation phase 4, but all of the other differences
between pre-processing tokens and tokens remain valid during phase 6.

However, since white-space characters separating tokens supposedly
remains significant until translation phase 7, the same logic that favors
pre-processing tokens over tokens also favors including white-space
characters as separators. If they are still significant in phase 6, how are
they significant, if not as separators of string literal tokens? I don't claim
that this was the committee's intent (which is irrelevant to my mode of
reading the standard), only that it's an unintentional side effect of putting
the wording about white-space characters in the wrong translation
phase, which should be corrected.

...
> > I contend that it doesn't clearly make that distinction anywhere,
> If I may make a suggestion, how you read the C standard doesn't
> match the reading mode expected by its authors. ...

Your reading mode puts too much emphasis on guessing the intent of
the authors, and not enough on trying to write the text clearly enough to
avoid the need for such guesswork. You might be right that it is the
intended reading mode, but if so, I consider it a seriously flawed one.

...
> ... If no one else has noticed it in
> more than 30 years, what does that say about how clear or unclear
> the distinction is?

You can't be sure that no one else has noticed it, only that no one has
mentioned the issue in any forum that you monitor, during the time that
you have monitored it. Unless you're super-human, you could not have
come close to monitoring all forums where such an issue might have
been raised, for the entire 30 years that you refer to.

Tim Rentsch

unread,
Jan 17, 2022, 8:30:04 AM1/17/22
to
> [...]

Apparently you have missed the point of what I was saying. That
surprises me, because I didn't think it was difficult to
understand.


>>> I contend that it doesn't clearly make that distinction anywhere,
>>
>> If I may make a suggestion, how you read the C standard doesn't
>> match the reading mode expected by its authors. ...
>
> Your reading mode puts too much emphasis on guessing the intent of
> the authors,

It's not surprising that you think so, because that view doesn't
fit with your agenda. However, judging what meaning is intended
isn't what I'm talking about when I say "reading mode".

> and not enough on trying to write the text clearly
> enough to avoid the need for such guesswork.

That's a non-sequitur. The two views are not in opposition;
they are about different kinds of discussion regarding the C
standard. They are not mutually exclusive.

> You might be right that it is the intended reading mode, but if
> so, I consider it a seriously flawed one.

If "it" refers to "judging what meaning is intended", then "it"
is independent of "reading mode" as I am using the term. (Note
also that the word I used is "expected", and not "intended", but
that distinction is not the primary point of focus.)

Let me give an example. The C standard is not a math textbook.
Most people don't read the C standard as though it were a math
textbook. Trying to read the C standard in much the same way as
one reads a math text would be a different "reading mode" than
how most people read it. Does this example help explain what I
mean by "reading mode"?


>> ... If no one else has noticed it in
>> more than 30 years, what does that say about how clear or unclear
>> the distinction is?
>
> You can't be sure that no one else has noticed it, [...]

I never said I was. The question is not what I know but what you
know. If, as far as /you/ know, no one else has noticed the
point you brought up, then it would appear that no one else is
bothered by it. Do you know of any previous instance of someone
else bringing up this question? Or is it, to the best of your
knowledge, the case that your posting here is the first such
occurrence?
0 new messages