Order of addresses of string-literals inside a translation unit

Bonita Montero

unread,

Feb 22, 2023, 8:30:53 AM2/22/23

to

Is there a mandated order of addresses of string-literals inside a
translation unit ? Or is there only the known requirement of order
of initialization inside a translation unit of objects with a con-
structor; because for that it wouldn't make a difference if the
order of addresses would be different from the order of initiali-
zation.

Richard Damon

unread,

Feb 22, 2023, 8:35:15 PM2/22/23

to

As far as I know, there is NO promises about the order in memory of
objects declared in a program. The order of construction within a
translation unit is specified, but not their physical order.

Sub-object within a large object (like array elements in an array, or
members within a class/struct have some specification).

The compiler is allowed to optimize string literals and thus put them in
a different order.

James Kuyper

unread,

Feb 23, 2023, 12:00:56 AM2/23/23

to

Note, in particular, that because writing to the memory allocated to
store the string corresponding to a string literal has undefined
behavior, it's entirely permissible for an implementation to make all
occurrences of "Hello, world!" in a given program point at the same
location in memory, and to make "Hello, world!" + 7 == "world!". Both
optimizations have actually been implemented by many implementations.

Öö Tiib

unread,

Feb 23, 2023, 3:00:06 AM2/23/23

to

By standard two string literals are two unrelated constant character
arrays. Standard does not guarantee if addresses of two string literals
compare greater, less or equal with each other (regardless of contents
or if literals are from same or different translation unit) in any
deterministic or consistent manner.

Actual compilers do not use so lot of freedom. What those may
do is practical ... optimise unused string literals out, treat equal
string literals of different translation units as one, use end part of
string literal "hello word" of one translation unit as string literal
"word" of other translation unit and the like. That makes
location of the literals impossible to predict.

What we have strong guarantees for are only that the C++14 added
specialisations of std::greater, std::less, std::less_equal and
std::greater_equal that give consistent strict total ordering to
pointers (so also to pointers of string literals). Therefore good
code should use those to compare pointers to unrelated objects.

Scott Lurndal

unread,

Feb 23, 2023, 11:18:16 AM2/23/23

to

=?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> writes:
>On Wednesday, 22 February 2023 at 15:30:53 UTC+2, Bonita Montero wrote:
>> Is there a mandated order of addresses of string-literals inside a
>> translation unit ? Or is there only the known requirement of order
>> of initialization inside a translation unit of objects with a con-
>> structor; because for that it wouldn't make a difference if the
>> order of addresses would be different from the order of initiali-
>> zation.
>
>By standard two string literals are two unrelated constant character
>arrays. Standard does not guarantee if addresses of two string literals
>compare greater, less or equal with each other (regardless of contents
>or if literals are from same or different translation unit) in any
>deterministic or consistent manner.
>
>Actual compilers do not use so lot of freedom. What those may
>do is practical ... optimise unused string literals out, treat equal
>string literals of different translation units as one, use end part of
>string literal "hello word" of one translation unit as string literal
>"word" of other translation unit and the like. That makes
>location of the literals impossible to predict.

If you do need to be able to predict them, most compilers and linkers
offer impdef facilities to support that need (e.g. gcc allows the 'section'
attribute to be associated with a definition, and the linker can be
told to position that section at a certain point in the program address
space).

Keith Thompson

unread,

Feb 23, 2023, 4:52:34 PM2/23/23

to

That's true, but not just because string literals are read-only (and
const, unlike in C).

The standard *could* have required that two occurrences of "hello" in
the same program must refer to distinct objects with unequal addresses
(and code could have taken advantage of that guarantee). But in fact it
explicitly says that it's unspecified.

C++17 5.13.5p16 [lex.string]:

Evaluating a string-literal results in a string literal object
with static storage duration, initialized from the given
characters as specified above. Whether all string literals are
distinct (that is, are stored in nonoverlapping objects) and
whether successive evaluations of a string-literal yield the
same or a different object is unspecified. [ Note: The effect
of attempting to modify a string literal is undefined. — end
note ]

I hadn't realized until now that the standard allows two evaluations of
the same string literal to refer to distinct objects (and it's difficult
to imagine an implementation choice that would make them distinct).

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */

Tim Rentsch

unread,

Feb 24, 2023, 11:55:33 AM2/24/23

to

James Kuyper <james...@alumni.caltech.edu> writes:

> Note, in particular, that because writing to the memory allocated to
> store the string corresponding to a string literal has undefined
> behavior, it's entirely permissible for an implementation to make all
> occurrences of "Hello, world!" in a given program point at the same
> location in memory, and to make "Hello, world!" + 7 == "world!". Both
> optimizations have actually been implemented by many implementations.

It's true that two or more string literals may have some bytes in
common, but that's not because writing to a byte in a string
literal has undefined behavior; it's because there is an explicit
statement in the C++ standard (and also the C standard) that allows
it. If anything the implication goes the other direction: because
pre-standard C implementations stored multiple string literals in
the same memory, when C was standardized it was pretty much a
necessity that storing into the bytes of a string literal had to
be undefined behavior.

Nitpick: a string literal need not be a string (as the C standard
points out, in a footnote IIRC).

Tim Rentsch

unread,

Feb 24, 2023, 12:04:05 PM2/24/23

to

Keith Thompson <Keith.S.T...@gmail.com> writes:

[string literals may overlap]

> The standard *could* have required that two occurrences of "hello" in
> the same program must refer to distinct objects with unequal addresses
> (and code could have taken advantage of that guarantee). But in fact it
> explicitly says that it's unspecified.
>
> C++17 5.13.5p16 [lex.string]:
>
> Evaluating a string-literal results in a string literal object
> with static storage duration, initialized from the given
> characters as specified above. Whether all string literals are
> distinct (that is, are stored in nonoverlapping objects) and
> whether successive evaluations of a string-literal yield the
> same or a different object is unspecified. [ Note: The effect

> of attempting to modify a string literal is undefined. ? end

> note ]
>
> I hadn't realized until now that the standard allows two evaluations of
> the same string literal to refer to distinct objects (and it's difficult
> to imagine an implementation choice that would make them distinct).

IIANM the statement about two evaluations is present in C++
but not in C.

james...@alumni.caltech.edu

unread,

Feb 24, 2023, 12:43:32 PM2/24/23

to

True. But the fact that storing into the bytes of a string literal has undefined
behavior is what makes storing multiple string literals in overlapping memory
workable. We're really saying the same thing, from two different points of view.

> Nitpick: a string literal need not be a string (as the C standard
> points out, in a footnote IIRC).

I agree, but not, I suspect, in the sense that you mean it. A string literal is a
source-code feature. The corresponding string that I was referring to is
created at run-time, so they can't be the same thing.
That array is guaranteed to be null-terminated, and therefore always
contains at least one string, which might be empty (as in ""). The value of a
string literal not used to initialize an array is a pointer to the first element of
that array, which is also necessarily the start of the first (and possibly only,
possibly empty) string contained in that array. It is that string which I was
referring to when I mentioned the "corresponding string".

Tim Rentsch

unread,

Mar 14, 2023, 11:48:29 AM3/14/23

to

"james...@alumni.caltech.edu" <james...@alumni.caltech.edu> writes:

> On Friday, February 24, 2023 at 11:55:33?AM UTC-5, Tim Rentsch wrote:
>
>> James Kuyper <james...@alumni.caltech.edu> writes:
>>
>>> Note, in particular, that because writing to the memory allocated to
>>> store the string corresponding to a string literal has undefined
>>> behavior, it's entirely permissible for an implementation to make all
>>> occurrences of "Hello, world!" in a given program point at the same
>>> location in memory, and to make "Hello, world!" + 7 == "world!". Both
>>> optimizations have actually been implemented by many implementations.
>>
>> It's true that two or more string literals may have some bytes in
>> common, but that's not because writing to a byte in a string
>> literal has undefined behavior; it's because there is an explicit
>> statement in the C++ standard (and also the C standard) that allows
>> it. If anything the implication goes the other direction: because
>> pre-standard C implementations stored multiple string literals in
>> the same memory, when C was standardized it was pretty much a
>> necessity that storing into the bytes of a string literal had to
>> be undefined behavior.
>
> True. But the fact that storing into the bytes of a string literal
> has undefined behavior is what makes storing multiple string
> literals in overlapping memory workable. We're really saying the
> same thing, from two different points of view.

Your earlier statement has causality going in one direction, and
mine has it going in the opposite direction. I'm hard pressed to
see how those two views can be thought of as the same thing.