On the type of string literal

Isaac Chen

unread,

Jul 16, 1999, 3:00:00 AM7/16/99

to

Hi,

The type of string literal is 'array of n char', but its intended
use is as if it's an 'array of n const char' because the result of
modifying it is undefined.

The Rational says:
... allows implementations to share copies of strings with
identical text, to place string literals in read-only memory,
and perform certain optimizations.

Why didn't the Standard treat it as 'char *' and let those who
need such optimization indicate so like this:

const char *pcc = "string"; /* in ROM */
char *pc = "string"; /* in R/W memory */

int read_it(const char *), update_it(char *);
read_it("string"); /* in ROM */
update_it("string"); /* in R/W memory */

If one uses all the string literal as 'const char *' and need such
optimization but doesn't like to add all those 'const ...', a proper
compiler switch will do. Even if no such switch exists in some
compilers, the program would still work.

The reverse is not true in C89. If one uses string literals
as 'char *', he'll get undefined result!

Isaac Chen

Dennis Ritchie

unread,

Jul 16, 1999, 3:00:00 AM7/16/99

to

Isaac Chen asked (with some reordering and redaction by me):

> The type of string literal is 'array of n char', but its intended
> use is as if it's an 'array of n const char' because the result of
> modifying it is undefined.

> ...

> Why didn't the Standard treat it as 'char *' and let those who
> need such optimization indicate so like this:
>
> const char *pcc = "string"; /* in ROM */
> char *pc = "string"; /* in R/W memory */

> If one uses all the string literal as 'const char *' and need such

> optimization but doesn't like to add all those 'const ...', a proper
> compiler switch will do. Even if no such switch exists in some
> compilers, the program would still work.

The early history is that C as described in K&R I, following its
predecessor languages, were unambiguous and explicit in describing
string literals as anonymous, static arrays of characters that
were initialized with the characters; some early routines like
mktemp() explicitly invited overwriting of the characters in
a literal string passed as an argument.

Later, it was realized that this was not necessarily a good idea
for a variety of reasons, even though it is utterly simple to
say and to describe:

- As a general matter, it just seems pretty unclean. In some
sense, the appearance of "abcd" in a program looks sort of
like a genuine constant. Maybe one should think of
char p = "abcd";
p[2] = 'X';
as just like
i = 1;
...
i = 2;
but somehow it feels different.

- As a practical matter, particularly in memory-constricted
enviroments,
people wanted to collect string literals and put them in ROM or
shared
memory-protected storage. If the language rules permit overwriting,
this can't be done except by agree-upon convention (which would have
to be outside the language definition).

The ANSI committee that did C89 wanted (for a variety of reasons) to add
the notion of "const" as a type qualifier, basically to announce
that some objects could (if desired) be put in ROM, which would aid
a variety of optimizations and possibilities for verification. The most
natural idea was to say that string literals, instead of having type
static char[]
instead were
static const char[]

The problem was that the rules about conversion during assignment
(including
passing as function arguments) of pointers to const-qualified things
into not-const-qualified things meant that practically no program
in existence could avoid a mandatory diagnostic about a constraint
violation if string literals suddenly became 'const';
upwards compatibility was needed. It was not good if
yesterday you declared a function argument as 'char *' but today
you are required to say 'const char *' as you hand the function
a string literal.

The not-completely-happy result is the current rule set, which says that
even though string literals can't (under the standard) be written
into, they don't have the const qualification attached. Things
would be easier to describe if they did (or, for that matter, if there
were no 'const'). But that's the way things work out.

Dennis

Nick Maclaren

unread,

Jul 16, 1999, 3:00:00 AM7/16/99

to

In article <378ED5...@bell-labs.com>, Dennis Ritchie <d...@bell-labs.com> writes:
|>
|> The early history is that C as described in K&R I, following its
|> predecessor languages, were unambiguous and explicit in describing
|> string literals as anonymous, static arrays of characters that
|> were initialized with the characters; some early routines like
|> mktemp() explicitly invited overwriting of the characters in
|> a literal string passed as an argument.

This is not strictly relevant, but we used BCPL heavily for writing
utilities under MVT, and it had those semantics. Now, the program
and constant area was read/write for all user code under MVT (don't
ask), but could be read-only for some ways of installing system
programs. This meant that programs passed all tests, only to fail
when installed :-)

At one stage, the compiler was updated to allow string sharing, and
some checks were done on how many programs were affected. The
answer was very few, and most of the string updates were actually
bugs! Only one program was seriously affected.

I don't know if the early C programs had the same properties but,
from the above posting and the way C89 developed, I suspect that
they did.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QG, England.
Email: nm...@cam.ac.uk
Tel.: +44 1223 334761 Fax: +44 1223 334679

Dennis Ritchie

unread,

Jul 16, 1999, 3:00:00 AM7/16/99

to

Nick Maclaren wrote:

> I don't know if the early C programs had the same properties

> [of not in fact writing into strings though it was permissible] but,

> from the above posting and the way C89 developed, I suspect that
> they did.
>

This is true (mktemp excepted), and it's why C89 was able
to decide as it did. My mild discomfort with the situation
was (is) just that they weren't able to use the notion of
const in the formulation of the rule, not a desire to
keep writing over strings.

Dennis

Douglas A. Gwyn

unread,

Jul 17, 1999, 3:00:00 AM7/17/99

to

I recall some early versions of FORTRAN where
CALL SUB(1)
I = 1
PRINT 100 I
100 FORMAT I5
END
SUBROUTINE SUB(J)
J = 2
RETURN
END
would print
2
instead of
1
Very exciting source of bugs.

Nick Maclaren

unread,

Jul 17, 1999, 3:00:00 AM7/17/99

to

In article <378FFFD5...@null.net>,

And some current ones. But that was not, and never has been, permitted
by the language.

Kai Henningsen

unread,

Aug 1, 1999, 3:00:00 AM8/1/99

to

nm...@cus.cam.ac.uk (Nick Maclaren) wrote on 17.07.99 in <7mplmg$jcm$1...@pegasus.csx.cam.ac.uk>:

> In article <378FFFD5...@null.net>,
> Douglas A. Gwyn <DAG...@null.net> wrote:
> >I recall some early versions of FORTRAN where
> > CALL SUB(1)
> > I = 1
> > PRINT 100 I
> > 100 FORMAT I5
> > END
> > SUBROUTINE SUB(J)
> > J = 2
> > RETURN
> > END
> >would print
> > 2
> >instead of
> > 1
> >Very exciting source of bugs.
>
> And some current ones. But that was not, and never has been, permitted
> by the language.

And I recall an incident when, after changing to an OS with memory
protection (Unix), code like this broke, and the user complained.

The problem was explained, and he was encouraged to at least check some of
the results given by the buggy program. He refused, on the grounds that he
wouldn't have time to redo the calculations, so he'd rather not know if
his data (analysing astronomical plates) was worthless.

Kai
--
http://www.westfalen.de/private/khms/
"... by God I *KNOW* what this network is for, and you can't have it."
- Russ Allbery (r...@stanford.edu)