int * vs char *

yugandhar

unread,

Jun 21, 2011, 5:49:42 PM6/21/11

to

Hi Forum,

I have a quick question.

The following works:

char *s="hello";
*s="world";

but the following gives a segementation fault:

int *p = 0;
*p = 17;

Could anyone please clarify me about this?

yugandhar

John Gordon

unread,

Jun 21, 2011, 5:58:50 PM6/21/11

to

In <itr3lm$i8j$1...@speranza.aioe.org> yugandhar <nos...@nospam.com> writes:

> Hi Forum,

> I have a quick question.

> The following works:

> char *s="hello";
> *s="world";

It shouldn't work. A character pointer declared in that way should be
non-writable. It's an accident that it "worked" (did the wrong thing)
for you.

> but the following gives a segementation fault:

> int *p = 0;
> *p = 17;

> Could anyone please clarify me about this?

Setting a pointer equal to zero is a special case.

Zero is another way of saying NULL. So when you declare a pointer that
is equal to zero, it's really a NULL pointer. It doesn't point anywhere.

So, when you say "Insert 17 at the location pointed to by this pointer",
it has no place to put the value 17, since you told it to point nowhere.

--
John Gordon A is for Amy, who fell down the stairs
gor...@panix.com B is for Basil, assaulted by bears
-- Edward Gorey, "The Gashlycrumb Tinies"

Ian Collins

unread,

Jun 21, 2011, 5:59:43 PM6/21/11

to

On 06/22/11 09:49 AM, yugandhar wrote:
> Hi Forum,
>
> I have a quick question.
>
> The following works:
>
> char *s="hello";
> *s="world";

It is certainly not guaranteed to work. It would barf on any of the
compilers I use. These all place string literals in a write only segment.

> but the following gives a segementation fault:
>
> int *p = 0;
> *p = 17;
>
> Could anyone please clarify me about this?

You are invoking the demons of undefined behaviour.

--
Ian Collins

Ike Naar

unread,

Jun 21, 2011, 6:18:50 PM6/21/11

to

On 2011-06-21, yugandhar <nos...@nospam.com> wrote:
> The following works:

In what way do you think it works?

> char *s="hello";
> *s="world";

You're assigning a pointer-to-char to a char.
That doesn't work. Your compiler should have issued a diagnostic.

> but the following gives a segementation fault:
>
> int *p = 0;
> *p = 17;

p is a null pointer; you're trying to dereference it; don't do that.

> Could anyone please clarify me about this?

Sorry.

Shao Miller

unread,

Jun 21, 2011, 7:34:25 PM6/21/11

to

On 6/21/2011 4:49 PM, yugandhar wrote:
> Hi Forum,
>
> I have a quick question.
>
> The following works:
>
> char *s="hello";
> *s="world";
>

The type of 's' is 'char *'.

The type of '*s' is 'char'.

The type of '"world"' is not 'char'.

So the last line up above is erroneous.

> but the following gives a segementation fault:
>
> int *p = 0;
> *p = 17;
>

The type of 'p' is 'int *'.

The type of '*p' is 'int'.

Since 'p' is a null pointer, use of '*p' is undefined behaviour. That
explains your segmentation fault.

> Could anyone please clarify me about this?

Maybe.

int my_int = 13;
^^^--type

int my_int = 13;
^^^^^^--object identifier

int * foo, * bar;
^^^-^--type

int * foo, * bar;
^^^--object identifier

int * foo, * bar;
^^^--------^--type

int * foo, * bar;
^^^--object identifier

Also:

int my_int = 13;

is similar to:

int my_int;
my_int = 13;

A different example:

int * ip = &my_int;

is similar to:

int * ip;
ip = &my_int;

and not:

int * ip;
*ip = &my_int;

Keith Thompson

unread,

Jun 21, 2011, 6:42:15 PM6/21/11

to

It's already been explained why "*p = 17;" blows up, but I don't believe
that the first block of code is what you actually compiled.

s is of type char*, so *s is of type char. You're attempting to assign
a string literal (which will decay to char*) to a char object. Early
pre-ANSI C compilers might have accepted that without complaint,
implicitly treating the pointer value as an integer and then narrowing
it to one byte, but in modern C it's a constraint violation, and any
compiler that doesn't at least warn you about it is broken.

Even if you corrected the type mismatch by writing:

*s = 'w';

your program's behavior would be undefined. s points to a string
literal (more precisely, it points to the first element of the static
array associated with the string literal), and any attempt to modify a
string literal has undefined behavior.

I can't be sure what your actual code looks like, but the most plausible
variant I can think of is:

char *s = "hello";
s = "world";

That's perfectly legal; it stores the address of the string "world" in
the variable s.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Morris Keesan

unread,

Jun 21, 2011, 8:25:40 PM6/21/11

to

On Tue, 21 Jun 2011 17:58:50 -0400, John Gordon <gor...@panix.com> wrote:

> In <itr3lm$i8j$1...@speranza.aioe.org> yugandhar <nos...@nospam.com> writes:
>
>> Hi Forum,
>
>> I have a quick question.
>
>> The following works:
>
>> char *s="hello";
>> *s="world";
>
> It shouldn't work. A character pointer declared in that way should be
> non-writable. It's an accident that it "worked" (did the wrong thing)
> for you.

Not the declaration, but the initialization (making s point to a string
literal). But whether the pointed-to char is writable is undefined:
"should be non-writable" is a bit of an overstatement. But it should be
treated as unwritable by the programmer, even if not by an implementation.

--
Morris Keesan -- mke...@post.harvard.edu

Eric Sosman

unread,

Jun 21, 2011, 9:46:55 PM6/21/11

to

On 6/21/2011 5:49 PM, yugandhar wrote:
> Hi Forum,
>
> I have a quick question.
>
> The following works:
>
> char *s="hello";
> *s="world";

Get a new compiler. Or, possibly, post your question to a forum
devoted to the language you're using: It's not C. (Or, possibly,
post the actual code you're asking about rather than a paraphrase.)

> but the following gives a segementation fault:
>
> int *p = 0;
> *p = 17;

In C, if you're using C, the first line declares a pointer-to-int
and initializes it with the value usually known as NULL, the "pointer
to nowhere." The second line attempts to store 17 at the location
"nowhere," and things go haywire. (Technically, "the behavior is
undefined" -- "haywire" is a reasonable shorthand.)

--
Eric Sosman
eso...@ieee-dot-org.invalid

Stephen Sprunk

unread,

Jun 21, 2011, 10:29:22 PM6/21/11

to

On 21-Jun-11 16:49, yugandhar wrote:
> The following works:
>
> char *s="hello";
> *s="world";

It does?

% cat foo.c
#include <stdio.h>
int main(void) {

char *s="hello";
*s="world";

puts(s);
}
% gcc foo.c
foo.c: In function ‘main’:
foo.c:4: warning: assignment makes integer from pointer without a cast
% ./a.out
Segmentation fault

What is the shortest _compilable_ program that "works" on your
implementation? Does your compiler generate any warning messages that
indicate undefined behavior, as mine does?

> but the following gives a segementation fault:
>
> int *p = 0;
> *p = 17;

You're dereferencing a null pointer here, which is also undefined
behavior. However, my compiler isn't smart enough to notice that
problem; it just crashes when executed--for a somewhat different reason
than the former example.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

Noob

unread,

Jun 22, 2011, 4:28:36 AM6/22/11

to

Keith Thompson wrote:

> Even if you corrected the type mismatch by writing:
>

> char *s = "hello";

> *s = 'w';
>
> your program's behavior would be undefined. s points to a string
> literal (more precisely, it points to the first element of the static
> array associated with the string literal), and any attempt to modify a
> string literal has undefined behavior.

Since modifying a character string literal already has UB in
the current standard, then why doesn't the next standard
specify that string literal have type const char[] instead
of just char[] ?

Regards.

Shao Miller

unread,

Jun 22, 2011, 6:23:39 AM6/22/11

to

Undefined behaviour implies "non-portable," in my view. But why remove
an implementation's freedom to define string literals' storage as writable?

James Kuyper

unread,

Jun 22, 2011, 6:25:40 AM6/22/11

to

On 06/22/2011 04:28 AM, Noob wrote:
...

> Since modifying a character string literal already has UB in
> the current standard, then why doesn't the next standard
> specify that string literal have type const char[] instead
> of just char[] ?

It would break a large amount of existing code. The most common problem
would be code which assigns the pointer value of a string literal to a
char* object. You could argue that this is bad practice - if a pointer
might be pointing at a string literal, it should be declared 'const
char*', rather than 'char*'. However, there's no actual problem with
such code so long as no attempt is made to write through that pointer
value, and there's an awful lot of existing code which relies upon that
fact.
--
James Kuyper

Stephen Sprunk

unread,

Jun 22, 2011, 12:31:37 PM6/22/11

to

That has been proposed numerous times over the years, but it is always
shot down due to widespread fears of "const poisoning". Frankly, I
think more use of const in C would be a good thing and eliminate many
lurking bugs, but to date ISO is unwilling to "break" code that assumes
string literals are not const but doesn't actually write to them.

Kleuskes & Moos

unread,

Jun 22, 2011, 1:06:36 PM6/22/11

to

As an addendum: if your write through a pointer, you primise the
computer (so to speak) that there exists a valid object at that
address. If it does not, all bets are off.

That is to say:

int *p = 42;
*p = 17;

Would have the same problem.

Ian Collins

unread,

Jun 22, 2011, 3:56:03 PM6/22/11

to

The existing code problem was a acknowledged when C++ changed the type
of string literals. Compilers may choose not to issues a diagnostic for
this case. Now we have had over a decade to fix the smelly code, I
believe a diagnostic is now required by the new C++ standard.

C could and should have done the same, but as usual those worried about
breaking already broken code appear to have won the day.

--
Ian Collins

Keith Thompson

unread,

Jun 22, 2011, 4:06:32 PM6/22/11

to

Not exactly. "int *p = 42;" has the problem that 42 is not implicitly
convertible to int*, so it's a contraint violation. "int *p = 0;" is a
special case, because 0 is a null pointer constant.

Kleuskes & Moos

unread,

Jun 22, 2011, 4:29:12 PM6/22/11

to

On Jun 22, 10:06 pm, Keith Thompson <ks...@mib.org> wrote:

You're right. There should have been a cast in there.

Ian Collins

unread,

Jun 22, 2011, 6:23:15 PM6/22/11

to

On 06/23/11 11:12 AM, Shao Miller wrote:

> Again, why should a C implementation be rendered non-conforming [to some
> future Standard] thusly?

Because in most cases such code is an accident waiting to happen. It
was amusing to see how much cruft was removed from the OpenSolaris code
base when the default action of the native C compiler changed to use
read only literals! Although it's a specific example, it does
illustrate a more general point - assuming writeable literals is
non-portable. All those code changes were also required to get the code
to compile with gcc.

> In a "bare metal" environment, one might very well wish to overwrite
> their string literals' storage, no? The "bare metal" implementation
> might need to define such action as being appropriate.

Most "bare metal" environments I have used place literals in a read only
segment, RAM is usually a more precious (and expensive) resource than
ROM/FLASH on embedded systems.

> By using "proper" static arrays, we lose out on the "shared storage"
> benefit. Writing for bare metal, hopefully one knows what one is doing.

Embedded tools usually have a rich set of pragmas and link options to
specify where various types of object live. It's nigh on impossible to
write a pure standard C embedded application.

> Is this example silly?

No, but it is easily worked round, tool sets with a C++ compiler already
have to.

--
Ian Collins

Shao Miller

unread,

Jun 22, 2011, 7:12:09 PM6/22/11

to

On 6/22/2011 2:56 PM, Ian Collins wrote:

Again, why should a C implementation be rendered non-conforming [to some
future Standard] thusly?

In a "bare metal" environment, one might very well wish to overwrite

their string literals' storage, no? The "bare metal" implementation
might need to define such action as being appropriate.

By using "proper" static arrays, we lose out on the "shared storage"

benefit. Writing for bare metal, hopefully one knows what one is doing.

Is this example silly?

Keith Thompson

unread,

Jun 22, 2011, 8:23:51 PM6/22/11

to

Shao Miller <sha0....@gmail.com> writes:
> On 6/22/2011 2:56 PM, Ian Collins wrote:
>> On 06/22/11 10:25 PM, James Kuyper wrote:

[...]

>> The existing code problem was a acknowledged when C++ changed the type
>> of string literals. Compilers may choose not to issues a diagnostic for
>> this case. Now we have had over a decade to fix the smelly code, I
>> believe a diagnostic is now required by the new C++ standard.
>>
>> C could and should have done the same, but as usual those worried about
>> breaking already broken code appear to have won the day.
>>
>
> Again, why should a C implementation be rendered non-conforming [to some
> future Standard] thusly?
>
> In a "bare metal" environment, one might very well wish to overwrite
> their string literals' storage, no? The "bare metal" implementation
> might need to define such action as being appropriate.
>
> By using "proper" static arrays, we lose out on the "shared storage"
> benefit. Writing for bare metal, hopefully one knows what one is doing.
>
> Is this example silly?

If there's a need for a "bare metal" environment to be able to modify
string literals, that can be provided as an extension. Any code that
currently takes advantage of that ability already has undefined
behavior.

If such a feature were desirable, we could have an optional 'M' (for
modifiable) prefix for string literals, similar to the existing 'L'
prefix for wide string literals. For example, "hello" could be of type
const char[6], and M"hello" could be of type char[6]].

The fact that I've never heard of anyone implementing something like
this suggests (though not strongly) that there's no demand for it.

I suspect that the vast majority of code that attempts to modify
string literals does so as the result of bugs. A lot more code
uses string literals in contexts that don't treat them as const,
but doesn't actually try to modify them; for example:

void func(char *s) {
printf("In func(), s = \"%s\"\n");
}

...

func("hello");

In short, I think the issue is not that anyone wants to modify
string literals; it's that making them const would break existing
code that *doesn't* actually modify string literals.

(Stroustrup was able to do this in C++ because there was no existing
C++ code before he invented the language.)

Ian Collins

unread,

Jun 22, 2011, 8:54:39 PM6/22/11

to

On 06/23/11 12:23 PM, Keith Thompson wrote:
>
> I suspect that the vast majority of code that attempts to modify
> string literals does so as the result of bugs. A lot more code
> uses string literals in contexts that don't treat them as const,
> but doesn't actually try to modify them; for example:
>
> void func(char *s) {
> printf("In func(), s = \"%s\"\n");
> }
>
> ...
>
> func("hello");
>
> In short, I think the issue is not that anyone wants to modify
> string literals; it's that making them const would break existing
> code that *doesn't* actually modify string literals.
>
> (Stroustrup was able to do this in C++ because there was no existing
> C++ code before he invented the language.)

No quite, the change came in with the 1998 C++ standard so there was
plenty of existing code. That's why there was a "special case" which
allows your example to pass without requiring a diagnostic.

--
Ian Collins

Keith Thompson

unread,

Jun 22, 2011, 9:18:07 PM6/22/11

to

The change I was referring to was making string literals const.

I'm not familiar with the "special case"; I'll have to look into it
(elsewhere).

Ian Collins

unread,

Jun 22, 2011, 9:23:06 PM6/22/11

to

On 06/23/11 01:18 PM, Keith Thompson wrote:
> Ian Collins<ian-...@hotmail.com> writes:
>> On 06/23/11 12:23 PM, Keith Thompson wrote:
>>> I suspect that the vast majority of code that attempts to modify
>>> string literals does so as the result of bugs. A lot more code
>>> uses string literals in contexts that don't treat them as const,
>>> but doesn't actually try to modify them; for example:
>>>
>>> void func(char *s) {
>>> printf("In func(), s = \"%s\"\n");
>>> }
>>>
>>> ...
>>>
>>> func("hello");
>>>
>>> In short, I think the issue is not that anyone wants to modify
>>> string literals; it's that making them const would break existing
>>> code that *doesn't* actually modify string literals.
>>>
>>> (Stroustrup was able to do this in C++ because there was no existing
>>> C++ code before he invented the language.)
>>
>> No quite, the change came in with the 1998 C++ standard so there was
>> plenty of existing code. That's why there was a "special case" which
>> allows your example to pass without requiring a diagnostic.
>
> The change I was referring to was making string literals const.

That was the change introduced by the standard. Prior to that, C++ had
the same rule as C. The change was a hot topic of discussion at the
ACCU conference held just after the standard was ratified.

--
Ian Collins

Gordon Burditt

unread,

Jun 23, 2011, 12:49:31 AM6/23/11

to

> In a "bare metal" environment, one might very well wish to overwrite
> their string literals' storage, no?

Usually just the opposite.

If I wish to "force" writability of specific string literals, I can
instead put them in character arrays. Those have to go in writable
memory. A compiler that uses writeable strings also isn't supposed
to consolidate those strings. (For example: "birthday" + 5 ==
"day" (no, I do not mean to call strcmp() here: I am comparing
pointers, not strings) with non-writable strings might evaluate as
true because the compiler shares memory between the two, but it
shouldn't be doing that with writable strings. (GCC offers a choice
here, and it seems to get this right both ways. No consolidation
with writable strings.) Generally, on "bare metal" I'd want maximum
consolidation of string literals possible, and put them in ROM/code
flash, if possible. You don't get much in the way of strings
in a processor with 128 or 256 bytes of RAM, total, including
the stack (yes, most of these processors have one, even if C does
not require it).

> The "bare metal" implementation
> might need to define such action as being appropriate.

"bare metal" has a lot more problems of wishing to specify where
something is placed. Also, RAM tends to be much more expensive
than ROM or flash, and "initialized data" usually takes up space
in both. Starting the program involves copying from flash to
"initialized data" to RAM, and zeroing the "uninitialized data" in
RAM. The OS might do this on a PC but on bare metal it can be done
by compiler start-up routines or done explicitly by the program
(GCC and binutils for the TI MSP430 and Atmel AVR processors does
this, using linker magic to find the boundaries of these areas.
The code, fortunately, is very short. You can replace or remove
it if you really need to).

Some processors have much faster access to RAM memory on "page zero"
than elsewhere, so you want to carefully control what goes there and
what doesn't.

Interrupt vectors for a processor generally have to be placed at a
fixed location. If no interrupt vector handler is defined for a
particular vector, you usually want it handled by a default (dummy)
handler rather than crashing. Defining an interrupt handler to save
and restore registers properly is another common extension.

Some processors put code flash, RAM, and data eeprom in the same
address space (e.g. TI MSP430). For this, you want to put everything
read-only into code flash if at all possible. You might get 8k of
code flash, 256 bytes of RAM, and 256 bytes of data eeprom, plus
some I/O ports to control digital I/O, timers, analog/digital
conversion, system clock speed, programming eeprom, onboard temperature
sensor, etc.

Some processors have boundaries for code memory. On many, if you
go over 64k (these are large processors: many are limited to 4k
or less), you have to start worrying about what code goes in the
first 64k, and doing far calls between routines in different sections.

> By using "proper" static arrays, we lose out on the "shared storage"
> benefit. Writing for bare metal, hopefully one knows what one is doing.
>
> Is this example silly?

Some "bare metal" processors have separate address spaces for code
flash, RAM, and data eeprom (e.g. Atmel ATMega328P, ATTiny85).
Placement of elements of the program in the proper space is critical.
You *CAN'T HAVE* a pointer that points into "either code or data,
depending". You can't tell which it points to by examining bits.
Those of you familiar with PDP-11 "separate I and D" mode know what
I am referring to. Both the code and data spaces start at 0x0000,
and pointers do not contain a bit to indicate whether code or data
is being addressed. You have to know which one it is ahead of time
to know which instruction to use to fetch the data. The only way
to read data from code flash is using a special machine instruction,
usually made accessible in C as a macro or inline function as an
extension. So you have to *KNOW* whether a particular string
literal got put in code flash and adjust your code accordingly.

A compiler is likely to have an extension to put a string literal
in code flash. GCC lets you attach attributes to typedefs. However,
if you want to pass it to strcpy(), you have to have two different
versions of strcpy(): one that copies from code space to RAM, and
one that copies from RAM to RAM. The same goes for puts().

Due to interests in protecting proprietary code programmed into the
chip, a fuse can be "blown" after programming which prohibits reading
from code flash except to execute code (or erase the chip entirely
- which means the proprietary code is now gone). This may rule out
coding techniques such as "branch tables" or data constants (including
string literals) in the code section. You don't *have* to use that
fuse, although it may be company policy if these chips are going
in your company product.

Many of the processors have a small amount (e.g. 256 bytes) of
non-volatile data memory. It is particularly useful for
individual-chip-specific data such as MAC addresses and serial
numbers, calibration constants for temperature sensors or how far
your robot can reach, and storing stuff that has to survive power
cycling. It typically can stand many (e.g. 100,000) re-programming
cycles, but trying to program it once a second will limit lifetime
to a little over a day. It may or may not need a special function
(using a special machine instruction) to read from it (as opposed
to dereferencing pointers as though it were normal RAM or ROM). It
may require you to "erase" a "page" of memory (e.g. 64 bytes)
before you can re-write it, and you might only get a few "pages",
so it's important to organize related things in memory together.

A "bare metal" chip may actually have "standard input" and "standard
output", at least in a debugging environment. It might be a serial
port or USB interface acting like a serial port. The chip may end
up using 25% of its code space generating a carefully-timed signal
that looks like the waveform of an actual hardware serial port.
Or it might send error codes in Morse code over an LED.

Gordon Burditt

unread,

Jun 23, 2011, 1:45:18 AM6/23/11

to

> That has been proposed numerous times over the years, but it is always
> shot down due to widespread fears of "const poisoning". Frankly, I
> think more use of const in C would be a good thing and eliminate many
> lurking bugs, but to date ISO is unwilling to "break" code that assumes
> string literals are not const but doesn't actually write to them.

What is the correct return type of these C functions (and I've
probably missed a few) which return a pointer into somewhere into
the string passed as a first argument, given that the first argument
might be a C string literal (which we're now changing to const) or
it might be a writable character array? If it's a writable character
array, you might want to use the returned pointer to write into the
string.

strchr()
strrchr()
strstr()
strpbrk()

It seems you are stuck with one of several bad choices:
(1) dragging in C++ function overloading (does that even solve
the problem? Can you overload on const/non-const arguments
of otherwise the same type?),
(2) allowing these functions to return a non-const pointer to
non-writable data, thereby begging for bugs,
(3) giving an excellent excuse for "casting away constness" to
make use of a const pointer to writable data, thereby
inviting ignoring of similar "casting away constness" without
such a good excuse, which is begging for bugs,
(4) defining functions:
char *strchr(char *s, int c);
and const char * const_poisoned_strchr(const char *s, int c);
which do the same thing, other than constness?
This breaks existing code, at least with warnings.
The C standard does not currently reserve symbols starting
with const_poisoned_str* .

I can think of plenty of my own parsing functions that at least
potentially have the same issue: given some input (which might be
a string literal of a default or a line out of some user's config
file) retrieve some part of the command (e.g. variable name or value
of a particular type) or NULL if the command is ill-formatted.

Tim Rentsch

unread,

Jun 23, 2011, 4:02:34 AM6/23/11

to

Noob <ro...@127.0.0.1> writes:

Because the cost is large and the benefit is small.

I daresay it's pretty easy to get diagnostics for
non-const uses of string literals if one wants
them. Given that, there is no compelling reason
to force everyone to change, especially since it
can be useful for an implementation to define
string literals so that they are usefully writeable.

Ian Collins

unread,

Jun 23, 2011, 4:11:55 AM6/23/11

to

On 06/23/11 08:02 PM, Tim Rentsch wrote:
> Noob<ro...@127.0.0.1> writes:
>
>> Keith Thompson wrote:
>>
>>> Even if you corrected the type mismatch by writing:
>>>
>>> char *s = "hello";
>>> *s = 'w';
>>>
>>> your program's behavior would be undefined. s points to a string
>>> literal (more precisely, it points to the first element of the static
>>> array associated with the string literal), and any attempt to modify a
>>> string literal has undefined behavior.
>>
>> Since modifying a character string literal already has UB in
>> the current standard, then why doesn't the next standard
>> specify that string literal have type const char[] instead
>> of just char[] ?
>
> Because the cost is large and the benefit is small.

The benefit can be the difference between something failing to compile
or failing horribly at run time.

> I daresay it's pretty easy to get diagnostics for
> non-const uses of string literals if one wants
> them. Given that, there is no compelling reason
> to force everyone to change, especially since it
> can be useful for an implementation to define
> string literals so that they are usefully writeable.

Under what circumstances?

--
Ian Collins

Alan Curry

unread,

Jun 23, 2011, 4:08:38 PM6/23/11

to

In article <yp6dnQAIK81zT5_T...@posted.internetamerica>,

Gordon Burditt <gordon...@burditt.org> wrote:
>
>What is the correct return type of these C functions (and I've
>probably missed a few) which return a pointer into somewhere into
>the string passed as a first argument, given that the first argument
>might be a C string literal (which we're now changing to const) or
>it might be a writable character array? If it's a writable character
>array, you might want to use the returned pointer to write into the
>string.
>
> strchr()
> strrchr()
> strstr()
> strpbrk()

All should return an integer offset from the beginning of the input string.
In their present form they are dangerous.

--
Alan Curry

Stephen Sprunk

unread,

Jun 23, 2011, 11:40:43 PM6/23/11

to

On 23-Jun-11 00:45, Gordon Burditt wrote:
>> That has been proposed numerous times over the years, but it is always
>> shot down due to widespread fears of "const poisoning". Frankly, I
>> think more use of const in C would be a good thing and eliminate many
>> lurking bugs, but to date ISO is unwilling to "break" code that assumes
>> string literals are not const but doesn't actually write to them.
>
> What is the correct return type of these C functions (and I've
> probably missed a few) which return a pointer into somewhere into
> the string passed as a first argument, given that the first argument
> might be a C string literal (which we're now changing to const) or
> it might be a writable character array? If it's a writable character
> array, you might want to use the returned pointer to write into the
> string.
>
> strchr()
> strrchr()
> strstr()
> strpbrk()
>
> It seems you are stuck with one of several bad choices:
> (1) dragging in C++ function overloading (does that even solve
> the problem? Can you overload on const/non-const arguments
> of otherwise the same type?),

AIUI, you can. I see the problem, though; in fact, while researching
that question, one page I found actually listed the various overloaded
forms of strchr() offered by one compiler.

If one weren't to add function overloading (which has its own appeal),
the only solution would be to deprecate those functions and design
replacements with a const-friendly interface. However, that would break
so much code that it's simply not feasible.

Ian Collins

unread,

Jun 23, 2011, 11:46:39 PM6/23/11

to

The C++ standard replaces the C strchr() with two overloads:

const char* strchr(const char* s, int c);
char* strchr( char* s, int c);

--
Ian Collins

Stephen Sprunk

unread,

Jun 24, 2011, 10:24:49 AM6/24/11

to

Compare to the only option C provides:

char *strchr(const char *s, int c);

This takes a const or non-const argument but always returns a non-const
pointer; the potential loss of const-ness invites bugs. Thanks to
overloading, the C++ version is able to return a pointer that matches
the const-ness of its argument, preventing bugs.

Shao Miller

unread,

Jun 24, 2011, 3:39:39 PM6/24/11

to

On 6/22/2011 20:23, Keith Thompson wrote:
> Shao Miller<sha0....@gmail.com> writes:
>> On 6/22/2011 2:56 PM, Ian Collins wrote:
>>> On 06/22/11 10:25 PM, James Kuyper wrote:
> [...]
>>> The existing code problem was a acknowledged when C++ changed the type
>>> of string literals. Compilers may choose not to issues a diagnostic for
>>> this case. Now we have had over a decade to fix the smelly code, I
>>> believe a diagnostic is now required by the new C++ standard.
>>>
>>> C could and should have done the same, but as usual those worried about
>>> breaking already broken code appear to have won the day.
>>>
>>
>> Again, why should a C implementation be rendered non-conforming [to some
>> future Standard] thusly?
>>
>> In a "bare metal" environment, one might very well wish to overwrite
>> their string literals' storage, no? The "bare metal" implementation
>> might need to define such action as being appropriate.
>>
>> By using "proper" static arrays, we lose out on the "shared storage"
>> benefit. Writing for bare metal, hopefully one knows what one is doing.
>>
>> Is this example silly?
>
> If there's a need for a "bare metal" environment to be able to modify
> string literals, that can be provided as an extension.

As in, a documented extension?

> Any code that
> currently takes advantage of that ability already has undefined
> behavior.
>

Exactly the nature of my question. If there is at least one real
program that depends on this "undefined behaviour" (use the Standard
definition, please :) ), then that program's source code might have to
be adjusted for future versions of an implementation, depending on how
that future implementation implements the extension.

> If such a feature were desirable, we could have an optional 'M' (for
> modifiable) prefix for string literals, similar to the existing 'L'
> prefix for wide string literals. For example, "hello" could be of type
> const char[6], and M"hello" could be of type char[6]].
>

An interesting idea! 'M()' could get close, perhaps.

#define M(string) ((char[]){string})

I guess we'd need 'ML', too?

> The fact that I've never heard of anyone implementing something like
> this suggests (though not strongly) that there's no demand for it.
>

Suppose you've a program loaded into memory via a serial line. Suppose
the program is loaded into writable memory (that seems pretty likely).
Suppose the program offers a CLI. Suppose a user can rename commands or
variables, or redefine preset scripts. Yes, all of these could be done
cleanly (in my opinion) via 'static' 'char[]'s, but it can be more
convenient for some people to simply type the string literal right into
some spot in the source code where it's used and forget about coming up
with a meaningful (and possibly redundant) identifier, i.e.
'csz_Hello__world___And_how_are_you__today_'

> I suspect that the vast majority of code that attempts to modify
> string literals does so as the result of bugs.

That seems probable to me, too.

> A lot more code
> uses string literals in contexts that don't treat them as const,
> but doesn't actually try to modify them; for example:
>
> void func(char *s) {
> printf("In func(), s = \"%s\"\n");
> }
>
> ...
>
> func("hello");
>
> In short, I think the issue is not that anyone wants to modify
> string literals;

For the right use case, I would.

> it's that making them const would break existing
> code that *doesn't* actually modify string literals.
>
> (Stroustrup was able to do this in C++ because there was no existing
> C++ code before he invented the language.)
>

Well doesn't 'const' "come from" C++, anyway?

And why isn't there a write-only counterpart, for symmetry, such as a
memory-mapped port that mustn't be read from?

And during very early development (and Usenet code examples), can it be
pleasant to avoid 'const' concerns altogether and then to analyze and
refine, gradually sprinkling 'const' in where appropriate? I don't
advocate this, but imagine that some folks might get "stuck" in
"analysis paralysis" if they had to think 'const'ness through at every
corner. I could be mistaken.

Ian Collins

unread,

Jun 24, 2011, 5:28:41 PM6/24/11

to

On 06/25/11 07:39 AM, Shao Miller wrote:
> On 6/22/2011 20:23, Keith Thompson wrote:
>
>> The fact that I've never heard of anyone implementing something like
>> this suggests (though not strongly) that there's no demand for it.
>>
>
> Suppose you've a program loaded into memory via a serial line. Suppose
> the program is loaded into writable memory (that seems pretty likely).
> Suppose the program offers a CLI. Suppose a user can rename commands or
> variables, or redefine preset scripts.

A recipe for disaster! What happens if a new name is linger than the old?

> Yes, all of these could be done
> cleanly (in my opinion) via 'static' 'char[]'s, but it can be more
> convenient for some people to simply type the string literal right into
> some spot in the source code where it's used and forget about coming up
> with a meaningful (and possibly redundant) identifier, i.e.
> 'csz_Hello__world___And_how_are_you__today_'

Yuck, what a contrived example! As you say, there is a method that does
not use undefined behaviour.

>> I suspect that the vast majority of code that attempts to modify
>> string literals does so as the result of bugs.
>
> That seems probable to me, too.
>
>> A lot more code
>> uses string literals in contexts that don't treat them as const,
>> but doesn't actually try to modify them; for example:
>>
>> void func(char *s) {
>> printf("In func(), s = \"%s\"\n");
>> }
>>
>> ...
>>
>> func("hello");
>>
>> In short, I think the issue is not that anyone wants to modify
>> string literals;
>
> For the right use case, I would.
>
>> it's that making them const would break existing
>> code that *doesn't* actually modify string literals.
>>
>> (Stroustrup was able to do this in C++ because there was no existing
>> C++ code before he invented the language.)
>>
>
> Well doesn't 'const' "come from" C++, anyway?

No, it's just used properly there!

> And why isn't there a write-only counterpart, for symmetry, such as a
> memory-mapped port that mustn't be read from?

That case isn't uncommon in embedded systems (watchdog reset being a
common example). However reading is generally harmless.

> And during very early development (and Usenet code examples), can it be
> pleasant to avoid 'const' concerns altogether and then to analyze and
> refine, gradually sprinkling 'const' in where appropriate? I don't
> advocate this, but imagine that some folks might get "stuck" in
> "analysis paralysis" if they had to think 'const'ness through at every
> corner. I could be mistaken.

Not really, if you don't know whether the function you are writing will
modify its arguments, you are big trouble!

--
Ian Collins

Keith Thompson

unread,

Jun 24, 2011, 5:29:03 PM6/24/11

to

Shao Miller <sha0....@gmail.com> writes:
> On 6/22/2011 20:23, Keith Thompson wrote:
>> Shao Miller<sha0....@gmail.com> writes:
>>> On 6/22/2011 2:56 PM, Ian Collins wrote:
>>>> On 06/22/11 10:25 PM, James Kuyper wrote:
>> [...]
>>>> The existing code problem was a acknowledged when C++ changed the type
>>>> of string literals. Compilers may choose not to issues a diagnostic for
>>>> this case. Now we have had over a decade to fix the smelly code, I
>>>> believe a diagnostic is now required by the new C++ standard.
>>>>
>>>> C could and should have done the same, but as usual those worried about
>>>> breaking already broken code appear to have won the day.
>>>>
>>>
>>> Again, why should a C implementation be rendered non-conforming [to some
>>> future Standard] thusly?
>>>
>>> In a "bare metal" environment, one might very well wish to overwrite
>>> their string literals' storage, no? The "bare metal" implementation
>>> might need to define such action as being appropriate.
>>>
>>> By using "proper" static arrays, we lose out on the "shared storage"
>>> benefit. Writing for bare metal, hopefully one knows what one is doing.
>>>
>>> Is this example silly?
>>
>> If there's a need for a "bare metal" environment to be able to modify
>> string literals, that can be provided as an extension.
>
> As in, a documented extension?

Well, yes, documenting it would certainly be a nice touch.

>> Any code that
>> currently takes advantage of that ability already has undefined
>> behavior.
>>
>
> Exactly the nature of my question. If there is at least one real
> program that depends on this "undefined behaviour" (use the Standard
> definition, please :) ),

If you were correcting my spelling from "behavior" to "behaviour", the
Standard uses the US-style "behavior" spelling. If not, what do you
mean?

(Hmm, does the UK standard body, its equivalent of the US ANSI,
"translate" ISO standards into UK spellings?)

> then that program's source code might have to
> be adjusted for future versions of an implementation, depending on how
> that future implementation implements the extension.
>
>> If such a feature were desirable, we could have an optional 'M' (for
>> modifiable) prefix for string literals, similar to the existing 'L'
>> prefix for wide string literals. For example, "hello" could be of type
>> const char[6], and M"hello" could be of type char[6]].
>>
>
> An interesting idea! 'M()' could get close, perhaps.
>
> #define M(string) ((char[]){string})
>
> I guess we'd need 'ML', too?
>
>> The fact that I've never heard of anyone implementing something like
>> this suggests (though not strongly) that there's no demand for it.
>
> Suppose you've a program loaded into memory via a serial line. Suppose
> the program is loaded into writable memory (that seems pretty likely).
> Suppose the program offers a CLI. Suppose a user can rename commands or
> variables, or redefine preset scripts. Yes, all of these could be done
> cleanly (in my opinion) via 'static' 'char[]'s, but it can be more
> convenient for some people to simply type the string literal right into
> some spot in the source code where it's used and forget about coming up
> with a meaningful (and possibly redundant) identifier, i.e.
> 'csz_Hello__world___And_how_are_you__today_'

And suppose the user wants to replace the content with a string that's
longer than the original literal.

[snip]

>> In short, I think the issue is not that anyone wants to modify
>> string literals;
>
> For the right use case, I would.

Perhaps.

There are benefits and drawbacks both to allowing modifications of
string literals, and to disallowing them. IMHO the benefits of
disallowing modifications (catching potential errors) far outweight the
benefit for a few obscure use cases.

My M"..." proposal *could* give us the best of both, and I wouldn't
object if it showed up in a future standard. I just don't think
it's sufficiently useful.

>> it's that making them const would break existing
>> code that *doesn't* actually modify string literals.
>>
>> (Stroustrup was able to do this in C++ because there was no existing
>> C++ code before he invented the language.)
>
> Well doesn't 'const' "come from" C++, anyway?

I think so.

> And why isn't there a write-only counterpart, for symmetry, such as a
> memory-mapped port that mustn't be read from?

Lack of usefulness, I suppose. "const" (which perhaps should have been
called "readonly") is massively useful. "writeonly" would be probably
useful only in some very low-level code. And an implementation could
provide a #pragma that does the same thing (have any done so?).

> And during very early development (and Usenet code examples), can it be
> pleasant to avoid 'const' concerns altogether and then to analyze and
> refine, gradually sprinkling 'const' in where appropriate? I don't
> advocate this, but imagine that some folks might get "stuck" in
> "analysis paralysis" if they had to think 'const'ness through at every
> corner. I could be mistaken.

I think designing const into your code from the start is a lot
easier.

My personal preference is to declare everything "const" *unless*
I specifically need to modify it. In fact, if I were designing
a new language (without concern for backward compatibility),
declared objects would be read-only by default, with some special
syntax ("var"?) to make them writable. With sufficiently flexible
initialization, I suspect most objects don't need to be modified
after their creation. (I do not for one moment suggesting making
such a change to C.)

Ian Collins

unread,

Jun 24, 2011, 5:36:56 PM6/24/11

to

On 06/25/11 09:29 AM, Keith Thompson wrote:
>
> If you were correcting my spelling from "behavior" to "behaviour", the
> Standard uses the US-style "behavior" spelling. If not, what do you
> mean?
>
> (Hmm, does the UK standard body, its equivalent of the US ANSI,
> "translate" ISO standards into UK spellings?)

Alas, no. We have to suffer the cultural imperialism!

--
Ian Collins

Keith Thompson

unread,

Jun 24, 2011, 5:58:57 PM6/24/11

to

Ian Collins <ian-...@hotmail.com> writes:
> On 06/25/11 07:39 AM, Shao Miller wrote:

[...]

>> Well doesn't 'const' "come from" C++, anyway?
>
> No, it's just used properly there!

See the ANSI C Rationale, at
<http://www.lysator.liu.se/c/rat/c5.html#3-5-3>:

The Committee has added to C two type qualifiers: const and
volatile. Individually and in combination they specify the
assumptions a compiler can and must make when accessing an
object through an lvalue.

The syntax and semantics of const were adapted from C++; the
concept itself has appeared in other languages. volatile is
an invention of the Committee; it follows the syntactic model
of const.

Ian Collins

unread,

Jun 24, 2011, 6:06:35 PM6/24/11

to

On 06/25/11 09:58 AM, Keith Thompson wrote:
> Ian Collins<ian-...@hotmail.com> writes:
>> On 06/25/11 07:39 AM, Shao Miller wrote:
> [...]
>>> Well doesn't 'const' "come from" C++, anyway?
>>
>> No, it's just used properly there!
>
> See the ANSI C Rationale, at
> <http://www.lysator.liu.se/c/rat/c5.html#3-5-3>:
>
> The Committee has added to C two type qualifiers: const and
> volatile. Individually and in combination they specify the
> assumptions a compiler can and must make when accessing an
> object through an lvalue.
>
> The syntax and semantics of const were adapted from C++; the
> concept itself has appeared in other languages. volatile is
> an invention of the Committee; it follows the syntactic model
> of const.

OK, but it was standardised in C long before C++.

--
Ian Collins

Tim Rentsch

unread,

Jun 25, 2011, 11:39:09 AM6/25/11

to

Shao Miller <sha0....@gmail.com> writes:

> [discussing modifiable string literals]

>
> An interesting idea! 'M()' could get close, perhaps.
>
> #define M(string) ((char[]){string})

Different storage duration.

Tim Rentsch

unread,

Jun 25, 2011, 12:23:29 PM6/25/11

to

Ian Collins <ian-...@hotmail.com> writes:

> On 06/23/11 08:02 PM, Tim Rentsch wrote:
>> Noob<ro...@127.0.0.1> writes:
>>
>>> Keith Thompson wrote:
>>>
>>>> Even if you corrected the type mismatch by writing:
>>>>
>>>> char *s = "hello";
>>>> *s = 'w';
>>>>
>>>> your program's behavior would be undefined. s points to a string
>>>> literal (more precisely, it points to the first element of the static
>>>> array associated with the string literal), and any attempt to modify a
>>>> string literal has undefined behavior.
>>>
>>> Since modifying a character string literal already has UB in
>>> the current standard, then why doesn't the next standard
>>> specify that string literal have type const char[] instead
>>> of just char[] ?
>>
>> Because the cost is large and the benefit is small.
>
> The benefit can be the difference between something failing to compile
> or failing horribly at run time.

But no language change is needed to obtain that benefit; for
those who want it, it's available today through compiler options.

>> I daresay it's pretty easy to get diagnostics for
>> non-const uses of string literals if one wants
>> them. Given that, there is no compelling reason
>> to force everyone to change, especially since it
>> can be useful for an implementation to define
>> string literals so that they are usefully writeable.
>
> Under what circumstances?

I can easily imagine an implementation providing a compiler
option to make string literals modifiable - not turned on
all the time, but having the option. When might that option
be useful? Some examples:

1. Compiling legacy code that assumes literals are writable.

2. To force string literals to unique locations to help track
where various strings appear in the program (working under the
assumption that a writable-literals option would force different
literals to distinct locations, which it should).

3. During debugging, it might be handy to be able to change
a particular string literal, eg a printf() format, to help
explore program behavior.

I admit these examples may not occur very often. Still, why give
up the flexibility to preserve them, since the language as it
exists today also allows the option of checking string literals
being used "const"-inappropriately -- what benefit would we get
that we don't already have?

Keith Thompson

unread,

Jun 25, 2011, 3:37:11 PM6/25/11

to

Tim Rentsch <t...@alumni.caltech.edu> writes:
[...]

> I can easily imagine an implementation providing a compiler
> option to make string literals modifiable - not turned on
> all the time, but having the option. When might that option
> be useful? Some examples:
>
> 1. Compiling legacy code that assumes literals are writable.

Valid, but I think a lot of such code, perhaps most of it, has been
fixed by necessity, after it blew up when it was recompiled by a
compiler that makes literals non-writable.

> 2. To force string literals to unique locations to help track
> where various strings appear in the program (working under the
> assumption that a writable-literals option would force different
> literals to distinct locations, which it should).

That could be done by an option that just forces literals to unique
locations without making them writable.

> 3. During debugging, it might be handy to be able to change
> a particular string literal, eg a printf() format, to help
> explore program behavior.

Ok, but I don't think I've ever felt the need to do that.

> I admit these examples may not occur very often. Still, why give
> up the flexibility to preserve them, since the language as it
> exists today also allows the option of checking string literals
> being used "const"-inappropriately -- what benefit would we get
> that we don't already have?

We'd get the benefit of a guarantee that programs accidentally
attempt to write string literals will be caught and fixed more
easily, regardless of which conforming compiler we're using.
A hypothetical compiler option doesn't do me much good if the
compiler I'm using doesn't provide it (recompiling with a different
compiler isn't always an option).

Similar arguments could be made in favor of making modifying a
const-qualified object undefined behavior rather than a constraint
violation:

const int x = 42;
x = 43;

In my opinion, the only good reason to consider allowing string
literals to be modifiable is for compatibility with very old
implementations. I suspect that if string literals had been const
from the beginning (which would have required inventing "const"
many years sooner), we wouldn't be having this discussion.

Shao Miller

unread,

Jun 26, 2011, 4:44:33 PM6/26/11

to

On 6/24/2011 4:28 PM, Ian Collins wrote:
> On 06/25/11 07:39 AM, Shao Miller wrote:
>> On 6/22/2011 20:23, Keith Thompson wrote:
>>
>>> The fact that I've never heard of anyone implementing something like
>>> this suggests (though not strongly) that there's no demand for it.
>>>
>>
>> Suppose you've a program loaded into memory via a serial line. Suppose
>> the program is loaded into writable memory (that seems pretty likely).
>> Suppose the program offers a CLI. Suppose a user can rename commands or
>> variables, or redefine preset scripts.
>
> A recipe for disaster! What happens if a new name is linger than the old?
>

When it's time to overwrite, count how many characters before the null
character and limit accordingly. This'd mean that redefining these
could narrow the strings and one couldn't redefine with a larger string
afterwards, but a script mightn't need to redefine more than once; i.e.
defaults.

But I wouldn't worry about that as much as I'd worry about the potential
for shared storage. ;)

>> Yes, all of these could be done
>> cleanly (in my opinion) via 'static' 'char[]'s, but it can be more
>> convenient for some people to simply type the string literal right into
>> some spot in the source code where it's used and forget about coming up
>> with a meaningful (and possibly redundant) identifier, i.e.
>> 'csz_Hello__world___And_how_are_you__today_'
>
> Yuck, what a contrived example! As you say, there is a method that does
> not use undefined behaviour.
>

Yes, it is contrived. Yes, it's cleaner the other way. But ought our
opinions to be enforced globally via such a change in the Standard?
Some programmers might just like it. Since I'm not one of them, maybe I
shouldn't be attempting to defend their position. Heh.

>>> I suspect that the vast majority of code that attempts to modify
>>> string literals does so as the result of bugs.
>>
>> That seems probable to me, too.
>>
>>> A lot more code
>>> uses string literals in contexts that don't treat them as const,
>>> but doesn't actually try to modify them; for example:
>>>
>>> void func(char *s) {
>>> printf("In func(), s = \"%s\"\n");
>>> }
>>>
>>> ...
>>>
>>> func("hello");
>>>
>>> In short, I think the issue is not that anyone wants to modify
>>> string literals;
>>
>> For the right use case, I would.
>>
>>> it's that making them const would break existing
>>> code that *doesn't* actually modify string literals.
>>>
>>> (Stroustrup was able to do this in C++ because there was no existing
>>> C++ code before he invented the language.)
>>>
>>
>> Well doesn't 'const' "come from" C++, anyway?
>
> No, it's just used properly there!
>

I really thought it did come from C++. Oops.

>> And why isn't there a write-only counterpart, for symmetry, such as a
>> memory-mapped port that mustn't be read from?
>
> That case isn't uncommon in embedded systems (watchdog reset being a
> common example). However reading is generally harmless.
>

But a "write-only" attribute still adds useful information. A
programmer coming along to work on someone else's code mightn't realize
that there is no expectation whatsoever for the value of an object, once
read. Such an attribute could allow them to find that out at
translation time. It's a digression, anyway. :)

>> And during very early development (and Usenet code examples), can it be
>> pleasant to avoid 'const' concerns altogether and then to analyze and
>> refine, gradually sprinkling 'const' in where appropriate? I don't
>> advocate this, but imagine that some folks might get "stuck" in
>> "analysis paralysis" if they had to think 'const'ness through at every
>> corner. I could be mistaken.
>
> Not really, if you don't know whether the function you are writing will
> modify its arguments, you are big trouble!
>

That seems just a bit beginner-unfriendly, to me. I've seen beginners
struggle with 'const'-ness quite a bit, especially with one or more
levels of indirection.

My point just there is that the behaviour of a program can be the same
with 'const' completely removed. So it seems more like something that
"ought to" be used rather than "must" be used. But hey, that's just an
opinion.

Shao Miller

unread,

Jun 26, 2011, 5:24:37 PM6/26/11

to

Oops; no. Just an attempt to direct the interpretation of "that depends
on this \"undefined behaviour\"". It's previously been demonstrated
that it might be interpreted as plain English rather than the precise
Standard definition.

> (Hmm, does the UK standard body, its equivalent of the US ANSI,
> "translate" ISO standards into UK spellings?)
>

Heh. I've no idea.

>>
>> Suppose you've a program loaded into memory via a serial line. Suppose
>> the program is loaded into writable memory (that seems pretty likely).
>> Suppose the program offers a CLI. Suppose a user can rename commands or
>> variables, or redefine preset scripts. Yes, all of these could be done
>> cleanly (in my opinion) via 'static' 'char[]'s, but it can be more
>> convenient for some people to simply type the string literal right into
>> some spot in the source code where it's used and forget about coming up
>> with a meaningful (and possibly redundant) identifier, i.e.
>> 'csz_Hello__world___And_how_are_you__today_'
>
> And suppose the user wants to replace the content with a string that's
> longer than the original literal.
>

So let them. That's a discussion about user expectations. The user
might even be the programmer, wishing to override some defaults for a
particular client's needs.

Above is just an example for why a programmer might not want string
literals to be 'const'-qualified, especially in an environment where
'const' mightn't have any physical meaning; where all program memory is
writable, and usefully so.

> [snip]
>
>>> In short, I think the issue is not that anyone wants to modify
>>> string literals;
>>
>> For the right use case, I would.
>
> Perhaps.
>
> There are benefits and drawbacks both to allowing modifications of
> string literals, and to disallowing them. IMHO the benefits of
> disallowing modifications (catching potential errors) far outweight the
> benefit for a few obscure use cases.
>

They probably do out-weigh in terms of use cases, in my opinion.

> My M"..." proposal *could* give us the best of both, and I wouldn't
> object if it showed up in a future standard. I just don't think
> it's sufficiently useful.
>

Agreed.

>>> it's that making them const would break existing
>>> code that *doesn't* actually modify string literals.
>>>
>>> (Stroustrup was able to do this in C++ because there was no existing
>>> C++ code before he invented the language.)
>>
>> Well doesn't 'const' "come from" C++, anyway?
>
> I think so.
>
>> And why isn't there a write-only counterpart, for symmetry, such as a
>> memory-mapped port that mustn't be read from?
>
> Lack of usefulness, I suppose. "const" (which perhaps should have been
> called "readonly") is massively useful.

I find it useful as a quality measure, potential optimization
opportunity, and for interface specification, but would find
"write-only" useful for the same reasons.

> "writeonly" would be probably
> useful only in some very low-level code. And an implementation could
> provide a #pragma that does the same thing (have any done so?).
>

Well MS has '_in', '_out', and '_inout'[1].

>> And during very early development (and Usenet code examples), can it be
>> pleasant to avoid 'const' concerns altogether and then to analyze and
>> refine, gradually sprinkling 'const' in where appropriate? I don't
>> advocate this, but imagine that some folks might get "stuck" in
>> "analysis paralysis" if they had to think 'const'ness through at every
>> corner. I could be mistaken.
>
> I think designing const into your code from the start is a lot
> easier.
>

That strikes me as a useful habit for:
- A programmer that understands 'const'
- A programmer that perceives benefits from using 'const'

I'm just not sure that's the set of all programmers.

> My personal preference is to declare everything "const" *unless*
> I specifically need to modify it. In fact, if I were designing
> a new language (without concern for backward compatibility),
> declared objects would be read-only by default, with some special
> syntax ("var"?) to make them writable. With sufficiently flexible
> initialization, I suspect most objects don't need to be modified
> after their creation. (I do not for one moment suggesting making
> such a change to C.)
>

What are the major benefits of having such a read-only attribute as a
default? I think that there are lots of concepts translating to code
constructs and patterns that make for high-quality code (such as
'const'), but do they have meaning in every environment? As an example,
if "storage" can be "readable" and "writable," isn't it kind of pleasant
that subsets can be defined by explicit inclusion of a predicate such as
'const'? As another example, if "storage" can be "addressable" and
"non-addressable," isn't it kind of pleasant that subsets can be defined
by explicit inclusion of a predicate such as 'register'?

But maybe you're right and maybe 'const'-default string literals could
raise the average quality of C code. :)

[1] http://msdn.microsoft.com/en-us/library/aa383701%28v=VS.85%29.aspx

Tim Rentsch

unread,

Jun 27, 2011, 2:40:22 PM6/27/11

to

Keith Thompson <ks...@mib.org> writes:

> Tim Rentsch <t...@alumni.caltech.edu> writes:
> [...]
>> I can easily imagine an implementation providing a compiler
>> option to make string literals modifiable - not turned on
>> all the time, but having the option. When might that option
>> be useful? Some examples:
>>
>> 1. Compiling legacy code that assumes literals are writable.
>
> Valid, but I think a lot of such code, perhaps most of it, has been
> fixed by necessity, after it blew up when it was recompiled by a
> compiler that makes literals non-writable.

Certainly a lot of it has. But it would be nice to have
a choice, especially since the problems may manifest more
subtly than having code blow up (eg, overlapping different
string literals in writable memory).

>> 2. To force string literals to unique locations to help track
>> where various strings appear in the program (working under the
>> assumption that a writable-literals option would force different
>> literals to distinct locations, which it should).
>
> That could be done by an option that just forces literals to unique
> locations without making them writable.

Sure, but an option to have literals be writable gets both.
If I were writing a compiler I'd rather have to provide
only one option rather than two.

>> 3. During debugging, it might be handy to be able to change
>> a particular string literal, eg a printf() format, to help
>> explore program behavior.
>
> Ok, but I don't think I've ever felt the need to do that.

Me either, but then I tend to use debuggers in rather
simple ways, and I know there are developers for whom
the exploratory capabilities of their debuggers is
quite valuable.

>> I admit these examples may not occur very often. Still, why give
>> up the flexibility to preserve them, since the language as it
>> exists today also allows the option of checking string literals
>> being used "const"-inappropriately -- what benefit would we get
>> that we don't already have?
>
> We'd get the benefit of a guarantee that programs accidentally
> attempt to write string literals will be caught and fixed more
> easily, regardless of which conforming compiler we're using.

Some developers would benefit from such a guarantee. Most
developers already have the benefit available to them.

> A hypothetical compiler option doesn't do me much good if the
> compiler I'm using doesn't provide it (recompiling with a different
> compiler isn't always an option).

Oh, the other compiler doesn't have to be used to produce the
object files; it can be used _in addition to_ the particular
target compiler, just to provide extra error checking. Granted,
there can be discrepancies between the two compilations if there
is platform-specific conditional compilation, but programs that
make extensive use of that have much deeper problems than string
literals.

To say this another way - can you name one real-world example
of a program where it can't easily be checked for string literal
non-const-ness?

> Similar arguments could be made in favor of making modifying a
> const-qualified object undefined behavior rather than a constraint
> violation:
>
> const int x = 42;
> x = 43;

The two cases don't seem sufficiently analogous to draw
any useful conclusions.

> In my opinion, the only good reason to consider allowing string
> literals to be modifiable is for compatibility with very old
> implementations. I suspect that if string literals had been const
> from the beginning (which would have required inventing "const"
> many years sooner), we wouldn't be having this discussion.

A better option would be for the Standard to mandate a compiler
option to give a diagnostic if a string literal is used as a
'char *' rather than a 'const char *'. Some groups like 'const'
and use it a lot; others use it only when necessary. I think
it's better not to change the language itself but simply make
sure better tools are there for people who want or need them, and
let different groups decide independently. This approach also is
better suited for acceptance in the development community, not
to mention allowing easier transitioning of old programs.

Shao Miller

unread,

Jul 4, 2011, 6:00:41 PM7/4/11

to

On 6/24/2011 10:24, Stephen Sprunk wrote:
>>
>> The C++ standard replaces the C strchr() with two overloads:
>>
>> const char* strchr(const char* s, int c);
>> char* strchr( char* s, int c);
>
> Compare to the only option C provides:
>
> char *strchr(const char *s, int c);
>
> This takes a const or non-const argument but always returns a non-const
> pointer; the potential loss of const-ness invites bugs. Thanks to
> overloading, the C++ version is able to return a pointer that matches
> the const-ness of its argument, preventing bugs.

A fellow yesterday advised of another strategy, using a GCC extension:

#define strchr(s, c) ((__typeof__(s)) (strchr((s), (c)))

That is, casting the result to the original type of 's'. Heheheh.

Dr Nick

unread,

Jul 5, 2011, 2:45:22 AM7/5/11

to

Shao Miller <sha0....@gmail.com> writes:

That's either very neat or unutterably ghastly. I'm still thinking
about which it is.
--
Online waterways route planner | http://canalplan.eu
Plan trips, see photos, check facilities | http://canalplan.org.uk

Harald van Dĳk

unread,

Jul 5, 2011, 1:39:43 PM7/5/11

to

On Jul 5, 7:45 am, Dr Nick <3-nos...@temporary-address.org.uk> wrote:

> Shao Miller <sha0.mil...@gmail.com> writes:
> > A fellow yesterday advised of another strategy, using a GCC extension:
>
> > #define strchr(s, c) ((__typeof__(s)) (strchr((s), (c)))
>
> > That is, casting the result to the original type of 's'. Heheheh.
>
> That's either very neat or unutterably ghastly. I'm still thinking
> about which it is.

I'm going for ghastly, because it gives an error when s is an array,
and it gives the wrong result type when s is a pointer to (const) void.

Shao Miller

unread,

Jul 5, 2011, 2:09:05 PM7/5/11

to

(Note I had missed a right parenthesis at the end.)

That rings quite true. :)

Hmmm... I was actually just guessing at the macro based on what the
fellow had said, but considering your array note, perhaps it was more like:

#define strchr(s, c) (*((__typeof__(s) *) (strchr((s), (c)))))

It might be interesting if there was a '__constof__(type-name)' and/or
'__constof__(object)' that could expand to 'const' or nothing, as
appropriate.

const char * foo;
/* The next type is 'const char' */
typedef __constof__(*foo) char foo_base_t;
/* The next type is silly */
typedef int * __constof__(foo_base_t) bar;

Uhhh... :S

#define strchr(s, c) ((__constof__(*(s)) char *) (strchr((s), (c))))

Joel C. Salomon

unread,

Jul 5, 2011, 3:32:07 PM7/5/11

to

I proposed another version on comp.std.c a few months back, using the
C1x proposed _Generic keyword. As fixed by Ben Bacarisse:

#define strchr(str, chr) \
_Generic((str), \
const void *: (const char *)strchr((str), (chr)), \
const char *: (const char *)strchr((str), (chr)), \
default: strchr((str), (chr)) \
)

(See the thread "_Generic and defining NELEM(arr)".)

--Joel

Shao Miller

unread,

Jul 6, 2011, 9:58:26 AM7/6/11

to

Very nice. :) I wonder how that plays with arrays... If the
controlling expression is not evaluated[6.5.1.1p3], then the type could
be an array type... Perhaps another line?

const char[]: ((const char *) strchr((str), (chr))), \

'_Generic' isn't mentioned in 6.3.2.1p3, but this still gives me pause.

Phil Carmody

unread,

Jul 7, 2011, 2:56:20 PM7/7/11

to

Shao Miller <sha0....@gmail.com> writes:

> On 7/5/2011 13:39, Harald van D锟斤拷 3k wrote:
> > On Jul 5, 7:45 am, Dr Nick<3-nos...@temporary-address.org.uk> wrote:
> >> Shao Miller<sha0.mil...@gmail.com> writes:
> >>> A fellow yesterday advised of another strategy, using a GCC extension:
> >>
> >>> #define strchr(s, c) ((__typeof__(s)) (strchr((s), (c)))
> >>
> >>> That is, casting the result to the original type of 's'. Heheheh.
> >>
> >> That's either very neat or unutterably ghastly. I'm still thinking
> >> about which it is.
> >
> > I'm going for ghastly, because it gives an error when s is an array,
> > and it gives the wrong result type when s is a pointer to (const) void.
>
> (Note I had missed a right parenthesis at the end.)
>
> That rings quite true. :)
>
> Hmmm... I was actually just guessing at the macro based on what the
> fellow had said, but considering your array note, perhaps it was more
> like:
>
> #define strchr(s, c) (*((__typeof__(s) *) (strchr((s), (c)))))

How about just:

#define strchr(s, c) ((__typeof__(&*s)) (strchr((s), (c))))

?

> It might be interesting if there was a '__constof__(type-name)' and/or
> '__constof__(object)' that could expand to 'const' or nothing, as
> appropriate.

That I like. A lot. Get it in GCC, and I'll be an early adopter.

Phil
--
"At least you know where you are with Microsoft."
"True. I just wish I'd brought a paddle." -- Matthew Vernon

Shao Miller

unread,

Jul 7, 2011, 7:43:09 PM7/7/11

to

On Thursday, 7 July 2011 14:56:20 UTC-4, Phil Carmody wrote:
> [...some other language?...]

Or was that your point? :)

Phil Carmody

unread,

Jul 7, 2011, 7:46:08 PM7/7/11

to

Phil Carmody <thefatphi...@yahoo.co.uk> writes:
> �⁍��〮��⹣��ਾ⁏��㈰ㄱ‱㌺㌹Ⱐ��⁶�⁄ĳ��ਾ‾⁏��‷��Ⱐ�⁎��㌭��⸮��ⵡ��⹯�⹵�†��ਾ‾��⁍��⹭�⸮⹀��†��⁹��⁡��⁯��Ⱐ��ਾ‾��†‣��⁳��⁣⤠⠨��⡳⤩ ��⠨�Ⱐ⡣⤩⤊��ਾ‾�⁔��Ⱐ��⁴��⁴��‧�⸠⁈��⸊��ਾ‾��❳⁥��⁮��⁵��⁧��⸠⁉❭⁳��⁴��⁡��⁷��⁩��⸊��⁢��⁩��⁥��⁷��⁡��ਾ‾⁡�⁩��⁴��⁳⁩�� ��⁶��ਾ �⡎��⁡⁲��⁰��⁡��⤊�ਾ⁔��⸠�ਾ ��⸮⸠⁉⁷�⁡��⁧��⁴�⁭��⁢��⁯��⁴�ਾ⁦��Ⱐ��⁡��⁮��⁰��⁩��ਾ⁬��ਾ �†‣��⁳��⁣⤠⠪⠨��⡳⤠⨩ ��⠨�Ⱐ⡣⤩⤩⤊ੈ�⁡��⁪��ਊ⍤��⡳Ⱐ� ⡟��☪�⤠⡳��⡳⤬ �⤩⤊ਿ ਾ⁉��⁩��⁩��❟��⡴��⤧⁡�⽯��❟��⡯��✠��⁣��⁥��‧��⁯��⁡��ਊ��⁉⁬��⁁⁬�⸠��⁩��⁡�⁉❬��⁡��ਊ��ਭⴠਢ�⁬��⁹�⁫��⁍��⸢ਢ��⸠��⁷��⸢‭ⴠ��

No idea what happened there, that's all "?"s to me. I certainly never consciously chose
Content-Type: text/plain; charset=utf-16be
Google groups, no matter how much it sucks, shows that there's a post hidden behind that noise:
http://groups.google.com/group/comp.lang.c/msg/4bbd1eab45333db5

Shao Miller

unread,

Jul 7, 2011, 10:26:43 PM7/7/11

to

On 7/7/2011 19:46, Phil Carmody wrote:
> Phil Carmody<thefatphi...@yahoo.co.uk> writes:
>> [...uhh...]

>
> No idea what happened there, that's all "?"s to me. I certainly never consciously chose
> Content-Type: text/plain; charset=utf-16be
> Google groups, no matter how much it sucks, shows that there's a post hidden behind that noise:
> http://groups.google.com/group/comp.lang.c/msg/4bbd1eab45333db5
>

Doubly-odd. It showed up weird in Thunderbird, so I went to Google
Groups to confirm. That's how I saw it there, too. I actually replied
from Google Groups as Thunderbird wouldn't send my response (due to the
quoting?). :S Anyway...

Stephen Sprunk

unread,

Jul 7, 2011, 11:41:41 PM7/7/11

to

Thunderbird displays the message properly (and allows responses) when
forced to interpret the message as UTF-8 rather than as the UTF-16BE
indicated in the header noted above. I suspect other modern readers
would as well, but one has to think to try it.