Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Pointer and string literal question

4 views
Skip to first unread message

Tagore

unread,
Dec 10, 2009, 5:51:40 PM12/10/09
to
hi,

#include <stdio.h>
int main(void){
char *s="LET";
char *t="LET";
if(s==t)
printf("same");
else
printf("different");
return 0;
}

In above code, output is "same".
but I expected output to be "different". I think that s and t points
to string literals present at different addresses.
Can any one please help me in understanding its output.

Regards,

bartc

unread,
Dec 10, 2009, 6:09:22 PM12/10/09
to

"Tagore" <c.lang...@gmail.com> wrote in message
news:6cf7d23b-8a63-4992...@h14g2000pri.googlegroups.com...

Because the literals are identical, perhaps only a single copy is used.

--
Bartc

Jens Thoms Toerring

unread,
Dec 10, 2009, 6:20:42 PM12/10/09
to
Tagore <c.lang...@gmail.com> wrote:
> #include <stdio.h>
> int main(void){
> char *s="LET";
> char *t="LET";
> if(s==t)
> printf("same");
> else
> printf("different");
> return 0;
> }

> In above code, output is "same".
> but I expected output to be "different". I think that s and t points
> to string literals present at different addresses.

Why do you think so? It's correct that both 's' and 't' point to
string literals - but since the strings they point to are identical
it's one of the most simple (memory-related) optimizations for the
compiler to make them point to the same location. Actually, that's
the very reason why you aren't allowed to change string literals -
i.e. if you would do e.g.

s[1] = 'x'; /* not allowed by the C standard! */

then this would also change the content of what 't' is poin-
ting to. The guys writing the C standard had two alternatives:
allow changes to string literals - in which case 's' couldn't
point to the same place as 't', thus making a certain kind of
optimization impossible - or allow for optimization like the
one you are seeing here and thus forbid changing string lite-
rals. They went with the second one, which to me seems to be
in the spirit of C, i.e. go for compact, fast and least resour-
ce-hungry compiled programs.

But if you don't like it your compiler may have a flag to make
it less standard-compliant and force it to produce code where
's' is pointing to a different location than 't' (and where you
thus may change string literals).
Regards, Jens
--
\ Jens Thoms Toerring ___ j...@toerring.de
\__________________________ http://toerring.de

Keith Thompson

unread,
Dec 10, 2009, 6:20:37 PM12/10/09
to
"bartc" <ba...@freeuk.com> writes:
> "Tagore" <c.lang...@gmail.com> wrote in message
> news:6cf7d23b-8a63-4992...@h14g2000pri.googlegroups.com...
>> #include <stdio.h>
>> int main(void){
>> char *s="LET";
>> char *t="LET";
>> if(s==t)
>> printf("same");
>> else
>> printf("different");
>> return 0;
>> }
>>
>> In above code, output is "same".
>> but I expected output to be "different". I think that s and t points
>> to string literals present at different addresses.
>> Can any one please help me in understanding its output.
>
> Because the literals are identical, perhaps only a single copy is used.

Right. Compilers are explicitly permitted, but not required, to do
this. C99 6.4.5p6:

It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.

Your program shouldn't assume either that they're the same, or that
they aren't.

For example, for this program:

#include <stdio.h>
int main(void)
{
char *s0 = "abcde";
char *s1 = "abcde";
char *s2 = "Xabcde";
if (s0 == s1) {
puts("s0 == s1");
}
else {
puts("s0 != s1");
}
if (s0 == s2+1) {
puts("s0 == s2+1");
}
else {
puts("s0 != s2+1");
}
return 0;
}

all 4 possible results are valid. (The compiler I'm using prints
s0 == s1, s0 != s2+1
without optimization,
s0 == s1, s0 == s2+1
with optimization.)

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Kaz Kylheku

unread,
Dec 10, 2009, 6:38:26 PM12/10/09
to
On 2009-12-10, Jens Thoms Toerring <j...@toerring.de> wrote:
> Tagore <c.lang...@gmail.com> wrote:
>> #include <stdio.h>
>> int main(void){
>> char *s="LET";
>> char *t="LET";
>> if(s==t)
>> printf("same");
>> else
>> printf("different");
>> return 0;
>> }
>
>> In above code, output is "same".
>> but I expected output to be "different". I think that s and t points
>> to string literals present at different addresses.
>
> Why do you think so? It's correct that both 's' and 't' point to
> string literals - but since the strings they point to are identical
> it's one of the most simple (memory-related) optimizations for the
> compiler to make them point to the same location. Actually, that's
> the very reason why you aren't allowed to change string literals -
> i.e. if you would do e.g.

It's not the only reason.

Literals are effectively pieces of the program text made available to itself as
data, so that modifying a literal de facto constitutes self-modifying code.
Self-modifying code can't be placed into read-only storage, such as a ROM, or
write-protected virtual pages.

> s[1] = 'x'; /* not allowed by the C standard! */

This undefinedness also means that once you perform s[1] = 'x', a subsequent
statement of the form

if (s[1] == 'x') ...

could go either way (if it ever gets to execute at all). It's not just about
other copies of the i literal being affected by the change.

The translated program is also simply not required to be aware of
self-modifications like this.

Not only can another instance of the literal share the same space as s, but the
expression s[1] can be optimized to a constant which does not respond to
changes to s.

Keith Thompson

unread,
Dec 10, 2009, 6:41:17 PM12/10/09
to
Kaz Kylheku <kkyl...@gmail.com> writes:
> On 2009-12-10, Jens Thoms Toerring <j...@toerring.de> wrote:
>> Tagore <c.lang...@gmail.com> wrote:
>>> char *s="LET";
>>> char *t="LET";
[...]

>> Why do you think so? It's correct that both 's' and 't' point to
>> string literals - but since the strings they point to are identical
>> it's one of the most simple (memory-related) optimizations for the
>> compiler to make them point to the same location. Actually, that's
>> the very reason why you aren't allowed to change string literals -
>> i.e. if you would do e.g.
>
> It's not the only reason.
>
> Literals are effectively pieces of the program text made available
> to itself as data, so that modifying a literal de facto constitutes
> self-modifying code. Self-modifying code can't be placed into
> read-only storage, such as a ROM, or write-protected virtual pages.
>
>> s[1] = 'x'; /* not allowed by the C standard! */
>
> This undefinedness also means that once you perform s[1] = 'x', a subsequent
> statement of the form
>
> if (s[1] == 'x') ...
>
> could go either way (if it ever gets to execute at all). It's not just about
> other copies of the i literal being affected by the change.
>
> The translated program is also simply not required to be aware of
> self-modifications like this.
>
> Not only can another instance of the literal share the same space as
> s, but the expression s[1] can be optimized to a constant which does
> not respond to changes to s.

Agreed.

In addition, it's also likely (but not required) that attempting:

s[1] = 'x';

will cause your program to crash. (In fact, this is the *best*
outcome, since it shows you where the error is.)

Ike Naar

unread,
Dec 10, 2009, 6:45:45 PM12/10/09
to
In article <6cf7d23b-8a63-4992...@h14g2000pri.googlegroups.com>,

Tagore <c.lang...@gmail.com> wrote:
>#include <stdio.h>
>int main(void){
> char *s="LET";
> char *t="LET";
> if(s==t)
> printf("same");
> else
> printf("different");
> return 0;
>}
>In above code, output is "same".
>but I expected output to be "different".
> I think that s and t points
>to string literals present at different addresses.

Not necessarily; the compiler is free to choose whether it places
the string literals at the same address or not. Sometimes you can
use compiler flags to tell the compiler where to place the strings.

For example, if you would compile your program with the Sun C compiler
operating in default mode, the output would be "different", but if
you use the same compiler with the -xstrconst flag, the output is "same".

For GNU C with default settings, the output is "same", but
with the -fwritable-strings flag the output is "different".

Of course, you could write your program in such a way that it does
not matter where the strings are.

Eric Sosman

unread,
Dec 10, 2009, 9:00:59 PM12/10/09
to

As others have explained, the compiler might choose
to create only one nameless "LET" string, and aim both
pointers at that single instance.

A compiler might go further still:

char *s = "LET";
char *t = "NUMBER ONE WITH A BULLET";
if (s == t)
... obviously false ...
if (s == t+21)
... ??? ...

--
Eric Sosman
eso...@ieee-dot-org.invalid

Peter Nilsson

unread,
Dec 10, 2009, 9:23:50 PM12/10/09
to
Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
<snip>

>      As others have explained, the compiler might choose
> to create only one nameless "LET" string, and aim both
> pointers at that single instance.
>
>      A compiler might go further still:
>
>         char *s = "LET";
>         char *t = "NUMBER ONE WITH A BULLET";
>         if (s == t)
>             ... obviously false ...
>         if (s == t+21)
>             ... ??? ...

Are you sure that's allowed in C89/90? I thought the
string literals had to be 'identical' before they could
share the same address.

--
Peter

Eric Sosman

unread,
Dec 10, 2009, 9:50:51 PM12/10/09
to

The word "identical" doesn't seem to appear in any part
of the C99 Standard that's relevant. But perhaps I've missed
something; can you cite which "identical" you're thinking of?

In C99, 6.4.5p6 says "It is unspecified whether these arrays


are distinct provided their elements have the appropriate values."

The word "appropriate" does not seem to me to imply "identical."

C89/ANSI 3.1.4 says "Identical string literals of either form
need not be distinct," but doesn't seem to say anything at all
about non-identical literals. (It doesn't even say that "X"
and "FOOBAR" are distinct.)

I don't have a copy of C90 to consult, but others have said
it's the same as C89 except for section and paragraph numbers.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Keith Thompson

unread,
Dec 10, 2009, 9:50:32 PM12/10/09
to

The wording did change between C90 and C99.

C90 6.1.4:

Identical string literals of either form need not be distinct. If
the program attempts to modify a string literal of either form,
the behavior is undefined.

where "either form" refers to character string literals and wide
string literals.

C99 6.4.5p6:

It is unspecified whether these arrays are distinct provided their

elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.

But the C90 standard didn't say that string literals that aren't
identical *can't* overlap (and I can't think of any good reason to
assume that they can't). I think C99 mostly just improved the
wording.

Nick

unread,
Dec 11, 2009, 1:40:43 AM12/11/09
to
Keith Thompson <ks...@mib.org> writes:

There is, presumably, nothing to stop the compiler pointing s at four
bytes of machine code that happen to make up part of the body of your
program and which constitute codes for L,E and T followed by a 0 byte.
If they should so happen to appear, of course.
--
Online waterways route planner: http://canalplan.org.uk
development version: http://canalplan.eu

James Dow Allen

unread,
Dec 11, 2009, 2:52:21 AM12/11/09
to
On Dec 11, 6:38 am, Kaz Kylheku <kkylh...@gmail.com> wrote:
> Literals are effectively pieces of the program text made available to itself as
> data, so that modifying a literal de facto constitutes self-modifying code.
> Self-modifying code can't be placed into read-only storage, such as a ROM, or
> write-protected virtual pages.

A related reason for "read-only when possible" concerns text-sharing.

One might have dozens of copies of the same program (e.g. interpreter)
running on one machine; the interpreter's data might include hundreds
of messages; there's a very big savings if the messages can be moved
to a read-only, sharable memory section. (There used to be a
complicated
pre-processor that accomplished this, also looking for string matches;
it became obsolete when compilers started treating string literals as
read-only by default.)

James Dow Allen

Nick Keighley

unread,
Dec 11, 2009, 3:00:35 AM12/11/09
to
On 10 Dec, 23:20, j...@toerring.de (Jens Thoms Toerring) wrote:
> Tagore <c.lang.mys...@gmail.com> wrote:

> > #include <stdio.h>
> > int main(void){
> >         char *s="LET";
> >         char *t="LET";
> >         if(s==t)
> >               printf("same");
> >         else
> >               printf("different");
> >         return 0;
> > }
>
> > In above code, output is "same".
> > but I expected output to be "different". I think that s and t points
> > to string literals present at different addresses.
>
> Why do you think so? It's correct that both 's' and 't' point to
> string literals - but since the strings they point to are identical
> it's one of the most simple (memory-related) optimizations for the
> compiler to make them point to the same location.

<snip>

> But if you don't like it your compiler may have a flag to make
> it less standard-compliant and force it to produce code where
> 's' is pointing to a different location than 't' (and where you
> thus may change string literals).

why is this not-compliant?

Richard Bos

unread,
Dec 12, 2009, 1:33:57 PM12/12/09
to
Keith Thompson <ks...@mib.org> wrote:

It's even possible that a later

if (ch == 'L')

is compiled to compare to the first character of your string literal,
instead of to a literal 'L', on systems where this is faster. It is even
allowed that, if you do try to change the string, that comparison fails
when ch is 'L', at a point which _appears_ to have nothing whatsoever to
do with the original string literal.
I have never seen an implementation which goes that far in its
optimisations (in fact, I've never seen one where it would make sense),
but I would not be very surprised to find one. It would certainly be
perfectly legal.

Richard

Jens Thoms Toerring

unread,
Dec 12, 2009, 5:49:10 PM12/12/09
to

> <snip>

> why is this not-compliant?

Sorry, that was badly expressed. What I meant was that there might
be a flag that gets the compiler to emit a working program (in the
sense of "as maybe expected by the user") for non-compliant code
(i.e. that allows for changing of string literals, which otherwise
results in undefined behaviour). But on thinking about it a bit
more even that doesn't guarantee that 's' and 't' will point to dif-
ferent locations, what one would need for that is a flag that sup-
presses the kind of optimization that merges identical (parts of)
string literals.

Michael Tsang

unread,
Dec 20, 2009, 6:42:02 AM12/20/09
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tagore wrote:

> hi,
>
> #include <stdio.h>
> int main(void){
> char *s="LET";
> char *t="LET";
> if(s==t)
> printf("same");
> else
> printf("different");
> return 0;
> }
>

As string literals are really "const" char *, there are read-only and the
compiler is free to place them at the same or at different addresses.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAksuDYoACgkQG6NzcAXitM8q8QCggCk8nbniYiOayL/SLP3qQIxE
eWMAoIgO0i2qL7Sf4PE9rmye3xp3IfK3
=nz2o
-----END PGP SIGNATURE-----

Ben Bacarisse

unread,
Dec 20, 2009, 8:17:53 AM12/20/09
to
Michael Tsang <mik...@gmail.com> writes:

> Tagore wrote:
<snip>


>> #include <stdio.h>
>> int main(void){
>> char *s="LET";
>> char *t="LET";
>> if(s==t)
>> printf("same");
>> else
>> printf("different");
>> return 0;
>> }
>
> As string literals are really "const" char *, there are read-only and the
> compiler is free to place them at the same or at different
> addresses.

It's worth pointing out (as I think you know from the quotes you used
round "const") that string literals are not actually const objects in
C. They are not modifiable (in that the effect of doing so is
undefined) but if they were really const, you'd get a compiler
diagnostic from the initialisations in the program above.

Also (and this is very much a small point) a literal like "same" is
really of type char[5] since sizeof will report the array object's
size not the size of a char *.

--
Ben.

0 new messages