Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Is this a correct implementation of strstr ?

10 views
Skip to first unread message

candide

unread,
Apr 13, 2010, 5:01:33 PM4/13/10
to
I request your opinion about the following attempt to implement the
standard function strstr. Here is the code :

char *mystrchr(const char *s, int c)
{
while ((unsigned char)*s != (unsigned char)c) {
if (*s == '\0')
return NULL;
++s;
}
return (char*)s;
}

Ian Collins

unread,
Apr 13, 2010, 5:08:23 PM4/13/10
to
On 04/14/10 09:01 AM, candide wrote:
> I request your opinion about the following attempt to implement the
> standard function strstr. Here is the code :

No, it looks like you are writing strchr.


>
>
> char *mystrchr(const char *s, int c)
> {
> while ((unsigned char)*s != (unsigned char)c) {

Why the casts?

> if (*s == '\0')
> return NULL;
> ++s;
> }
> return (char*)s;
> }


--
Ian Collins

Seebs

unread,
Apr 13, 2010, 5:08:56 PM4/13/10
to

I do not think this is a correct implementation of the standard
function str*str*. Apart from that, it looks basically correct
to me, although that could just be heatstroke or something, I went
out in the Big Blue Room, and was exposed for nearly an hour to
an essentially uncontrolled fusion reaction. :(

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

Phil Carmody

unread,
Apr 13, 2010, 5:19:10 PM4/13/10
to
Ian Collins <ian-...@hotmail.com> writes:
> On 04/14/10 09:01 AM, candide wrote:
>> I request your opinion about the following attempt to implement the
>> standard function strstr. Here is the code :
>
> No, it looks like you are writing strchr.
>>
>>
>> char *mystrchr(const char *s, int c)
>> {
>> while ((unsigned char)*s != (unsigned char)c) {
>
> Why the casts?

Because without them

char buff[LINELEN];
while(get_line_from_post(buff, sizeof(buff))) {
/* detect mangled indents from googoogroups */
char *mangledchars=buff;
while ((mangledchars=mystrchr(mangledchars, 160)) != NULL)
*mangledchars=' ';
}

may fail to fix any non-breaking spaces.
--
I find the easiest thing to do is to k/f myself and just troll away
-- David Melville on r.a.s.f1

candide

unread,
Apr 13, 2010, 5:25:37 PM4/13/10
to
Ian Collins a �crit :

>
> No, it looks like you are writing strchr.
>>

Yes, sorry for the misprint.

>>
>> char *mystrchr(const char *s, int c)
>> {
>> while ((unsigned char)*s != (unsigned char)c) {
>
> Why the casts?

Why not ;) ?

In fact, the code is not mine so I can't tell you why. I suppose the
casts are needed for the function to have a correct behaviour in order
to locate some character in the extended character set (accent for
instance).

Ian Collins

unread,
Apr 13, 2010, 5:31:38 PM4/13/10
to
On 04/14/10 09:19 AM, Phil Carmody wrote:
> Ian Collins<ian-...@hotmail.com> writes:
>> On 04/14/10 09:01 AM, candide wrote:
>>> I request your opinion about the following attempt to implement the
>>> standard function strstr. Here is the code :
>>
>> No, it looks like you are writing strchr.
>>>
>>>
>>> char *mystrchr(const char *s, int c)
>>> {
>>> while ((unsigned char)*s != (unsigned char)c) {
>>
>> Why the casts?
>
> Because without them
>
> char buff[LINELEN];
> while(get_line_from_post(buff, sizeof(buff))) {
> /* detect mangled indents from googoogroups */
> char *mangledchars=buff;
> while ((mangledchars=mystrchr(mangledchars, 160)) != NULL)
> *mangledchars=' ';
> }
>
> may fail to fix any non-breaking spaces.

Eh?

--
Ian Collins

Dann Corbit

unread,
Apr 13, 2010, 5:35:31 PM4/13/10
to
In article <4bc4dba9$0$24101$426a...@news.free.fr>,
can...@free.invalid says...

The casts add nothing.

This seems to be a user implementation of strchr() and not strstr().

With strstr() you are looking for a sequence of characters that match
and not a single character.

As long as you are writing a strstr() implementation, I suggest the BMH
algorithm.

P.S.
Is this one of those:

PARIS IN THE
THE SPRING

jokes?

Phil Carmody

unread,
Apr 13, 2010, 5:45:35 PM4/13/10
to

If chars are signed, then you'll be comparing char -96 with int 160,
and not find any matches.

Phil

Keith Thompson

unread,
Apr 13, 2010, 6:17:20 PM4/13/10
to
Dann Corbit <dco...@connx.com> writes:
> In article <4bc4dba9$0$24101$426a...@news.free.fr>,
> can...@free.invalid says...
>>
>> I request your opinion about the following attempt to implement the
>> standard function strstr. Here is the code :
>>
>> char *mystrchr(const char *s, int c)
>> {
>> while ((unsigned char)*s != (unsigned char)c) {
>> if (*s == '\0')
>> return NULL;
>> ++s;
>> }
>> return (char*)s;
>> }
>
> The casts add nothing.

I don't think that's true, though the second cast is probably
unnecessary.

Assume plain char is signed, with CHAR_BIT==8. Consider
*s == -96
c == 160
Without the casts, *s is promoted from plain char to int, with the
value -96, which is not equal to the int value 160. Converting the
char value -96 to unsigned char yields 160, which correctly promotes
to the value 160 of type int.

Interestingly the Standard's description of strchr() requires c, the
int argument to be converted to a char. Typically, given the above
assumptions, the int value 160 will be converted to the char value
-96, but this isn't actually guaranteed.

> This seems to be a user implementation of strchr() and not strstr().

That was already acknowledged.

[snip]

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Ian Collins

unread,
Apr 13, 2010, 6:17:32 PM4/13/10
to
On 04/14/10 09:45 AM, Phil Carmody wrote:
> Ian Collins<ian-...@hotmail.com> writes:
>> On 04/14/10 09:19 AM, Phil Carmody wrote:
>>> Ian Collins<ian-...@hotmail.com> writes:
>>>> On 04/14/10 09:01 AM, candide wrote:
>>>>> I request your opinion about the following attempt to implement the
>>>>> standard function strstr. Here is the code :
>>>>
>>>> No, it looks like you are writing strchr.
>>>>>
>>>>>
>>>>> char *mystrchr(const char *s, int c)
>>>>> {
>>>>> while ((unsigned char)*s != (unsigned char)c) {
>>>>
>>>> Why the casts?
>>>
>>> Because without them
>>>
>>> char buff[LINELEN];
>>> while(get_line_from_post(buff, sizeof(buff))) {
>>> /* detect mangled indents from googoogroups */
>>> char *mangledchars=buff;
>>> while ((mangledchars=mystrchr(mangledchars, 160)) != NULL)
>>> *mangledchars=' ';
>>> }
>>>
>>> may fail to fix any non-breaking spaces.
>>
>> Eh?
>
> If chars are signed, then you'll be comparing char -96 with int 160,
> and not find any matches.
>
So? a simple cast to char would be fine:

while (*s != (char)c)

As the standard says (my emphasis):

"The strchr function locates the first occurrence of c *(converted to a
char)* in string s".

--
Ian Collins

Peter Nilsson

unread,
Apr 13, 2010, 7:00:34 PM4/13/10
to
Phil Carmody <thefatphil_demun...@yahoo.co.uk> wrote:

> Ian Collins <ian-n...@hotmail.com> writes:
> > On 04/14/10 09:19 AM, Phil Carmody wrote:
> > > Ian Collins<ian-n...@hotmail.com>  writes:

> > > > On 04/14/10 09:01 AM, candide wrote:
> > > > > I request your opinion about the following attempt
> > > > > to implement the standard function strstr. Here is
> > > > > the code:
> > > >
> > > > No, it looks like you are writing strchr.
> > > >
> > > > > char *mystrchr(const char *s, int c)
> > > > > {
> > > > > while ((unsigned char)*s != (unsigned char)c) {
> > > >
> > > > Why the casts?
>
> If chars are signed, then you'll be comparing char -96 with
> int 160, and not find any matches.

The standard requires c to be converted to a char for strchr.
In which case, strchr effectively invokes undefined behaviour
(in C99) if char is signed and c is not in the range of signed
char.

--
Peter

Eric Sosman

unread,
Apr 13, 2010, 6:09:04 PM4/13/10
to
On 4/13/2010 5:45 PM, Phil Carmody wrote:
> Ian Collins<ian-...@hotmail.com> writes:
>> On 04/14/10 09:19 AM, Phil Carmody wrote:
>>> Ian Collins<ian-...@hotmail.com> writes:
>>>> On 04/14/10 09:01 AM, candide wrote:
>>>>> I request your opinion about the following attempt to implement the
>>>>> standard function strstr. Here is the code :
>>>>
>>>> No, it looks like you are writing strchr.
>>>>>
>>>>>
>>>>> char *mystrchr(const char *s, int c)
>>>>> {
>>>>> while ((unsigned char)*s != (unsigned char)c) {
>>>>
>>>> Why the casts?
>>>
>>> Because without them
>>>
>>> char buff[LINELEN];
>>> while(get_line_from_post(buff, sizeof(buff))) {
>>> /* detect mangled indents from googoogroups */
>>> char *mangledchars=buff;
>>> while ((mangledchars=mystrchr(mangledchars, 160)) != NULL)
>>> *mangledchars=' ';
>>> }
>>>
>>> may fail to fix any non-breaking spaces.
>>
>> Eh?
>
> If chars are signed, then you'll be comparing char -96 with int 160,
> and not find any matches.

Yabbut, it's still wrong. The second argument to strchr()
is an int which is "converted to a char" (7.21.5.2p2), not to
an unsigned char. So the given code behaves differently from
the Standard's description for any char value C and int value I
such that

(C == (char)I) != ((unsigned char)C == (unsigned char)I)

Can such a C,I pair exist? Yes, certainly, if I is less than
CHAR_MIN or greater than CHAR_MAX, because then (char)I is ill-
defined and can produce a peculiar result (6.3.1.3p2). That
peculiar result need not be closely related to the (well-defined)
result of converting to unsigned char, so the two comparisons
could come out differently -- in which case, the fix is to remove
the `unsigned' from both casts and allow the peculiar behavior
to occur. It's what the Standard requires.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Seebs

unread,
Apr 13, 2010, 6:21:08 PM4/13/10
to
On 2010-04-13, candide <can...@free.invalid> wrote:
> In fact, the code is not mine so I can't tell you why. I suppose the
> casts are needed for the function to have a correct behaviour in order
> to locate some character in the extended character set (accent for
> instance).

I don't think they are, actually. The "unsigned char" thing is needed
when using is*() functions, and for processing things like the result of
getchar(), etcetera, but in the case of strchr, it seems as though plain
char is right.

candide

unread,
Apr 13, 2010, 5:55:56 PM4/13/10
to
Dann Corbit a �crit :

> The casts add nothing.
>

Does it mean the casts don't hurt ? In other words : may the casts lead
to a wrong behaviour ?


> This seems to be a user implementation of strchr() and not strstr().
>

I meant strchr and not strstr. Sorry for the misprint.

Phil Carmody

unread,
Apr 14, 2010, 2:17:45 AM4/14/10
to

I was addressing just mystrchr as posted, not strchr. However, as
you continue, the comparison is a most enlightening one.

> So the given code behaves differently from
> the Standard's description for any char value C and int value I
> such that
>
> (C == (char)I) != ((unsigned char)C == (unsigned char)I)
>
> Can such a C,I pair exist? Yes, certainly, if I is less than
> CHAR_MIN or greater than CHAR_MAX, because then (char)I is ill-
> defined and can produce a peculiar result (6.3.1.3p2). That
> peculiar result need not be closely related to the (well-defined)
> result of converting to unsigned char, so the two comparisons
> could come out differently -- in which case, the fix is to remove
> the `unsigned' from both casts and allow the peculiar behavior
> to occur. It's what the Standard requires.

I don't know quite how much 'the standard is occasionally a bit dumb'
was supposed to be in that, but with my 'the standard is occasionally
a bit dumb'-tinted spectacles, I certainly detected some.

So, got any portable C code for turning latin-1 NBSPs from googoogroups
into ' ' using strchr to find the errant chars?

Phil Carmody

unread,
Apr 14, 2010, 2:30:12 AM4/14/10
to
Ian Collins <ian-...@hotmail.com> writes:
> On 04/14/10 09:45 AM, Phil Carmody wrote:
>> Ian Collins<ian-...@hotmail.com> writes:
>>> On 04/14/10 09:19 AM, Phil Carmody wrote:
>>>> Ian Collins<ian-...@hotmail.com> writes:
>>>>> On 04/14/10 09:01 AM, candide wrote:
>>>>>> I request your opinion about the following attempt to implement the
>>>>>> standard function strstr. Here is the code :
>>>>>
>>>>> No, it looks like you are writing strchr.
>>>>>>
>>>>>>
>>>>>> char *mystrchr(const char *s, int c)
>>>>>> {
>>>>>> while ((unsigned char)*s != (unsigned char)c) {
>>>>>
>>>>> Why the casts?
>>>>
>>>> Because without them
>>>>
>>>> char buff[LINELEN];
>>>> while(get_line_from_post(buff, sizeof(buff))) {
>>>> /* detect mangled indents from googoogroups */
>>>> char *mangledchars=buff;
>>>> while ((mangledchars=mystrchr(mangledchars, 160)) != NULL)
>>>> *mangledchars=' ';
>>>> }
>>>>
>>>> may fail to fix any non-breaking spaces.
>>>
>>> Eh?
>>
>> If chars are signed, then you'll be comparing char -96 with int 160,
>> and not find any matches.
>
> So?

So it won't do what was requred of it.

> a simple cast to char would be fine:

Now that's weasely wording. The cast you like is 'simple', but the
two casts from the OP are so horrible they must be avoided no matter
what?

> while (*s != (char)c)
>
> As the standard says (my emphasis):
>
> "The strchr function locates the first occurrence of c *(converted to
> a char)* in string s".

But, on the DS-2010, (char)'\xA0' is 127. That's going to replace all
the DEL characters instead!

Phil Carmody

unread,
Apr 14, 2010, 2:37:27 AM4/14/10
to
Seebs <usenet...@seebs.net> writes:
> On 2010-04-13, candide <can...@free.invalid> wrote:
>> In fact, the code is not mine so I can't tell you why. I suppose the
>> casts are needed for the function to have a correct behaviour in order
>> to locate some character in the extended character set (accent for
>> instance).
>
> I don't think they are, actually. The "unsigned char" thing is needed
> when using is*() functions, and for processing things like the result of
> getchar(), etcetera, but in the case of strchr, it seems as though plain
> char is right.

The values of the hex and octal escapes in character constants are
unsigned char too. (That are mapped onto ints as they become the
actual character constants.)

Ian Collins

unread,
Apr 14, 2010, 3:32:29 AM4/14/10
to
On 04/14/10 06:30 PM, Phil Carmody wrote:
> Ian Collins<ian-...@hotmail.com> writes:
>> On 04/14/10 09:45 AM, Phil Carmody wrote:
>>>
>>> If chars are signed, then you'll be comparing char -96 with int 160,
>>> and not find any matches.
>>
>> So?
>
> So it won't do what was requred of it.
>
>> a simple cast to char would be fine:
>
> Now that's weasely wording. The cast you like is 'simple', but the
> two casts from the OP are so horrible they must be avoided no matter
> what?

Yes, one is bad enough thank you!

>> while (*s != (char)c)
>>
>> As the standard says (my emphasis):
>>
>> "The strchr function locates the first occurrence of c *(converted to
>> a char)* in string s".
>
> But, on the DS-2010, (char)'\xA0' is 127. That's going to replace all
> the DEL characters instead!

So it doesn't have a conforming C implementation...

--
Ian Collins

Eric Sosman

unread,
Apr 14, 2010, 8:21:12 AM4/14/10
to
On 4/14/2010 2:17 AM, Phil Carmody wrote:
> [...]

> So, got any portable C code for turning latin-1 NBSPs from googoogroups
> into ' ' using strchr to find the errant chars?

Assuming it's "portable" to know the numeric code of the
character in question,

#define NBSP ...whatever...
for (char *p; (p = strchr(text, NBSP)) != NULL; )
*p = ' ';

... should do it. Knowing your system's own value for NBSP is
the non-portable part, and I can't think of a way to make it
100% portable. (Keep in mind that the character encoding on
the originating system may not be the same as on yours, and that
translations may have occurred en route. Indeed, if the origin
used the value 160 for NBSP and your system has CHAR_MAX<160,
translation *must* have occurred.)

--
Eric Sosman
eso...@ieee-dot-org.invalid

Eric Sosman

unread,
Apr 14, 2010, 9:47:33 AM4/14/10
to
On 4/14/2010 8:21 AM, Eric Sosman wrote:
> On 4/14/2010 2:17 AM, Phil Carmody wrote:
>> [...]
>> So, got any portable C code for turning latin-1 NBSPs from googoogroups
>> into ' ' using strchr to find the errant chars?
>
> Assuming it's "portable" to know the numeric code of the
> character in question,
>
> #define NBSP ...whatever...
> for (char *p; (p = strchr(text, NBSP)) != NULL; )
> *p = ' ';

Um, er, make that

#define NBSP ...
for (char *p = text; (p = strchr(p, NBSP)) != NULL; )
*p++ = ' ';

Sorry for the thinko.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Kenneth Brody

unread,
Apr 14, 2010, 12:27:28 PM4/14/10
to
On 4/13/2010 5:31 PM, Ian Collins wrote:
> On 04/14/10 09:19 AM, Phil Carmody wrote:
>> Ian Collins<ian-...@hotmail.com> writes:
>>> On 04/14/10 09:01 AM, candide wrote:
>>>> I request your opinion about the following attempt to implement the
[...]

>>>> char *mystrchr(const char *s, int c)
>>>> {
>>>> while ((unsigned char)*s != (unsigned char)c) {
>>>
>>> Why the casts?
>>
>> Because without them
[...]

>> while ((mangledchars=mystrchr(mangledchars, 160)) != NULL)
[...]

>> may fail to fix any non-breaking spaces.
>
> Eh?

Some platforms use signed chars, and (char)160 != (int)160.

This prints "no" on my system:

#include <stdio.h>

main()
{
char c = 160;
int i = 160;

if ( c == i )
printf("yes\n");
else
printf("no\n");
}


--
Kenneth Brody

Kenneth Brody

unread,
Apr 14, 2010, 12:34:56 PM4/14/10
to
On 4/13/2010 6:17 PM, Keith Thompson wrote:
> Dann Corbit<dco...@connx.com> writes:
>> In article<4bc4dba9$0$24101$426a...@news.free.fr>,
>> can...@free.invalid says...
>>>
>>> I request your opinion about the following attempt to implement the
>>> standard function strstr. Here is the code :
[...]

>>> while ((unsigned char)*s != (unsigned char)c) {
[...]

>> The casts add nothing.
>
> I don't think that's true, though the second cast is probably
> unnecessary.

Actually, I think it is necessary in the above statement. What happens if,
rather than calling mystrchr(foo,160) you use mystrchr(foo,'\xa0')?

With only the first cast, this will compare (unsigned char)160 to (int)-96.

[...]


> Interestingly the Standard's description of strchr() requires c, the
> int argument to be converted to a char. Typically, given the above
> assumptions, the int value 160 will be converted to the char value
> -96, but this isn't actually guaranteed.

Wouldn't this be sufficient then, given that s is of type "char *"?

while ( *s != (char)c )

[...]

--
Kenneth Brody

Phil Carmody

unread,
Apr 14, 2010, 4:33:30 PM4/14/10
to

Care you C&V that claim. In case you do, here's my counter in
advance:

The value of the hex escape \xA0 is 160 as an unsigned char.
The int value unsigned char 160 maps onto is 160, so '\xA0' is an
int with value 160.
char has the same range as signed char on the DS-2010, and so the
maximum value of a char is 127. The conversion of out-of-range
values onto signed char is that of clipping the value at whichever
bound is exceeded. So int 160 is converted to 127.

Which bit(s) of that violated which paragraph(s) of the standard?

Phil Carmody

unread,
Apr 14, 2010, 4:34:36 PM4/14/10
to
Kenneth Brody <kenb...@spamcop.net> writes:
> On 4/13/2010 6:17 PM, Keith Thompson wrote:
>> Dann Corbit<dco...@connx.com> writes:
>>> In article<4bc4dba9$0$24101$426a...@news.free.fr>,
>>> can...@free.invalid says...
>>>>
>>>> I request your opinion about the following attempt to implement the
>>>> standard function strstr. Here is the code :
> [...]
>>>> while ((unsigned char)*s != (unsigned char)c) {
> [...]
>>> The casts add nothing.
>>
>> I don't think that's true, though the second cast is probably
>> unnecessary.
>
> Actually, I think it is necessary in the above statement. What
> happens if, rather than calling mystrchr(foo,160) you use
> mystrchr(foo,'\xa0')?

'\xa0' has value 160.

Ian Collins

unread,
Apr 14, 2010, 4:46:22 PM4/14/10
to
On 04/15/10 08:33 AM, Phil Carmody wrote:
> Ian Collins<ian-...@hotmail.com> writes:
>> On 04/14/10 06:30 PM, Phil Carmody wrote:
>>> But, on the DS-2010, (char)'\xA0' is 127. That's going to replace all
>>> the DEL characters instead!
>>
>> So it doesn't have a conforming C implementation...
>
> Care you C&V that claim. In case you do, here's my counter in
> advance:
>
> The value of the hex escape \xA0 is 160 as an unsigned char.
> The int value unsigned char 160 maps onto is 160, so '\xA0' is an
> int with value 160.
> char has the same range as signed char on the DS-2010, and so the
> maximum value of a char is 127. The conversion of out-of-range
> values onto signed char is that of clipping the value at whichever
> bound is exceeded. So int 160 is converted to 127.
>
> Which bit(s) of that violated which paragraph(s) of the standard?

Irrelevant, if CHAR_MAX=127, you can't have a character value of 160. So
"hex escape \xA0' can never appear in a string.

--
Ian Collins

Ian Collins

unread,
Apr 14, 2010, 4:53:14 PM4/14/10
to
On 04/15/10 04:27 AM, Kenneth Brody wrote:
>
> Some platforms use signed chars, and (char)160 != (int)160.
>
> This prints "no" on my system:
>
> #include <stdio.h>
>
> main()
> {
> char c = 160;
> int i = 160;
>
> if ( c == i )
> printf("yes\n");
> else
> printf("no\n");
> }

It upsets the compiler on mine...

Anyway, that isn't the issue. If the value 160 (-96 in signed char)
appeared in a character string, it would compare equal to (char)160. To
reuse your example:

int main()
{
int i = 160;
char c = 160;

if ( c == (char)i )


printf("yes\n");
else
printf("no\n");
}

--
Ian Collins

Keith Thompson

unread,
Apr 14, 2010, 4:59:23 PM4/14/10
to

Incorrect.

Assume CHAR_MAX==127 and UCHAR_MAX==255 (this is very common).

C99 6.4.4.4p9 says:

Constraints

9 The value of an octal or hexadecimal escape sequence shall be in the
range of representable values for the type *unsigned char* for an
integer character constant, or the unsigned type corresponding to
wchar_t for a wide character constant.

(emphasis added)

Note that UCHAR_MAX must be at least 255, so '\xA0' is always legal
(and is always of type int with the value 160).

Ian Collins

unread,
Apr 14, 2010, 5:06:51 PM4/14/10
to
On 04/15/10 08:59 AM, Keith Thompson wrote:
> Ian Collins<ian-...@hotmail.com> writes:
>> On 04/15/10 08:33 AM, Phil Carmody wrote:
>>> Ian Collins<ian-...@hotmail.com> writes:
>>>> On 04/14/10 06:30 PM, Phil Carmody wrote:
>>>>> But, on the DS-2010, (char)'\xA0' is 127. That's going to replace all
>>>>> the DEL characters instead!
>>>>
>>>> So it doesn't have a conforming C implementation...
>>>
>>> Care you C&V that claim. In case you do, here's my counter in
>>> advance:
>>>
>>> The value of the hex escape \xA0 is 160 as an unsigned char.
>>> The int value unsigned char 160 maps onto is 160, so '\xA0' is an
>>> int with value 160.
>>> char has the same range as signed char on the DS-2010, and so the
>>> maximum value of a char is 127. The conversion of out-of-range
>>> values onto signed char is that of clipping the value at whichever
>>> bound is exceeded. So int 160 is converted to 127.
>>>
>>> Which bit(s) of that violated which paragraph(s) of the standard?
>>
>> Irrelevant, if CHAR_MAX=127, you can't have a character value of 160. So
>> "hex escape \xA0' can never appear in a string.
>
> Incorrect.
>
> Assume CHAR_MAX==127 and UCHAR_MAX==255 (this is very common).

Oops, I misinterpreted Phil's post. No i didn't, he said

"The conversion of out-of-range values onto signed char is that of
clipping the value at whichever bound is exceeded. So int 160 is
converted to 127."

Which isn't what happens in the common case you quoted. My analysis stands.

--
Ian Collins

Seebs

unread,
Apr 14, 2010, 5:14:36 PM4/14/10
to
On 2010-04-14, Ian Collins <ian-...@hotmail.com> wrote:
> Irrelevant, if CHAR_MAX=127, you can't have a character value of 160. So
> "hex escape \xA0' can never appear in a string.

So how, on this hypothetical system, would you indicate a character with
a negative value?

On many systems with signed char, you can represent -1 as \xff, because
it wraps around in the "expected" way.

Keith Thompson

unread,
Apr 14, 2010, 5:23:32 PM4/14/10
to

The standard requires the second argument to strchr() to be converted
from int to char (C99 7.21.5.2p2).

If plain char is signed, and the int value is outside the range
CHAR_MIN..CHAR_MAX, the result of the conversion is at best
implementation-defined (C99 6.3.1.3p3); the clipping Phil described
is legal.

The typical output of this program:

#include <stdio.h>
#include <string.h>

int main(void)
{
const char *s = "\xff--\xa0";
const char *result = strchr(s, '\xa0');

if (result == NULL) {
puts("result == NULL");
}
else {
printf("result - s = %d\n", (int)(result - s));
}
return 0;
}

is "result - s = 3", but I think a conforming implementation on which
signed conversion saturates rather than wrapping around could print
"result - s = 0"; since both '\xff' and '\xa0' yield the same value
when converted to char, we get a false positive match. For the
implementation in question, strchr() doesn't do what we expect it to
*because* the implementation conforms to the standard.

If the standard said that each character of the string *and* the int
value are converted to unsigned char before the comparison, we
wouldn't have this potential problem.

I know of no real-world implementations that have this problem, though
non-2's-complement systems might introduce some interesting corner
cases.

Phil Carmody

unread,
Apr 14, 2010, 5:55:30 PM4/14/10
to

WHo's putting hex escapes in strings? I'm certainly not.
I'm putting hex escapes in character constants, and getting
my strings from an outside source such as fgets.

Peter Nilsson

unread,
Apr 14, 2010, 6:54:38 PM4/14/10
to
Phil Carmody <thefatphil_demun...@yahoo.co.uk> wrote:
> ... on the DS-2010, (char)'\xA0' is 127.

I don't see how it can be. "The value of an integer character
constant containing a single character that maps to a single-
byte execution character is the numerical value of the
representation of the mapped character interpreted as an
integer."

Hence, the value of '\xA0' is as if...

({ unsigned char tmp = 0xA0; *(char*)&tmp; })

The representation 10100000 doesn't yield 127 on any
of the three number systems that might apply to signed
char.

--
Peter

Keith Thompson

unread,
Apr 14, 2010, 7:17:17 PM4/14/10
to

It's not the character constant that gives you 127, it's the
conversion specified by the cast.

'\xA0' is of type int with value 160. This is true regardless of the
range or signedness of plain char, or any other system-specific
consideration. The constant '\xA0' is the same as the constant 160
(unless you stringize it).

The cast causes the value 160 to be converted from int to char. If
plain char is signed and CHAR_MAX < 160, then the result of the
conversion is governed by C99 6.3.1.3p3:

Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined
or an implementation-defined signal is raised.

Most implementations behave "reasonably" by converting 160 to -96 (and
unforunately the standard seems to implicitly assume this behavior, or
something very much like it, in the descriptions of some of the string
functions).

Ian Collins

unread,
Apr 14, 2010, 9:25:06 PM4/14/10
to

That's the point I've been making all along. The representation of char
is irrelevant. So back to the original point, the two casts to unsigned
char are both superfluous and wrong. The condition should be

while (*s != (char)c)

--
Ian Collins

Kenneth Brody

unread,
Apr 14, 2010, 10:35:12 PM4/14/10
to
On 4/14/2010 4:34 PM, Phil Carmody wrote:
> Kenneth Brody<kenb...@spamcop.net> writes:
[...]

>> What happens if, rather than calling mystrchr(foo,160) you use
>> mystrchr(foo,'\xa0')?
>
> '\xa0' has value 160.

Even if chars are signed?

On my system, this prints "-96":

#include <stdio.h>

main()
{
int i = '\xa0';

printf("%d\n",i);
}

--
Kenneth Brody

Keith Thompson

unread,
Apr 14, 2010, 11:37:10 PM4/14/10
to

Yeah, mine too.

I didn't read far enough in the standard. C99 6.4.4.4p9 says:

Constraints

9 The value of an octal or hexadecimal escape sequence shall

be in the range of representable values for the type unsigned
char for an integer character constant, or the unsigned type


corresponding to wchar_t for a wide character constant.

I mistakenly took that to be a specification of the value of
a character constant containing an octal or hexadecimal escape
sequence, but it isn't. It's just the value of the escape sequence.
The value of the character constant is defined in paragraph 10,
under Semantics:

If an integer character constant contains a single character
or escape sequence, its value is the one that results when
an object with type char whose value is that of the single
character or escape sequence is converted to type int.

I still find the wording a bit shaky. In '\xa0', the value of the
escape sequence, as defined by paragraph 9, is 160. Given that plain
char is signed and CHAR_BIT==8, an object with type char *cannot*
have the value 160.

The standard seems to be assuming that the values 160 and -96 are
interchangeable when stored in a char object. Either I've missed
something else obvious (which is quite possible), or the standard
is playing fast and loose with signed and unsigned values.

I think the standard's accuracy would be improved by changing a
lot of references to character values so they refer to the result
of converting those values to unsigned char. There seem to be a
lot of places that would be improved by this change, including the
description of strchr().

(Mandating that plain char is unsigned would also simplify things,
but that's probably not feasible.)

I've been posting on this thread asserting that '\xa0' is 160.
I apologize for the unintentional misinformation.

Peter Nilsson

unread,
Apr 15, 2010, 12:39:41 AM4/15/10
to
Keith Thompson <ks...@mib.org> wrote:
> Peter Nilsson <ai...@acay.com.au> writes:
> > Phil Carmody <thefatphil_demun...@yahoo.co.uk> wrote:
> > > ... on the DS-2010, (char)'\xA0' is 127.
> >
> > I don't see how it can be. "The value of an integer
> > character constant containing a single character that
> > maps to a single-byte execution character is the

> > numerical value of the representation of the mapped
> > character interpreted as an integer."
>
> It's not the character constant that gives you 127,
> it's the conversion specified by the cast.

Fair enough, missed the (char).

> '\xA0' is of type int with value 160. This is true
> regardless of the range or signedness of plain char,
> or any other system-specific consideration. The
> constant '\xA0' is the same as the constant 160
> (unless you stringize it).

Not necessarily, for reasons cited above.

% type schar.c
#include <limits.h>
#include <stdio.h>

int main(void)
{
printf("CHAR_BIT is %d\n", CHAR_BIT);
printf("char is %ssigned\n", (char) -1 < 0 ? "" : "un");
printf("'\\xA0' is %d\n", '\xA0');
return 0;
}

% acc schar.c -o schar.exe

% schar.exe
CHAR_BIT is 8
char is signed
'\xA0' is -96

%

--
Peter

Peter Nilsson

unread,
Apr 15, 2010, 12:54:13 AM4/15/10
to
On Apr 15, 2:39 pm, Peter Nilsson <ai...@acay.com.au> wrote:
> Keith Thompson <ks...@mib.org> wrote:
> > Peter Nilsson <ai...@acay.com.au> writes:
> > > Phil Carmody <thefatphil_demun...@yahoo.co.uk> wrote:
> > > > ... on the DS-2010, (char)'\xA0' is 127.
> > >
> > > I don't see how it can be. "The value of an integer
> > > character constant containing a single character that
> > > maps to a single-byte execution character is the
> > > numerical value of the representation of the mapped
> > > character interpreted as an integer."
> >
> > It's not the character constant that gives you 127,
> > it's the conversion specified by the cast.
>
> Fair enough, missed the (char).

No, I take that back, I was right the first time. The
cast to char is redundant as far as converting the value
of '\xA0' because it is necessarily already in the range
of char.

The only possible values for '\xA0' are:

160 - char is unsigned
160 - char is signed, CHAR_BIT > 8
-96 - char is signed, CHAR_BIT == 8, two's complement
-95 - char is signed, CHAR_BIT == 8, ones' complement
-32 - char is signed, CHAR_BIT == 8, sign magnitude

--
Peter

Michael Foukarakis

unread,
Apr 15, 2010, 3:23:26 AM4/15/10
to
On Apr 15, 6:37 am, Keith Thompson <ks...@mib.org> wrote:

> Kenneth Brody <kenbr...@spamcop.net> writes:
> > On 4/14/2010 4:34 PM, Phil Carmody wrote:
> > > Kenneth Brody<kenbr...@spamcop.net>  writes:

If I'm not mistaken, the standard (since ANSI) mandates that char,
signed char and unsigned char are three different types. The whole
confusion in this thread stems from the fact that char has the same
values and representation as either of the other two types - this is
implementation-defined. Personally, I wonder what the rationale for
such a decision was, although I can guess.

Keith Thompson

unread,
Apr 15, 2010, 3:40:04 AM4/15/10
to
Keith Thompson <ks...@mib.org> writes:
[...]

> '\xA0' is of type int with value 160. This is true regardless of the
> range or signedness of plain char, or any other system-specific
> consideration. The constant '\xA0' is the same as the constant 160
> (unless you stringize it).
[...]

For the record, I was wrong about this, as I explained in more
detail elsethread.

Keith Thompson

unread,
Apr 15, 2010, 3:45:43 AM4/15/10
to
Peter Nilsson <ai...@acay.com.au> writes:
> On Apr 15, 2:39 pm, Peter Nilsson <ai...@acay.com.au> wrote:
> > Keith Thompson <ks...@mib.org> wrote:
> > > Peter Nilsson <ai...@acay.com.au> writes:
> > > > Phil Carmody <thefatphil_demun...@yahoo.co.uk> wrote:
> > > > > ... on the DS-2010, (char)'\xA0' is 127.
> > > >
> > > > I don't see how it can be. "The value of an integer
> > > > character constant containing a single character that
> > > > maps to a single-byte execution character is the
> > > > numerical value of the representation of the mapped
> > > > character interpreted as an integer."
> > >
> > > It's not the character constant that gives you 127,
> > > it's the conversion specified by the cast.
> >
> > Fair enough, missed the (char).
>
> No, I take that back, I was right the first time. The
> cast to char is redundant as far as converting the value
> of '\xA0' because it is necessarily already in the range
> of char.

Right.

> The only possible values for '\xA0' are:
>
> 160 - char is unsigned
> 160 - char is signed, CHAR_BIT > 8
> -96 - char is signed, CHAR_BIT == 8, two's complement
> -95 - char is signed, CHAR_BIT == 8, ones' complement
> -32 - char is signed, CHAR_BIT == 8, sign magnitude

You're probably right as far as the intent is concerned, but I think
the wording of the standard is internally inconsistent.

Ian Collins

unread,
Apr 15, 2010, 5:00:50 AM4/15/10
to
On 04/15/10 07:23 PM, Michael Foukarakis wrote:
>
> If I'm not mistaken, the standard (since ANSI) mandates that char,
> signed char and unsigned char are three different types.

Distinct types rather than different, the wording difference is significant.

See section 6.2.5 para 15.

--
Ian Collins

candide

unread,
Apr 15, 2010, 8:48:05 AM4/15/10
to
Ian Collins a écrit :

>
> That's the point I've been making all along. The representation of char
> is irrelevant. So back to the original point, the two casts to unsigned
> char are both superfluous and wrong.

I don't work out how you make compatible things both "superfluous" and
"wrong" : usually, superfluous suppose things don't hurt, this is not
usually the case for something _wrong_.

Do you mean the casts are always superfluous and sometimes wrong ?

Phil Carmody

unread,
Apr 15, 2010, 5:37:22 PM4/15/10
to
Kenneth Brody <kenb...@spamcop.net> writes:
> On 4/14/2010 4:34 PM, Phil Carmody wrote:
>> Kenneth Brody<kenb...@spamcop.net> writes:
> [...]
>>> What happens if, rather than calling mystrchr(foo,160) you use
>>> mystrchr(foo,'\xa0')?
>>
>> '\xa0' has value 160.
>
> Even if chars are signed?

The standard doesn't put a condition on the signedness of chars
when it specifies what the value should be, so presumably yes.

> On my system, this prints "-96":
>
> #include <stdio.h>
>
> main()
> {
> int i = '\xa0';
>
> printf("%d\n",i);
> }

Mine too. I guess -96 is the value of the a char with value 160 in
this instance.

Phil Carmody

unread,
Apr 15, 2010, 5:44:19 PM4/15/10
to
Keith Thompson <ks...@mib.org> writes:

> Peter Nilsson <ai...@acay.com.au> writes:
>> The only possible values for '\xA0' are:
>>
>> 160 - char is unsigned
>> 160 - char is signed, CHAR_BIT > 8
>> -96 - char is signed, CHAR_BIT == 8, two's complement
>> -95 - char is signed, CHAR_BIT == 8, ones' complement
>> -32 - char is signed, CHAR_BIT == 8, sign magnitude
>
> You're probably right as far as the intent is concerned, but I think
> the wording of the standard is internally inconsistent.

That's one way of putting it. I'd almost go as far as to say it's
broken, as it requires something to have the same value as something
that doesn't exist in this case.

Ian Collins

unread,
Apr 15, 2010, 5:51:02 PM4/15/10
to

What I mean is given the requirements for strchr,

while ((unsigned char)*s != (unsigned char)c)

casts to the wrong type. Using the cast to the required type, the cast
of s is not required.

Keith Thompson

unread,
Apr 15, 2010, 6:06:12 PM4/15/10
to
Phil Carmody <thefatphi...@yahoo.co.uk> writes:
> Kenneth Brody <kenb...@spamcop.net> writes:
> > On 4/14/2010 4:34 PM, Phil Carmody wrote:
> >> Kenneth Brody<kenb...@spamcop.net> writes:
> > [...]
> >>> What happens if, rather than calling mystrchr(foo,160) you use
> >>> mystrchr(foo,'\xa0')?
> >>
> >> '\xa0' has value 160.
> >
> > Even if chars are signed?
>
> The standard doesn't put a condition on the signedness of chars
> when it specifies what the value should be, so presumably yes.

It doesn't do so explicitly, but it does define the value of a
character constant in terms of the value of an object of type char.
C99 6.4.4.4p10:

If an integer character constant contains a single character
or escape sequence, its value is the one that results when
an object with type char whose value is that of the single
character or escape sequence is converted to type int.

The value of an object of type char must be in the range
CHAR_MIN..CHAR_MAX (which, if plain char is signed, is the same as
the range SCHAR_MIN..SCHAR_MAX, typically -128..+127).

> > On my system, this prints "-96":
> >
> > #include <stdio.h>
> >
> > main()
> > {
> > int i = '\xa0';
> >
> > printf("%d\n",i);
> > }
>
> Mine too. I guess -96 is the value of the a char with value 160 in
> this instance.

Yeah, that must be it. 8-)}

0 new messages