char *mystrchr(const char *s, int c)
{
while ((unsigned char)*s != (unsigned char)c) {
if (*s == '\0')
return NULL;
++s;
}
return (char*)s;
}
No, it looks like you are writing strchr.
>
>
> char *mystrchr(const char *s, int c)
> {
> while ((unsigned char)*s != (unsigned char)c) {
Why the casts?
> if (*s == '\0')
> return NULL;
> ++s;
> }
> return (char*)s;
> }
--
Ian Collins
I do not think this is a correct implementation of the standard
function str*str*. Apart from that, it looks basically correct
to me, although that could just be heatstroke or something, I went
out in the Big Blue Room, and was exposed for nearly an hour to
an essentially uncontrolled fusion reaction. :(
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
Because without them
char buff[LINELEN];
while(get_line_from_post(buff, sizeof(buff))) {
/* detect mangled indents from googoogroups */
char *mangledchars=buff;
while ((mangledchars=mystrchr(mangledchars, 160)) != NULL)
*mangledchars=' ';
}
may fail to fix any non-breaking spaces.
--
I find the easiest thing to do is to k/f myself and just troll away
-- David Melville on r.a.s.f1
>
> No, it looks like you are writing strchr.
>>
Yes, sorry for the misprint.
>>
>> char *mystrchr(const char *s, int c)
>> {
>> while ((unsigned char)*s != (unsigned char)c) {
>
> Why the casts?
Why not ;) ?
In fact, the code is not mine so I can't tell you why. I suppose the
casts are needed for the function to have a correct behaviour in order
to locate some character in the extended character set (accent for
instance).
Eh?
--
Ian Collins
The casts add nothing.
This seems to be a user implementation of strchr() and not strstr().
With strstr() you are looking for a sequence of characters that match
and not a single character.
As long as you are writing a strstr() implementation, I suggest the BMH
algorithm.
P.S.
Is this one of those:
PARIS IN THE
THE SPRING
jokes?
If chars are signed, then you'll be comparing char -96 with int 160,
and not find any matches.
Phil
I don't think that's true, though the second cast is probably
unnecessary.
Assume plain char is signed, with CHAR_BIT==8. Consider
*s == -96
c == 160
Without the casts, *s is promoted from plain char to int, with the
value -96, which is not equal to the int value 160. Converting the
char value -96 to unsigned char yields 160, which correctly promotes
to the value 160 of type int.
Interestingly the Standard's description of strchr() requires c, the
int argument to be converted to a char. Typically, given the above
assumptions, the int value 160 will be converted to the char value
-96, but this isn't actually guaranteed.
> This seems to be a user implementation of strchr() and not strstr().
That was already acknowledged.
[snip]
--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
while (*s != (char)c)
As the standard says (my emphasis):
"The strchr function locates the first occurrence of c *(converted to a
char)* in string s".
--
Ian Collins
The standard requires c to be converted to a char for strchr.
In which case, strchr effectively invokes undefined behaviour
(in C99) if char is signed and c is not in the range of signed
char.
--
Peter
Yabbut, it's still wrong. The second argument to strchr()
is an int which is "converted to a char" (7.21.5.2p2), not to
an unsigned char. So the given code behaves differently from
the Standard's description for any char value C and int value I
such that
(C == (char)I) != ((unsigned char)C == (unsigned char)I)
Can such a C,I pair exist? Yes, certainly, if I is less than
CHAR_MIN or greater than CHAR_MAX, because then (char)I is ill-
defined and can produce a peculiar result (6.3.1.3p2). That
peculiar result need not be closely related to the (well-defined)
result of converting to unsigned char, so the two comparisons
could come out differently -- in which case, the fix is to remove
the `unsigned' from both casts and allow the peculiar behavior
to occur. It's what the Standard requires.
--
Eric Sosman
eso...@ieee-dot-org.invalid
I don't think they are, actually. The "unsigned char" thing is needed
when using is*() functions, and for processing things like the result of
getchar(), etcetera, but in the case of strchr, it seems as though plain
char is right.
> The casts add nothing.
>
Does it mean the casts don't hurt ? In other words : may the casts lead
to a wrong behaviour ?
> This seems to be a user implementation of strchr() and not strstr().
>
I meant strchr and not strstr. Sorry for the misprint.
I was addressing just mystrchr as posted, not strchr. However, as
you continue, the comparison is a most enlightening one.
> So the given code behaves differently from
> the Standard's description for any char value C and int value I
> such that
>
> (C == (char)I) != ((unsigned char)C == (unsigned char)I)
>
> Can such a C,I pair exist? Yes, certainly, if I is less than
> CHAR_MIN or greater than CHAR_MAX, because then (char)I is ill-
> defined and can produce a peculiar result (6.3.1.3p2). That
> peculiar result need not be closely related to the (well-defined)
> result of converting to unsigned char, so the two comparisons
> could come out differently -- in which case, the fix is to remove
> the `unsigned' from both casts and allow the peculiar behavior
> to occur. It's what the Standard requires.
I don't know quite how much 'the standard is occasionally a bit dumb'
was supposed to be in that, but with my 'the standard is occasionally
a bit dumb'-tinted spectacles, I certainly detected some.
So, got any portable C code for turning latin-1 NBSPs from googoogroups
into ' ' using strchr to find the errant chars?
So it won't do what was requred of it.
> a simple cast to char would be fine:
Now that's weasely wording. The cast you like is 'simple', but the
two casts from the OP are so horrible they must be avoided no matter
what?
> while (*s != (char)c)
>
> As the standard says (my emphasis):
>
> "The strchr function locates the first occurrence of c *(converted to
> a char)* in string s".
But, on the DS-2010, (char)'\xA0' is 127. That's going to replace all
the DEL characters instead!
The values of the hex and octal escapes in character constants are
unsigned char too. (That are mapped onto ints as they become the
actual character constants.)
Yes, one is bad enough thank you!
>> while (*s != (char)c)
>>
>> As the standard says (my emphasis):
>>
>> "The strchr function locates the first occurrence of c *(converted to
>> a char)* in string s".
>
> But, on the DS-2010, (char)'\xA0' is 127. That's going to replace all
> the DEL characters instead!
So it doesn't have a conforming C implementation...
--
Ian Collins
Assuming it's "portable" to know the numeric code of the
character in question,
#define NBSP ...whatever...
for (char *p; (p = strchr(text, NBSP)) != NULL; )
*p = ' ';
... should do it. Knowing your system's own value for NBSP is
the non-portable part, and I can't think of a way to make it
100% portable. (Keep in mind that the character encoding on
the originating system may not be the same as on yours, and that
translations may have occurred en route. Indeed, if the origin
used the value 160 for NBSP and your system has CHAR_MAX<160,
translation *must* have occurred.)
--
Eric Sosman
eso...@ieee-dot-org.invalid
Um, er, make that
#define NBSP ...
for (char *p = text; (p = strchr(p, NBSP)) != NULL; )
*p++ = ' ';
Sorry for the thinko.
--
Eric Sosman
eso...@ieee-dot-org.invalid
Some platforms use signed chars, and (char)160 != (int)160.
This prints "no" on my system:
#include <stdio.h>
main()
{
char c = 160;
int i = 160;
if ( c == i )
printf("yes\n");
else
printf("no\n");
}
--
Kenneth Brody
Actually, I think it is necessary in the above statement. What happens if,
rather than calling mystrchr(foo,160) you use mystrchr(foo,'\xa0')?
With only the first cast, this will compare (unsigned char)160 to (int)-96.
[...]
> Interestingly the Standard's description of strchr() requires c, the
> int argument to be converted to a char. Typically, given the above
> assumptions, the int value 160 will be converted to the char value
> -96, but this isn't actually guaranteed.
Wouldn't this be sufficient then, given that s is of type "char *"?
while ( *s != (char)c )
[...]
--
Kenneth Brody
Care you C&V that claim. In case you do, here's my counter in
advance:
The value of the hex escape \xA0 is 160 as an unsigned char.
The int value unsigned char 160 maps onto is 160, so '\xA0' is an
int with value 160.
char has the same range as signed char on the DS-2010, and so the
maximum value of a char is 127. The conversion of out-of-range
values onto signed char is that of clipping the value at whichever
bound is exceeded. So int 160 is converted to 127.
Which bit(s) of that violated which paragraph(s) of the standard?
'\xa0' has value 160.
Irrelevant, if CHAR_MAX=127, you can't have a character value of 160. So
"hex escape \xA0' can never appear in a string.
--
Ian Collins
It upsets the compiler on mine...
Anyway, that isn't the issue. If the value 160 (-96 in signed char)
appeared in a character string, it would compare equal to (char)160. To
reuse your example:
int main()
{
int i = 160;
char c = 160;
if ( c == (char)i )
printf("yes\n");
else
printf("no\n");
}
--
Ian Collins
Incorrect.
Assume CHAR_MAX==127 and UCHAR_MAX==255 (this is very common).
C99 6.4.4.4p9 says:
Constraints
9 The value of an octal or hexadecimal escape sequence shall be in the
range of representable values for the type *unsigned char* for an
integer character constant, or the unsigned type corresponding to
wchar_t for a wide character constant.
(emphasis added)
Note that UCHAR_MAX must be at least 255, so '\xA0' is always legal
(and is always of type int with the value 160).
Oops, I misinterpreted Phil's post. No i didn't, he said
"The conversion of out-of-range values onto signed char is that of
clipping the value at whichever bound is exceeded. So int 160 is
converted to 127."
Which isn't what happens in the common case you quoted. My analysis stands.
--
Ian Collins
So how, on this hypothetical system, would you indicate a character with
a negative value?
On many systems with signed char, you can represent -1 as \xff, because
it wraps around in the "expected" way.
The standard requires the second argument to strchr() to be converted
from int to char (C99 7.21.5.2p2).
If plain char is signed, and the int value is outside the range
CHAR_MIN..CHAR_MAX, the result of the conversion is at best
implementation-defined (C99 6.3.1.3p3); the clipping Phil described
is legal.
The typical output of this program:
#include <stdio.h>
#include <string.h>
int main(void)
{
const char *s = "\xff--\xa0";
const char *result = strchr(s, '\xa0');
if (result == NULL) {
puts("result == NULL");
}
else {
printf("result - s = %d\n", (int)(result - s));
}
return 0;
}
is "result - s = 3", but I think a conforming implementation on which
signed conversion saturates rather than wrapping around could print
"result - s = 0"; since both '\xff' and '\xa0' yield the same value
when converted to char, we get a false positive match. For the
implementation in question, strchr() doesn't do what we expect it to
*because* the implementation conforms to the standard.
If the standard said that each character of the string *and* the int
value are converted to unsigned char before the comparison, we
wouldn't have this potential problem.
I know of no real-world implementations that have this problem, though
non-2's-complement systems might introduce some interesting corner
cases.
WHo's putting hex escapes in strings? I'm certainly not.
I'm putting hex escapes in character constants, and getting
my strings from an outside source such as fgets.
I don't see how it can be. "The value of an integer character
constant containing a single character that maps to a single-
byte execution character is the numerical value of the
representation of the mapped character interpreted as an
integer."
Hence, the value of '\xA0' is as if...
({ unsigned char tmp = 0xA0; *(char*)&tmp; })
The representation 10100000 doesn't yield 127 on any
of the three number systems that might apply to signed
char.
--
Peter
It's not the character constant that gives you 127, it's the
conversion specified by the cast.
'\xA0' is of type int with value 160. This is true regardless of the
range or signedness of plain char, or any other system-specific
consideration. The constant '\xA0' is the same as the constant 160
(unless you stringize it).
The cast causes the value 160 to be converted from int to char. If
plain char is signed and CHAR_MAX < 160, then the result of the
conversion is governed by C99 6.3.1.3p3:
Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined
or an implementation-defined signal is raised.
Most implementations behave "reasonably" by converting 160 to -96 (and
unforunately the standard seems to implicitly assume this behavior, or
something very much like it, in the descriptions of some of the string
functions).
That's the point I've been making all along. The representation of char
is irrelevant. So back to the original point, the two casts to unsigned
char are both superfluous and wrong. The condition should be
while (*s != (char)c)
--
Ian Collins
Even if chars are signed?
On my system, this prints "-96":
#include <stdio.h>
main()
{
int i = '\xa0';
printf("%d\n",i);
}
--
Kenneth Brody
Yeah, mine too.
I didn't read far enough in the standard. C99 6.4.4.4p9 says:
Constraints
9 The value of an octal or hexadecimal escape sequence shall
be in the range of representable values for the type unsigned
char for an integer character constant, or the unsigned type
corresponding to wchar_t for a wide character constant.
I mistakenly took that to be a specification of the value of
a character constant containing an octal or hexadecimal escape
sequence, but it isn't. It's just the value of the escape sequence.
The value of the character constant is defined in paragraph 10,
under Semantics:
If an integer character constant contains a single character
or escape sequence, its value is the one that results when
an object with type char whose value is that of the single
character or escape sequence is converted to type int.
I still find the wording a bit shaky. In '\xa0', the value of the
escape sequence, as defined by paragraph 9, is 160. Given that plain
char is signed and CHAR_BIT==8, an object with type char *cannot*
have the value 160.
The standard seems to be assuming that the values 160 and -96 are
interchangeable when stored in a char object. Either I've missed
something else obvious (which is quite possible), or the standard
is playing fast and loose with signed and unsigned values.
I think the standard's accuracy would be improved by changing a
lot of references to character values so they refer to the result
of converting those values to unsigned char. There seem to be a
lot of places that would be improved by this change, including the
description of strchr().
(Mandating that plain char is unsigned would also simplify things,
but that's probably not feasible.)
I've been posting on this thread asserting that '\xa0' is 160.
I apologize for the unintentional misinformation.
Fair enough, missed the (char).
> '\xA0' is of type int with value 160. This is true
> regardless of the range or signedness of plain char,
> or any other system-specific consideration. The
> constant '\xA0' is the same as the constant 160
> (unless you stringize it).
Not necessarily, for reasons cited above.
% type schar.c
#include <limits.h>
#include <stdio.h>
int main(void)
{
printf("CHAR_BIT is %d\n", CHAR_BIT);
printf("char is %ssigned\n", (char) -1 < 0 ? "" : "un");
printf("'\\xA0' is %d\n", '\xA0');
return 0;
}
% acc schar.c -o schar.exe
% schar.exe
CHAR_BIT is 8
char is signed
'\xA0' is -96
%
--
Peter
No, I take that back, I was right the first time. The
cast to char is redundant as far as converting the value
of '\xA0' because it is necessarily already in the range
of char.
The only possible values for '\xA0' are:
160 - char is unsigned
160 - char is signed, CHAR_BIT > 8
-96 - char is signed, CHAR_BIT == 8, two's complement
-95 - char is signed, CHAR_BIT == 8, ones' complement
-32 - char is signed, CHAR_BIT == 8, sign magnitude
--
Peter
If I'm not mistaken, the standard (since ANSI) mandates that char,
signed char and unsigned char are three different types. The whole
confusion in this thread stems from the fact that char has the same
values and representation as either of the other two types - this is
implementation-defined. Personally, I wonder what the rationale for
such a decision was, although I can guess.
For the record, I was wrong about this, as I explained in more
detail elsethread.
Right.
> The only possible values for '\xA0' are:
>
> 160 - char is unsigned
> 160 - char is signed, CHAR_BIT > 8
> -96 - char is signed, CHAR_BIT == 8, two's complement
> -95 - char is signed, CHAR_BIT == 8, ones' complement
> -32 - char is signed, CHAR_BIT == 8, sign magnitude
You're probably right as far as the intent is concerned, but I think
the wording of the standard is internally inconsistent.
Distinct types rather than different, the wording difference is significant.
See section 6.2.5 para 15.
--
Ian Collins
>
> That's the point I've been making all along. The representation of char
> is irrelevant. So back to the original point, the two casts to unsigned
> char are both superfluous and wrong.
I don't work out how you make compatible things both "superfluous" and
"wrong" : usually, superfluous suppose things don't hurt, this is not
usually the case for something _wrong_.
Do you mean the casts are always superfluous and sometimes wrong ?
The standard doesn't put a condition on the signedness of chars
when it specifies what the value should be, so presumably yes.
> On my system, this prints "-96":
>
> #include <stdio.h>
>
> main()
> {
> int i = '\xa0';
>
> printf("%d\n",i);
> }
Mine too. I guess -96 is the value of the a char with value 160 in
this instance.
That's one way of putting it. I'd almost go as far as to say it's
broken, as it requires something to have the same value as something
that doesn't exist in this case.
What I mean is given the requirements for strchr,
while ((unsigned char)*s != (unsigned char)c)
casts to the wrong type. Using the cast to the required type, the cast
of s is not required.
It doesn't do so explicitly, but it does define the value of a
character constant in terms of the value of an object of type char.
C99 6.4.4.4p10:
If an integer character constant contains a single character
or escape sequence, its value is the one that results when
an object with type char whose value is that of the single
character or escape sequence is converted to type int.
The value of an object of type char must be in the range
CHAR_MIN..CHAR_MAX (which, if plain char is signed, is the same as
the range SCHAR_MIN..SCHAR_MAX, typically -128..+127).
> > On my system, this prints "-96":
> >
> > #include <stdio.h>
> >
> > main()
> > {
> > int i = '\xa0';
> >
> > printf("%d\n",i);
> > }
>
> Mine too. I guess -96 is the value of the a char with value 160 in
> this instance.
Yeah, that must be it. 8-)}