Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Q: strchr() behavior on unterminated strings

25 views
Skip to first unread message

Jim Hill

unread,
Feb 6, 1997, 3:00:00 AM2/6/97
to

Is the behavior of strchr(buf,X) when buf contains X but no nulls defined?

For a concrete example, is this a legal implementation of strchr()?

char *strchr2(const char *p, int c)
{
int c1 = 0;
char c2 = *p;

while ( c2 ) {
c1 = c2;
c2 = *++p;
if ( c1==c )
return (char *)p-1;
}

return c ? 0 : (char *)p;
}

Jim
--
Jim Hill <jth...@netcom.com>

Ken Pizzini

unread,
Feb 7, 1997, 3:00:00 AM2/7/97
to

In article <jthill-0602...@jthill.slip.netcom.com>,

Jim Hill <jth...@netcom.com> wrote:
>Is the behavior of strchr(buf,X) when buf contains X but no nulls defined?

If buf is not a '\0' terminated string then you'd be better off
using memchr().

>For a concrete example, is this a legal implementation of strchr()?
>
>char *strchr2(const char *p, int c)
>{
> int c1 = 0;
> char c2 = *p;
>
> while ( c2 ) {
> c1 = c2;
> c2 = *++p;
> if ( c1==c )
> return (char *)p-1;
> }
>
> return c ? 0 : (char *)p;
>}

It looks odd, but correct. I'd go for something more like:
#include <stddef.h>
char *
my_strchr(const char *p, int c)
{
while (*p && *p != c)
++p;
return *p == c ? (char *)p : NULL;
}

In any case (system strchr(), your strchr2(), my my_strchr()) the
scan stops at the first NUL character. If the buffer being passed
in is not known to be NUL terminated then undefined behavior will
result if "c" does not occur before the end of the buffer's allocated
space, unless a fortuitous NUL saves you from that fate.

--Ken Pizzini

Clive D.W. Feather

unread,
Feb 7, 1997, 3:00:00 AM2/7/97
to

In article <5de38o$q80$1...@brokaw.wa.com>, Ken Pizzini
<k...@coho.halcyon.com> writes

>>Is the behavior of strchr(buf,X) when buf contains X but no nulls defined?
>If buf is not a '\0' terminated string then you'd be better off
>using memchr().

The string searched is *defined* to be from buf to the next zero byte.
If this takes you outside the defined or allocated object, undefined
behaviour occurs.

--
Clive D.W. Feather | Associate Director | Director
Tel: +44 181 371 1138 | Demon Internet Ltd. | CityScape Internet Services Ltd.
Fax: +44 181 371 1150 | <cl...@demon.net> | <cd...@cityscape.co.uk>
Written on my laptop - please reply to the Reply-To address <cl...@demon.net>

J. Kanze

unread,
Feb 7, 1997, 3:00:00 AM2/7/97
to

jth...@netcom.com (Jim Hill) writes:

> Is the behavior of strchr(buf,X) when buf contains X but no nulls defined?

I don't think so. According to 7.11.1: "The strchr function locates the
first occurrence of c in the string pointed to by s." And in 7.1.1: "A
string is a contiguous sequence of characters terminated by and
including the first null character." Presumably, if buf doesn't point
to a null terminated string, undefined behavior results.

As for real life, at least one implementation I've used implemented
strchr as "memchr( p , c , strlen( p ) )". (This really pissed me off
when I encountered it, as my `string' was 32 KBytes long, and the
character I was looking for was typically in the first 40 or 50, and
perhaps half of the time, the very first character.)

--
James Kanze +33 (0)1 39 55 85 62 email: ka...@gabi-soft.fr
GABI Software, Sarl., 22 rue Jacques-Lemercier, 78000 Versailles, France
-- Conseils en informatique industrielle --

Ken Pizzini

unread,
Feb 7, 1997, 3:00:00 AM2/7/97
to

In article <hhLKqCA5...@on-the-train.demon.co.uk>,

Clive D.W. Feather <cl...@demon.net> wrote:
>In article <5de38o$q80$1...@brokaw.wa.com>, Ken Pizzini
><k...@coho.halcyon.com> writes

>>>Is the behavior of strchr(buf,X) when buf contains X but no nulls defined?
>>If buf is not a '\0' terminated string then you'd be better off
>>using memchr().
>
>The string searched is *defined* to be from buf to the next zero byte.
>If this takes you outside the defined or allocated object, undefined
>behaviour occurs.

Precisely why you'd be better off using memchr(), although I admit
that I was less than clear about this.

--Ken Pizzini

Ken Pizzini

unread,
Feb 8, 1997, 3:00:00 AM2/8/97
to

In article <m3afpgg...@gabi-soft.fr>, J. Kanze <ka...@gabi-soft.fr> wrote:
>As for real life, at least one implementation I've used implemented
>strchr as "memchr( p , c , strlen( p ) )". (This really pissed me off
>when I encountered it, as my `string' was 32 KBytes long, and the
>character I was looking for was typically in the first 40 or 50, and
>perhaps half of the time, the very first character.)

If that expression isn't typoless then the implementation is even wrong!
Consider the result of "strchr(p, '\0')".

--Ken Pizzini

Norman Diamond

unread,
Feb 8, 1997, 3:00:00 AM2/8/97
to

In article <jthill-0602...@jthill.slip.netcom.com>, jth...@netcom.com (Jim Hill) writes:
>Is the behavior of strchr(buf,X) when buf contains X but no nulls defined?

Didn't we just answer that last month? (Me and one other helpful poster.)

ANSI Classic section 4.1.1, page 97 lines 5 to 6, the first sentence in the
library chapter: "A string is a contiguous sequence of characters terminated
by and including the first null character." And strchr needs a string.

--
<< If this were the company's opinion, I would not be allowed to post it. >>
"I paid money for this car, I pay taxes for vehicle registration and a driver's
license, so I can drive in any lane I want, and no innocent victim gets to call
the cops just 'cause the lane's not goin' the same direction as me" - J Spammer

J. Kanze

unread,
Feb 9, 1997, 3:00:00 AM2/9/97
to

k...@chinook.halcyon.com (Ken Pizzini) writes:

Norm Diamond also pointed this out to me in email.

I don't really remember the details. At any rate, I *SHOULD* have
written that they used the equivalent of this; the actual routine
(judging from the disassembly) was in assembler. Their whole point of
doing it this way was that the 8086 (no 'x' between the 0 and the second
8 back then) had hardware support for strlen and for memchr, but
couldn't do strchr without a loop.

I might add that it was this implementation that sensitivized me to the
meaning of "string". Elsewhere in the program, I was using a printf
format "%*s", with an array of (non-'\0' terminated) characters, and the
length. The implementation basically did the equivalent of "memcpy(
buffer , userPtr , min( strlen( userPtr , widthArg ) ) )".

Actually, the real problem was that I was passing the '*' 32768. Which,
of course, on a 16 bit machine, the implemenation interpreted as -1.
Which, in turn, was its internal indication for "don't check width".
The resulting output was, shall we say, interesting enough to make me
investigate what was really going on. I'd actually prepared a very
virulent error report; luckily, I double checked the standard before
sending it.

Jim Hill

unread,
Feb 12, 1997, 3:00:00 AM2/12/97
to

In article <1997Feb11.1...@leeds.ac.uk>, ecl...@sun.leeds.ac.uk
(R S Haigh) wrote:
[re "no"]
> How annoying.

I actually think it's as it should be: (a) as several people pointed out,
if you're not passing a string you should be using memchr() -- and if your
buffer isn't terminated and you don't know how long it is, well, what
kinda buffer is it, anyway? I'm hard up for an example of that; (b) this
way, string functions are allowed to fetch ahead if that helps the
pipleline any -- see my original example (not enough faster to be
worthwhile, but it can be done *lots* better) for instance.

Since that was the answer I was expecting -- I was checking my reading --
I'll venture that strnxyz() and %.*s are defined so long as the buffer is
at least as long as promised ("not more than n characters" and "the
maximum number of characters"), and that strtok() is not (no such
language).

J. Kanze

unread,
Feb 12, 1997, 3:00:00 AM2/12/97
to

ecl...@sun.leeds.ac.uk (R S Haigh) writes:

> In article <m3afpgg...@gabi-soft.fr>, ka...@gabi-soft.fr (J. Kanze) writes:
> > jth...@netcom.com (Jim Hill) writes:
> >
> > > Is the behavior of strchr(buf,X) when buf contains X but no nulls defined?
> >

> > I don't think so. According to 7.11.1: "The strchr function locates the

> > first occurrence of c in the string pointed to by s." And in 7.1.1: "A


> > string is a contiguous sequence of characters terminated by and

> > including the first null character." Presumably, if buf doesn't point
> > to a null terminated string, undefined behavior results.
>

> How annoying. This seems to mean that to guarantee the behaviour
> most people would expect (i.e. if there's an X within the object,
> as defined or allocated, then the lack of a null doesn't matter)
> you have to write your own function, which in reality almost certainly
> behaves the same as your compiler's strchr and everybody else's.

First, what do you mean "your compiler's strchr"? The last purely C
compiler I used implemented, strchr was implemented using the machine
instruction equivalent of strlen and memchr. In an extreme case, the
strlen wouldn't find a '\0', the addresses would wrap (the machine
didn't have protected memory), and the resulting length for the memchr
could be 0.

The compiler in question was from Microsoft, so I'm pretty sure that
others have used similar implementations. (The implementation isn't as
stupid as it sounds. Both strlen and memchr can be implemented with a
single instruction on the 8086, whereas strchr would otherwise require a
loop.)

> Let's hear the worst. Is it safe to use strtok when the indicated
> separators are present but the terminating '\0' isn't?
>
> What about the strn functions?
>
> Is it safe to write sprintf(buf1, "%.8s", buf2) when buf2 is at least
> that length but possibly not null-terminated?

If the standard says "string", no. And I think that it does say
"string" in all of these cases.

Clive D.W. Feather

unread,
Feb 12, 1997, 3:00:00 AM2/12/97
to

In article <jthill-1202...@jthill.slip.netcom.com>, Jim Hill
<jth...@netcom.com> writes

>Since that was the answer I was expecting -- I was checking my reading --
>I'll venture that strnxyz() and %.*s are defined so long as the buffer is
>at least as long as promised

strnxyz: these all appear to be written in terms of an "array" and not a
"string", with wording that doesn't require a null character.

%.*s: "If the precision is not specified or is greater than the size of
the array, the array shall contain a null character."

>and that strtok() is not (no such
>language).

The Standard uses the word "string".

So I agree with you.

John R MacMillan

unread,
Feb 12, 1997, 3:00:00 AM2/12/97
to

|How annoying. This seems to mean that to guarantee the behaviour
|most people would expect (i.e. if there's an X within the object,
|as defined or allocated, then the lack of a null doesn't matter)
|you have to write your own function, ...

Most of the time, I'd expect you'd know the length of buf, and could
then just use memchr().

|... which in reality almost certainly


|behaves the same as your compiler's strchr and everybody else's.

I have to disagree here; implementations of library functions may not
use the ``obvious'' method. For example, consider implementing strchr()
on a hypothetical chip, let's call it the Intel i486... :-)

Suppose this chip has a string scan opcode, say SCAS, that can be used
to scan for a particular character, up to a particular number of
characters. A library may choose to implement strchr() by using SCAS to
look for the trailing '\0' (with no limit on how far it will search) to
get the length of the string, and then do another SCAS looking for the
desired character up to the length of the string.

Obviously, this ``hypothetical'' chip really exists, and real
implementations use this mechanism, and your desired use of strchr()
could fail on them.

In short, if it's not a string, don't try to pretend it is. As Henry
Spencer's .sig used to sometimes say, ``If you lie to the compiler, it
will get its revenge.''

Clive D.W. Feather

unread,
Feb 13, 1997, 3:00:00 AM2/13/97
to

In article <5dtg5c$8...@nr1.toronto.istar.net>, John R MacMillan
<jo...@interlog.com> writes

>In short, if it's not a string, don't try to pretend it is. As Henry
>Spencer's .sig used to sometimes say, ``If you lie to the compiler, it
>will get its revenge.''

Nope, my old .sig, quoting Henry.

Julian Pardoe

unread,
Feb 28, 1997, 3:00:00 AM2/28/97
to

J. Kanze wrote:
> I might add that it was this implementation that sensitivized me to the
> meaning of "string". Elsewhere in the program, I was using a printf
> format "%*s", with an array of (non-'\0' terminated) characters, and the
> length. The implementation basically did the equivalent of "memcpy(
> buffer , userPtr , min( strlen( userPtr , widthArg ) ) )".

This suggests that printf needs a new specifier, with the same relationship
to memchr as %.*s has to strncpy. Supposing it were %S, then you could
safely write
printf ("%.*S", widthArg, userPtr);
because while %s needs a '\0' %S doesn't.

Whilst we are at it I'd also like a specifier along the lines
of scanf's %[] -- I want to be able to format and parse messages
using the same format string -- and a version that escaped strings
containing separator characters would be good too. I'd like to
be able so say something like
sprintf (buffer, "MESSAGE=(a=%{^,)},b=%{^,)},c=%{^,)})",
"a", "b(\"2\")", "c1,c2")
resulting in a string like
MESSAFGE=(a=a,b="b(\"2\")",c="c1,c2")
(with sprintf providing '"'s around string containing problem
characters (i.e. delimiters or '"'s)). I'd then want to
parse the message using sscanf with the same format string.

-- jP --

J. Kanze

unread,
Mar 3, 1997, 3:00:00 AM3/3/97
to

Julian Pardoe <par...@lonnds.ml.com> writes:

|> J. Kanze wrote:
|> > I might add that it was this implementation that sensitivized me to the
|> > meaning of "string". Elsewhere in the program, I was using a printf
|> > format "%*s", with an array of (non-'\0' terminated) characters, and the
|> > length. The implementation basically did the equivalent of "memcpy(
|> > buffer , userPtr , min( strlen( userPtr , widthArg ) ) )".
|>
|> This suggests that printf needs a new specifier, with the same relationship
|> to memchr as %.*s has to strncpy. Supposing it were %S, then you could
|> safely write
|> printf ("%.*S", widthArg, userPtr);
|> because while %s needs a '\0' %S doesn't.

Except that I've just verified, and "%s" in printf DOESN'T require a
string, that is, the standard says it requires an array of character
type, and not string, and in fact, specifically puts an "if" in front of
the requirement for a null character to be present.

Which means that the implementation which sensitivized me to the problem
was wrong. (To be truthful, I ran into the problem in '87 or '88, so I
wasn't able to confront the implementation with the standard at the
time.)

0 new messages