strcat() considered harmful

Howard Chu

unread,

May 24, 1997, 3:00:00 AM5/24/97

to

A very wise person once told me, "just because you have insomnia is no
excuse to go wasting bandwidth on the net." So much for that advice...

This has been a peeve of mine for quite some time, but very recently I
was shocked to see the tremendous impact of the problem.

In C code that deals with strings, it's fairly common to see sequences like
strcpy(dst, src1);
strcat(dst, src2);
l = strlen(dst);

Take a couple strings of unknown length, slap 'em together, and figure out
how big the result is. No biggie, but inefficient since you have to process
each string a byte at a time, and iterate repeatedly over the same memory.

This is all because functions like strcpy and strcat all munge a source
pointer with a destination pointer and return the destination pointer. But
now I ask you - I just passed in the destination pointer, why the hell do
I need it returned to me? I know very well what I just passed in. Don't
waste my time. Sure, it allows you to do things like
strcat(strcat(strcat(strcpy(dst,a),b),c),d);
but there's a better way.

Let's take this plain-jane strcpy routine:

char * strcpy(char *dst, register char *src)
{
register char *d2 = dst;

while((*d2++ = *src++))
;
return dst;
}

and make it something *really* useful:

char *strcopy(register char *dst, register char *src)
{
while((*dst++ = *src++))
;
return dst-1;
}

Now, instead of returning a pointer to the beginning of the destination
string, we return a pointer to the *end* of the string. In fact, a pointer
to the NULL terminator. What does this buy you? It practically eliminates
the need for separate strcpy and strcat functions, and drastically cuts
down on the need for strlen as well. To expand on the previous example:

strcat(strcat(strcat(strcpy(dst,a),b),c),d);
l = strlen(dst);

this becomes:

char *ptr = strcopy(strcopy(strcopy(strcopy(dst,a),b),c),d);
l = ptr - dst;

The original example using strcat will execute in O(n^2) time, because it
always has to start over from the beginning of the string, and there is an
extra string traversal to compute the length of the result. My solution
will execute in linear time, and you get the string length for free (well,
constant time, O(1)).

How much does this really matter? Think about it.

I've had this in my bag of tricks for years, and I figured it was too
insignificant and self-evident to talk about, but then something interesting
happened... I was working on a port of Pacer's PacerShare server, and decided
to profile the thing to see where it was spending all its CPU time. (Pacer is
no longer in business, and this was a couple years ago, so I don't believe
there's any risk in bringing this up.) PacerShare was an AppleShare server
that ran on Unix boxes, letting Macs mount Unix filesystems as volumes
on the Mac desktop. You'd figure that a program like this would be seriously
I/O bound, being a fileserver and all. In fact, 60% of its time was spent in
strcat. !!!!!

Why? Because the server constantly had to translate Macintosh-format filenames
into their Unix equivalent, and vice versa, and was constantly pulling paths
apart, replacing separator characters, and cat'ing them back together again,
with whatever incidental modifications along the way. (And just in case you
were curious, there was another 30% spent in strlen. This so-called fileserver
only spent 10% of its CPU time actually doing file operations.)

After I recovered from my shock at this discovery, I set to work tracking
down all the instances of strcpy, strcat, and strlen in the server. There
weren't very many of them. Fixing them with my strcopy usage took very little
time. When I was done, the server behaved like a whole new animal. String
operations were nearly unmeasurable, and the server actually was I/O bound,
as it ought to have been all along. (Unfortunately, after that overhaul it
was still spending 40% of the time doing stat calls, but that's another
story...)

So anyway... If this string-library shortcoming was already blindingly
obvious to you, my apologies for wasting the phosphors on your screen.
In the meantime, I'm gonna go delete strcpy and strcat from my libc.a and
shoot anyone who complains...
--
Howard Chu Principal Member of Technical Staff
h...@locus.com PLATINUM technology, Los Angeles Lab
Advertisements proof-read for US$100 per word. Submission of your ad to my
email address constitutes your acceptance of these terms.

Kurt Watzka

unread,

May 24, 1997, 3:00:00 AM5/24/97

to

h...@frisky.la.platsol.com (Howard Chu) writes:

>A very wise person once told me, "just because you have insomnia is no
>excuse to go wasting bandwidth on the net." So much for that advice...

>This has been a peeve of mine for quite some time, but very recently I
>was shocked to see the tremendous impact of the problem.

>In C code that deals with strings, it's fairly common to see sequences like
> strcpy(dst, src1);
> strcat(dst, src2);
> l = strlen(dst);

>Take a couple strings of unknown length, slap 'em together, and figure out
>how big the result is. No biggie, but inefficient since you have to process
>each string a byte at a time, and iterate repeatedly over the same memory.

Nobody forces you to use general purpose library routines for a special
purpose. If time is critical, I'd avoid the overhead the overhaed of
function calls and just write it as something like

char *base, *p, *q;

p = base;
while (*p++ = *src1++)
;
--p;
while (*p++ = *src2++)
;
l = (int)(p - base) - 1;

>This is all because functions like strcpy and strcat all munge a source
>pointer with a destination pointer and return the destination pointer. But
>now I ask you - I just passed in the destination pointer, why the hell do
>I need it returned to me? I know very well what I just passed in. Don't
>waste my time. Sure, it allows you to do things like
> strcat(strcat(strcat(strcpy(dst,a),b),c),d);
>but there's a better way.

>Let's take this plain-jane strcpy routine:

[pretty standard version of strcpy edited]

>and make it something *really* useful:

> char *strcopy(register char *dst, register char *src)
> {
> while((*dst++ = *src++))
> ;
> return dst-1;
> }

I hope this is just an example, and is not meant to violate the
implementors namespace in a real application.

>Now, instead of returning a pointer to the beginning of the destination
>string, we return a pointer to the *end* of the string. In fact, a pointer
>to the NULL terminator.

Minor nit: strings in C are not terminated by NULL, they are terminated
by '\0', called NUL in the ASCII character set. NULL is a null pointer
constant.

>What does this buy you? It practically eliminates
>the need for separate strcpy and strcat functions, and drastically cuts
>down on the need for strlen as well. To expand on the previous example:

> strcat(strcat(strcat(strcpy(dst,a),b),c),d);
> l = strlen(dst);

>this becomes:

> char *ptr = strcopy(strcopy(strcopy(strcopy(dst,a),b),c),d);
> l = ptr - dst;

>The original example using strcat will execute in O(n^2) time, because it
>always has to start over from the beginning of the string, and there is an
>extra string traversal to compute the length of the result. My solution
>will execute in linear time, and you get the string length for free (well,
>constant time, O(1)).

>How much does this really matter? Think about it.

In most implementations it does not matter too much, given the
large overhead of a function call. However, the functions from
the standard C library are indeed not well suited for patching
together large strings from small building blocks. If such an
operation is time critical, aviod using _any_ function calls.
If it is not, your proposal (minus the obvious namespace
problem) saves some time at an non critical piece of code.

["horror story" about a file server spending 40% of it's CPU time
with string operations edited].

Just one point to ponder: In some implementations an I/O bound
program does not look I/O bound because the time _waiting_ for
I/O does not cost your programs CPU time.

Kurt

--
| Kurt Watzka Phone : +49-89-2180-6254
| wat...@stat.uni-muenchen.de

Lawrence Kirby

unread,

May 24, 1997, 3:00:00 AM5/24/97

to

In article <5m6atd$10...@frisky.la.platsol.com>
h...@frisky.la.platsol.com "Howard Chu" writes:

>A very wise person once told me, "just because you have insomnia is no
>excuse to go wasting bandwidth on the net." So much for that advice...
>
>This has been a peeve of mine for quite some time, but very recently I
>was shocked to see the tremendous impact of the problem.

It is sometimes a mild irritation but I've never seen it as a "tremensous
impact".

>In C code that deals with strings, it's fairly common to see sequences like
> strcpy(dst, src1);
> strcat(dst, src2);
> l = strlen(dst);

Not around here it isn't! :-)

>Take a couple strings of unknown length, slap 'em together, and figure out
>how big the result is. No biggie, but inefficient since you have to process
>each string a byte at a time, and iterate repeatedly over the same memory.
>

>This is all because functions like strcpy and strcat all munge a source
>pointer with a destination pointer and return the destination pointer. But
>now I ask you - I just passed in the destination pointer, why the hell do
>I need it returned to me? I know very well what I just passed in. Don't
>waste my time. Sure, it allows you to do things like
> strcat(strcat(strcat(strcpy(dst,a),b),c),d);
>but there's a better way.

Whether this makes a tremendous impact depends on whether the difference
in time it takes is measurable. It may be in some cases where this is
performed a very large number of times in a loop but generally it doesn't.

>Let's take this plain-jane strcpy routine:
>

> char * strcpy(char *dst, register char *src)
> {
> register char *d2 = dst;
>
> while((*d2++ = *src++))
> ;
> return dst;
> }
>

>and make it something *really* useful:
>
> char *strcopy(register char *dst, register char *src)
> {
> while((*dst++ = *src++))
> ;
> return dst-1;
> }

OK, you aren't the first to suggest this, however if you are goinf to
define your own function don't call it strcopy() since that is a reserved
identifer. You might call it something like str_copy().

>Now, instead of returning a pointer to the beginning of the destination
>string, we return a pointer to the *end* of the string. In fact, a pointer

>to the NULL terminator. What does this buy you? It practically eliminates

>the need for separate strcpy and strcat functions, and drastically cuts
>down on the need for strlen as well. To expand on the previous example:
>
> strcat(strcat(strcat(strcpy(dst,a),b),c),d);
> l = strlen(dst);
>
>this becomes:
>
> char *ptr = strcopy(strcopy(strcopy(strcopy(dst,a),b),c),d);
> l = ptr - dst;

However even given a strcopy() function I wouldn't write that. It is
much simpler and clearer to write:

l = sprintf(dst, "%s%s%s%s", a, b, c, d);

At least on the implementations I use there's no great overhead for using
sprintf to do this, it may even be faster. You have all the other printf
formatting options available to you too.

>The original example using strcat will execute in O(n^2) time, because it
>always has to start over from the beginning of the string, and there is an
>extra string traversal to compute the length of the result. My solution
>will execute in linear time, and you get the string length for free (well,
>constant time, O(1)).
>
>How much does this really matter? Think about it.

Mostly very little however there are of course cases where it is significant..

>I've had this in my bag of tricks for years, and I figured it was too
>insignificant and self-evident to talk about, but then something interesting
>happened... I was working on a port of Pacer's PacerShare server, and decided
>to profile the thing to see where it was spending all its CPU time. (Pacer is
>no longer in business, and this was a couple years ago, so I don't believe
>there's any risk in bringing this up.) PacerShare was an AppleShare server
>that ran on Unix boxes, letting Macs mount Unix filesystems as volumes
>on the Mac desktop. You'd figure that a program like this would be seriously
>I/O bound, being a fileserver and all. In fact, 60% of its time was spent in
>strcat. !!!!!
>
>Why? Because the server constantly had to translate Macintosh-format filenames
>into their Unix equivalent, and vice versa, and was constantly pulling paths
>apart, replacing separator characters, and cat'ing them back together again,
>with whatever incidental modifications along the way. (And just in case you
>were curious, there was another 30% spent in strlen. This so-called fileserver
>only spent 10% of its CPU time actually doing file operations.)

Presumably it is doing a lot of these call in loops. It is true that some
people seem blithely unaware of the overheads of the code they write,
you do have to use the library functions intelligently.

>After I recovered from my shock at this discovery, I set to work tracking
>down all the instances of strcpy, strcat, and strlen in the server. There
>weren't very many of them. Fixing them with my strcopy usage took very little
>time. When I was done, the server behaved like a whole new animal. String
>operations were nearly unmeasurable, and the server actually was I/O bound,
>as it ought to have been all along. (Unfortunately, after that overhaul it
>was still spending 40% of the time doing stat calls, but that's another
>story...)

You might try the sprintf solution, it tends to simplyfy string handling code
significantly especially whgere you are usinmg strcat/strncat a lot.

>So anyway... If this string-library shortcoming was already blindingly
>obvious to you, my apologies for wasting the phosphors on your screen.
>In the meantime, I'm gonna go delete strcpy and strcat from my libc.a and
>shoot anyone who complains...

I wouldn't do that, there are plenty of legitimate uses for them. Also
if your compiler inlines them can be used whether they are in libc.a
or not.

--
-----------------------------------------
Lawrence Kirby | fr...@genesis.demon.co.uk
Wilts, England | 7073...@compuserve.com
-----------------------------------------

Fritz W Feuerbacher

unread,

May 24, 1997, 3:00:00 AM5/24/97

to

Howard Chu (h...@frisky.la.platsol.com) wrote:

: now I ask you - I just passed in the destination pointer, why the hell do

: I need it returned to me? I know very well what I just passed in. Don't
: waste my time. Sure, it allows you to do things like

I beleive its because is has to return something to tell you if something
whent wrong. So, if there was a problem it will return a NUL if it was
successful it will return you the pointer to the dest string, which is the
most intuitive thing to return.

Lawrence Kirby

unread,

May 26, 1997, 3:00:00 AM5/26/97

to

In article <5m6sg8$59l$2...@news.cc.ucf.edu>

No, strcpy()/strcat() never return a failure code, they are defined to
return simply the value of their first argument. If this isn't a valid
pointer to a string then you get undefined behaviour, anything can happen.
The implementation isn't required to test the validity of its arguments.
strcpy() and strcat() are never required to return a null pointer. If you
do any testing it must be on the agruments *before* you make the call.

Fritz W Feuerbacher

unread,

May 31, 1997, 3:00:00 AM5/31/97

to

Howard Chu (h...@frisky.la.platsol.com) wrote:
: now I ask you - I just passed in the destination pointer, why the hell do
: I need it returned to me? I know very well what I just passed in. Don't

I thought about this some more and the only reason I could think of would
be what if you wanted to assign that destination pointer to another
variable or function? strcat would save you a line of code....

Fritz W Feuerbacher

unread,

May 31, 1997, 3:00:00 AM5/31/97

to

Kurt Watzka (wat...@stat.uni-muenchen.de) wrote:

: Minor nit: strings in C are not terminated by NULL, they are terminated

: by '\0', called NUL in the ASCII character set. NULL is a null pointer
: constant.

It depends on if you have included the header file called "string.h".
It has the NULL defined properly for strings.

Kurt Watzka

unread,

Jun 1, 1997, 3:00:00 AM6/1/97

to

>Kurt Watzka (wat...@stat.uni-muenchen.de) wrote:

It is a question of terminology, and style, and not of working or not
working. Defining NULL as 0 (plain and unadorned) and using it as
a string terminator will work, but it still sends the wrong message
to a potential reader of the code.

Pete Becker

unread,

Jun 1, 1997, 3:00:00 AM6/1/97

to

Fritz W Feuerbacher wrote:
>
> Kurt Watzka (wat...@stat.uni-muenchen.de) wrote:
>
> : Minor nit: strings in C are not terminated by NULL, they are terminated
> : by '\0', called NUL in the ASCII character set. NULL is a null pointer
> : constant.
>
> It depends on if you have included the header file called "string.h".
> It has the NULL defined properly for strings.

No. NULL is the null pointer constant. End of sentence. As the earlier
posting said, it is not the null terminator for a string. In some
implementations it will work in that context, but that is not required
by the language definition. Use '\0'. That's what it's there for.
-- Pete

Lawrence Kirby

unread,

Jun 1, 1997, 3:00:00 AM6/1/97

to

In article <5mp7em$q3$2...@news.cc.ucf.edu>

fwf2...@pegasus.cc.ucf.edu "Fritz W Feuerbacher" writes:

>Kurt Watzka (wat...@stat.uni-muenchen.de) wrote:
>
>: Minor nit: strings in C are not terminated by NULL, they are terminated
>: by '\0', called NUL in the ASCII character set. NULL is a null pointer
>: constant.
>
>It depends on if you have included the header file called "string.h".
>It has the NULL defined properly for strings.

Various header files including <string.h> are documented by the standard
as defining a NULL macro. It is defined as something "which expands to an
implementation-define null pointer constant". That allows it to be many
things however the 2 most common forms are 0 and ((void *)0). The only
things you can do with NULL ar

1. convert it to a pointer, which often happens implicitly. One example
where that doesn't happen implicitly is in a variable argument list so

printf("%p\n", NULL);

is incorrect, you should write:

printf("%p\n", (void *)NULL);

2. compare it against a pointer.

3. compare it against another null pointer constant. However this case
is rarely very interesting.

Something like

char ch = NULL;

is invalud because it could expand to, say:

char ch = ((void *)0);

whouch would be a constraint violation.

Mark Clevenger

unread,

Jun 3, 1997, 3:00:00 AM6/3/97

to

Kurt Watzka wrote:

>
> fwf2...@pegasus.cc.ucf.edu (Fritz W Feuerbacher) writes:
>
> >Kurt Watzka (wat...@stat.uni-muenchen.de) wrote:
>
> >: Minor nit: strings in C are not terminated by NULL, they are terminated
> >: by '\0', called NUL in the ASCII character set. NULL is a null pointer
> >: constant.
>
> >It depends on if you have included the header file called "string.h".
> >It has the NULL defined properly for strings.
>

> It is a question of terminology, and style, and not of working or not
> working. Defining NULL as 0 (plain and unadorned) and using it as
> a string terminator will work, but it still sends the wrong message
> to a potential reader of the code.
>

There is a potential problem using NULL when a null string is intended.
Say a program has the following line:

strcat(mystr, NULL);

This will concatenate a string found at memory location 0 onto mystr.
If there are
a thousand characters at 0 before a null character there is a good
chance a memory
error will occur [Segmentation Fault usually].

What makes this terribly onerous is that some OS (HP-UX) must have a
'\0' at memory
location 0, and the above could would not have a problem. But on other
systems
(SunOS) this will blow-up. Therefore your so-called portable ANSI-C
code is not portable. Technically it is not proper ANSI-C code - NULL
is not a string.

I learned this the hard way when portable legacy code from HP-UX to
SunOS.

Lawrence Kirby

unread,

Jun 4, 1997, 3:00:00 AM6/4/97

to

In article <339425...@and.ifg.gmeds.com>
fga...@and.ifg.gmeds.com "Mark Clevenger" writes:

>Kurt Watzka wrote:
>>
>> fwf2...@pegasus.cc.ucf.edu (Fritz W Feuerbacher) writes:
>>
>> >Kurt Watzka (wat...@stat.uni-muenchen.de) wrote:
>>
>> >: Minor nit: strings in C are not terminated by NULL, they are terminated
>> >: by '\0', called NUL in the ASCII character set. NULL is a null pointer
>> >: constant.
>>
>> >It depends on if you have included the header file called "string.h".
>> >It has the NULL defined properly for strings.
>>
>> It is a question of terminology, and style, and not of working or not
>> working. Defining NULL as 0 (plain and unadorned) and using it as
>> a string terminator will work, but it still sends the wrong message
>> to a potential reader of the code.
>>
>
>There is a potential problem using NULL when a null string is intended.
>Say a program has the following line:
>
>strcat(mystr, NULL);

The problem with this is that C says it results in undefined behaviour.
Whatever your particular implementation happens to do with it, it is an
error in the code.

>This will concatenate a string found at memory location 0 onto mystr.

assuming a null pointer corresponds to an address of zero. This isn't
required by C.

>If there are
>a thousand characters at 0 before a null character there is a good
>chance a memory
>error will occur [Segmentation Fault usually].

More usually good systems will map out page zero (or whatever page the
null pointer refers to) so that it traps immediately before it causes
damage.

>What makes this terribly onerous is that some OS (HP-UX) must have a
>'\0' at memory
>location 0, and the above could would not have a problem. But on other
>systems
>(SunOS) this will blow-up. Therefore your so-called portable ANSI-C
>code is not portable. Technically it is not proper ANSI-C code - NULL
>is not a string.

The code isn't portable according to the language definition.

>I learned this the hard way when portable legacy code from HP-UX to
>SunOS.

Are you sure HP-UX doesn't have an option to map out page zero?