ANSI strcpy question

Edward J. Sabol

unread,

Mar 16, 1995, 4:28:23 PM3/16/95

to

K&R2 (and most Unix man pages) explicitly state that usage of strcpy on
overlapping strings is to be considered unreliable. And that's basically all
K&R2 says on the subject. I guess my question is what quantifies overlapping
strings. For example, is the following snippet strictly ANSI OK?

char *buffer; /* assume buffer points to valid null-terminated string of
* length greater than one */
buffer = strcpy(buffer,buffer+1);

In this case, buffer and buffer+1 are pointers to what I would call
overlapping strings, but it works on every system/compiler combination I've
tried. I'd like to be able to do this, but if it isn't portable, I guess I'll
have to write my own (albeit trivial) routine. However, I'd rather not do
that if I don't have to.

The other possibility of overlapping strings is the following:

buffer = strcpy(buffer+1,buffer);

I can understand why this one doesn't work, but the first case, IMHO, should!
And if ANSI doesn't strictly dictate that it should work, someone please
explain to me why not.

Thanks,
Ed

Dan Pop

unread,

Mar 16, 1995, 5:43:36 PM3/16/95

to

In <SABOL.95M...@dbsrv.gsfc.nasa.gov> sa...@dbsrv.gsfc.nasa.gov (Edward J. Sabol) writes:

>K&R2 (and most Unix man pages) explicitly state that usage of strcpy on
>overlapping strings is to be considered unreliable. And that's basically all
>K&R2 says on the subject. I guess my question is what quantifies overlapping
>strings. For example, is the following snippet strictly ANSI OK?
>
>char *buffer; /* assume buffer points to valid null-terminated string of
> * length greater than one */
>buffer = strcpy(buffer,buffer+1);

It's strictly forbidden. ISO 7.11.2.3 says: "If copying takes place
between objects that overlap, the behavior is undefined."

>
>In this case, buffer and buffer+1 are pointers to what I would call
>overlapping strings, but it works on every system/compiler combination I've
>tried. I'd like to be able to do this, but if it isn't portable, I guess I'll
>have to write my own (albeit trivial) routine. However, I'd rather not do
>that if I don't have to.

Or use the already existing routines from the standard library:

memmove(buffer, buffer + 1, strlen(buffer) + 1);

K&R2 (and most Unix man pages) explicitly state that usage of memmove on
overlapping strings is to be considered reliable.

>
>The other possibility of overlapping strings is the following:
>
>buffer = strcpy(buffer+1,buffer);
>
>I can understand why this one doesn't work, but the first case, IMHO, should!
>And if ANSI doesn't strictly dictate that it should work, someone please
>explain to me why not.

Ask the guys who wrote the standard. Nobody else knows.

One possible answer: an implementation might decide to do the copying
by locating the terminating null and copying everything backwards. Such
an implementation will work fine in the second case, but will break the
first one. The standard doesn't impose the order in which the characters
are to be copied.

Dan
--
Dan Pop
CERN, CN Division
Email: dan...@cernapo.cern.ch
Mail: CERN - PPE, Bat. 31 R-004, CH-1211 Geneve 23, Switzerland

Mark Brader

unread,

Mar 18, 1995, 2:42:14 AM3/18/95

to

Edward J. Sabol (sa...@dbsrv.gsfc.nasa.gov) writes:
> K&R2 (and most Unix man pages) explicitly state that usage of strcpy on
> overlapping strings is to be considered unreliable. And that's basically all

> K&R2 says on the subject. ...

> For example, is the following snippet strictly ANSI OK?
>
> char *buffer; /* assume buffer points to valid null-terminated string of
> * length greater than one */
> buffer = strcpy(buffer,buffer+1);

No, it isn't. From section 7.11.2.3/4.11.2.3 about strcpy:

# If copying takes place between objects that overlap, the behavior is
# undefined.

Edward asked for a definition of terms. From section 3.14/1.6, an "object" is

# a region of data storage in the execution environment, the contents of
# which can represent values.

If strlen(buffer) is 17, then the overlapping regions are buffer[0] through
buffer[17], on the one hand, and buffer[1] through buffer[18], on the other.

> ... it works on every system/compiler combination I've tried.

I'm not surprised.

> And if ANSI doesn't strictly dictate that it should work, someone please
> explain to me why not.

There's nothing explicit about this in the ANSI Rationale, but there is
something suggestive: a discussion about memmove() and memcpy(). From
section 4.11.2 of the Rationale (corresponding to 7.11.2/4.11.2 of the
standard):

@ A block copy routine should be "right": it should work correctly even
@ if the blocks being copied overlap. Otherwise it is more difficult to
@ correctly code such overlapping copy operations, and portability
@ suffers because the optimal C-coded algorithm on one machine may be
@ horribly slow on another.
@
@ A block copy routine should be "fast": it should be implementable as a
@ few inline instructions which take maximum advantage of any block copy
@ provisions of the hardware. Checking for overlapping copies produces
@ too much code for convenient inlining in many implementations. The
@ programmer knows in a great many cases that the two blocks cannot
@ possibly overlap, so the space and time overhead are for naught.
@
@ These arguments are contradictory but each is compelling. Therefore
@ the Standard mandates two block copy functions: memmove is required
@ to work correctly even if the source and destination overlap, while
@ memcpy can presume nonoverlapping operands and be optimized accordingly.

The writers of the standard found the same conflict with all functions
that involve copying strings, but it would not have been reasopnable
to provide twin functions for each of them. Instead, they simple
banned all copies involving overlapping objects by library functions
other than memmove(). This is consistent and easy to remember, and
it means that implementers can choose the fast option for all other
functions. For example, on a machine where strlen() is a built-in
operation and very fast, strcpy() can be implemented:

char *strcpy (char *s1, const char *s2)
{ return memcpy (s1, s2, strlen (s2)+1); }

even if memcpy() operates right to left.

I don't know if there are any machines like that now, and it doesn't really
matter; the point is that it's not particularly implausible, and the
standard tries to allow for implementations other than the obvious ones.
--
Mark Brader "I wonder why. I wonder why.
m...@sq.com I wonder why I wonder.
SoftQuad Inc. I wonder *why* I wonder why
Toronto I wonder why I wonder!" -- Richard Feynman

This article is in the public domain.

Edward J. Sabol

unread,

Mar 20, 1995, 10:31:45 AM3/20/95

to

I want to thank everyone for all the many helpful responses I got both
privately and posted to this list.

Many of you suggested the following:

memmove(buffer, buffer + 1, strlen(buffer));

(or some variant thereof) instead of the

strcpy(buffer,buffer + 1);

that I wanted to use. I've taken this recommendation to heart and am now
using memmove instead. It's probably slower than strcpy, but then performance
isn't that big of a deal in the scenario I'm working with.

I'd like to thank Mark Brader <m...@sq.sq.com> for excerpting portions of the
ISO/ANSI documents concerning the rationale behind memcpy and memmove. (Are
these documents available on-line anywhere?) I found that very helpful and
informative, as well as the many responders who described various processor
scenarios where different implementations of strcpy would make sense.

Thanks,
Ed

Chris Trueman

unread,

Mar 20, 1995, 11:50:31 AM3/20/95

to

In article <SABOL.95M...@dbsrv.gsfc.nasa.gov>

sa...@dbsrv.gsfc.nasa.gov "Edward J. Sabol" writes:

> memmove(buffer, buffer + 1, strlen(buffer));
>
> (or some variant thereof) instead of the
>
> strcpy(buffer,buffer + 1);
>
> that I wanted to use. I've taken this recommendation to heart and am now
> using memmove instead. It's probably slower than strcpy, but then performance
> isn't that big of a deal in the scenario I'm working with.

Though I have not tested this I always believed the memxxxxx
functions to be a wrapper for what are crafter functions utilising
each machines 'special speed' attributes.

- Chris

-============================================================================-
E-mail : true...@intellic.demon.co.uk : Listen ... do you smell something?
Tel : +44 (0)1344 305305 :
Fax : +44 (0)1344 305100 : I'll be back...
-============================================================================-

Dan Pop

unread,

Mar 20, 1995, 1:32:07 PM3/20/95

to

In <795718...@intellic.demon.co.uk> true...@intellic.demon.co.uk (Chris Trueman) writes:

>In article <SABOL.95M...@dbsrv.gsfc.nasa.gov>
> sa...@dbsrv.gsfc.nasa.gov "Edward J. Sabol" writes:
>
>> memmove(buffer, buffer + 1, strlen(buffer));
>>
>> (or some variant thereof) instead of the
>>
>> strcpy(buffer,buffer + 1);
>>
>> that I wanted to use. I've taken this recommendation to heart and am now
>> using memmove instead. It's probably slower than strcpy, but then performance
>> isn't that big of a deal in the scenario I'm working with.
>
>Though I have not tested this I always believed the memxxxxx
>functions to be a wrapper for what are crafter functions utilising
>each machines 'special speed' attributes.

This is only a 'quality of implementation' issue. Some vendor of
crappy compilers could implement them in plain C. Better implementations
generate inline code for these functions, whenever possible, because the
code is compact and the overhead of a function call is avoided.

The point of the original poster was, probably, that instead of calling
only strcpy, you have to call two functions now, strlen and memmove and
this is likely to be less efficient, since the source string will be
scanned twice.