I have been a C programmer and advanced to C++. In C, to copy arrays
the memcpy function is used. In C++, the STL can be used (std::copy).
So the question is: Which is more efficient?
Example:
#include <mem.h> // or <cmem>
#include <algorithm>
const unsigned int HUGE_ARRAY_SIZE = 30000;
unsigned char first[HUGE_ARRAY_SIZE];
unsigned char second[HUGE_ARRAY_SIZE];
int main(void)
{
// Case 1: using memcpy
//----------------------
memcpy(second, first, sizeof(first));
// Case 2: using std::copy
//-------------------------
std::copy(first, first + sizeof(first), second);
// Case 3: good old fashioned "for" loop
//---------------------------------------
for (unsigned int index = 0; index < HUGE_ARRAY_SIZE; ++index)
second[index] = first[index];
return 0;
}
When I looked at std::copy, it evaluated into a "for" loop that copies
each element using pointers (similar to Case 3).
My understanding was that memcpy could be coded to use "special"
processor instructions to transfer memory around. Would this be faster
than Case 2?
Also, are compilers smart enough nowadays to simplify Case 3 into
special processor instructions (provided they exist)?
-- Thomas Matthews
email: thomas....@tecmar.com
Sent via Deja.com http://www.deja.com/
Before you buy.
John
<Thomas_...@tecmar.com> wrote in message
news:7ungub$aa8$1...@nnrp1.deja.com...
> Hi,
>
> I have been a C programmer and advanced to C++. In C, to copy arrays
> the memcpy function is used. In C++, the STL can be used (std::copy).
> So the question is: Which is more efficient?
>
> Example:
> #include <mem.h> // or <cmem>
> #include <algorithm>
>
> const unsigned int HUGE_ARRAY_SIZE = 30000;
>
> unsigned char first[HUGE_ARRAY_SIZE];
> unsigned char second[HUGE_ARRAY_SIZE];
>
> int main(void)
> {
> // Case 1: using memcpy
> file://----------------------
> memcpy(second, first, sizeof(first));
>
> // Case 2: using std::copy
> file://-------------------------
> std::copy(first, first + sizeof(first), second);
>
> // Case 3: good old fashioned "for" loop
> file://---------------------------------------
--
Ian Collins
The trouble with memcpy is that it doesn't know about
objects/constructors/destructors/assignment operators/etc.
For plain old data, memcpy is fine. However if you have an array of
objects:
std::string array[100];
memcpy will more than likely not do what you want.
HTH
> Hi,
>
> I have been a C programmer and advanced to C++. In C, to copy arrays
> the memcpy function is used. In C++, the STL can be used (std::copy).
> So the question is: Which is more efficient?
>
> Example:
> #include <mem.h> // or <cmem>
Cough, cough. Surely you know that there is no standard C header
<mem.h>? memcpy() is prototyped in <string.h>, now also available as
<cstring>.
> #include <algorithm>
>
> const unsigned int HUGE_ARRAY_SIZE = 30000;
>
> unsigned char first[HUGE_ARRAY_SIZE];
> unsigned char second[HUGE_ARRAY_SIZE];
>
> int main(void)
> {
> // Case 1: using memcpy
> //----------------------
> memcpy(second, first, sizeof(first));
>
> // Case 2: using std::copy
> //-------------------------
> std::copy(first, first + sizeof(first), second);
>
> // Case 3: good old fashioned "for" loop
> //---------------------------------------
> for (unsigned int index = 0; index < HUGE_ARRAY_SIZE; ++index)
> second[index] = first[index];
>
> return 0;
> }
>
> When I looked at std::copy, it evaluated into a "for" loop that copies
> each element using pointers (similar to Case 3).
>
> My understanding was that memcpy could be coded to use "special"
> processor instructions to transfer memory around. Would this be faster
> than Case 2?
>
> Also, are compilers smart enough nowadays to simplify Case 3 into
> special processor instructions (provided they exist)?
>
> -- Thomas Matthews
> email: thomas....@tecmar.com
There is no generic answer to this, always provided you are talking
about arrays of POD items. It would be a very bad idea to use
memcpy() on anything other than scalars or POD aggregates.
When it was possible to use memcpy() it would probably be faster on
most compilers, but you could probably find at least one where it
would be slower.
Questions about what is faster are always best answered by:
If it matters on your particular implementation, test it on your
particular implementation. The results might or might be the same for
the next implementation.
Jack Klein
--
Home: http://jackklein.home.att.net
Very true. For this reason, memcpy should never be used with 1)
thoroughly determining if it's safe to use and 2) deciding if the
performance boost (if it exists) will be great enough to warrant the
possible risk that future code changes will cause the use to no longer
be safe.
> When it was possible to use memcpy() it would probably be faster on
> most compilers, but you could probably find at least one where it
> would be slower.
I'd hope not. If a compiler/platform didn't have primitives to make
bulk memory transfers faster then you'd assume memcpy would have been
implemented as a simple for loop. This shouldn't be any slower than
hand implementations.
BTW, it's possible to specialize std::copy to insure it always performs
at the best possible speed. For POD types it can be specialized to do
a memcpy instead of a for loop with assignements. Unless profiling
indicated a bottle neck I'd stick with std::copy, and even if a bottle
neck was found I'd consider specializing std::copy instead of changing
it directly to memcpy any way.
> Questions about what is faster are always best answered by:
>
> If it matters on your particular implementation, test it on your
> particular implementation. The results might or might be the same for
> the next implementation.
Don't forget to take the platform/compiler into consideration as well.
Thanks to all that replied. I forgot about complex (more than simple)
classes. My concern was primarily with simple Plain Old Data (POD)
classes. This came up as I was writing the assignment operator for many
classes.
I was asking primarily to find out if anybody has done similar testing.
Always better to research first. ;-)
I was reading an article in either Dr. Dobbs or the C/C++ Users Journal
that stated constructors, copy constructors and assignment operators
should not be written for PODs since the compiler provides these and are
usually more efficient.
> I have been a C programmer and advanced to C++. In C, to copy arrays
> the memcpy function is used. In C++, the STL can be used (std::copy).
> So the question is: Which is more efficient?
[snip]
Many standard library implementations of copy() revert to memcpy()
whenever that is possible. So, on a good compiler with a good standard
library implementation, both should be exactly as efficient.
---
Vesa Karvonen
memcpy will *always* be the fastest way to copy bytes because it is
"magic" (i.e. implemented by the compiler's code generator, not as a
function). Whether it is correct or not is another matter. If the object
you are copying has a non-default copy-ctor then you should be using
looped assigment. And yes, optimising compilers these days can be very
smart* indeed (based on my experience of gcc, M$ VC++ and some embedded
DSP compilers).
alan
*examples of smartness:
- a struct of mine with sizeof == 7 was "padded" to 8 because this made
arithmetic in its pointers faster.
- division by constants can be approximated by multiplication by a
constant followed by a right shift. The compiler would ensure that this
was safe for the domain of the operation by dataflow analysis.
- impressive examples of unrolling (and parallelisation in DSP
compilers).
------------------------------------------------------------------------
Alan Donovan adon...@imerge.co.uk http://www.imerge.co.uk
Imerge Ltd. +44 1223 875265
Incorrect. Read other posts in this thread as to why std::copy can be
(read should be) just as fast, yet will never suffer from the problems
you elude to below.
> Whether it is correct or not is another matter. If the object
> you are copying has a non-default copy-ctor then you should be using
> looped assigment. And yes, optimising compilers these days can be very
> smart* indeed (based on my experience of gcc, M$ VC++ and some
embedded
> DSP compilers).
> I'd hope not. If a compiler/platform didn't have primitives to make
> bulk memory transfers faster then you'd assume memcpy would have been
> implemented as a simple for loop. This shouldn't be any slower than
> hand implementations.
There are so many ways to speed up memory copying, even when the
processor does not have special support for such operations, that I do
not have the time to even start explaining all of the techniques. Unless
the compiler is extraordinarily clever, i.e. something that hasn't been
done yet, then there are always ways to beat a simple for loop.
> BTW, it's possible to specialize std::copy to insure it always performs
> at the best possible speed. For POD types it can be specialized to do
> a memcpy instead of a for loop with assignements.
[snip]
This requires special support from the compiler. Using standard C++, it
is not possible to detect whether a type is a POD struct or not in
compile-time. However, it is possible to detect if a type is a built-in
type and many standard library implementations do detect such things.
---
Vesa Karvonen
Not to mention that if you write them, it is no longer a POD.
Copying arrays of PODs, i.e. structs that have only public data and no
constructors or destructors, is still usually faster using memcpy() -
and probably will be for some time. Copying arrays of built-in types can
easily be made exactly as efficient using both copy() and memcpy().
---
Vesa Karvonen
Sorry to confuse you, but if you read what I wrote strictly, it is true.
It is true that memcpy will always be the fastest. Whether other methods
are equally fast (and perhaps safer) does not change that fact.
alan
> Sorry to confuse you, but if you read what I wrote strictly, it is true.
> It is true that memcpy will always be the fastest. Whether other methods
> are equally fast (and perhaps safer) does not change that fact.
>
> alan
>
> ------------------------------------------------------------------------
> Alan Donovan adon...@imerge.co.uk http://www.imerge.co.uk
> Imerge Ltd. +44 1223 875265
>
I'd have to agree with the other guy. What you are saying is that 2 guys
run a race, both tie. So both are fastest in race.
> I'd have to agree with the other guy. What you are saying is that 2 guys
> run a race, both tie. So both are fastest in race.
Yes! If you define the fastest as "there is nothing faster", which (I
guess) most people do.
Why is this so controversial???
Maybe because there are two points of view equally logical. In fact,
not only I think there is another logical point of view, but I find
this other point of view better in this case.
As I see it, the word *fastest* is "a superset" of the word *faster*
(so to speak). Thus, I would rather define "fastest" as "faster than
anything else," and not as "there is nothing faster". (I prefer
forward/positive thinking, than thinking by contradiction of the
negative -- boy I'll give you a medal if you really understood what
I meant! ;-))
But that's my point of view in this linguistic, maybe off-topic
discussion... :-(
Carlos
--
OK, well let me qualify my original statement about the fastest method
by saying that I meant that nothing will go faster, although there may
be other equally fast methods (that, too, are "fastest").
> (I prefer
> forward/positive thinking, than thinking by contradiction of the
> negative -- boy I'll give you a medal if you really understood what
> I meant! ;-))
I know what you are trying to say, but (for me) your original analogy of
two runners says it well: if two runners tie, they are indeed both the
fastest.
And there's nothing particularly negative about this formulation:
mathematicians frequently describe the extreme of a set as the
element(s) for which no other element will satisfy some ordering
relation (such as "is faster than").
> > BTW, it's possible to specialize std::copy to insure it always performs
> > at the best possible speed. For POD types it can be specialized to do
> > a memcpy instead of a for loop with assignements.
> [snip]
>
> This requires special support from the compiler. Using standard C++, it
> is not possible to detect whether a type is a POD struct or not in
> compile-time.
Standard library implementors, however, don't have to restrict
themselves to standard C++. They can (and should) use whatever tricks
are necessary to get good performance from the platform they're
targeting.
> Hi,
>
> I have been a C programmer and advanced to C++. In C, to copy arrays
> the memcpy function is used. In C++, the STL can be used (std::copy).
> So the question is: Which is more efficient?
In most cases I would expect memcpy to be no slower than std::copy.
There are three important caveats, though.
First, std::copy is more general. If you use memcpy then you're
limited to POD types, to pointers (rather than arbitrary iterators)
and to non-overlapping ranges.
Second, there are a few pathological cases where std::copy might be
more efficient than memcpy. If the number of elements being copied is
a small compile-time constant (e.g. 0), and if your compiler doesn't
generate inline assembly code for memcpy, then the function call
overhead could be noticeable compared to the cost of the loop.
Third, note that I said "no slower than," not "faster than". It's
possible to implement std::copy so that it delegates to memmove
whenever it's copying a range of pointers to PODs, and some C++
library implementations do that.
You've piqued my interest. I'm not a systems level programmer so I
can't imagine any techniques that would be faster than a loop beyond
CPU supplied memory manipulation, which one would expect your compiler
to take advantage of. I'm not asserting that it's not possible for you
to be able to out due memcpy with a hand coded implementation... what
I'm saying is that I'd hope the likelyhood of this possibility would be
remote enough to make one wonder about someone who attempts this.
> > BTW, it's possible to specialize std::copy to insure it always
performs
> > at the best possible speed. For POD types it can be specialized to
do
> > a memcpy instead of a for loop with assignements.
> [snip]
>
> This requires special support from the compiler.
Not really... you're just wanting more than I (or the standard)
promised.
> Using standard C++, it
> is not possible to detect whether a type is a POD struct or not in
> compile-time.
Quite true, but the compiler doesn't need to. All I meant was that you
can specialize std::copy for any (or all) POD types. If you write your
code in terms of std::copy instead of memcpy and profiling shows the
copy to be slow then simple analysis will tell you if the type is POD
or not. If it is, you specialize std::copy for this type and no
further code changes are required.
> However, it is possible to detect if a type is a built-in
> type and many standard library implementations do detect such things.
It's not a matter of detection. It's a matter of specialization. You
can specialize on any type, including those that aren't built in
types. Most implementations will have already specialized std::copy
for the built in types, but you are free to specialize for other types
as well. This is important, because it may be that a type has
characteristics that make neither the loop nor the memcpy alternatives
the fastest possible way to copy.
The most effective techniques obviously require some platform knowledge,
but here is a start:
- manual unrolling
- manual source and/or destination alignment
- manual block prefetching or cache warm-up
Sometimes the compiler can do an optimization or two (unrolling is
easy, alignment is a bit harder, prefetching is easy, but sometimes the
best results are achieved using "macro block" prefetch, which isn't
easy), but I haven't seen any compiler yet that would turn a generic
char (or even any other type) copy loop into an optimal memory copying
routine.
Furthermore, it is often possible to choose the best copy algorithm for
the situation. For instance, if you know that the source or destination
is already likely to be near or far from the CPU, then it changes the
whole game.
The above, and many many more techniques, are possible in rather
portable C++. If you have detailed platform knowledge, you can play all
kinds of dirty tricks such as using the FPU to copy/set memory.
---
Vesa Karvonen
Carlos
--
Carlos Moreno wrote:
>
> I know that this is possibly a very stupid question, but what does
> POD stand for?
Plain Old Data [type] = i.e., the built-in types (char, int, float,
etc.) as opposed to user-defined types (classes, structs, enums).
The PO<T> template is used elsewhere, as in
POTS = Plain Old Telephone Service, i.e.,
standard analog phone line
Right achronym, wrong definition. POD types include classes, structs
and enums. However, to qualify as a POD the type must follow certain
rules: no copy constructor (I believe other constructor types are
allowed but I don't have a copy of the standard to verify this), no
assignment operator, no destructor, no virtual functions... basically
the type must behave identically to a C type.
One final point: it may all be very hardware dependent. For example, the
relative speed of the string instruction set (the "special" instructions I
think you were alluding to) versus the integer instruction set has changed
with every generation of 80x86 architecture, with the trend toward
RISC-iness increasingly favoring the latter.
And the realities of pipelining and caching in modern CPUs and memory
controllers make predicting performance _extremely_ complex.
IMHO most of the time the difference won't matter much; where it does,
forget theory and make a MEASUREMENT. It'll be quicker and more accurate
than any amount of theorizing.
Jeff
Thomas_...@tecmar.com wrote:
> Hi,
>
> I have been a C programmer and advanced to C++. In C, to copy arrays
> the memcpy function is used. In C++, the STL can be used (std::copy).
> So the question is: Which is more efficient?
>
> etc.