Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

a = b or memset/cpy?

51 views
Skip to first unread message

nroberts

unread,
Feb 7, 2012, 12:02:06 PM2/7/12
to
memset and memcpy are turning up in profiles a lot. I'd like to speed
things up a bit.

Sometimes it is clear that using = to initialize a local would be
better than memset. I might not gain anything, but at least there's a
chance.

However, can I gain performance improvements when zeroing out say some
global element in an array like so:

typedef struct x { int var0; char var1[20]; } X;

X gX[30];

void f(int slot)
{
X init = {0};

gX[slot] = init;

...
}

vs.
void f(int slot)
{
memset(&gX[slot], 0, sizeof(X));

...
}

Normally I wouldn't look for a micro-optimization like this but I'm
kind of stuck with the parameters I'm given.

Jens Gustedt

unread,
Feb 7, 2012, 12:41:07 PM2/7/12
to
Am 02/07/2012 06:02 PM, schrieb nroberts:
> X gX[30];
>
> void f(int slot)
> {
> X init = {0};
>
> gX[slot] = init;
>
> ...
> }

make it

X const init = { 0 };

or even better use a compound literal

gX[slot] = (X const){ 0 };

> Normally I wouldn't look for a micro-optimization like this but I'm
> kind of stuck with the parameters I'm given.

On any decent compiler the assignment version should not be worse that
the memset version, because the compiler must be able to see that it
is an object only filled with 0.

On the other hand the assignment version *may* be better, when the
compiler can do a data flow analysis that shows e.g that part of what
you initialize is overwritten before being read.

So I'd always prefer the assigment version.

Jens

James Kuyper

unread,
Feb 7, 2012, 1:40:59 PM2/7/12
to
On 02/07/2012 12:41 PM, Jens Gustedt wrote:
> Am 02/07/2012 06:02 PM, schrieb nroberts:
>> X gX[30];
>>
>> void f(int slot)
>> {
>> X init = {0};
>>
>> gX[slot] = init;
>>
>> ...
>> }
>
> make it
>
> X const init = { 0 };
>
> or even better use a compound literal
>
> gX[slot] = (X const){ 0 };
>
>> Normally I wouldn't look for a micro-optimization like this but I'm
>> kind of stuck with the parameters I'm given.
>
> On any decent compiler the assignment version should not be worse that

This is initialization, not assignment.

> the memset version, because the compiler must be able to see that it
> is an object only filled with 0.

I've used a compiler which, given the following code:

double array[10][1354][3] = {0};

generated the equivalent of the following:

array[0][0][0] = 0;
array[0][0][1] = 0;
etc.
The resulting executable was noticeably larger that I had expected it to
be. I was a little annoyed when I figured out what was going on. I
changed it to use memset(), and got a lot smaller, and executed somewhat
faster, too. The support person I talked with said that my use of {0}
was unreasonable, not their compiler's code generation.

Jens Gustedt

unread,
Feb 7, 2012, 2:09:33 PM2/7/12
to
Am 02/07/2012 07:40 PM, schrieb James Kuyper:
> On 02/07/2012 12:41 PM, Jens Gustedt wrote:
>> On any decent compiler the assignment version should not be worse that
>
> This is initialization, not assignment.

No, you are mistaken. The relevant part is assignment to gX[slot]. The
other part is just initialization of a const. In particular the
initialization of the const qualified compound literal can be done at
compile time if the compiler decides that it is beneficial (as if it
where declared as a static variable).

>> the memset version, because the compiler must be able to see that it
>> is an object only filled with 0.
>
> I've used a compiler which, given the following code:
>
> double array[10][1354][3] = {0};
>
> generated the equivalent of the following:
>
> array[0][0][0] = 0;
> array[0][0][1] = 0;
> etc.
> The resulting executable was noticeably larger that I had expected it to
> be. I was a little annoyed when I figured out what was going on. I
> changed it to use memset(), and got a lot smaller, and executed somewhat
> faster, too. The support person I talked with said that my use of {0}
> was unreasonable, not their compiler's code generation.

How long ago and what compiler was that? My observation over the last
years is that a compiler like gcc is capable of optimizing assignments
to struct fields or different array members as if all of these were
different variables.

(and double may be special, setting all bytes to 0 and initializing
with 0 must not necessarily be the same thing.)

Jens

nroberts

unread,
Feb 7, 2012, 2:07:30 PM2/7/12
to
LOL!

Nevermind. I'm not allowed to use this language feature. It's too
"complex". People won't know what it does.

Not the '=' operator... Initializing a structure to all 0 with = {0}.

:/

I keep running into bosses like this. Is this normal in the
programming field or am I just incredibly unlucky?

Scott Fluhrer

unread,
Feb 7, 2012, 2:14:35 PM2/7/12
to

"nroberts" <robert...@gmail.com> wrote in message
news:abb85a15-f68c-4e2f...@t8g2000yqg.googlegroups.com...
> memset and memcpy are turning up in profiles a lot. I'd like to speed
> things up a bit.
>
> Sometimes it is clear that using = to initialize a local would be
> better than memset. I might not gain anything, but at least there's a
> chance.

Without any measurements, no one can say for certain. However, my guess is
that it wouldn't matter, because the overhead isn't particulary how you're
writing the data, but the fact that you are writing data (and causing cache
misses, which are what is expensive).

What might be a fruitful avenue to explore would be to ask yourself "why do
I need to initialize them so often, and is there a way I can reduce the
number of times I need to do it?"

--
poncho


Ben Pfaff

unread,
Feb 7, 2012, 2:14:45 PM2/7/12
to
nroberts <robert...@gmail.com> writes:

> Nevermind. I'm not allowed to use this language feature. It's too
> "complex". People won't know what it does.
>
> Not the '=' operator... Initializing a structure to all 0 with = {0}.

Look on the bright side: on that basis, you should have no
trouble avoiding C++ entirely at that workplace.
--
char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[]
={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa67f6aaa,0xaa9aa9f6,0x11f6},*p
=b,i=24;for(;p+=!*p;*p/=4)switch(0[p]&3)case 0:{return 0;for(p--;i--;i--)case+
2:{i++;if(i)break;else default:continue;if(0)case 1:putchar(a[i&15]);break;}}}

Malcolm McLean

unread,
Feb 7, 2012, 2:37:44 PM2/7/12
to
On Feb 7, 6:40 pm, James Kuyper <jameskuy...@verizon.net> wrote:
> The support person I talked with said that my use of {0}
> was unreasonable, not their compiler's code generation.
>
Well what can he say? He can't patch the compiler to replace a long
intialisation with a call to memset().

nroberts

unread,
Feb 7, 2012, 2:41:38 PM2/7/12
to
On Feb 7, 11:14 am, b...@cs.stanford.edu (Ben Pfaff) wrote:
> nroberts <roberts.n...@gmail.com> writes:
> > Nevermind.  I'm not allowed to use this language feature.  It's too
> > "complex".  People won't know what it does.
>
> > Not the '=' operator... Initializing a structure to all 0 with = {0}.
>
> Look on the bright side: on that basis, you should have no
> trouble avoiding C++ entirely at that workplace.

I don't consider that a good thing.

Shao Miller

unread,
Feb 7, 2012, 6:35:36 PM2/7/12
to
Call == LOL. Good one. :)

And the support person cannot patch the compiler to replace a 'struct'
object assignment with a call to 'memcpy' either, presumably.

I've used a Microsoft C implementation which actually will give you a
linker error if you do:

void func(void) {
int array[42] = { 0 };
return;
}

and choose not to link with the standard library... It complains about
a missing 'memset' symbol...

Shao Miller

unread,
Feb 7, 2012, 6:39:16 PM2/7/12
to
On 2/7/2012 14:07, nroberts wrote:
> LOL!
>
> Nevermind. I'm not allowed to use this language feature. It's too
> "complex". People won't know what it does.
>
> Not the '=' operator... Initializing a structure to all 0 with = {0}.
>
> :/
>
> I keep running into bosses like this. Is this normal in the
> programming field or am I just incredibly unlucky?

That feature has been around since C89/C90. Perhaps you can find a
clever way for your boss to find that out without losing face or without
regretting disallowing its use.

Shao Miller

unread,
Feb 7, 2012, 6:58:38 PM2/7/12
to
On 2/7/2012 12:02, nroberts wrote:
> memset and memcpy are turning up in profiles a lot. I'd like to speed
> things up a bit.
>

You might find that the implementation actually translates a '= { 0
};'-style initializer into a call to 'memset'. An experiment might
reveal whether or not that's the case.

> Sometimes it is clear that using = to initialize a local would be
> better than memset. I might not gain anything, but at least there's a
> chance.
>

I'm not sure how you could gain anything unless the call to 'memset'
actually translates differently than a '= { 0 };'-style initializer.

Did you know that after all subobjects that are explicitly initialized
(by the initializer-list) have been so, the rest are initialized to what
they would have been had the object been declared with 'static' storage
duration? The whole containing object is thus "touched."

> However, can I gain performance improvements when zeroing out say some
> global element in an array like so:
>
> typedef struct x { int var0; char var1[20]; } X;
>
> X gX[30];
>
> void f(int slot)
> {
> X init = {0};
>
> gX[slot] = init;
>
> ...
> }
>
> vs.
> void f(int slot)
> {
> memset(&gX[slot], 0, sizeof(X));
>
> ...
> }
>

Well these aren't the same. The former initializes all sub-objects to
the "zeroey" values that would initialize a 'static'-storage-duration
object having the same type as the sub-object and having no explicit
initializer.

The latter fills the object with bytes with the 'unsigned char' value
'0', which is all-bits-zero.

In your example, the 'struct' type 'X' has an 'int' member. The object
representation of an 'int' can have padding bits that can be used any
way the implementation pleases.

If filling the padding bits with zeroes results in a trap representation
for an 'int', then you might be in for a surprise.

There are similar concerns for other types, including pointers, where a
null pointer value might not be all-bits-zero.

That is why I believe some people consider a '= { 0 };'-style
initializer to be more portable than 'memset'. If portability isn't a
concern, oh well.

> Normally I wouldn't look for a micro-optimization like this but I'm
> kind of stuck with the parameters I'm given.

Optmizing and making portable might not always be compatible. If you
have a particular set of implementations as your target(s), there might
be "compiler intrinsics" that you can use which are
implementation-defined extensions to C that could offer you speed
advantages.

For example, some Microsoft compilers offer '__movsd':

http://msdn.microsoft.com/en-us/library/9d196b9h.aspx

Eric Sosman

unread,
Feb 7, 2012, 8:34:00 PM2/7/12
to
On 2/7/2012 12:02 PM, nroberts wrote:
> memset and memcpy are turning up in profiles a lot. I'd like to speed
> things up a bit.
>
> Sometimes it is clear that using = to initialize a local would be
> better than memset. I might not gain anything, but at least there's a
> chance.
>
> However, can I gain performance improvements when zeroing out say some
> global element in an array like so:
>
> typedef struct x { int var0; char var1[20]; } X;
>
> X gX[30];
>
> void f(int slot)
> {
> X init = {0};
>
> gX[slot] = init;
>
> ...
> }
>
> vs.
> void f(int slot)
> {
> memset(&gX[slot], 0, sizeof(X));
>
> ...
> }

The official answer is: The definition of the C language says
nothing about which constructs are faster or slower than others.

That said, I would expect memset() to be faster, usually, if
the wind is not unfavorable and the Moon is in the right quarter.
Argument: In the assignment version, the code must allocate the auto
variable `init', zero it, and then copy all those zeroes to `gX[slot]';
on the face of it, this sounds like more work than just zeroing
`gX[slot]' to begin with.

It is just possible that a very smart compiler could (1) realize
that the `init' variable is not actually necessary, (2) decide to
clear `gX[slot]' directly instead of clearing `init' and copying,
and (3) clear `gX[slot]' more efficiently than memset() can, perhaps
with in-line code. My suspicion, though, is that a compiler smart
enough for (1,2,3) would not at the same time be so dumb as to
implement memset() with an actual call to an actual external function;
you'd need a strange combination of brilliance and stupidity to get
an advantage for initialize-and-copy.

... and, of course, measurement is the only way to be sure.

> Normally I wouldn't look for a micro-optimization like this but I'm
> kind of stuck with the parameters I'm given.

My prejudice (and I admit it's something of a prejudice) would be
to take a hard look at those memset() and memcpy() calls, with a view
toward eliminating at least some of them -- if you can eliminate a
call you get an infinite speedup, as opposed to a mere hundredfold!
Making copies of bits you've already computed usually doesn't advance
the state of the computation very much; making many duplicates of a
single byte is also not usually a great addition to the program's
"knowledge." There are, of course, exceptions: qsort() just rearranges
bits you already own, for example, but can be useful nonetheless.
Still, if memset() and memcpy() are dominating the run time, it seems
likely that there may be a lot of needless setting and copying going
on. See what you can jettison.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Shao Miller

unread,
Feb 7, 2012, 9:58:55 PM2/7/12
to
On 2/7/2012 18:58, Shao Miller wrote:
>>
>> typedef struct x { int var0; char var1[20]; } X;
>>
>> X gX[30];
>>
>> void f(int slot)
>> {
>> X init = {0};
>>
>> gX[slot] = init;
>>
>> ...
>> }
>>
>> vs.
>> void f(int slot)
>> {
>> memset(&gX[slot], 0, sizeof(X));
>>
>> ...
>> }
>>
>
> Well these aren't the same. The former initializes all sub-objects to
> the "zeroey" values that would initialize a 'static'-storage-duration
> object having the same type as the sub-object and having no explicit
> initializer.
>
> The latter fills the object with bytes with the 'unsigned char' value
> '0', which is all-bits-zero.
>
> In your example, the 'struct' type 'X' has an 'int' member. The object
> representation of an 'int' can have padding bits that can be used any
> way the implementation pleases.
>
> If filling the padding bits with zeroes results in a trap representation
> for an 'int', then you might be in for a surprise.
>

Ben Bacarisse proved in another thread that my claim for a potential
surprise is false; there is no potential for all-zero-bits in an
integer's object representation to be a trap representation. Sorry
about that!

> There are similar concerns for other types, including pointers, where a
> null pointer value might not be all-bits-zero.
>

Still applies for other things, like pointers. :)

Jorgen Grahn

unread,
Feb 8, 2012, 1:43:50 PM2/8/12
to
On Tue, 2012-02-07, nroberts wrote:
> memset and memcpy are turning up in profiles a lot. I'd like to speed
> things up a bit.
>
> Sometimes it is clear that using = to initialize a local would be
> better than memset. I might not gain anything, but at least there's a
> chance.

For copying with memcpy(), I much prefer assignment since it doesn't
bypass the type system, and is more readable.

I won't comment on the memset() part.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Joe keane

unread,
Feb 8, 2012, 3:05:03 PM2/8/12
to
In article <abb85a15-f68c-4e2f...@t8g2000yqg.googlegroups.com>,
nroberts <robert...@gmail.com> wrote:
>memset and memcpy are turning up in profiles a lot.

Indeed.

>Sometimes it is clear that using = to initialize a local would be
>better than memset.

It's a shame if you call a function with a size parameter, when in fact
the size is a compile-time constant. You also probably know a bit about
alignment, whereas those guys have to assume the worst.

>I might not gain anything, but at least there's a chance.

Please to use real data! 'gprof' is very good at this. It works [so
far as i have seen] on stdlib calls as well as your functions.

It can tell you where you're getting killed by function call overhead,
and where the copy is taking a long time, such that you may go to more
length to avoid it. It can also (by switching back to a function) tell
you where your 'optimization' does nothing except increase code size.

Ian Collins

unread,
Feb 8, 2012, 3:18:58 PM2/8/12
to
On 02/ 9/12 09:05 AM, Joe keane wrote:
> In article<abb85a15-f68c-4e2f...@t8g2000yqg.googlegroups.com>,
> nroberts<robert...@gmail.com> wrote:
>> memset and memcpy are turning up in profiles a lot.
>
> Indeed.
>
>> Sometimes it is clear that using = to initialize a local would be
>> better than memset.
>
> It's a shame if you call a function with a size parameter, when in fact
> the size is a compile-time constant. You also probably know a bit about
> alignment, whereas those guys have to assume the worst.

A decent compiler will inline the call to memset() in this case, so
there is no call overhead. Whether the inline memset() is faster or
slower than an assignment to a const initialiser is something the OP
would have to measure.

>> I might not gain anything, but at least there's a chance.
>
> Please to use real data! 'gprof' is very good at this. It works [so
> far as i have seen] on stdlib calls as well as your functions.

Assuming the OP uses GNU tools...

> It can tell you where you're getting killed by function call overhead,
> and where the copy is taking a long time, such that you may go to more
> length to avoid it. It can also (by switching back to a function) tell
> you where your 'optimization' does nothing except increase code size.

Assuming there is a function call...

--
Ian Collins

Jens Gustedt

unread,
Feb 9, 2012, 3:34:06 PM2/9/12
to
Am 02/08/2012 12:58 AM, schrieb Shao Miller:
> On 2/7/2012 12:02, nroberts wrote:

> I'm not sure how you could gain anything unless the call to 'memset'
> actually translates differently than a '= { 0 };'-style initializer.

The gain is in the knowledge of the optimizer. If you have a memset
initialization it is difficult (but not impossible) for the optimizer
to keep track of initializations. If it knows of initializations and
it encounters an assignment of a field of the struct before it is ever
read, the optimizer is allowed to omit the initialization. Modern
optimizers can be quite good in tracking individual struct or array
members.

Jens

Tim Prince

unread,
Feb 14, 2012, 8:30:34 AM2/14/12
to
Certain compilers make such transformations automatically; for only 30
elements, presumably with reasonable alignment (with compiler able to
see it via in-lining), in-line code may be best, but compilers may
prefer memset() to reduce code size. It may make a difference when one
or the other applies a cache bypass (IA nontemporal) when the move is
seen as large enough to need it, which 30 elements clearly is not.
0 new messages