memcmp() - compare buffer for 0

Mark

unread,

Oct 10, 2011, 5:38:59 PM10/10/11

to

Hello,

#define MAX_LEN 6
unsigned char buf[MAX_LEN];
unsigned char zerobuf[MAX_LEN] = { 0 };

if (memcmp(buf, zerobuf, MAX_LEN) == 0)
puts("buf is empty");
else
puts("buf is not empty");

Is there another way to compare the input buffer for being equal to some
pattern (all zeros in this case), i.e. ti avoid having additional array
zerobuf? I think memcmp(buf, "0", MAX_LEN) isn't valid.

Thanks.

Mark

pete

unread,

Oct 10, 2011, 6:57:54 PM10/10/11

to

It's not right either.

memcmp(buf, "\0\0\0\0\0", 6)

--
pete

Ike Naar

unread,

Oct 10, 2011, 7:18:22 PM10/10/11

to

That's correct, memcmp(buf, "0", MAX_LEN) doesn't work.
One reason it won't work is that the character '0' is not the same
as a zero byte ('\0'). Another problem is that the array "0" has
less than MAX_LEN elements so there's the danger of reading beyond
the end of that array.

One other way to do it, that does not use an additional array,
is a simple linear search for the first nonzero element:

size_t i = 0;
while (i != MAX_LEN && buf[i] == 0) ++i;
if (i == MAX_LEN) puts("all zeroes"); else puts("not all zeroes");

Yet another way is to sort the buf array in descending order and
check whether buf[0] equals zero. If it does, the array contains
only zeroes.

There must be at least fifty other ways to do it.

Keith Thompson

unread,

Oct 10, 2011, 8:02:31 PM10/10/11

to

If your compiler supports compound literals (a "new" feature in C99),
you can do something like this:

if (memcmp(buf, (char[MAX_LEN]){ 0 }, MAX_LEN) == 0)

puts("buf is empty");
else
puts("buf is not empty");

In the more general case -- well, it depends on just what the more
general case turns out to be. This may be a specific example of a
more general problem, but it's hard to tell what that more general
problem is.

(Note that an array is never really "empty". Your array is full; it
just happens to be full of zeros.)

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Eric Sosman

unread,

Oct 10, 2011, 8:37:30 PM10/10/11

to

You think rightly, as others have explained.

To use memcmp(), you'll need to compare to a region of memory as
long as the test region, pre-initialized to the desired value. That's
inescapable: it's how memcmp() operates.

Alternatively, you could write a simple isAllZero() function that
just looped through the test region comparing each byte to the desired
value. That would surely use less data memory, but it's impossible to
tell without measurement whether it would be faster or slower.

The strspn() function might *almost* be made to work, if only you
weren't looking for zeroes. As things stand, I see no way to employ it.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Mark Adler

unread,

Oct 11, 2011, 12:35:51 AM10/11/11

to

On 2011-10-10 14:38:59 -0700, Mark said:
> Is there another way to compare the input buffer for being equal to
> some pattern (all zeros in this case), i.e. ti avoid having additional
> array zerobuf?

I'm surprised that no one offered the obvious answer that uses memcmp()
without another array:

if (MAX_LEN > 0 && buf[0] == 0 && memcmp(buf, buf + 1, MAX_LEN - 1) == 0)
puts("buf is all zeros");
else
puts("buf has at least one non-zero value");

Though it's probably faster to compare each entry to zero, or even
faster to recast to an array of large integers and compare those to
zero.

Mark

Ian Collins

unread,

Oct 11, 2011, 12:37:57 AM10/11/11

to

On 10/11/11 05:35 PM, Mark Adler wrote:
> On 2011-10-10 14:38:59 -0700, Mark said:
>> Is there another way to compare the input buffer for being equal to
>> some pattern (all zeros in this case), i.e. ti avoid having additional
>> array zerobuf?
>
> I'm surprised that no one offered the obvious answer that uses memcmp()
> without another array:
>

> if (MAX_LEN> 0&& buf[0] == 0&& memcmp(buf, buf + 1, MAX_LEN - 1) == 0)

> puts("buf is all zeros");
> else
> puts("buf has at least one non-zero value");
>
> Though it's probably faster to compare each entry to zero, or even
> faster to recast to an array of large integers and compare those to
> zero.

Usual alignment caveats apply!

--
Ian Collins

Ben Bacarisse

unread,

Oct 11, 2011, 5:00:06 AM10/11/11

to

Eric Sosman <eso...@ieee-dot-org.invalid> writes:

> On 10/10/2011 5:38 PM, Mark wrote:
>> Hello,
>>
>> #define MAX_LEN 6
>> unsigned char buf[MAX_LEN];
>> unsigned char zerobuf[MAX_LEN] = { 0 };
>>
>> if (memcmp(buf, zerobuf, MAX_LEN) == 0)
>> puts("buf is empty");
>> else
>> puts("buf is not empty");
>>
>> Is there another way to compare the input buffer for being equal to some
>> pattern (all zeros in this case), i.e. ti avoid having additional array
>> zerobuf? I think memcmp(buf, "0", MAX_LEN) isn't valid.
>
> You think rightly, as others have explained.
>
> To use memcmp(), you'll need to compare to a region of memory as
> long as the test region, pre-initialized to the desired value. That's
> inescapable: it's how memcmp() operates.

There's a memcmp method that does not quite fit that description, though
it is not likely to useful.

In the same way that an object can be copied by successively doubling
the copy region, so a region can be compared for all bytes the same by
doubling the compare region (and it can be altered to handle longer
patterns than all-bytes the same)

assert(BUF_SZ != 0);
size_t clen = 1;
while (2*clen < BUF_SZ)
if (memcmp(buf, buf + clen, clen) == 0)
clen *= 2;
else return false;
return memcmp(buf, buf + clen, BUF_SZ - clen) == 0 && buf[0] == target;

Rather like the previous suggestion to change a O(n) algorithm (a
simple loop) into a O(n log(n)) one (sort and test buf[0]), this is no
more than an exercise in possibilities.

--
Ben.

Ben Bacarisse

unread,

Oct 11, 2011, 5:11:56 AM10/11/11

to

Are there any in this case?

--
Ben.

Ian Collins

unread,

Oct 11, 2011, 5:15:49 AM10/11/11

to

What if the buffer is at an odd address?

--
Ian Collins

Ben Bacarisse

unread,

Oct 11, 2011, 5:22:13 AM10/11/11

to

Ian Collins <ian-...@hotmail.com> writes:

Yes, I missed the last sentence right up to the point I hit send! Then
I what what was being proposed (and you were commenting on).

--
Ben.

Ben Bacarisse

unread,

Oct 11, 2011, 5:23:45 AM10/11/11

to

Mark Adler <mad...@alumni.caltech.edu> writes:

> On 2011-10-10 14:38:59 -0700, Mark said:
>> Is there another way to compare the input buffer for being equal to
>> some pattern (all zeros in this case), i.e. ti avoid having
>> additional array zerobuf?
>
> I'm surprised that no one offered the obvious answer that uses
> memcmp() without another array:

I can't speak for others, but I didn't because I didn't think it was
assured to work. I assumed (wrongly, of course) that memcmp would be
undefined when applied to overlapping regions. I isn't, so it's a fine
solution.

> if (MAX_LEN > 0 && buf[0] == 0 && memcmp(buf, buf + 1, MAX_LEN - 1) == 0)
> puts("buf is all zeros");
> else
> puts("buf has at least one non-zero value");

<snip>
--
Ben.

bartek szurgot

unread,

Oct 11, 2011, 7:21:09 AM10/11/11

to

On 10/11/2011 06:35 AM, Mark Adler wrote:
> if (MAX_LEN > 0 && buf[0] == 0 && memcmp(buf, buf + 1, MAX_LEN - 1) == 0)
> puts("buf is all zeros");
> else
> puts("buf has at least one non-zero value");
>
> Though it's probably faster to compare each entry to zero, or even
> faster to recast to an array of large integers and compare those to zero.

i've checked 512MB array of zeros and on my PC simple "for(...)
if(!zero) return false" was over twice as fast (~0.41[s] vs. ~0.11[s])
as memcmp (used as above).

i strongly suggest for() since it is both faster and easier for the
reader to understand.

--
pozdrawiam serdecznie / best regards,
Bartek 'BaSz' Szurgot

http://www.baszerr.org

James Kuyper

unread,

Oct 11, 2011, 7:27:39 AM10/11/11

to

On 10/11/2011 12:35 AM, Mark Adler wrote:
> On 2011-10-10 14:38:59 -0700, Mark said:
>> Is there another way to compare the input buffer for being equal to
>> some pattern (all zeros in this case), i.e. ti avoid having additional
>> array zerobuf?
>
> I'm surprised that no one offered the obvious answer that uses memcmp()
> without another array:

For me, the reason I didn't suggest that was because it wasn't obvious
to me.

> if (MAX_LEN > 0 && buf[0] == 0 && memcmp(buf, buf + 1, MAX_LEN - 1) == 0)
> puts("buf is all zeros");
> else
> puts("buf has at least one non-zero value");

While I admire your ingenuity in coming up with that approach, I'd still
favor the simpler, more direct approach:
> Though it's probably faster to compare each entry to zero, ...
--
James Kuyper

Ben Bacarisse

unread,

Oct 11, 2011, 8:47:08 AM10/11/11

to

bartek szurgot <no...@m.plz> writes:

> On 10/11/2011 06:35 AM, Mark Adler wrote:
>> if (MAX_LEN > 0 && buf[0] == 0 && memcmp(buf, buf + 1, MAX_LEN - 1) == 0)
>> puts("buf is all zeros");
>> else
>> puts("buf has at least one non-zero value");
>>
>> Though it's probably faster to compare each entry to zero, or even
>> faster to recast to an array of large integers and compare those to zero.
>
> i've checked 512MB array of zeros and on my PC simple "for(...)
> if(!zero) return false" was over twice as fast (~0.41[s] vs. ~0.11[s])
> as memcmp (used as above).
>
> i strongly suggest for() since it is both faster and easier for the
> reader to understand.

Just another few data points: I see the plain loop version as being 14
times *slower* than the memcmp version for 512MB, provided I don't
optimise! At gcc -O1 and above the memcmp takes twice the time the
plain loops does. The reason seems to be that gcc replaces memcmp with
a repz cmpsb instruction (plus housekeeping gubbins) and these string
instructions have a reputation for being slow.

The memcmp method is still four times faster than the loop (on this
data, on this machine, with the library, with... etc) provided gcc can
be told not to inline it to cmpsb instruction.

To summarise:
for loop memcmp
gcc unoptimised 13.333s 0.932s
gcc optimising 4.160s 8.250s

I'd post the CPU, gcc version and so on but I think the real messages is
*measure*. You can still be surprised after <mumble> years in this
field.

--
Ben.

rufus zhu

unread,

Oct 11, 2011, 10:13:18 AM10/11/11

to

How does it make sense that "the buffer is at odd address"?

"Ben Bacarisse" <ben.u...@bsb.me.uk>

--
Rufus

--- Posted via news://freenews.netfront.net/ - Complaints to ne...@netfront.net ---

Mark Adler

unread,

Oct 11, 2011, 10:27:32 AM10/11/11

to

On 2011-10-11 02:15:49 -0700, Ian Collins said:
> What if the buffer is at an odd address?

You do a few single-byte compares at the beginning and end as
necessary, and do the rest in the middle with larger word compares.

Mark

James Kuyper

unread,

Oct 11, 2011, 10:30:01 AM10/11/11

to

On 10/11/2011 10:13 AM, rufus zhu wrote:
> How does it make sense that "the buffer is at odd address"?

C allows implementations to have restrictions, called "alignment
requirements" on where in memory objects of a certain type can be
stored. While there is no portable C meaning to the concept of an "odd
address", it is in fact usually the case that associated with every
address is a number, and on some platforms the addresses associated with
odd numbers could fail to meet the alignment requirements of the "large
integers" referred to below. Such restrictions are actually commonplace.

> "Ben Bacarisse" <ben.u...@bsb.me.uk>
>> Ian Collins <ian-...@hotmail.com> writes:
>>
>>> On 10/11/11 10:11 PM, Ben Bacarisse wrote:
>>>> Ian Collins<ian-...@hotmail.com> writes:
>>>>
>>>>> On 10/11/11 05:35 PM, Mark Adler wrote:
>>>>>> On 2011-10-10 14:38:59 -0700, Mark said:
>>>>>>> Is there another way to compare the input buffer for being equal to
>>>>>>> some pattern (all zeros in this case), i.e. ti avoid having
>>>>>>> additional
>>>>>>> array zerobuf?
>>>>>>
>>>>>> I'm surprised that no one offered the obvious answer that uses
>>>>>> memcmp()
>>>>>> without another array:
>>>>>>
>>>>>> if (MAX_LEN> 0&& buf[0] == 0&& memcmp(buf, buf + 1, MAX_LEN - 1)
>>>>>> == 0)
>>>>>> puts("buf is all zeros");
>>>>>> else
>>>>>> puts("buf has at least one non-zero value");
>>>>>>
>>>>>> Though it's probably faster to compare each entry to zero, or even
>>>>>> faster to recast to an array of large integers and compare those to
>>>>>> zero.

If the "large integers" he's referring to have alignment requirements,
there's no guarantee that buf satisfies those requirements. If it does
not, the "cast" he's referring to in "recast" will have undefined behavior.
--
James Kuyper

Ben Bacarisse

unread,

Oct 11, 2011, 10:39:25 AM10/11/11

to

"rufus zhu" <hait...@mail.ustc.edu.cn> writes:

Top posting messes up the discussion. I've moved your remark to where I
would have put it and I think you'll see that it makes things
simpler to follow.

> "Ben Bacarisse" <ben.u...@bsb.me.uk>
>> Ian Collins <ian-...@hotmail.com> writes:
>>
>>> On 10/11/11 10:11 PM, Ben Bacarisse wrote:
>>>> Ian Collins<ian-...@hotmail.com> writes:
>>>>
>>>>> On 10/11/11 05:35 PM, Mark Adler wrote:
>>>>>> On 2011-10-10 14:38:59 -0700, Mark said:

<snip>

>>>>>> if (MAX_LEN> 0&& buf[0] == 0&& memcmp(buf, buf + 1, MAX_LEN - 1)
>>>>>> == 0)
>>>>>> puts("buf is all zeros");
>>>>>> else
>>>>>> puts("buf has at least one non-zero value");
>>>>>>
>>>>>> Though it's probably faster to compare each entry to zero, or even
>>>>>> faster to recast to an array of large integers and compare those to
>>>>>> zero.
>>>>>
>>>>> Usual alignment caveats apply!
>>>>
>>>> Are there any in this case?
>>>
>>> What if the buffer is at an odd address?
>>
>> Yes, I missed the last sentence right up to the point I hit send! Then

>> I [saw] what was being proposed (and you were commenting on).

>
> How does it make sense that "the buffer is at odd address"?

Mark was suggesting testing for zero in bigger chunk by doing something
like this:

long long int *ip = (void *)buf;
for (size_t i; i < something && *ip == 0; i++);

but that does not work reliably on machines that need long long ints to
be aligned more strictly than char arrays. The classic example is when
buf is placed at an odd address rather than an even one.

[I've also cut out material that I don't think is needed and marked the
edits. These are things that keep Usenet posts readable in their own.]

--
Ben.

Mark Adler

unread,

Oct 11, 2011, 10:57:19 AM10/11/11

to

On 2011-10-11 07:27:32 -0700, Mark Adler said:
> You do a few single-byte compares at the beginning and end as
> necessary, and do the rest in the middle with larger word compares.

For example:

int isallz(char *buf, size_t len)
{
long *wide;

while ((size_t)buf & (sizeof(long) - 1))
if (len--, *buf++)
return 0;
wide = (long *)buf;
while (len >= sizeof(long))
if (len -= sizeof(long), *wide++)
return 0;
buf = (char *)wide;
while (len)
if (len--, *buf++)
return 0;
return 1;
}

This could be made faster by avoiding the len increments, but this is
just to illustrate the idea.

Mark

bartek szurgot

unread,

Oct 11, 2011, 11:30:26 AM10/11/11

to

On 10/11/2011 02:47 PM, Ben Bacarisse wrote:
> Just another few data points: I see the plain loop version as being 14
> times *slower* than the memcmp version for 512MB, provided I don't
> optimise! At gcc -O1 and above the memcmp takes twice the time the
> plain loops does. The reason seems to be that gcc replaces memcmp with
> a repz cmpsb instruction (plus housekeeping gubbins) and these string
> instructions have a reputation for being slow.
>
> The memcmp method is still four times faster than the loop (on this
> data, on this machine, with the library, with... etc) provided gcc can
> be told not to inline it to cmpsb instruction.
>
> To summarise:
> for loop memcmp
> gcc unoptimised 13.333s 0.932s
> gcc optimising 4.160s 8.250s
>
> I'd post the CPU, gcc version and so on but I think the real messages is
> *measure*. You can still be surprised after <mumble> years in this
> field.

i was surprised with your results even more than with my own. :) i
already removed test code i've wrote before posting my last message and
so i had to rewrite it once more... and bang! it looks like i had a bug
(probably #ifndef instead of #ifdef for switching implementations with
-D), because now i'm getting similar computation times i got before, but
both versions are faster for memcmp(). namely:

for() memcmp()
-g3 ~1.2[s] ~0.11[s]
-O3 ~0.4[s] ~0.11[s]

i think this can be ok, since it would be reasonable that call to ext.
library lasts the same amount of time, regardless of release/debug
builds and hand-made code to be faster, when optimized.

source code for this is:
http://pastebin.com/sP9hexnW#
(note: this code is only a PoC - it does not check all errors)

can you check your code as well, please?

any way the conclusion is 100% right. it is always worth checking, even
if you think you know the results. hardware is changing and it may
affect results a lot.

BartC

unread,

Oct 11, 2011, 11:52:31 AM10/11/11

to

"bartek szurgot" <no...@m.plz> wrote in message
news:j718r5$eaa$1...@z-news.wcss.wroc.pl...

> On 10/11/2011 06:35 AM, Mark Adler wrote:
>> if (MAX_LEN > 0 && buf[0] == 0 && memcmp(buf, buf + 1, MAX_LEN - 1) == 0)
>> puts("buf is all zeros");
>> else
>> puts("buf has at least one non-zero value");
>>
>> Though it's probably faster to compare each entry to zero, or even
>> faster to recast to an array of large integers and compare those to zero.
>
> i've checked 512MB array of zeros and on my PC simple "for(...)
> if(!zero) return false" was over twice as fast (~0.41[s] vs. ~0.11[s])
> as memcmp (used as above).
>
> i strongly suggest for() since it is both faster and easier for the
> reader to understand.

The reader doesn't need to understand it. Just wrap the whole thing in a
function, for example:

int checkallzeros(void *buffer, int nchars); /* or some such name */

then the reader just sees:

if (checkallzeros(buf, MAX_LEN)) ...

Inside checkallzeros(), just put whatever code is best.

This does make it more difficult to have a fixed-length zero array to
compare against, if that is the method, but not impossible. You can use a
static zero array up to certain limit, then some other method, or check a
section at a time, whatever. But the method is hidden, and can be maintained
more easily.

(And if you do have a 512MB array, then you might find it wasteful (in space
and bandwidth) to reserve another 512MB reference array consisting of zeros
to compare against.)

--
Bartc

Mark Adler

unread,

Oct 11, 2011, 11:54:46 AM10/11/11

to

On 2011-10-11 07:57:19 -0700, Mark Adler said:
> For example:
>
> int isallz(char *buf, size_t len)
> {
> long *wide;
>
> while ((size_t)buf & (sizeof(long) - 1))
> if (len--, *buf++)
> return 0;
> wide = (long *)buf;
> while (len >= sizeof(long))
> if (len -= sizeof(long), *wide++)
> return 0;
> buf = (char *)wide;
> while (len)
> if (len--, *buf++)
> return 0;
> return 1;
> }

And of course, there's an error in there as an exercise for the reader
to find. Anyway, it illustrates the approach.

Mark

Ben Bacarisse

unread,

Oct 11, 2011, 1:31:45 PM10/11/11

to

bartek szurgot <no...@m.plz> writes:

It would seem that your compiler is not replacing memcmp when
optimising. Taking this into account, the two sets of results agree
pretty well -- there's a nearly constant factor between your results and
mine.

I could leave it at that and pretend that my machine is old and ten
times slower than yours, but I'll come clean -- to get accurate times I
execute the test 10 times and I did not divide by the 10! I usually
don't bother to divide because I normally only care about ratios, not
actual MB/s.

> i think this can be ok, since it would be reasonable that call to ext.
> library lasts the same amount of time, regardless of release/debug
> builds and hand-made code to be faster, when optimized.
>
> source code for this is:
> http://pastebin.com/sP9hexnW#
> (note: this code is only a PoC - it does not check all errors)

It's C++. Probably makes no difference. I still get slower times when
optimising memcmp:

for loop memcmp
g++ 1.4 0.10
g++ -O2 0.43 0.83

> can you check your code as well, please?

I think the code is alright -- the units were wrong.
http://pastebin.com/cGZxpSia

> any way the conclusion is 100% right. it is always worth checking, even
> if you think you know the results. hardware is changing and it may
> affect results a lot.

--
Ben.

MaciekL

unread,

Oct 12, 2011, 4:03:07 AM10/12/11

to

int is_all_bytes_set(void const * data, int value, unsigned int size)
{
memset(&value, value, sizeof(value));
if (size > sizeof(value))
{
char const * ptr = (char const *)data;
return !(memcmp(ptr, &value, sizeof(value)) ||
memcmp(ptr, ptr + sizeof(value), size - sizeof(value)));
}
else
{
return !memcmp(&value, data, size);
}
}

bartek szurgot

unread,

Oct 12, 2011, 4:45:38 AM10/12/11

to

On 10/11/2011 07:31 PM, Ben Bacarisse wrote:

> bartek szurgot <no...@m.plz> writes:
> It would seem that your compiler is not replacing memcmp when
> optimising. Taking this into account, the two sets of results agree
> pretty well -- there's a nearly constant factor between your results and
> mine.
>
> I could leave it at that and pretend that my machine is old and ten
> times slower than yours, but I'll come clean -- to get accurate times I
> execute the test 10 times and I did not divide by the 10! I usually
> don't bother to divide because I normally only care about ratios, not
> actual MB/s.

ok - this explains a lot. :)

> It's C++. Probably makes no difference. I still get slower times when
> optimising memcmp:
>
> for loop memcmp
> g++ 1.4 0.10
> g++ -O2 0.43 0.83
>
>> can you check your code as well, please?
>
> I think the code is alright -- the units were wrong.
> http://pastebin.com/cGZxpSia

after compiling & comparing execution times of both codes i still
couldn't understand the results. why, in (logically) identical programs
for()-version gives similar times (buf[i] vs. buf++ was the only
difference), while memcmp() differs about 6x! :/

after doing some digging and exercising i found what looks like a gcc
performance problem.
http://pastebin.com/nMcD7f6z#
when compiling with "inline", time is far different then when no inline
is present. it does not affect debug version though. in short:

inline non-inline
-g3 ~0.8[s] ~0.8[s]
-O3 ~0.14[s] ~0.8[s]

i got this results for gcc-4.6. gcc-4.5 compiled code runs always
(deb/rel) in ~0.14[s] (i.e. fine). you probably observe similar effect
on your machine.

BartC

unread,

Oct 12, 2011, 5:30:01 AM10/12/11

to

"bartek szurgot" <no...@m.plz> wrote in message

news:j73k3j$hpb$1...@z-news.wcss.wroc.pl...

> On 10/11/2011 07:31 PM, Ben Bacarisse wrote:

> after doing some digging and exercising i found what looks like a gcc
> performance problem.
> http://pastebin.com/nMcD7f6z#
> when compiling with "inline", time is far different then when no inline
> is present. it does not affect debug version though. in short:

Benchmarking with gcc always gave me problems, especially when comparing
with other languages.

Anything over -O1 optimisation tended to remove essential bits of code whose
execution time I was trying to compare! So I only used -O1.

Although it's difficult with your essentially one-line program to see
exactly what it could leave out...

--
Bartc

bartek szurgot

unread,

Oct 12, 2011, 6:12:16 AM10/12/11

to

On 10/12/2011 11:30 AM, BartC wrote:
> Anything over -O1 optimisation tended to remove essential bits of code
> whose execution time I was trying to compare! So I only used -O1.

it is common for the compilers to remove dead code. it is actually a
nice feature - why computing something not used at all? there is a
simple trick to overcome this during measurements - output some
(randomly chosen) part of the result. then compiler cannot optimize-out
code, in such a situation, since it is used (printed on the screen). on
the other hand such a simple operation (usually) does not influence
measurements significantly.

io_x

unread,

Oct 12, 2011, 7:33:09 AM10/12/11

to

"bartek szurgot" <no...@m.plz> ha scritto nel messaggio
news:j73p60$k5g$1...@z-news.wcss.wroc.pl...

> On 10/12/2011 11:30 AM, BartC wrote:
>> Anything over -O1 optimisation tended to remove essential bits of code
>> whose execution time I was trying to compare! So I only used -O1.
>
> it is common for the compilers to remove dead code. it is actually a
> nice feature - why computing something not used at all? there is a
> simple trick to overcome this during measurements - output some
> (randomly chosen) part of the result. then compiler cannot optimize-out
> code, in such a situation, since it is used (printed on the screen). on
> the other hand such a simple operation (usually) does not influence
> measurements significantly.

i not like what you say

Ben Bacarisse

unread,

Oct 12, 2011, 8:31:55 AM10/12/11

to

bartek szurgot <no...@m.plz> writes:
<snip>

> after compiling & comparing execution times of both codes i still
> couldn't understand the results. why, in (logically) identical programs
> for()-version gives similar times (buf[i] vs. buf++ was the only
> difference), while memcmp() differs about 6x! :/

I thought I explained that when I posted the first times. I certainly
had an explanation that for what was happening on my machine with my
version of gcc.

> after doing some digging and exercising i found what looks like a gcc
> performance problem.
> http://pastebin.com/nMcD7f6z#
> when compiling with "inline", time is far different then when no inline
> is present. it does not affect debug version though. in short:
>
> inline non-inline
> -g3 ~0.8[s] ~0.8[s]
> -O3 ~0.14[s] ~0.8[s]
>
> i got this results for gcc-4.6. gcc-4.5 compiled code runs always
> (deb/rel) in ~0.14[s] (i.e. fine). you probably observe similar effect
> on your machine.

No, I don't, but I don't think it matters. For one thing, you are using
C++ and I am using C. We might have different versions of gcc and/or
the C library. Who knows? The key points, for me, are:

(a) Often, there is no useful notion of "faster" code. The same code
will perform differently in different environments or even with
different compiler flags.

(b) Even in one environment, what's fast will depend on the data. We'd
probably be having a different discussion if you'd chosen a 30 byte
buffer.

(c) It's useful to have a feel for the relative costs of certain basic
operations, but clarity of expression will often trump speed, unless
measurement shows that readability must be sacrificed to get a program
performing as required.[1]

(d) In most cases, it's algorithmic changes that will get you really
valuable performance gains.

[1] I've just spent some time writing a slow Reed-Solomon decoder. All
the existing code I looked at was super fast but, for me, almost
unintelligible. That was probably the right thing for those authors to
do, since there are many situations where fast decoding is essential.
But I wanted to be able to see what the code was really doing, so I have
de-optimised like mad. The result is probably 10 (maybe more!) times
slower than the fastest software decoders, but I now largely understand
it, and it is fast enough because the program spends a tiny faction of
its run time decoding a small amount of data

--
Ben.

Jorgen Grahn

unread,

Oct 12, 2011, 11:13:30 AM10/12/11

to

On Wed, 2011-10-12, BartC wrote:
> "bartek szurgot" <no...@m.plz> wrote in message
> news:j73k3j$hpb$1...@z-news.wcss.wroc.pl...
>> On 10/11/2011 07:31 PM, Ben Bacarisse wrote:
>
>> after doing some digging and exercising i found what looks like a gcc
>> performance problem.
>> http://pastebin.com/nMcD7f6z#
>> when compiling with "inline", time is far different then when no inline
>> is present. it does not affect debug version though. in short:
>
> Benchmarking with gcc always gave me problems, especially when comparing
> with other languages.
>
> Anything over -O1 optimisation tended to remove essential bits of code whose
> execution time I was trying to compare! So I only used -O1.

So you'd rather have meaningless results? If your benchmark doesn't
survive the optimizer, it's not relevant to real code.

You *have* to make sure the things you are calculating actually get
used, as far as the compiler can tell. One way is feeding it into some
void touch(void*) function in another translation unit -- there's no
way around that as far as I can see.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Noob

unread,

Oct 12, 2011, 11:34:58 AM10/12/11

to

Jorgen Grahn wrote:

> You *have* to make sure the things you are calculating actually get
> used, as far as the compiler can tell. One way is feeding it into some
> void touch(void*) function in another translation unit -- there's no
> way around that as far as I can see.

Some optimizations can be deferred until link-time ;-)

e.g.
http://gcc.gnu.org/wiki/LinkTimeOptimization
http://llvm.org/docs/LinkTimeOptimization.html

Regards.

Gordon Burditt

unread,

Oct 13, 2011, 1:10:33 AM10/13/11

to

> Benchmarking with gcc always gave me problems, especially when comparing
> with other languages.
>
> Anything over -O1 optimisation tended to remove essential bits of code whose
> execution time I was trying to compare! So I only used -O1.

Check the results that you are computing. If they are not what you
expected, add at least 72 hours (estimated clock time to fix the
code, especially if discovered at the beginning of a weekend) to
the execution time. Remember, if you don't need a correct answer,
any calculation can be done in zero time. And actually using the
results of the calculation should prevent gcc from deleting major
parts of the calculation.

A quick way of doing this that shouldn't disturb the timing much
is to exit(EXIT_SUCCESS) if the result matched what was expected
and exit(EXIT_FAILURE) if it didn't. (It's OK if the correct answer
is a range for floating-point calculations).

Note that if you measure by using the entire runtime of an executable,
there is a good chance that loading the program (including disk
i/o) and, if applicable, running the dynamic linker (including more
disk i/o of libraries), dominates anything your program is doing.

Another alternative for timing is to take the value of clock()
before and after executing the computation you are trying to time,
and after the timing interval is over, print the results and the
timing. Be sure that you expect the execution time of the code to
be at least 100 times the timing resolution of clock() in order to
get a decent measurement. Also be sure that the time won't exceed
the point where clock_t overflows. (If clock_t is returned in
microseconds and is an unsigned 32-bit integer quantity, it overflows
in a bit over 71 minutes). If you loop through the calculation in
Order to make it take more time, average the results.

clock_t start;
clock_t stop;

start = clock();
result = ... code you're trying to time ...;
stop = clock();

Compute stop - start (allowing for wraparound) and convert
to more reasonable units, like seconds and fractions of
seconds.

Print result and timing.

To evaluate the results properly, you will need to know the
(system-dependent) timing resolution of clock(). Just because the
value returned is in, say, microseconds doesn't mean that the return
value of clock() doesn't increment by 16667 microseconds every
1/60th of a second, or even that the timing base isn't a sundial
calibrated in 15 minute intervals. There's a good chance then that
your times will (legitimately!) come out 0 most of the time. printf()
statements, etc. won't affect the timing measurement because at
this point it's already finished.

Jorgen Grahn

unread,

Oct 13, 2011, 4:53:15 AM10/13/11

to

True, but that is not widely used yet. The gcc version seemed (last
time I looked) as an experimental feature.