#define MAX_LEN 6
unsigned char buf[MAX_LEN];
unsigned char zerobuf[MAX_LEN] = { 0 };
if (memcmp(buf, zerobuf, MAX_LEN) == 0)
puts("buf is empty");
else
puts("buf is not empty");
Is there another way to compare the input buffer for being equal to some
pattern (all zeros in this case), i.e. ti avoid having additional array
zerobuf? I think memcmp(buf, "0", MAX_LEN) isn't valid.
Thanks.
Mark
That's correct, memcmp(buf, "0", MAX_LEN) doesn't work.
One reason it won't work is that the character '0' is not the same
as a zero byte ('\0'). Another problem is that the array "0" has
less than MAX_LEN elements so there's the danger of reading beyond
the end of that array.
One other way to do it, that does not use an additional array,
is a simple linear search for the first nonzero element:
size_t i = 0;
while (i != MAX_LEN && buf[i] == 0) ++i;
if (i == MAX_LEN) puts("all zeroes"); else puts("not all zeroes");
Yet another way is to sort the buf array in descending order and
check whether buf[0] equals zero. If it does, the array contains
only zeroes.
There must be at least fifty other ways to do it.
If your compiler supports compound literals (a "new" feature in C99),
you can do something like this:
if (memcmp(buf, (char[MAX_LEN]){ 0 }, MAX_LEN) == 0)
puts("buf is empty");
else
puts("buf is not empty");
In the more general case -- well, it depends on just what the more
general case turns out to be. This may be a specific example of a
more general problem, but it's hard to tell what that more general
problem is.
(Note that an array is never really "empty". Your array is full; it
just happens to be full of zeros.)
--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
I'm surprised that no one offered the obvious answer that uses memcmp()
without another array:
if (MAX_LEN > 0 && buf[0] == 0 && memcmp(buf, buf + 1, MAX_LEN - 1) == 0)
puts("buf is all zeros");
else
puts("buf has at least one non-zero value");
Though it's probably faster to compare each entry to zero, or even
faster to recast to an array of large integers and compare those to
zero.
Mark
Usual alignment caveats apply!
--
Ian Collins
> On 10/10/2011 5:38 PM, Mark wrote:
>> Hello,
>>
>> #define MAX_LEN 6
>> unsigned char buf[MAX_LEN];
>> unsigned char zerobuf[MAX_LEN] = { 0 };
>>
>> if (memcmp(buf, zerobuf, MAX_LEN) == 0)
>> puts("buf is empty");
>> else
>> puts("buf is not empty");
>>
>> Is there another way to compare the input buffer for being equal to some
>> pattern (all zeros in this case), i.e. ti avoid having additional array
>> zerobuf? I think memcmp(buf, "0", MAX_LEN) isn't valid.
>
> You think rightly, as others have explained.
>
> To use memcmp(), you'll need to compare to a region of memory as
> long as the test region, pre-initialized to the desired value. That's
> inescapable: it's how memcmp() operates.
There's a memcmp method that does not quite fit that description, though
it is not likely to useful.
In the same way that an object can be copied by successively doubling
the copy region, so a region can be compared for all bytes the same by
doubling the compare region (and it can be altered to handle longer
patterns than all-bytes the same)
assert(BUF_SZ != 0);
size_t clen = 1;
while (2*clen < BUF_SZ)
if (memcmp(buf, buf + clen, clen) == 0)
clen *= 2;
else return false;
return memcmp(buf, buf + clen, BUF_SZ - clen) == 0 && buf[0] == target;
Rather like the previous suggestion to change a O(n) algorithm (a
simple loop) into a O(n log(n)) one (sort and test buf[0]), this is no
more than an exercise in possibilities.
--
Ben.
What if the buffer is at an odd address?
--
Ian Collins
Yes, I missed the last sentence right up to the point I hit send! Then
I what what was being proposed (and you were commenting on).
--
Ben.
> On 2011-10-10 14:38:59 -0700, Mark said:
>> Is there another way to compare the input buffer for being equal to
>> some pattern (all zeros in this case), i.e. ti avoid having
>> additional array zerobuf?
>
> I'm surprised that no one offered the obvious answer that uses
> memcmp() without another array:
I can't speak for others, but I didn't because I didn't think it was
assured to work. I assumed (wrongly, of course) that memcmp would be
undefined when applied to overlapping regions. I isn't, so it's a fine
solution.
> if (MAX_LEN > 0 && buf[0] == 0 && memcmp(buf, buf + 1, MAX_LEN - 1) == 0)
> puts("buf is all zeros");
> else
> puts("buf has at least one non-zero value");
<snip>
--
Ben.
i've checked 512MB array of zeros and on my PC simple "for(...)
if(!zero) return false" was over twice as fast (~0.41[s] vs. ~0.11[s])
as memcmp (used as above).
i strongly suggest for() since it is both faster and easier for the
reader to understand.
--
pozdrawiam serdecznie / best regards,
Bartek 'BaSz' Szurgot
For me, the reason I didn't suggest that was because it wasn't obvious
to me.
> if (MAX_LEN > 0 && buf[0] == 0 && memcmp(buf, buf + 1, MAX_LEN - 1) == 0)
> puts("buf is all zeros");
> else
> puts("buf has at least one non-zero value");
While I admire your ingenuity in coming up with that approach, I'd still
favor the simpler, more direct approach:
> Though it's probably faster to compare each entry to zero, ...
--
James Kuyper
> On 10/11/2011 06:35 AM, Mark Adler wrote:
>> if (MAX_LEN > 0 && buf[0] == 0 && memcmp(buf, buf + 1, MAX_LEN - 1) == 0)
>> puts("buf is all zeros");
>> else
>> puts("buf has at least one non-zero value");
>>
>> Though it's probably faster to compare each entry to zero, or even
>> faster to recast to an array of large integers and compare those to zero.
>
> i've checked 512MB array of zeros and on my PC simple "for(...)
> if(!zero) return false" was over twice as fast (~0.41[s] vs. ~0.11[s])
> as memcmp (used as above).
>
> i strongly suggest for() since it is both faster and easier for the
> reader to understand.
Just another few data points: I see the plain loop version as being 14
times *slower* than the memcmp version for 512MB, provided I don't
optimise! At gcc -O1 and above the memcmp takes twice the time the
plain loops does. The reason seems to be that gcc replaces memcmp with
a repz cmpsb instruction (plus housekeeping gubbins) and these string
instructions have a reputation for being slow.
The memcmp method is still four times faster than the loop (on this
data, on this machine, with the library, with... etc) provided gcc can
be told not to inline it to cmpsb instruction.
To summarise:
for loop memcmp
gcc unoptimised 13.333s 0.932s
gcc optimising 4.160s 8.250s
I'd post the CPU, gcc version and so on but I think the real messages is
*measure*. You can still be surprised after <mumble> years in this
field.
--
Ben.
"Ben Bacarisse" <ben.u...@bsb.me.uk>
--
Rufus
--- Posted via news://freenews.netfront.net/ - Complaints to ne...@netfront.net ---
C allows implementations to have restrictions, called "alignment
requirements" on where in memory objects of a certain type can be
stored. While there is no portable C meaning to the concept of an "odd
address", it is in fact usually the case that associated with every
address is a number, and on some platforms the addresses associated with
odd numbers could fail to meet the alignment requirements of the "large
integers" referred to below. Such restrictions are actually commonplace.
> "Ben Bacarisse" <ben.u...@bsb.me.uk>
>> Ian Collins <ian-...@hotmail.com> writes:
>>
>>> On 10/11/11 10:11 PM, Ben Bacarisse wrote:
>>>> Ian Collins<ian-...@hotmail.com> writes:
>>>>
>>>>> On 10/11/11 05:35 PM, Mark Adler wrote:
>>>>>> On 2011-10-10 14:38:59 -0700, Mark said:
>>>>>>> Is there another way to compare the input buffer for being equal to
>>>>>>> some pattern (all zeros in this case), i.e. ti avoid having
>>>>>>> additional
>>>>>>> array zerobuf?
>>>>>>
>>>>>> I'm surprised that no one offered the obvious answer that uses
>>>>>> memcmp()
>>>>>> without another array:
>>>>>>
>>>>>> if (MAX_LEN> 0&& buf[0] == 0&& memcmp(buf, buf + 1, MAX_LEN - 1)
>>>>>> == 0)
>>>>>> puts("buf is all zeros");
>>>>>> else
>>>>>> puts("buf has at least one non-zero value");
>>>>>>
>>>>>> Though it's probably faster to compare each entry to zero, or even
>>>>>> faster to recast to an array of large integers and compare those to
>>>>>> zero.
If the "large integers" he's referring to have alignment requirements,
there's no guarantee that buf satisfies those requirements. If it does
not, the "cast" he's referring to in "recast" will have undefined behavior.
--
James Kuyper
Top posting messes up the discussion. I've moved your remark to where I
would have put it and I think you'll see that it makes things
simpler to follow.
> "Ben Bacarisse" <ben.u...@bsb.me.uk>
>> Ian Collins <ian-...@hotmail.com> writes:
>>
>>> On 10/11/11 10:11 PM, Ben Bacarisse wrote:
>>>> Ian Collins<ian-...@hotmail.com> writes:
>>>>
>>>>> On 10/11/11 05:35 PM, Mark Adler wrote:
>>>>>> On 2011-10-10 14:38:59 -0700, Mark said:
<snip>
>>>>>> if (MAX_LEN> 0&& buf[0] == 0&& memcmp(buf, buf + 1, MAX_LEN - 1)
>>>>>> == 0)
>>>>>> puts("buf is all zeros");
>>>>>> else
>>>>>> puts("buf has at least one non-zero value");
>>>>>>
>>>>>> Though it's probably faster to compare each entry to zero, or even
>>>>>> faster to recast to an array of large integers and compare those to
>>>>>> zero.
>>>>>
>>>>> Usual alignment caveats apply!
>>>>
>>>> Are there any in this case?
>>>
>>> What if the buffer is at an odd address?
>>
>> Yes, I missed the last sentence right up to the point I hit send! Then
>> I [saw] what was being proposed (and you were commenting on).
>
> How does it make sense that "the buffer is at odd address"?
Mark was suggesting testing for zero in bigger chunk by doing something
like this:
long long int *ip = (void *)buf;
for (size_t i; i < something && *ip == 0; i++);
but that does not work reliably on machines that need long long ints to
be aligned more strictly than char arrays. The classic example is when
buf is placed at an odd address rather than an even one.
[I've also cut out material that I don't think is needed and marked the
edits. These are things that keep Usenet posts readable in their own.]
--
Ben.
And of course, there's an error in there as an exercise for the reader
to find. Anyway, it illustrates the approach.
Mark
It would seem that your compiler is not replacing memcmp when
optimising. Taking this into account, the two sets of results agree
pretty well -- there's a nearly constant factor between your results and
mine.
I could leave it at that and pretend that my machine is old and ten
times slower than yours, but I'll come clean -- to get accurate times I
execute the test 10 times and I did not divide by the 10! I usually
don't bother to divide because I normally only care about ratios, not
actual MB/s.
> i think this can be ok, since it would be reasonable that call to ext.
> library lasts the same amount of time, regardless of release/debug
> builds and hand-made code to be faster, when optimized.
>
> source code for this is:
> http://pastebin.com/sP9hexnW#
> (note: this code is only a PoC - it does not check all errors)
It's C++. Probably makes no difference. I still get slower times when
optimising memcmp:
for loop memcmp
g++ 1.4 0.10
g++ -O2 0.43 0.83
> can you check your code as well, please?
I think the code is alright -- the units were wrong.
http://pastebin.com/cGZxpSia
> any way the conclusion is 100% right. it is always worth checking, even
> if you think you know the results. hardware is changing and it may
> affect results a lot.
--
Ben.
int is_all_bytes_set(void const * data, int value, unsigned int size)
{
memset(&value, value, sizeof(value));
if (size > sizeof(value))
{
char const * ptr = (char const *)data;
return !(memcmp(ptr, &value, sizeof(value)) ||
memcmp(ptr, ptr + sizeof(value), size - sizeof(value)));
}
else
{
return !memcmp(&value, data, size);
}
}
ok - this explains a lot. :)
> It's C++. Probably makes no difference. I still get slower times when
> optimising memcmp:
>
> for loop memcmp
> g++ 1.4 0.10
> g++ -O2 0.43 0.83
>
>> can you check your code as well, please?
>
> I think the code is alright -- the units were wrong.
> http://pastebin.com/cGZxpSia
after compiling & comparing execution times of both codes i still
couldn't understand the results. why, in (logically) identical programs
for()-version gives similar times (buf[i] vs. buf++ was the only
difference), while memcmp() differs about 6x! :/
after doing some digging and exercising i found what looks like a gcc
performance problem.
http://pastebin.com/nMcD7f6z#
when compiling with "inline", time is far different then when no inline
is present. it does not affect debug version though. in short:
inline non-inline
-g3 ~0.8[s] ~0.8[s]
-O3 ~0.14[s] ~0.8[s]
i got this results for gcc-4.6. gcc-4.5 compiled code runs always
(deb/rel) in ~0.14[s] (i.e. fine). you probably observe similar effect
on your machine.
I thought I explained that when I posted the first times. I certainly
had an explanation that for what was happening on my machine with my
version of gcc.
> after doing some digging and exercising i found what looks like a gcc
> performance problem.
> http://pastebin.com/nMcD7f6z#
> when compiling with "inline", time is far different then when no inline
> is present. it does not affect debug version though. in short:
>
> inline non-inline
> -g3 ~0.8[s] ~0.8[s]
> -O3 ~0.14[s] ~0.8[s]
>
> i got this results for gcc-4.6. gcc-4.5 compiled code runs always
> (deb/rel) in ~0.14[s] (i.e. fine). you probably observe similar effect
> on your machine.
No, I don't, but I don't think it matters. For one thing, you are using
C++ and I am using C. We might have different versions of gcc and/or
the C library. Who knows? The key points, for me, are:
(a) Often, there is no useful notion of "faster" code. The same code
will perform differently in different environments or even with
different compiler flags.
(b) Even in one environment, what's fast will depend on the data. We'd
probably be having a different discussion if you'd chosen a 30 byte
buffer.
(c) It's useful to have a feel for the relative costs of certain basic
operations, but clarity of expression will often trump speed, unless
measurement shows that readability must be sacrificed to get a program
performing as required.[1]
(d) In most cases, it's algorithmic changes that will get you really
valuable performance gains.
[1] I've just spent some time writing a slow Reed-Solomon decoder. All
the existing code I looked at was super fast but, for me, almost
unintelligible. That was probably the right thing for those authors to
do, since there are many situations where fast decoding is essential.
But I wanted to be able to see what the code was really doing, so I have
de-optimised like mad. The result is probably 10 (maybe more!) times
slower than the fastest software decoders, but I now largely understand
it, and it is fast enough because the program spends a tiny faction of
its run time decoding a small amount of data
--
Ben.
Check the results that you are computing. If they are not what you
expected, add at least 72 hours (estimated clock time to fix the
code, especially if discovered at the beginning of a weekend) to
the execution time. Remember, if you don't need a correct answer,
any calculation can be done in zero time. And actually using the
results of the calculation should prevent gcc from deleting major
parts of the calculation.
A quick way of doing this that shouldn't disturb the timing much
is to exit(EXIT_SUCCESS) if the result matched what was expected
and exit(EXIT_FAILURE) if it didn't. (It's OK if the correct answer
is a range for floating-point calculations).
Note that if you measure by using the entire runtime of an executable,
there is a good chance that loading the program (including disk
i/o) and, if applicable, running the dynamic linker (including more
disk i/o of libraries), dominates anything your program is doing.
Another alternative for timing is to take the value of clock()
before and after executing the computation you are trying to time,
and after the timing interval is over, print the results and the
timing. Be sure that you expect the execution time of the code to
be at least 100 times the timing resolution of clock() in order to
get a decent measurement. Also be sure that the time won't exceed
the point where clock_t overflows. (If clock_t is returned in
microseconds and is an unsigned 32-bit integer quantity, it overflows
in a bit over 71 minutes). If you loop through the calculation in
Order to make it take more time, average the results.
clock_t start;
clock_t stop;
start = clock();
result = ... code you're trying to time ...;
stop = clock();
Compute stop - start (allowing for wraparound) and convert
to more reasonable units, like seconds and fractions of
seconds.
Print result and timing.
To evaluate the results properly, you will need to know the
(system-dependent) timing resolution of clock(). Just because the
value returned is in, say, microseconds doesn't mean that the return
value of clock() doesn't increment by 16667 microseconds every
1/60th of a second, or even that the timing base isn't a sundial
calibrated in 15 minute intervals. There's a good chance then that
your times will (legitimately!) come out 0 most of the time. printf()
statements, etc. won't affect the timing measurement because at
this point it's already finished.
True, but that is not widely used yet. The gcc version seemed (last
time I looked) as an experimental feature.