The "population count" in a bit string should be the number of
bits set (1) in it. It is identical to the "accumulate" function
(The sum of all bits).
I have developed this code, but I am not really satisfied with
it, specifically
o portability
o compatibility with C89
I assume that an unsigned long is at least 32 bits wide.
Thanks in advance for suggestions/comments.
static unsigned bitcount(uintmax_t x)
{
if (64 == sizeof(x)*CHAR_BIT) {
x -= (x>>1) & 0x5555555555555555UL; // 0-2 in 2 bits
x = ((x>>2) & 0x3333333333333333UL) + (x & 0x3333333333333333UL); // 0-4 in 4 bits
x = ((x>>4) + x) & 0x0f0f0f0f0f0f0f0fUL; // 0-8 in 8 bits
x *= 0x0101010101010101UL;
return x>>56;
}
else { // Assume that 32 bit unsigned long exists.
x -= (x>>1) & 0x55555555UL; // 0-2 in 2 bits
x = ((x>>2) & 0x33333333UL) + (x & 0x33333333UL); // 0-4 in 4 bits
x = ((x>>4) + x) & 0x0f0f0f0fUL; // 0-8 in 8 bits
x *= 0x01010101UL;
return x>>24;
}
}
/* In C89 uintmax_t doesn't exist. I use: */
#ifndef NO_C99
#include <stdint.h>
#else
typedef unsigned long uintmax_t;
#endif
static uintmax_t PopulationCount(BitString *b)
{
uintmax_t *pumax,x,result=0;
size_t iterations;
int count;
unsigned char *p;
if (b == NULL || b->count == 0)
return 0;
/* Treat 32 or 64 bits at once */
pumax = (uintmax_t *)b->contents;
/* Since b->contents is the result of a malloc operation,
it must be aligned correctly for all types, including
uintmax_t
*/
iterations = b->count/(sizeof(x)*CHAR_BIT);
/* Position the pointer p to get the last few bytes
that are left once the uintmax_t pointer finishes.
*/
p = b->contents+(iterations*sizeof(x));
while (iterations > 0) {
x = *pumax++;
if (x != 0)
result += bitcount(x);
iterations--;
}
count = (b->count%(sizeof(x)*CHAR_BIT));
if (count) {
x=0;
/* count is NOT unsigned! This supposes that an
integer can hold CHAR_BIT-1 characters.
*/
while (count > 0) {
x = (x << CHAR_BIT) | *p++;
count -= CHAR_BIT;
}
result += bitcount(x);
}
return result;
}
---------------------------------------------------------------------------------
I don't have much to say about the algorithm at the time, but I do
have some minor comments.
> I assume that an unsigned long is at least 32 bits wide.
There's no need to state this assumption; unsigned long is guaranteed
to be at least 32 bits wide.
I think you're ignoring the possibility of padding bits. That's not a
bad assumption to make, but you might want to make it explicit.
[...]
> /* In C89 uintmax_t doesn't exist. I use: */
> #ifndef NO_C99
> #include <stdint.h>
> #else
> typedef unsigned long uintmax_t;
> #endif
How is NO_C99 defined?
In principle, you should be able to test the value of
__STDC_VERSION__, but there could be non-C99 compilers that provide
<stdint.h> and/or have an unsigned type bigger than unsigned long.
Perhaps you could configure it by generating an executing a small test
program, much like autoconf does.
> static uintmax_t PopulationCount(BitString *b)
> {
[...]
> /* count is NOT unsigned! This supposes that an
> integer can hold CHAR_BIT-1 characters.
> */
Why isn't count unsigned?
You should rephrase that second sentence. First, I think you mean
"int" rather than "integer". Second, an int almost certainly can't
hold CHAR_BIT-1 characters; you probably mean that it can hold values
up to CHAR_BIT-1.
You might consider a compile-time assertion of some sort, in addition
to describing your assumptions in comments.
[...]
--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
This is a case where straightforward code will often be faster,
I think, than a bit-twiddling trick. Here's what I use:
/* Table for fast bit counting */
#define BC1(x) x, x+1, x+1, x+2
#define BC2(x) BC1(x), BC1(x+1), BC1(x+1), BC1(x+2)
#define BC3(x) BC2(x), BC2(x+1), BC2(x+1), BC2(x+2)
#define BC4(x) BC3(x), BC3(x+1), BC3(x+1), BC3(x+2)
#define BC5(x) BC4(x), BC4(x+1), BC4(x+1), BC4(x+2)
static char BitCnt[1 << 11] = {
BC5(0), BC5(1),
};
int ones32(Uint32 g)
{
return 0
+ BitCnt[g & 0x7ff]
+ BitCnt[g>>11 & 0x7ff]
+ BitCnt[g>>22 & 0x3ff];
}
This code assumes the argument to ones32 has 32 bits
exactly. Other cases left as exercise. :-)
James Dow Allen
This is in C99, I do not know if C89 made this assumption.
> I think you're ignoring the possibility of padding bits. That's not a
> bad assumption to make, but you might want to make it explicit.
>
> [...]
>
>> /* In C89 uintmax_t doesn't exist. I use: */
>> #ifndef NO_C99
>> #include <stdint.h>
>> #else
>> typedef unsigned long uintmax_t;
>> #endif
>
> How is NO_C99 defined?
>
By a configurable define in the header file.
> In principle, you should be able to test the value of
> __STDC_VERSION__, but there could be non-C99 compilers that provide
> <stdint.h> and/or have an unsigned type bigger than unsigned long.
>
> Perhaps you could configure it by generating an executing a small test
> program, much like autoconf does.
>
>> static uintmax_t PopulationCount(BitString *b)
>> {
> [...]
>> /* count is NOT unsigned! This supposes that an
>> integer can hold CHAR_BIT-1 characters.
>> */
>
> Why isn't count unsigned?
>
Because I test for count > 0 !!!
If you have a count of bits left of (say) 5, subtracting 8 will give
a negative number. I test for that. If it would be unsigned the test would
not work.
C89 int is at least 16 bits.
C89 long is at least 32 bits.
--
pete
It did.
And it's not an assumption, it's a requirement. If an implementation
has unsigned long narrower than 32 bits, that implementation does not
conform either to C89 or to C99.
The only minor quibble with that code is that if the bit string is a
representation of a signed quantity, maintaining the sign could be a
mistake - depending on the semantics of the situation.
Other than that, it seems quite OK. C89 required unsigned long
integers to be at least 32-bits wide, so you're in the clear there.
Portability though is a different issue, you're likely to encounter
weird issues there.