Population count of a bit string

jacob navia

unread,

Nov 22, 2009, 4:01:55 PM11/22/09

to

Hi

The "population count" in a bit string should be the number of
bits set (1) in it. It is identical to the "accumulate" function
(The sum of all bits).

I have developed this code, but I am not really satisfied with
it, specifically

o portability
o compatibility with C89

I assume that an unsigned long is at least 32 bits wide.

Thanks in advance for suggestions/comments.

static unsigned bitcount(uintmax_t x)
{
if (64 == sizeof(x)*CHAR_BIT) {
x -= (x>>1) & 0x5555555555555555UL; // 0-2 in 2 bits
x = ((x>>2) & 0x3333333333333333UL) + (x & 0x3333333333333333UL); // 0-4 in 4 bits
x = ((x>>4) + x) & 0x0f0f0f0f0f0f0f0fUL; // 0-8 in 8 bits
x *= 0x0101010101010101UL;
return x>>56;
}
else { // Assume that 32 bit unsigned long exists.
x -= (x>>1) & 0x55555555UL; // 0-2 in 2 bits
x = ((x>>2) & 0x33333333UL) + (x & 0x33333333UL); // 0-4 in 4 bits
x = ((x>>4) + x) & 0x0f0f0f0fUL; // 0-8 in 8 bits
x *= 0x01010101UL;
return x>>24;
}
}

/* In C89 uintmax_t doesn't exist. I use: */
#ifndef NO_C99
#include <stdint.h>
#else
typedef unsigned long uintmax_t;
#endif
static uintmax_t PopulationCount(BitString *b)
{
uintmax_t *pumax,x,result=0;
size_t iterations;
int count;
unsigned char *p;

if (b == NULL || b->count == 0)
return 0;
/* Treat 32 or 64 bits at once */
pumax = (uintmax_t *)b->contents;
/* Since b->contents is the result of a malloc operation,
it must be aligned correctly for all types, including
uintmax_t
*/
iterations = b->count/(sizeof(x)*CHAR_BIT);
/* Position the pointer p to get the last few bytes
that are left once the uintmax_t pointer finishes.
*/
p = b->contents+(iterations*sizeof(x));
while (iterations > 0) {
x = *pumax++;
if (x != 0)
result += bitcount(x);
iterations--;
}
count = (b->count%(sizeof(x)*CHAR_BIT));
if (count) {
x=0;
/* count is NOT unsigned! This supposes that an
integer can hold CHAR_BIT-1 characters.
*/
while (count > 0) {
x = (x << CHAR_BIT) | *p++;
count -= CHAR_BIT;
}
result += bitcount(x);
}
return result;
}

---------------------------------------------------------------------------------

Keith Thompson

unread,

Nov 22, 2009, 8:33:58 PM11/22/09

to

jacob navia <ja...@nospam.org> writes:
> The "population count" in a bit string should be the number of
> bits set (1) in it. It is identical to the "accumulate" function
> (The sum of all bits).
>
> I have developed this code, but I am not really satisfied with
> it, specifically
>
> o portability
> o compatibility with C89

I don't have much to say about the algorithm at the time, but I do
have some minor comments.

> I assume that an unsigned long is at least 32 bits wide.

There's no need to state this assumption; unsigned long is guaranteed
to be at least 32 bits wide.

I think you're ignoring the possibility of padding bits. That's not a
bad assumption to make, but you might want to make it explicit.

[...]

> /* In C89 uintmax_t doesn't exist. I use: */
> #ifndef NO_C99
> #include <stdint.h>
> #else
> typedef unsigned long uintmax_t;
> #endif

How is NO_C99 defined?

In principle, you should be able to test the value of
__STDC_VERSION__, but there could be non-C99 compilers that provide
<stdint.h> and/or have an unsigned type bigger than unsigned long.

Perhaps you could configure it by generating an executing a small test
program, much like autoconf does.

> static uintmax_t PopulationCount(BitString *b)
> {
[...]

> /* count is NOT unsigned! This supposes that an
> integer can hold CHAR_BIT-1 characters.
> */

Why isn't count unsigned?

You should rephrase that second sentence. First, I think you mean
"int" rather than "integer". Second, an int almost certainly can't
hold CHAR_BIT-1 characters; you probably mean that it can hold values
up to CHAR_BIT-1.

You might consider a compile-time assertion of some sort, in addition
to describing your assumptions in comments.

[...]

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

James Dow Allen

unread,

Nov 23, 2009, 2:49:10 AM11/23/09

to

On Nov 23, 4:01 am, jacob navia <ja...@nospam.org> wrote:
> The "population count" in a bit string should be the number of
> bits set (1) in it. It is identical to the "accumulate" function
> (The sum of all bits).

This is a case where straightforward code will often be faster,
I think, than a bit-twiddling trick. Here's what I use:

/* Table for fast bit counting */
#define BC1(x) x, x+1, x+1, x+2
#define BC2(x) BC1(x), BC1(x+1), BC1(x+1), BC1(x+2)
#define BC3(x) BC2(x), BC2(x+1), BC2(x+1), BC2(x+2)
#define BC4(x) BC3(x), BC3(x+1), BC3(x+1), BC3(x+2)
#define BC5(x) BC4(x), BC4(x+1), BC4(x+1), BC4(x+2)
static char BitCnt[1 << 11] = {
BC5(0), BC5(1),
};

int ones32(Uint32 g)
{
return 0
+ BitCnt[g & 0x7ff]
+ BitCnt[g>>11 & 0x7ff]
+ BitCnt[g>>22 & 0x3ff];
}

This code assumes the argument to ones32 has 32 bits
exactly. Other cases left as exercise. :-)

James Dow Allen

jacob navia

unread,

Nov 23, 2009, 4:44:09 AM11/23/09

to

Keith Thompson a �crit :

> jacob navia <ja...@nospam.org> writes:
>> The "population count" in a bit string should be the number of
>> bits set (1) in it. It is identical to the "accumulate" function
>> (The sum of all bits).
>>
>> I have developed this code, but I am not really satisfied with
>> it, specifically
>>
>> o portability
>> o compatibility with C89
>
> I don't have much to say about the algorithm at the time, but I do
> have some minor comments.
>
>> I assume that an unsigned long is at least 32 bits wide.
>
> There's no need to state this assumption; unsigned long is guaranteed
> to be at least 32 bits wide.
>

This is in C99, I do not know if C89 made this assumption.

> I think you're ignoring the possibility of padding bits. That's not a
> bad assumption to make, but you might want to make it explicit.
>

> [...]
>
>> /* In C89 uintmax_t doesn't exist. I use: */
>> #ifndef NO_C99
>> #include <stdint.h>
>> #else
>> typedef unsigned long uintmax_t;
>> #endif
>
> How is NO_C99 defined?
>

By a configurable define in the header file.

> In principle, you should be able to test the value of
> __STDC_VERSION__, but there could be non-C99 compilers that provide
> <stdint.h> and/or have an unsigned type bigger than unsigned long.
>
> Perhaps you could configure it by generating an executing a small test
> program, much like autoconf does.
>
>> static uintmax_t PopulationCount(BitString *b)
>> {
> [...]
>> /* count is NOT unsigned! This supposes that an
>> integer can hold CHAR_BIT-1 characters.
>> */
>
> Why isn't count unsigned?
>

Because I test for count > 0 !!!
If you have a count of bits left of (say) 5, subtracting 8 will give
a negative number. I test for that. If it would be unsigned the test would
not work.

pete

unread,

Nov 23, 2009, 7:01:56 AM11/23/09

to

jacob navia wrote:
> Keith Thompson a �crit :
>
>> jacob navia <ja...@nospam.org> writes:
>>
>>> The "population count" in a bit string should be the number of
>>> bits set (1) in it. It is identical to the "accumulate" function
>>> (The sum of all bits).
>>>
>>> I have developed this code, but I am not really satisfied with
>>> it, specifically
>>>
>>> o portability
>>> o compatibility with C89
>>
>>
>> I don't have much to say about the algorithm at the time, but I do
>> have some minor comments.
>>
>>> I assume that an unsigned long is at least 32 bits wide.
>>
>>
>> There's no need to state this assumption; unsigned long is guaranteed
>> to be at least 32 bits wide.
>>
>
> This is in C99, I do not know if C89 made this assumption.

C89 int is at least 16 bits.
C89 long is at least 32 bits.

--
pete

Keith Thompson

unread,

Nov 23, 2009, 11:27:38 AM11/23/09

to

jacob navia <ja...@nospam.org> writes:
> Keith Thompson a écrit :
>> jacob navia <ja...@nospam.org> writes:
[...]

>>> I assume that an unsigned long is at least 32 bits wide.
>>
>> There's no need to state this assumption; unsigned long is guaranteed
>> to be at least 32 bits wide.
>
> This is in C99, I do not know if C89 made this assumption.

It did.

Keith Thompson

unread,

Nov 23, 2009, 11:43:26 AM11/23/09

to

Keith Thompson <ks...@mib.org> writes:
> jacob navia <ja...@nospam.org> writes:
>> Keith Thompson a écrit :
>>> jacob navia <ja...@nospam.org> writes:
> [...]
>>>> I assume that an unsigned long is at least 32 bits wide.
>>>
>>> There's no need to state this assumption; unsigned long is guaranteed
>>> to be at least 32 bits wide.
>>
>> This is in C99, I do not know if C89 made this assumption.
>
> It did.

And it's not an assumption, it's a requirement. If an implementation
has unsigned long narrower than 32 bits, that implementation does not
conform either to C89 or to C99.

Michael Foukarakis

unread,

Nov 24, 2009, 7:52:16 AM11/24/09

to

On Nov 22, 11:01 pm, jacob navia <ja...@nospam.org> wrote:
> Hi
>
> The "population count" in a bit string should be the number of
> bits set (1) in it. It is identical to the "accumulate" function
> (The sum of all bits).
>
> I have developed this code, but I am not really satisfied with
> it, specifically
>
> o portability
> o compatibility with C89
>
> I assume that an unsigned long is at least 32 bits wide.
>
> Thanks in advance for suggestions/comments.
>
> static unsigned bitcount(uintmax_t x)
> {
> if (64 == sizeof(x)*CHAR_BIT) {
> x -= (x>>1) & 0x5555555555555555UL; // 0-2 in 2 bits
> x = ((x>>2) & 0x3333333333333333UL) + (x & 0x3333333333333333UL); // 0-4 in 4 bits
> x = ((x>>4) + x) & 0x0f0f0f0f0f0f0f0fUL; // 0-8 in 8 bits
> x *= 0x0101010101010101UL;
> return x>>56;
> }
> else { // Assume that 32 bit unsigned long exists.
> x -= (x>>1) & 0x55555555UL; // 0-2 in 2 bits
> x = ((x>>2) & 0x33333333UL) + (x & 0x33333333UL); // 0-4 in 4 bits
> x = ((x>>4) + x) & 0x0f0f0f0fUL; // 0-8 in 8 bits
> x *= 0x01010101UL;
> return x>>24;
> }
>
> }

The only minor quibble with that code is that if the bit string is a
representation of a signed quantity, maintaining the sign could be a
mistake - depending on the semantics of the situation.

Other than that, it seems quite OK. C89 required unsigned long
integers to be at least 32-bits wide, so you're in the clear there.
Portability though is a different issue, you're likely to encounter
weird issues there.