nr of bits set to "1" in a byte

aku

unread,

Dec 19, 2003, 4:24:49 PM12/19/03

to

I'm looking for the absolute fastest way
to count the nr of bits that are set to "1"
in a string. Presumably I then first need the
fastest way to do this in a byte.

I think this is it, but welcome any improvements:

i = 0;
if (g && 1) i++;
if (g && 2) i++;
if (g && 3) i++;
if (g && 4) i++;
if (g && 5) i++;
if (g && 6) i++;
if (g && 7) i++;
if (g && 8) i++;

Ben Pfaff

unread,

Dec 19, 2003, 4:29:24 PM12/19/03

to

aku <a...@europe.com> writes:

Did you try it? Not only is that probably not the fastest way,
it doesn't work.
--
Just another C hacker.

Bertrand Mollinier Toublet

unread,

Dec 19, 2003, 4:57:35 PM12/19/03

to

Besides Ben's accurate remark that your code will not do what you
expect, table lookup is generally considered the "absolute fastest way"
to do something. What you gain in speed, though, you lose in data size,
which is a classical tradeof. At the scale of a byte, this is still a
manageable table:

int bitsPerByte[] =
{
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
/* etc. I am getting lazy */
}

For any given unsigned char g, you get the number of bits set to 1 in g
with bitsPerByte[g];

--
Bertrand Mollinier Toublet
- Le top-posting.
- Quelle est la pratique la plus chiante sur Usenet ?

Alex

unread,

Dec 19, 2003, 5:04:22 PM12/19/03

to

Did you actually try this? It is not even close.

Even if you corrected the obvious mistake and changed
logical AND to bitwise AND (&), it is still incorrect.

Below is a 4 bit unsigned integer and the the values
that it would have if the corresponding bit was set.

[ 0 ][ 0 ][ 0 ][ 0 ]
8 4 2 1

Now, try again, preferably by using CHAR_BIT and a loop.
When you get it right, feel free to optimize.

Alex

Stephen Mayes

unread,

Dec 19, 2003, 5:22:51 PM12/19/03

to

"Alex" <alex...@hotmail.com> wrote in message
news:brvsl5$8202l$1...@ID-190529.news.uni-berlin.de...

> Now, try again, preferably by using CHAR_BIT and a loop.
> When you get it right, feel free to optimize.
>
> Alex

Can I try, please?

I've been checking this group now for two weeks and have been very much
impressed with how much I don't know about this language which I thought I
knew.
I would appreciate any critique of this code. It might not be fast, but is
it correct and will it work?

#include <stdio.h>
#include <stdlib.h> // is this required for below code ??
#include <limits.h>

int count_ones( char c )
{
int i = 0;
int count = 0;
unsigned char t = c;
unsigned char bit = 0x01; // what if CHAR_BIT != 8 ??

for( i = 0; i < CHAR_BIT; i++, bit <<= 1 ) {
if ( t & bit )
count++;
}
return count;
}

int main( void )
{
char c = (char)0; // whatever
int count = 0;

count = count_ones( c );
printf( "There are %d bits set to 1.\n", count );
return 0;
}

Richard Heathfield

unread,

Dec 19, 2003, 5:36:39 PM12/19/03

to

aku wrote:

> I'm looking for the absolute fastest way
> to count the nr of bits that are set to "1"
> in a string. Presumably I then first need the
> fastest way to do this in a byte.
>
> I think this is it, but welcome any improvements:
>
> i = 0;
> if (g && 1) i++;

You mean &

> if (g && 2) i++;
> if (g && 3) i++;

You mean g & 4

> if (g && 4) i++;

You mean g & 8

> if (g && 5) i++;

You mean g & 16

etc

A lookup table would probably be faster, though.

--
Richard Heathfield : bin...@eton.powernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton

Alex

unread,

Dec 19, 2003, 5:51:12 PM12/19/03

to

Stephen Mayes <ste...@themayeshouse.us> wrote:

> "Alex" <alex...@hotmail.com> wrote in message
> news:brvsl5$8202l$1...@ID-190529.news.uni-berlin.de...

>> Now, try again, preferably by using CHAR_BIT and a loop.
>> When you get it right, feel free to optimize.

> I would appreciate any critique of this code. It might not be fast, but is

> it correct and will it work?

Looks pretty good to me. My comments below are minor.

> #include <stdio.h>
> #include <stdlib.h> // is this required for below code ??

No, you don't need it. Also, unless you have a C99 compiler,
the double-slash comments are syntax errors. When posting in
news groups, they may wrap incorrectly, hence try not to use
them here.

> #include <limits.h>

> int count_ones( char c )

Why not have this function take an unsigned char in the
first place, then lose the 't'.

> {
> int i = 0;
> int count = 0;
> unsigned char t = c;
> unsigned char bit = 0x01; // what if CHAR_BIT != 8 ??

It's not a problem here. You are effectively doing

unsigned char bit = 1;

> for( i = 0; i < CHAR_BIT; i++, bit <<= 1 ) {
> if ( t & bit )
> count++;

As far as I can see, there is nothing wrong with this. However,
stylistically I'd prefer to see the bit shift done inside the
body of the loop.

> }
> return count;
> }

> int main( void )
> {
> char c = (char)0; // whatever

You don't need the cast. 0 is also not the most exciting
number to test this with :-)

> int count = 0;

> count = count_ones( c );
> printf( "There are %d bits set to 1.\n", count );
> return 0;
> }

Alex

Andrey Tarasevich

unread,

Dec 19, 2003, 7:38:38 PM12/19/03

to

For a 8-bit byte the look-up table is one way to go. Another solution
would be to do something along the lines of the following

g = (g & 0x55u) + ((g >> 1) & 0x55u);
g = (g & 0x33u) + ((g >> 2) & 0x33u);
g = (g & 0x0fu) + ((g >> 4) & 0x0fu);
/* now g contains the total number of 1 bits in the
original value of g */

--
Best regards,
Andrey Tarasevich

Sidney Cadot

unread,

Dec 19, 2003, 9:27:15 PM12/19/03

to

That's not correct, as others pointed out. What you're writing here is
equivalent to:

i = g ? 8 : 0;

(...how I yearn for a triple '&&&' operator...)

The lookup table, as suggested by others, is as fast as it gets on many
architectures, though it will bloat your code a bit if written
explicitly (and might lead to unexpected performance loss under some
circumstances if the LUT thrashes the processor cache, although this
should be rare).

One obvious solution that wasn't mentioned so far:

unsigned bits(const unsigned n)
{
return n ? n&1 + bits(n>>1) : 0;
}

Something like this would be useable for initializing your lookup table
at program start.

Best regards,

Sidney

Jack Klein

unread,

Dec 20, 2003, 12:05:44 AM12/20/03

to

On Fri, 19 Dec 2003 13:57:35 -0800, Bertrand Mollinier Toublet
<bertrand.mollin...@enst-bretagne.fr> wrote in
comp.lang.c:

> aku wrote:
> > I'm looking for the absolute fastest way
> > to count the nr of bits that are set to "1"
> > in a string. Presumably I then first need the
> > fastest way to do this in a byte.
> >
> > I think this is it, but welcome any improvements:
> >
> > i = 0;
> > if (g && 1) i++;
> > if (g && 2) i++;
> > if (g && 3) i++;
> > if (g && 4) i++;
> > if (g && 5) i++;
> > if (g && 6) i++;
> > if (g && 7) i++;
> > if (g && 8) i++;
> >
> Besides Ben's accurate remark that your code will not do what you
> expect, table lookup is generally considered the "absolute fastest way"
> to do something. What you gain in speed, though, you lose in data size,
> which is a classical tradeof. At the scale of a byte, this is still a
> manageable table:

You mean at the scale of a byte in the implementations you are
familiar with. On some platforms the table would need to have 65,536
entries. On some platforms the table would need to have 16,777,216
entries. On some platforms the table would need to have 4,294,967,296
entries. I don't know of any platform off-hand where an even larger
table would be needed, but I wouldn't be too surprised if there were
some.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq

Richard Heathfield

unread,

Dec 20, 2003, 2:01:35 AM12/20/03

to

Stephen Mayes wrote:

<snip>

> I would appreciate any critique of this code. It might not be fast, but
> is it correct and will it work?
>
> #include <stdio.h>
> #include <stdlib.h> // is this required for below code ??

No, I don't see any need for it. By the way, unless you have a C99 compiler,
the above line contains a syntax error (BCPL comments were not introduced
into C until C99 arrived).

> #include <limits.h>
>
> int count_ones( char c )
> {
> int i = 0;
> int count = 0;
> unsigned char t = c;
> unsigned char bit = 0x01; // what if CHAR_BIT != 8 ??

This is fine (apart from the "comment" again), but unsigned char bit = 1;
works just as well. :-)

>
> for( i = 0; i < CHAR_BIT; i++, bit <<= 1 ) {
> if ( t & bit )
> count++;
> }
> return count;

I find this unnecessarily complicated (although it should work okay). The
following loop is simpler, and thus easier to maintain:

for(i = 0; i < CHAR_BIT; i++)
{
if(t & bit)
{
count++;
}
bit <<= 1;
}

Keep things simple.

> }
>
> int main( void )
> {
> char c = (char)0; // whatever

No need for the cast. char c = 0; is perfectly legal code and does exactly
what you want.

Yes, as far as I can see, the code should work fine.

Robert Stankowic

unread,

Dec 20, 2003, 2:45:17 AM12/20/03

to

"Jack Klein" <jack...@spamcop.net> schrieb im Newsbeitrag
news:mvl7uvon42n50lftm...@4ax.com...

> On Fri, 19 Dec 2003 13:57:35 -0800, Bertrand Mollinier Toublet
> <bertrand.mollin...@enst-bretagne.fr> wrote in
> comp.lang.c:
>
> > aku wrote:

[....]

> > Besides Ben's accurate remark that your code will not do what you
> > expect, table lookup is generally considered the "absolute fastest way"
> > to do something. What you gain in speed, though, you lose in data size,
> > which is a classical tradeof. At the scale of a byte, this is still a
> > manageable table:
>
> You mean at the scale of a byte in the implementations you are
> familiar with. On some platforms the table would need to have 65,536
> entries. On some platforms the table would need to have 16,777,216
> entries. On some platforms the table would need to have 4,294,967,296
> entries. I don't know of any platform off-hand where an even larger
> table would be needed, but I wouldn't be too surprised if there were
> some.
>

What about

#include <stdlib.h>
#include <stdio.h>

int main(void)
{
int tbl[] = {0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4};
int count = 0;
/*"faked" example for CHAR_BIT == 32
(int is 32 bit in my implementation)*/
unsigned int to_test = 0x22098157;
printf("%d", to_test);

/*would be implementation defined for signed negative*/
for(; to_test; to_test >>= 4)
{
count += tbl[to_test &0x0f];
}
printf(" has %d bits set\n", count);
return EXIT_SUCCESS;
}

cheers
Robert

Paul Hsieh

unread,

Dec 20, 2003, 4:29:50 AM12/20/03

to

Andrey Tarasevich <andreyta...@hotmail.com> wrote:
> aku wrote:
> > I'm looking for the absolute fastest way
> > to count the nr of bits that are set to "1"
> > in a string. Presumably I then first need the
> > fastest way to do this in a byte.
>

> For a 8-bit byte the look-up table is one way to go. Another solution
> would be to do something along the lines of the following
>
> g = (g & 0x55u) + ((g >> 1) & 0x55u);
> g = (g & 0x33u) + ((g >> 2) & 0x33u);
> g = (g & 0x0fu) + ((g >> 4) & 0x0fu);
> /* now g contains the total number of 1 bits in the
> original value of g */

Oooooo! Someone who actually knows something. I take it you were
*NOT* part of the C standard language committee. In any event you are
missing the finisher:

/* For 32 bits */
g *= 0x001010101;
return (g >> 24) & 0x3f;

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

pete

unread,

Dec 20, 2003, 4:56:04 AM12/20/03

to

Sidney Cadot wrote:

> One obvious solution that wasn't mentioned so far:
>
> unsigned bits(const unsigned n)
> {
> return n ? n&1 + bits(n>>1) : 0;
> }
>

unsigned bit_count(unsigned n)
{
unsigned count;

for (count = 0; n; n &= n - 1) {
++count;
}
return count;
}

--
pete

Jarno A Wuolijoki

unread,

Dec 20, 2003, 7:22:47 AM12/20/03

to

On Sat, 20 Dec 2003, Sidney Cadot wrote:

B> One obvious solution that wasn't mentioned so far:

>
> unsigned bits(const unsigned n)
> {
> return n ? n&1 + bits(n>>1) : 0;
> }

I'm not a fan of excessive parenthesizing either, but I would still add
a couple of them there..

Sidney Cadot

unread,

Dec 20, 2003, 8:01:33 AM12/20/03

to

pete wrote:

> unsigned bit_count(unsigned n)
> {
> unsigned count;
>
> for (count = 0; n; n &= n - 1) {
> ++count;
> }
> return count;
> }

This doesn't work. If you want to get rid of the recursion, this will do:

unsigned bit_count(unsigned n)
{
unsigned count;

for (count = 0; n; n>>=1) {
count+=n&1;
}
return count;
}

Best regards,

Sidney

pete

unread,

Dec 20, 2003, 9:10:59 AM12/20/03

to

Sidney Cadot wrote:
>
> pete wrote:
>
> > unsigned bit_count(unsigned n)
> > {
> > unsigned count;
> >
> > for (count = 0; n; n &= n - 1) {
> > ++count;
> > }
> > return count;
> > }
>
> This doesn't work.

You're wrong.

> If you want to get rid of the recursion, this will do:
>
> unsigned bit_count(unsigned n)
> {
> unsigned count;
>
> for (count = 0; n; n>>=1) {
> count+=n&1;
> }
> return count;
> }

Try harder.

/* BEGIN new.c */

#include <stdio.h>

#define LIMIT 256

unsigned bit_count(unsigned n)
{
unsigned count;

for (count = 0; n; n &= n - 1) {
++count;
}
return count;
}

unsigned bit_count2(unsigned n)
{
unsigned count;

for (count = 0; n; n>>=1) {
count+=n&1;
}
return count;
}

int main(void)
{
unsigned number;

for (number = 0; LIMIT != number; ++number) {
if (bit_count(number) != bit_count2(number)) {
break;
}
}
if (number == LIMIT) {
puts("Try harder.");
}
return 0;
}

/* END new.c */

nrk

unread,

Dec 20, 2003, 9:27:34 AM12/20/03

to

Sidney Cadot wrote:

> pete wrote:
>
>> unsigned bit_count(unsigned n)
>> {
>> unsigned count;
>>
>> for (count = 0; n; n &= n - 1) {
>> ++count;
>> }
>> return count;
>> }
>
> This doesn't work. If you want to get rid of the recursion, this will do:

Wrong! It definitely works. This is the well-known "sparse-ones" log2
algorithm (the sparse-zeroes simply negates the input and counts the 0's
instead). Its running time is proportional to the number of set bits in
the input. Credited originally to DMR, I think. I've been told by a
knowledgeable ex-colleague that on machines with an FPU, frexp is faster
than any bit counting methods to obtain log2. Don't know if that's true or
not.

Also, I find no hint of recursion in the above code.

-nrk.

Sidney Cadot

unread,

Dec 20, 2003, 11:54:23 AM12/20/03

to

pete wrote:

> Sidney Cadot wrote:
>
>>pete wrote:
>>
>>
>>>unsigned bit_count(unsigned n)
>>>{
>>> unsigned count;
>>>
>>> for (count = 0; n; n &= n - 1) {
>>> ++count;
>>> }
>>> return count;
>>>}
>>
>>This doesn't work.
>
> You're wrong.

You're right. /my/ implementation was wrong because of some lacking
parentheses. Sorry.

Your solution is quite a clever thing; took me a couple of seconds to
see how it succeeds at knocking out a single bit with n even.

>>If you want to get rid of the recursion, this will do:
>>
>>unsigned bit_count(unsigned n)
>>{
>> unsigned count;
>>
>> for (count = 0; n; n>>=1) {
>> count+=n&1;
>> }
>> return count;
>>}
>
>
> Try harder.

How's that? I don't see anything wrong with it.

> if (number == LIMIT) {
> puts("Try harder.");
> }

I think you meant: number != LIMIT.

Best regards, Sidney

Sidney Cadot

unread,

Dec 20, 2003, 11:57:17 AM12/20/03

to

nrk wrote:

>>This doesn't work. If you want to get rid of the recursion, this will do:
>
>
> Wrong! It definitely works. This is the well-known "sparse-ones" log2
> algorithm (the sparse-zeroes simply negates the input and counts the 0's
> instead). Its running time is proportional to the number of set bits in
> the input. Credited originally to DMR, I think.

I botched it.... It's a clever little algorithm, this.

> I've been told by a
> knowledgeable ex-colleague that on machines with an FPU, frexp is faster
> than any bit counting methods to obtain log2. Don't know if that's true or
> not.
>
> Also, I find no hint of recursion in the above code.

I thought Pete was trying to get rid of the recursion in my algorithm.

Best regards, Sidney

Sidney Cadot

unread,

Dec 20, 2003, 11:59:17 AM12/20/03

to

Jarno A Wuolijoki wrote:

You're right, it should be

unsigned bits(const unsigned n)
{
return n ? (n&1) + bits(n>>1) : 0;
}

That should teach me to excrete untested code. Mea maxima culpa.

Best regards,

Sidney

Bertrand Mollinier Toublet

unread,

Dec 20, 2003, 12:00:07 PM12/20/03

to

That's right. My apologies for the confusion. I am spending much more
time these days programming in a language where bytes have exactly eight
bits than in C.

--
Bertrand Mollinier Toublet
"Breathtaking C++, Amazing C99, Fabulous C90."
-- Greg Comeau

Richard Heathfield

unread,

Dec 20, 2003, 1:31:48 PM12/20/03

to

Sidney Cadot wrote:

This English idiom, rather like "head over tails", is head over tails, (or
rather, it's tails over heads). The meaning you intend to convey (and yes,
you've said it correctly; this isn't an English lesson!) is that it ought
to teach you /not/ to publish (is that a fair substitute? <g>) untested
code.

In fact, though, it will do no such thing. At least, the lesson didn't work
for the rest of us, so I don't see why it should work for you! :-)

Chris Torek

unread,

Dec 20, 2003, 1:52:10 PM12/20/03

to

In article <vu6r514...@corp.supernews.com> aku <a...@europe.com> wrote:
>I'm looking for the absolute fastest way

>to count the nr of bits that are set to "1" ...

<http://groups.google.com/groups?selm=602%40shrike.AUSTIN.LOCKHEED.COM>

Someone reformatted at least some of the code in that, so here is
bct.c (nonportabilities and all).

#ifndef lint
static char rcsid[] = "$Id: bct.c,v 1.5 90/10/13 08:44:12 chris Exp $";
#endif

/*
* bct - bitcount timing
*
* Runs a bunch of different functions-to-count-bits-in-a-longword
* (many assume 32 bits per long) with a timer around each loop, and
* tries to calcuate the time used.
*/
#include <stdio.h>
#include <sys/types.h>

#ifdef FD_SETSIZE
# define USE_GETRUSAGE
# include <sys/time.h>
# include <sys/resource.h>
#else
# include <sys/times.h>
#endif

#define SIZEOF(a) (sizeof(a)/sizeof(a[0]))

/*
* This function is used to calibrate the timing mechanism.
* This way we can subtract the loop and call overheads.
*/
int
calibrate(n)
register unsigned long n;
{
return 0;
}

/*
* This function counts the number of bits in a long.
* It is limited to 63 bit longs, but a minor mod can cope with 511 bits.
*
* It is so magic, an explanation is required:
* Consider a 3 bit number as being
* 4a+2b+c
* if we shift it right 1 bit, we have
* 2a+b
* subtracting this from the original gives
* 2a+b+c
* if we shift the original 2 bits right we get
* a
* and so with another subtraction we have
* a+b+c
* which is the number of bits in the original number.
* Suitable masking allows the sums of the octal digits in a
* 32 bit number to appear in each octal digit. This isn't much help
* unless we can get all of them summed together.
* This can be done by modulo arithmetic (sum the digits in a number by molulo
* the base of the number minus one) the old "casting out nines" trick they
* taught in school before calculators were invented.
* Now, using mod 7 wont help us, because our number will very likely have
* more than 7 bits set. So add the octal digits together to get base64
* digits, and use modulo 63.
* (Those of you with 64 bit machines need to add 3 octal digits together to
* get base512 digits, and use mod 511.)
*
* This is HACKMEM 169, as used in X11 sources.
*/
int
t0_hackmemmod(n)
register unsigned long n;
{
register unsigned long tmp;

tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);
return ((tmp + (tmp >> 3)) & 030707070707) % 63;
}

/*
* This is the same as the above, but does not use the % operator.
* Most modern machines have clockless division, and so the modulo is as
* expensive as, say, an addition.
*/
int
t1_hackmemloop(n)
register unsigned long n;
{
register unsigned long tmp;

tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);
tmp = (tmp + (tmp >> 3)) & 030707070707;
while (tmp > 63)
tmp = (tmp & 63) + (tmp >> 6);
return tmp;
}

/*
* Above, without using while loop. It takes at most 5 iterations of the
* loop, so we just do all 5 in-line. The final result is never 63
* (this is assumed above as well).
*/
int
t2_hackmemunrolled(n)
register unsigned long n;
{
register unsigned long tmp;

tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);
tmp = (tmp + (tmp >> 3)) & 030707070707;
tmp = (tmp & 63) + (tmp >> 6);
tmp = (tmp & 63) + (tmp >> 6);
tmp = (tmp & 63) + (tmp >> 6);
tmp = (tmp & 63) + (tmp >> 6);
tmp = (tmp & 63) + (tmp >> 6);
return (tmp);
}

/*
* This function counts the bits in a long.
*
* It removes the lsb and counting the number of times round the loop.
* The expression (n & -n) yields the lsb of a number,
* but it only works on 2's compliment machines.
*/
int
t3_rmlsbsub(n)
register unsigned long n;
{
register int count;

for (count = 0; n; n -= (n & -n))
count++;
return count;
}

int
t4_rmlsbmask(n)
register unsigned long n;
{
register int count;

for (count = 0; n; count++)
n &= n - 1; /* take away lsb */
return (count);
}

/*
* This function counts the bits in a long.
*
* It works by shifting the number down and testing the bottom bit.
*/
int
t5_testlsb(n)
register unsigned long n;
{
register int count;

for (count = 0; n; n >>= 1)
if (n & 1)
count++;
return count;
}

/*
* This function counts the bits in a long.
*
* It works by shifting the number left and testing the top bit.
* On many machines shift is expensive, so it uses a cheap addition instead.
*/
int
t6_testmsb(n)
register unsigned long n;
{
register int count;

for (count = 0; n; n += n)
if (n & ~(~(unsigned long)0 >> 1))
count++;
return count;
}

int
t7_testsignandshift(n)
register unsigned long n;
{
register int count;

for (count = 0; n; n <<= 1)
if ((long)n < 0)
count++;
return (count);
}

/*
* This function counts the bits in a long.
*
* It works by masking each bit.
* This is the second most intuitively obvious method,
* and is independent of the number of bits in the long.
*/
int
t8_testeachbit(n)
register unsigned long n;
{
register int count;
register unsigned long mask;

count = 0;
for (mask = 1; mask; mask += mask)
if (n & mask)
count++;
return count;
}

/*
* This function counts the bits in a long.
*
* It works by masking each bit.
* This is the most intuitively obvious method,
* but how do you a priori know how many bits in the long?
* (except for ''sizeof(long) * CHAR_BITS'' expression)
*/
int
t9_testeachbit1shl(n)
register unsigned long n;
{
register int count;
register int bit;

count = 0;
for (bit = 0; bit < 32; ++bit)
if (n & ((unsigned long)1 << bit))
count++;
return count;
}

static char nbits[256] = {

0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,

3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,

1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,

3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,

2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,

3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
};

static int inbits[256] = {

0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,

3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,

1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,

3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,

2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,

3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
};

int
tA_tableshift(n)
register unsigned long n;
{

return (nbits[n & 0xff] + nbits[(n >> 8) & 0xff] +
nbits[(n >> 16) & 0xff] + nbits[n >> 24]);
}

int
tB_tableuchar(n)
unsigned long n;
{
register unsigned char *p = (unsigned char *)&n;

return (nbits[p[0]] + nbits[p[1]] + nbits[p[2]] + nbits[p[3]]);
}

int
tC_tableshiftcast(n)
register unsigned long n;
{

return nbits[(unsigned char) n] +
nbits[(unsigned char) (n >> 8)] +
nbits[(unsigned char) (n >> 16)] +
nbits[(unsigned char) (n >> 24)];
}

int
tD_itableshift(n)
register unsigned long n;
{

return (inbits[n & 0xff] + inbits[(n >> 8) & 0xff] +
inbits[(n >> 16) & 0xff] + inbits[n >> 24]);
}

int
tE_itableuchar(n)
unsigned long n;
{
register unsigned char *p = (unsigned char *)&n;

return (inbits[p[0]] + inbits[p[1]] + inbits[p[2]] + inbits[p[3]]);
}

int
tF_itableshiftcast(n)
register unsigned long n;
{

return inbits[(unsigned char) n] +
inbits[(unsigned char) (n >> 8)] +
inbits[(unsigned char) (n >> 16)] +
inbits[(unsigned char) (n >> 24)];
}

/*
* Explanation:
* First we add 32 1-bit fields to get 16 2-bit fields.
* Each 2-bit field is one of 00, 01, or 10 (binary).
* We then add all the two-bit fields to get 8 4-bit fields.
* These are all one of 0000, 0001, 0010, 0011, or 0100.
*
* Now we can do something different, becuase for the first
* time the value in each k-bit field (k now being 4) is small
* enough that adding two k-bit fields results in a value that
* still fits in the k-bit field. The result is four 4-bit
* fields containing one of {0000,0001,...,0111,1000} and four
* more 4-bit fields containing junk (sums that are uninteresting).
* Pictorially:
* n = 0aaa0bbb0ccc0ddd0eee0fff0ggg0hhh
* n>>4 = 00000aaa0bbb0ccc0ddd0eee0fff0ggg
* sum = 0aaaWWWWiiiiXXXXjjjjYYYYkkkkZZZZ
* where W, X, Y, and Z are the interesting sums (each at most 1000,
* or 8 decimal). Masking with 0x0f0f0f0f extracts these.
*
* Now we can change tactics yet again, because now we have:
* n = 0000WWWW0000XXXX0000YYYY0000ZZZZ
* n>>8 = 000000000000WWWW0000XXXX0000YYYY
* so sum = 0000WWWW000ppppp000qqqqq000rrrrr
* where p and r are the interesting sums (and each is at most
* 10000, or 16 decimal). The sum `q' is junk, like i, j, and
* k above; but it is not necessarry to discard it this time.
* One more fold, this time by sixteen bits, gives
* n = 0000WWWW000ppppp000qqqqq000rrrrr
* n>>16 = 00000000000000000000WWWW000ppppp
* so sum = 0000WWWW000ppppp000sssss00tttttt
* where s is at most 11000 and t is it most 100000 (32 decimal).
*
* Now we have t = r+p = (Z+Y)+(X+W) = ((h+g)+(f+e))+((d+c)+(b+a)),
* or in other words, t is the number of bits set in the original
* 32-bit longword. So all we have to do is return the low byte
* (or low 6 bits, but `low byte' is typically just as easy if not
* easier).
*
* This technique is also applicable to 64 and 128 bit words, but
* 256 bit or larger word sizes require at least one more masking
* step.
*/
int
tG_sumbits(n)
register unsigned long n;
{

n = (n & 0x55555555) + ((n >> 1) & 0x55555555);
n = (n & 0x33333333) + ((n >> 2) & 0x33333333);
n = (n + (n >> 4)) & 0x0f0f0f0f;
n += n >> 8;
n += n >> 16;
return (n & 0xff);
}

/*
* build a long random number.
* The standard rand() returns at least a 15 bit number.
* We use the top 9 of 15 (since the lower N bits of the usual rand()
* repeat with a period of 2^N).
*/
unsigned long
bigrand()
{
#define randbits() ((unsigned long)((rand() >> 6) & 0777))
register unsigned long r;

r = randbits() << 27;
r |= randbits() << 18;
r |= randbits() << 9;
r |= randbits();
return (r);
}

/*
* Run the test many times.
* You will need a running time in the 10s of seconds
* for an accurate result.
*
* To give a fair comparison, the random number generator
* is seeded the same for each measurement.
*
* Return value is seconds per iteration.
*/
#ifndef REPEAT
#if defined(mips) || defined(sparc)
#define REPEAT (1L<<20)
#else
#define REPEAT (1L<<18)
#endif
#endif

double
measure(func)
register int (*func)();
{
#ifdef USE_GETRUSAGE
struct rusage ru0, ru1;
register long j;

srand(1);
(void) getrusage(RUSAGE_SELF, &ru0);
for (j = 0; j < REPEAT; ++j)
func(bigrand());
(void) getrusage(RUSAGE_SELF, &ru1);
ru1.ru_utime.tv_sec -= ru0.ru_utime.tv_sec;
if ((ru1.ru_utime.tv_usec -= ru0.ru_utime.tv_usec) < 0) {
ru1.ru_utime.tv_usec += 1000000;
ru1.ru_utime.tv_sec--;
}
return ((ru1.ru_utime.tv_sec + (ru1.ru_utime.tv_usec / 1000000.0)) /
(double)REPEAT);
#else
register long j;
struct tms start;
struct tms finish;

srand(1);
times(&start);
for (j = 0; j < REPEAT; ++j)
func(bigrand());
times(&finish);
return ((finish.tms_utime - start.tms_utime) / (double)REPEAT);
#endif
}

struct table {
char *name;
int (*func)();
double measurement;
int trial;
};

struct table table[] =
{
{ "hackmemmod", t0_hackmemmod },
{ "hackmemloop", t1_hackmemloop },
{ "hackmemunrolled", t2_hackmemunrolled },
{ "rmlsbsub", t3_rmlsbsub },
{ "rmlsbmask", t4_rmlsbmask },
{ "testlsb", t5_testlsb },
{ "testmsb", t6_testmsb },
{ "testsignandshift", t7_testsignandshift },
{ "testeachbit", t8_testeachbit },
{ "testeachbit1shl", t9_testeachbit1shl },
{ "tableshift", tA_tableshift },
{ "tableuchar", tB_tableuchar },
{ "tableshiftcast", tC_tableshiftcast },
{ "itableshift", tD_itableshift },
{ "itableuchar", tE_itableuchar },
{ "itableshiftcast", tF_itableshiftcast },
{ "sumbits", tG_sumbits },
};

main(argc, argv)
int argc;
char **argv;
{
double calibration, v, best;
int j, k, m, verbose;

verbose = argc > 1;
/*
* run a few tests to make sure they all agree
*/
srand(getpid());
for (j = 0; j < 10; ++j) {
unsigned long n;
int bad;

n = bigrand();
for (k = 0; k < SIZEOF(table); ++k)
table[k].trial = table[k].func(n);
bad = 0;
for (k = 0; k < SIZEOF(table); ++k)
for (m = 0; m < SIZEOF(table); ++m)
if (table[k].trial != table[m].trial)
bad = 1;
if (bad) {
printf("wrong: %08lX", n);
for (k = 0; k < SIZEOF(table); ++k)
printf(" %3d", table[k].trial);
printf("\n");
}
}

/*
* calibrate the timing mechanism
*/
calibration = measure(calibrate);
if (verbose)
printf("calibration: %g\n", calibration);

/*
* time them all, keeping track of the best (smallest)
*/
for (j = 0; j < SIZEOF(table); ++j) {
v = measure(table[j].func) - calibration;
if (verbose)
printf("%s: %g\n", table[j].name, v);
table[j].measurement = v;
if (!j || v < best)
best = v;
}
(void) printf("%-24s %-14sratio\n", "function", "time");
for (j = 0; j < SIZEOF(table); ++j) {
(void) printf("%-24s %#10.8g %6.3f\n",
table[j].name,
table[j].measurement,
table[j].measurement / best);
}
exit(0);
}
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

pete

unread,

Dec 20, 2003, 6:46:08 PM12/20/03

to

Sidney Cadot wrote:
> pete wrote:

> > Sidney Cadot wrote:
> >>

> >>This doesn't work.
> >
> > You're wrong.
>
> You're right.

> > Try harder.
>
> How's that?

I tested the code and you didn't.
I get caught almost every time that I post untested code,
so I almost never do.

--
pete

Sidney Cadot

unread,

Dec 20, 2003, 9:47:29 PM12/20/03

to

pete wrote:

> Sidney Cadot wrote:
>
>>pete wrote:
>>
>>>Sidney Cadot wrote:
>>>
>
>>>>This doesn't work.
>>>
>>>You're wrong.
>>
>>You're right.
>
>
>>>Try harder.
>>
>>How's that?
>
>
> I tested the code and you didn't.

Guilty as charged. However, I hope you agree that this is correct:

unsigned bit_count(unsigned n)
{
unsigned count;

for (count = 0; n; n>>=1) {
count+=n&1;
}
return count;
}

You seemed to be implying it isn't, that's why I ask.

> I get caught almost every time that I post untested code,
> so I almost never do.

The type of silly mistakes I can make in three lines of code is
shattering. I think I should take this to heart.

Best regards,

Sidney

Robert Stankowic

unread,

Dec 20, 2003, 11:26:05 PM12/20/03

to

"Richard Heathfield" <dont...@address.co.uk.invalid> schrieb im Newsbeitrag
news:bs24ij$nhn$2...@hercules.btinternet.com...

> Sidney Cadot wrote:
>
> > Jarno A Wuolijoki wrote:
> >

[....]

> is that it ought
> to teach you /not/ to publish (is that a fair substitute? <g>) untested
> code.
>
> In fact, though, it will do no such thing. At least, the lesson didn't
work
> for the rest of us, so I don't see why it should work for you! :-)

Ha, I sometimes even manage to post _tested_ stupidities :)
cheers
Robert

Randy Howard

unread,

Dec 21, 2003, 1:10:53 AM12/21/03

to

If you want to see some very bizarre appearing, but surprisingly fast
implementations of this ("population count" might make a good search
term BTW), check out Hacker's Delight, Chapter 5 by Henry S. Warren, Jr.

--
Randy Howard _o
2reply remove FOOBAR \<,
______________________()/ ()______________________________________________
SCO Spam-magnet: postm...@sco.com

pete

unread,

Dec 21, 2003, 6:34:48 AM12/21/03

to

Sidney Cadot wrote:

> Guilty as charged. However, I hope you agree that this is correct:
>
> unsigned bit_count(unsigned n)
> {
> unsigned count;
>
> for (count = 0; n; n>>=1) {
> count+=n&1;
> }
> return count;
> }
>
> You seemed to be implying it isn't, that's why I ask.

It's very nice.

--
pete

Robert Stankowic

unread,

Dec 21, 2003, 1:34:13 PM12/21/03

to

"Randy Howard" <randy....@FOOmegapathdslBAR.net> schrieb im Newsbeitrag
news:MPG.1a4ef5647...@news.megapathdsl.net...

> If you want to see some very bizarre appearing, but surprisingly fast
> implementations of this ("population count" might make a good search
> term BTW), check out Hacker's Delight, Chapter 5 by Henry S. Warren, Jr.

Thanks, I know, my response was just addressing the "need for _huge_ lookup
tables"
regards
Robert

Andrey Tarasevich

unread,

Dec 22, 2003, 10:04:33 PM12/22/03

to

Paul Hsieh wrote:
> Andrey Tarasevich <andreyta...@hotmail.com> wrote:
>> aku wrote:
>> > I'm looking for the absolute fastest way
>> > to count the nr of bits that are set to "1"
>> > in a string. Presumably I then first need the
>> > fastest way to do this in a byte.
>>
>> For a 8-bit byte the look-up table is one way to go. Another solution
>> would be to do something along the lines of the following
>>
>> g = (g & 0x55u) + ((g >> 1) & 0x55u);
>> g = (g & 0x33u) + ((g >> 2) & 0x33u);
>> g = (g & 0x0fu) + ((g >> 4) & 0x0fu);
>> /* now g contains the total number of 1 bits in the
>> original value of g */
>
> Oooooo! Someone who actually knows something. I take it you were
> *NOT* part of the C standard language committee.

Huh? I don't exactly get what you mean to say.

> In any event you are missing the finisher:
>
> /* For 32 bits */
> g *= 0x001010101;
> return (g >> 24) & 0x3f;

> ...

For 32-bit data it can be "finished" in several different ways. But I
don't understand why you think I'm "missing" something - the original
question was about a "byte", not about a 32-bit word.

Paul Hsieh

unread,

Dec 22, 2003, 10:45:26 PM12/22/03

to

andreyta...@hotmail.com said:
> Paul Hsieh wrote:
> > Andrey Tarasevich <andreyta...@hotmail.com> wrote:
> >> aku wrote:
> >> > I'm looking for the absolute fastest way
> >> > to count the nr of bits that are set to "1"
> >> > in a string. Presumably I then first need the
> >> > fastest way to do this in a byte.
> >>
> >> For a 8-bit byte the look-up table is one way to go. Another solution
> >> would be to do something along the lines of the following
> >>
> >> g = (g & 0x55u) + ((g >> 1) & 0x55u);
> >> g = (g & 0x33u) + ((g >> 2) & 0x33u);
> >> g = (g & 0x0fu) + ((g >> 4) & 0x0fu);
> >> /* now g contains the total number of 1 bits in the
> >> original value of g */
> >
> > Oooooo! Someone who actually knows something. I take it you were
> > *NOT* part of the C standard language committee.
>
> Huh? I don't exactly get what you mean to say.

I'm just saying that people who understand real world solutions are not likely
to be the same people had anything to do with the C language standard.

> > In any event you are missing the finisher:
> >
> > /* For 32 bits */
> > g *= 0x001010101;
> > return (g >> 24) & 0x3f;
> > ...
>
> For 32-bit data it can be "finished" in several different ways. But I
> don't understand why you think I'm "missing" something - the original
> question was about a "byte", not about a 32-bit word.

No, in the original he was guessing that he should do it byte by byte. He
actually wanted to know the *absolute fastest* way to do it for a
"string". Given his imprecise requirements, I just latched onto the
*absolutely fastest* part, but noticed that you responded before I got the
chance to, and that it wouldn't have been accurate for me to post with my
typical "you're not going to get useful answers from people who post here ..."
thing.

Keith Thompson

unread,

Dec 22, 2003, 11:20:42 PM12/22/03

to

Andrey Tarasevich <andreyta...@hotmail.com> writes:
> Paul Hsieh wrote:
[...]

> > In any event you are missing the finisher:
> >
> > /* For 32 bits */
> > g *= 0x001010101;
> > return (g >> 24) & 0x3f;
> > ...
>
> For 32-bit data it can be "finished" in several different ways. But I
> don't understand why you think I'm "missing" something - the original
> question was about a "byte", not about a 32-bit word.

Are you assuming that a "byte" is 8 bits? It is on most systems, but
the C standard allows a byte to be any size >= 8 bits. 32-bit bytes
are not unknown (I think some DSPs have them).

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
(Note new e-mail address)

Andrey Tarasevich

unread,

Dec 23, 2003, 1:14:42 AM12/23/03

to

Keith Thompson wrote:
> [...]
>> > In any event you are missing the finisher:
>> >
>> > /* For 32 bits */
>> > g *= 0x001010101;
>> > return (g >> 24) & 0x3f;
>> > ...
>>
>> For 32-bit data it can be "finished" in several different ways. But I
>> don't understand why you think I'm "missing" something - the original
>> question was about a "byte", not about a 32-bit word.
>
> Are you assuming that a "byte" is 8 bits? It is on most systems, but
> the C standard allows a byte to be any size >= 8 bits. 32-bit bytes
> are not unknown (I think some DSPs have them).

> ...

No, I'm not assuming that any "byte" is 8 bits. I'm just assuming that
OP's "byte" is 8 bits since his original code (although compeletely
incorrect) seems to suggest that.

pete

unread,

Dec 23, 2003, 8:10:07 AM12/23/03

to

When somebody has a code problem with code for an 8 bit byte system,
they frequently get answers for 8 bit systems and also
portable answers.
In my opinion, the 8 bit answers are at best, barely on topic,
but so many people post them, and according to the charter,
this newsgroup is ruled by the mob.

--
pete

Richard Bos

unread,

Dec 23, 2003, 9:00:18 AM12/23/03

to

pete <pfi...@mindspring.com> wrote:

> In my opinion, the 8 bit answers are at best, barely on topic,
> but so many people post them, and according to the charter,
> this newsgroup is ruled by the mob.

One question: _what_ charter?

Richard

Mark McIntyre

unread,

Dec 23, 2003, 6:50:23 PM12/23/03

to

On Tue, 23 Dec 2003 03:45:26 GMT, in comp.lang.c , q...@pobox.com (Paul
Hsieh) wrote:

>andreyta...@hotmail.com said:
>>
>> Huh? I don't exactly get what you mean to say.
>
>I'm just saying that people who understand real world solutions are not likely
>to be the same people had anything to do with the C language standard.

This is a quite ludicrous statement, given the people who were on the
C standardisation committee.

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.angelfire.com/ms3/bchambless0/welcome_to_clc.html>

----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

Message has been deleted

Richard Heathfield

unread,

Dec 23, 2003, 8:38:53 PM12/23/03

to

Arthur J. O'Dwyer wrote:

> On Tue, 23 Dec 2003, pete wrote:
>>
>> In my opinion, the 8 bit answers are at best, barely on topic,
>> but so many people post them, and according to the charter,
>> this newsgroup is ruled by the mob.
>

> For values of "charter" equal to "mob," yes. ;-)
> [c.l.c has no charter, as I'm sure we all know. And yes, it
> is ruled by the mob.]

Possibly the most formal, stylish, up-market, self-disciplined, and
intelligent mob in history... but still a mob.

Joe Wright

unread,

Dec 23, 2003, 9:52:07 PM12/23/03

to

Richard Heathfield wrote:
>
> Arthur J. O'Dwyer wrote:
>
> > On Tue, 23 Dec 2003, pete wrote:
> >>
> >> In my opinion, the 8 bit answers are at best, barely on topic,
> >> but so many people post them, and according to the charter,
> >> this newsgroup is ruled by the mob.
> >
> > For values of "charter" equal to "mob," yes. ;-)
> > [c.l.c has no charter, as I'm sure we all know. And yes, it
> > is ruled by the mob.]
>
> Possibly the most formal, stylish, up-market, self-disciplined, and
> intelligent mob in history... but still a mob.
>

Heh yeah. I especially like the intelligent part. :-)
--
Joe Wright http://www.jw-wright.com
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---

Jirka Klaue

unread,

Dec 24, 2003, 6:29:43 AM12/24/03

to

Sidney Cadot wrote:
> pete wrote:

...

>>>> unsigned bit_count(unsigned n)
>>>> {
>>>> unsigned count;
>>>>
>>>> for (count = 0; n; n &= n - 1) {
>>>> ++count;
>>>> }
>>>> return count;
>>>> }

...

> Your solution is quite a clever thing;

It's not "his" solution. Try a web search for "kernighan bit count" or go
directly to http://graphics.stanford.edu/~seander/bithacks.html

Jirka