Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Parallel version of popcount

22 views
Skip to first unread message

Paul

unread,
Nov 10, 2018, 6:54:50 AM11/10/18
to
Googling for fast ways of doing popcount, I read about this for doing it "in parallel".
It doesn't seem parallelized to me. There are no threads or anything.
I would have thought a parallel approach would involve different threads tackling
different bits?
Thanks for your advice about C++ parallel versions of popcount.

Kind Regards,
Paul

unsigned int countBits(unsigned int x)
{
// count bits of each 2-bit chunk
x = x - ((x >> 1) & 0x55555555);
// count bits of each 4-bit chunk
x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
// count bits of each 8-bit chunk
x = x + (x >> 4);
// mask out junk
x &= 0xF0F0F0F;
// add all four 8-bit chunks
return (x * 0x01010101) >> 24;
}

Alf P. Steinbach

unread,
Nov 10, 2018, 10:59:39 AM11/10/18
to
This code is well commented about the parallelism. E.g. "count bits of
each 2-bit chunk", that's about doing all the 2-bit chunks in parallel.
Up to the width of the architecture's `int`, which you can infer from
the code is 32.

However, how it works is a different matter: there's no comment about
that. Studying the hand-optimized code resulting from some idea is like
reverse-engineering machine code, or (slight exaggeration) trying to
deduce the shape of a human of a given age from a DNA sequencing.
Instead I would look for a description of the basic idea, and how that
idea was expressed in code.

And instead of reinventing this particular wheel, why not use one of the
umpteen existing solutions? In C++ you have `std::bitset::count`,
portably. And in both C and C++ there are somewhat less portable C level
compiler intrinsics, e.g. as listed at <url:
https://en.wikichip.org/wiki/population_count>.


Cheers & hth.,

- Alf

Melzzzzz

unread,
Nov 10, 2018, 11:03:51 AM11/10/18
to
On 2018-11-10, Paul <peps...@gmail.com> wrote:
> Googling for fast ways of doing popcount, I read about this for doing it "in parallel".

Depending on architecture, you can do it pretty fast, but popcount
native instruction on x86 is as fast as avx2 version with pshufb.

> It doesn't seem parallelized to me. There are no threads or anything.
> I would have thought a parallel approach would involve different threads tackling
> different bits?

This means parallel as multiple chunks in same register in one op. Like
SIMD ;)
> Thanks for your advice about C++ parallel versions of popcount.
>
> Kind Regards,
> Paul
>
> unsigned int countBits(unsigned int x)
> {
> // count bits of each 2-bit chunk
> x = x - ((x >> 1) & 0x55555555);
> // count bits of each 4-bit chunk
> x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
> // count bits of each 8-bit chunk
> x = x + (x >> 4);
> // mask out junk
> x &= 0xF0F0F0F;
> // add all four 8-bit chunks
> return (x * 0x01010101) >> 24;
> }


--
press any key to continue or any other to quit...
0 new messages