Optimized bitcounting on FPGA

Steven Derrien

unread,

Oct 4, 2007, 6:38:03 AM10/4/07

to Florent BERTHELOT

Hi,

I am currently working on a circuit which has to perform Hamming
distance computation between large bit vectors (>500 bits).

I was surprised not to find much information on how to implement this
type of operation *efficiently* on FPGA technology.

So far I have been investigating two approaches (combining tables and
counter for the bitcouting part). I observed that the choice of table
size (3 or 4 address bits) had a significant impact (20%) on the area
cost of the operator.

I feel that there are many subtle trade-offs in such implementations,
and I was wondering if anybody had been looking at this problem (most of
the articles I stumbled accross dealt with the correcting code issue,
rather than focusing on the Hamming distance realization in itself).

Thanks in advances for the input.

Regards,

Steven

Ray Andraka

unread,

Oct 4, 2007, 4:06:02 PM10/4/07

to

Steven Derrien wrote:

Most Modern FPGAs have block RAM that can be used for look-ups. Use
these as larger LUTs to get partial counts, and then sum up the
partials. If you can afford a multiplied clock, you can time multiplex
segments of your input vector into a smaller number of RAMs followed by
accumulators.

Peter Alfke

unread,

Oct 4, 2007, 4:18:00 PM10/4/07

to

> Most Modern FPGAs have block RAM that can be used for look-ups. Use
> these as larger LUTs to get partial counts, and then sum up the
> partials. If you can afford a multiplied clock, you can time multiplex
> segments of your input vector into a smaller number of RAMs followed by
> accumulators.

Thank you, Ray.
Let me just add that a dual-ported RAM can then be used as two
separate look-up tables, by addressing the two ports independently,
using common data. This is obvious once you think about it, and it
cuts the number of BRAMs in half...
Peter Alfke

Andreas Schwarz

unread,

Oct 4, 2007, 5:58:31 PM10/4/07

to

On 4 Okt., 12:38, Steven Derrien <sderrienREM...@irisa.fr> wrote:
> I am currently working on a circuit which has to perform Hamming
> distance computation between large bit vectors (>500 bits).
>
> I was surprised not to find much information on how to implement this
> type of operation *efficiently* on FPGA technology.
>
> So far I have been investigating two approaches (combining tables and
> counter for the bitcouting part). I observed that the choice of table
> size (3 or 4 address bits) had a significant impact (20%) on the area
> cost of the operator.

There was a lengthy discussion about bitcounting in the FPGA forum on
mikrocontroller.net:
http://www.mikrocontroller.net/topic/34231#new
http://www.mikrocontroller.net/topic/34250#new

comp.arch.fpga

unread,

Oct 5, 2007, 3:03:33 AM10/5/07

to

On 4 Okt., 23:58, Andreas Schwarz <use...@andreas-s.net> wrote:
> On 4 Okt., 12:38, Steven Derrien <sderrienREM...@irisa.fr> wrote:
>
> > I am currently working on a circuit which has to perform Hamming
> > distance computation between large bit vectors (>500 bits).
>
> > I was surprised not to find much information on how to implement this
> > type of operation *efficiently* on FPGA technology.

You find the background in theoretical text books on parallel
computing.
Prefix sum is a fundamental sub problem and a lot has been written
about
the implementation of sums and prefix sums on various architecturs
including
networks of operators.
The results translate directly to FPGAs. (Counting bits reduces to
summing
numbers, albeit small numbers).

Kolja Sulimma

Steven Derrien

unread,

Oct 5, 2007, 4:17:01 AM10/5/07

to

Hi Ray,

That's indeed what I did, except that BRAM are somewhat overkill for the
purpose.

So far, the best approach I came up with was to store in a LUT the
bitcount of a 3 bit wide vector (after xoring the 2 operand for
Hamming), and use an adder tree ot obtain the Hamming distance value.

Using larger tables (for vectors of 4 and more bits) ends up in a
signifcant area penalty (~20%).

Steven

Ray Andraka a écrit :

Peter Alfke

unread,

Oct 5, 2007, 12:26:20 PM10/5/07

to

On Oct 4, 2:58 pm, Andreas Schwarz <use...@andreas-s.net> wrote:
>>
> There was a lengthy discussion about bitcounting in the FPGA forum on
> mikrocontroller.net:http://www.mikrocontroller.net/topic/34231#newhttp://www.mikrocontroller.net/topic/34250#new

For the English-only readers:
There is nothing of value in that German discussion, just repetitive
trivialities.
You don't miss anything if you cannot read it.
Peter Alfke

glen herrmannsfeldt

unread,

Oct 6, 2007, 6:01:26 AM10/6/07

to

Steven Derrien wrote:

> So far, the best approach I came up with was to store in a LUT the
> bitcount of a 3 bit wide vector (after xoring the 2 operand for
> Hamming), and use an adder tree ot obtain the Hamming distance value.

You mean two LUTs for the 1's and 2's bit of the sum?
That is the first level of a carry save adder tree.

For a true carry save adder tree you add the 1's to form
a second set of 1's and 2's, add the 2's to form a
set of 2's and 4's. Repeat as needed.

-- glen