I am currently working on a circuit which has to perform Hamming
distance computation between large bit vectors (>500 bits).
I was surprised not to find much information on how to implement this
type of operation *efficiently* on FPGA technology.
So far I have been investigating two approaches (combining tables and
counter for the bitcouting part). I observed that the choice of table
size (3 or 4 address bits) had a significant impact (20%) on the area
cost of the operator.
I feel that there are many subtle trade-offs in such implementations,
and I was wondering if anybody had been looking at this problem (most of
the articles I stumbled accross dealt with the correcting code issue,
rather than focusing on the Hamming distance realization in itself).
Thanks in advances for the input.
Regards,
Steven
Most Modern FPGAs have block RAM that can be used for look-ups. Use
these as larger LUTs to get partial counts, and then sum up the
partials. If you can afford a multiplied clock, you can time multiplex
segments of your input vector into a smaller number of RAMs followed by
accumulators.
Thank you, Ray.
Let me just add that a dual-ported RAM can then be used as two
separate look-up tables, by addressing the two ports independently,
using common data. This is obvious once you think about it, and it
cuts the number of BRAMs in half...
Peter Alfke
There was a lengthy discussion about bitcounting in the FPGA forum on
mikrocontroller.net:
http://www.mikrocontroller.net/topic/34231#new
http://www.mikrocontroller.net/topic/34250#new
You find the background in theoretical text books on parallel
computing.
Prefix sum is a fundamental sub problem and a lot has been written
about
the implementation of sums and prefix sums on various architecturs
including
networks of operators.
The results translate directly to FPGAs. (Counting bits reduces to
summing
numbers, albeit small numbers).
Kolja Sulimma
That's indeed what I did, except that BRAM are somewhat overkill for the
purpose.
So far, the best approach I came up with was to store in a LUT the
bitcount of a 3 bit wide vector (after xoring the 2 operand for
Hamming), and use an adder tree ot obtain the Hamming distance value.
Using larger tables (for vectors of 4 and more bits) ends up in a
signifcant area penalty (~20%).
Steven
Ray Andraka a écrit :
For the English-only readers:
There is nothing of value in that German discussion, just repetitive
trivialities.
You don't miss anything if you cannot read it.
Peter Alfke
> So far, the best approach I came up with was to store in a LUT the
> bitcount of a 3 bit wide vector (after xoring the 2 operand for
> Hamming), and use an adder tree ot obtain the Hamming distance value.
You mean two LUTs for the 1's and 2's bit of the sum?
That is the first level of a carry save adder tree.
For a true carry save adder tree you add the 1's to form
a second set of 1's and 2's, add the 2's to form a
set of 2's and 4's. Repeat as needed.
-- glen