Help with parallel CRC's

stephen henry

unread,

Sep 26, 2002, 4:33:58 PM9/26/02

to

Hi all,

I was hoping one of you out there could possibly help me. I'm
currently looking into a Serial ATA interface system as part of a
final year university project. I've been looking all day for
information regarding on how parallel CRC's are implemented. I pretty
much understand how normal serial CRC's operate -but i just dont
understand where the vast arrays of xors come from when doing a
parallel implementation, (very little resemblance to the original CRC
polynominal).

I search google but the only things i've been able to find relating to
it were things called G and H Matrix -i dont know what these are.

Could somebody please help me? or at least point me to some
information that would let me get my head around it.

Cheers,

Stephen Henry

Mike Treseler

unread,

Sep 26, 2002, 5:30:48 PM9/26/02

to

stephen henry wrote:

> I was hoping one of you out there could possibly help me. I'm
> currently looking into a Serial ATA interface system as part of a
> final year university project. I've been looking all day for
> information regarding on how parallel CRC's are implemented. I pretty
> much understand how normal serial CRC's operate -but i just dont
> understand where the vast arrays of xors come from when doing a
> parallel implementation, (very little resemblance to the original CRC
> polynominal).

It's real simple. The bit shifter is just a shift
and an inversion of the poly bit locations if the
input or msb, but not both, are '1'.

For the parallel version you just invoke the serial
version for each bit in a for loop. See below.

Let leo or synplify worry about the vast array of xors.

-- Mike Treseler

function crc_shift ---------------------------------------------------------
-- Mike Treseler
-- serial data version, see overload for parallel data below
-- Purpose : Single bit shift for a CRC register using any polynomial.
-- Inputs : X_load : Current CRC vector.
-- D_bit : Data to shift into CRC vector.
-- Poly : CRC polynomial. Default is Frame Relay.
--
-- Outputs : CRC vector after CRC shift.
(
constant X_load : in unsigned; -- register start value
constant D_bit : in std_ulogic := '0'; -- input bit
constant Poly : in unsigned := Poly_16_12_5 -- poly bits
)
return unsigned is
variable X_out : unsigned(X_load'range); -- CRC register
begin ----------------------------------------------------------------------
-- we assume that X and Poly are in downto format
-- to match the textbook definition of LSFR
-- and to match the CCITT FCS bit assigments
-- for frame relay, note that X(15) becomes the lsb of octet n-2
-- and that X(7 ) becomes the lsb of octet n-1
----------------------------------------------------------------------
-- Procedure: Left shift a '0' into the current X0
-- and the previous X(14) into the current X15 etc.
-- if the original X15 is '1' or the data is '1'
-- but not both, then invert the poly bit locations
----------------------------------------------------------------------
-- Sample Invocation:
-- crc_shift( "0001000100010001", '1'));
-- SLL 0010001000100010 -- shift the variable
-- D (not X15) [ ! ! !] -- invert poly locations?
-- expect("shift1 0011001000000011"); -- expected result
----------------------------------------------------------------------
assert X_load'length = Poly'length
report "crc_shift: Vectors X_load and Poly must be of equal length."
severity error;
X_out := X_load sll 1;
if (X_load(X_load'left) xor D_bit) = '1' then
X_out := X_out xor Poly;
end if;
return unsigned(X_out); -- returns each shift
end function crc_shift;

-----------------------------------------------

function crc_shift
-- Mike Treseler
-- parallel data version
(constant X_load : in unsigned;
constant D_vec : in unsigned ;
constant Poly : in unsigned := Poly_16_12_5)
return unsigned is
variable X_out : unsigned(X_load'range);
begin
X_out := X_load;
for I in D_vec'range loop -- call serial version for each bit
X_out := crc_shift(X_out, D_vec(I), Poly);
end loop;
return X_out;
end function crc_shift;

Clyde R. Shappee

unread,

Sep 26, 2002, 9:22:38 PM9/26/02

to

You might also have a look at this web site....

I have found it useful.

Clyde

http://www.easics.com/webtools/crctool

Allan Herriman

unread,

Sep 26, 2002, 10:30:49 PM9/26/02

to

On 26 Sep 2002 13:33:58 -0700, steh...@yahoo.com (stephen henry)
wrote:

You might also like to look at this previous discussion on the issue:

http://groups.google.com/groups?threadm=c2088d4a.0111290138.10577818%40posting.google.com

Regards,
Allan.

Ralf Hildebrandt

unread,

Sep 27, 2002, 2:51:45 AM9/27/02

to

Hi "stephen henry"!

> I search google but the only things i've been able to find relating to
> it were things called G and H Matrix -i dont know what these are.

These matrices are used in "coding theory" and describe the
generator(G)- and the proof(H)-matrix of a "linear block code".

Ralf

Renaud Pacalet

unread,

Sep 27, 2002, 2:58:44 AM9/27/02

to

stephen henry wrote:

Just go there and search CRC:
http://www.cypress.com/support/

Regards,
--
Renaud Pacalet, ENST / COMELEC, 46 rue Barrault 75634 Paris Cedex 13
Tel. : 01 45 81 78 08 | Fax : 01 45 80 40 36 | Mel : pac...@enst.fr
###### Fight Spam! Join EuroCAUCE: http://www.euro.cauce.org/ ######

Alan Coppola

unread,

Sep 27, 2002, 10:00:03 AM9/27/02

to

Stephen,

Check out my website:

http://www.nwlink.com/~ajjc

There is a public domain tool, pdcodes-0.01, that handles
not only generating parametrized CRC's of any pipeline size, but also
cross verifying
software, math, and hardware models.

alan

Damien Hubaux

unread,

Oct 1, 2002, 6:01:55 AM10/1/02

to

Hello,

As a starting point, you can have a look at the intuitive introduction to
parallel CRC at the folowing address

http://www.repairfaq.org/filipg/LINK/F_crc_v3.html

(or see: http://www.ross.net/crc)

regards

--
dh

Allan Herriman

unread,

Oct 1, 2002, 10:35:36 PM10/1/02

to

On Tue, 01 Oct 2002 12:01:55 +0200, Damien Hubaux
<damien...@cetic.be> wrote:

>
>Hello,
>
>As a starting point, you can have a look at the intuitive introduction to
>parallel CRC at the folowing address
>
>http://www.repairfaq.org/filipg/LINK/F_crc_v3.html
>
>(or see: http://www.ross.net/crc)

That paper describes a software based approach that doesn't map
particularly well to hardware.

Regards,
Allan.

Petter Gustad

unread,

Oct 2, 2002, 3:00:07 AM10/2/02

to

steh...@yahoo.com (stephen henry) writes:

The serial form of CRC is defined using xors (no carry add operator)
of polynomial arithmetic. See Donald E. Knuth "The Art of Computer
Programming" vol. 2 for a good background on polynomial arithmetic.

Then you can take the serial form of the CRC expression and do the
shifts and xors *symbolically*, e.g. you shift symbols (d0,d1 etc.)
into the expression which you can substitute for data later. This is
one way of getting the parallel form. This can of course be done using
a computer program. I've written a Common Lisp program to do this.
It's currently using the X16 + X12 + X5 + 1 CCITT CRC-16 polynomal but
could modified for other polynomials.

;; crc tools by Petter Gustad
;; (format nil "~&~A@~A.com" 'petter 'gustad)

;; print crc expression
(defun print-crc (crc)
(dotimes (i 16 crc)
(format t "~&c~D=~A" i (aref crc i))))

;; make a symbol, either cn (crc) or dn (data)
(defun make-crcsym (c n)
(intern (concatenate 'string c (princ-to-string n))))

;; optimize a xor expression, a^a=0 and b^0=b, i.e. remove all even
;; number of duplicated, or remove all duplicates and insert if odd

(defun count-crcsym (c l)
(apply #'+ (mapcar #'(lambda (x) (if (eq c x) 1 0)) l)))

(defun optimize-xor (e)
(cond ((null e) nil)
((= 1 (length e)) e)
(t (let* ((f (first e))
(r (rest e))
(i (count-crcsym f r)))
(cond ((zerop i) (cons f (optimize-xor r)))
((oddp i) (optimize-xor (remove f r)))
(t (cons f (optimize-xor (remove f r)))))))))

;; the X16 + X12 + X5 + 1 CCITT CRC-16 polynomal on serial form,
;; iterate to do parallel version

(defun make-crc (width)
(let ((crc (make-array 16))
(c0))
(dotimes (i 16 nil)
(setf (aref crc i) (list (make-crcsym "c" i))))
(dotimes (i width crc)
(setq c0 (append (aref crc 0) (list (make-crcsym "d" i))))
(dotimes (j 15 t)
(setf (aref crc j) (aref crc (1+ j)))
(if (or (= j 3) (= j 10))
(setf (aref crc j) (append (aref crc j) c0))))
(setf (aref crc 15) c0))
(dotimes (i 16 crc)
(setf (aref crc i) (optimize-xor (aref crc i))))))

To make a 16-bit wide data input you do:

(print-crc (make-crc 16))

c0=(c5 d5 c12 c8 c4 d4 d8 d12)
c1=(c6 d6 c13 c9 c5 d5 d9 d13)
c2=(c7 d7 c14 c10 c6 d6 d10 d14)
c3=(c8 c0 d0 d8 c15 c11 c7 d7 d11 d15)
c4=(c4 c0 d0 d4 c9 c5 c1 d1 d5 d9)
c5=(c5 c1 d1 d5 c10 c6 c2 d2 d6 d10)
c6=(c6 c2 d2 d6 c11 c0 d0 c7 c3 d3 d7 d11)
c7=(c7 c3 d3 d7 c12 c1 d1 c8 c4 c0 d0 d4 d8 d12)
c8=(c8 c4 c0 d0 d4 d8 c13 c2 d2 c9 c5 c1 d1 d5 d9 d13)
c9=(c9 c5 c1 d1 d5 d9 c14 c3 d3 c10 c6 c2 d2 d6 d10 d14)
c10=(c10 c6 c2 d2 d6 d10 c15 c4 d4 c11 c7 c3 d3 d7 d11 d15)
c11=(c11 c0 d0 c7 c3 d3 d7 d11)
c12=(c12 c1 d1 c8 c4 c0 d0 d4 d8 d12)
c13=(c13 c2 d2 c9 c5 c1 d1 d5 d9 d13)
c14=(c14 c3 d3 c10 c6 c2 d2 d6 d10 d14)
c15=(c15 c4 d4 c11 c7 c3 d3 d7 d11 d15)

There is a xor between at each space in the list. cx is the previous
CRC value, dx is the data input.

Petter

--
________________________________________________________________________
Petter Gustad 8'h2B | ~8'h2B http://www.gustad.com/petter

Damien Hubaux

unread,

Oct 2, 2002, 8:14:34 AM10/2/02

to

Allan Herriman wrote:

> On Tue, 01 Oct 2002 12:01:55 +0200, Damien Hubaux wrote:
>
>>Hello,
>>
>>As a starting point, you can have a look at the intuitive introduction to
>>parallel CRC at the folowing address
>>
>>http://www.repairfaq.org/filipg/LINK/F_crc_v3.html
>

> That paper describes a software based approach that doesn't map
> particularly well to hardware.

Sure, it is not the way you will implement it in hadware.

I was focusing on the first question of Stephen:

> I pretty
> much understand how normal serial CRC's operate -but i just dont
> understand where the vast arrays of xors come from when doing a
> parallel implementation, (very little resemblance to the original CRC
> polynominal)

In the previous document, chapter 10 comes with an intuitive answer abut that.
But where software deals with a pre-computed table, hardware comes to the same
result using an array of XOR.

Further, starting from normal serial CRC's, with a Linear Feeback Shift
Register; it can be seen as a matrix transformation (this example use the 8 bits
CRC used in ADSL).

If you have R0 ..R7, the registers, and you shift in D0, then the new register
value will be: (R0 = R7 xor D0, etc):

|R0| |0 0 0 0 0 0 0 1| |R0| |0 |
|R1| |1 0 0 0 0 0 0 0| |R1| |0 |
|R2| |0 1 0 0 0 0 0 1| |R2| |0 |
|R3| =|0 0 1 0 0 0 0 1|.|R3| XOR |0 |
|R4| |0 0 0 1 0 0 0 1| |R4| |0 |
|R5| |0 0 0 0 1 0 0 0| |R5| |0 |
|R6| |0 0 0 0 0 1 0 0| |R6| |0 |
|R7| |0 0 0 0 0 0 1 0| |R7| |D0|

Where the diagonal of 1s represent the shift and the last column the polynomial.
In order to make 8 bits parallel computation, you will multiply the matrix 8
times by itself and shift in 8 data bits:

|R0| |1 0 0 0 1 0 1 0| |R0| |D0|
|R1| |0 1 0 0 0 0 1 1| |R1| |D1|
|R2| |1 0 1 0 1 0 0 1| |R2| |D2|
|R3| =|1 1 0 1 1 0 0 0|.|R3| XOR |D3|
|R4| |1 1 1 0 0 0 1 0| |R4| |D4|
|R5| |0 1 1 1 0 0 0 1| |R5| |D5|
|R6| |0 0 1 1 1 0 0 0| |R6| |D6|
|R7| |0 0 0 1 1 1 0 0| |R7| |D7|

And you get something like this. The array of XORs 'implement' that matrix. Of
course the real implementation comes with optimisation,...

If possible have a look at the following papers at the library of your university:
- R. Hobson, K. Cheung, A high performance CMOS 32-bits parallel CRC engine,
IEEE journal of solid state circuits, vol 34, 02/1999, pp.233-235
- L Seong, B. Dewar, Parallel realisation of the ATM cell header CRC, Computer
Communication, Elsevier, vol 19, 1996, pp.257-263.