8051 and CRC-CCITT

Jon Buller

unread,

Mar 11, 1996, 3:00:00 AM3/11/96

to

steve.m...@ifrsys.com (Steve Meirowsky) wrote:
>The fastest code is uses a look up table to help. I'm at home and don't
>have the code with me. Send me Email.

Well yes, a table is generally the fastest way to do it, but do you have
the space for a table? It's been a long time since I coded CRC routines,
but a fast version would need (as I remember) a 64Kx16 table which is the
ENTIRE memory of the 8051. I can do it with no table, 40 bytes of code,
and 28 cycles. I'd be curious to see if (and how) you can do it faster,
and still fit in the 64K program space. I'll post my code if people are
interested, otherwise it's by email request at jo...@metronet.com

Jon Buller <bul...@nortel.com> home: <jo...@metronet.com>
Include quotes, graphics, and disclaimers as necessary or desired.

Will Colley

unread,

Mar 12, 1996, 3:00:00 AM3/12/96

to

> Well yes, a table is generally the fastest way to do it, but do you have
> the space for a table? It's been a long time since I coded CRC routines,
> but a fast version would need (as I remember) a 64Kx16 table which is the
> ENTIRE memory of the 8051. I can do it with no table, 40 bytes of code,
> and 28 cycles. I'd be curious to see if (and how) you can do it faster,
> and still fit in the 64K program space. I'll post my code if people are
> interested, otherwise it's by email request at jo...@metronet.com

XRL A, CRCL ;This bit of code updates the CRC-16 accumula-
MOV CRCL, A ; tion in registers CRCH:CRCL using the data
MOV DPTR, #CRCTBL ; byte in the A register.
MOVC A, @A + DPTR
XRL A, CRCH ;The nice table-driven algorithm comes out of
XCH A, CRCL ; Jack Crenshaw's article _Implementing_CRCs_
INC DPH ; in the January 1992 issue of _Embedded_
MOVC A, @A + DPTR ; _Systems_Programming_ magazine.
MOV CRCH, A
;Burns registers A, DPTR, CRCH, CRCL. Runs in
; 12 machine cycles. Costs 8 more if you have
; to PUSH/POP DPH and DPL, though. No extra
; cost for using internal data RAM for CRCH
; CRCL.

;
; This 512-byte table is used to implement a CCITT CRC-16 error detection code.
;
CRCTBL DB 000H, 0C1H, 081H, 040H, 001H, 0C0H, 080H, 041H
DB 001H, 0C0H, 080H, 041H, 000H, 0C1H, 081H, 040H
DB 001H, 0C0H, 080H, 041H, 000H, 0C1H, 081H, 040H
DB 000H, 0C1H, 081H, 040H, 001H, 0C0H, 080H, 041H
DB 001H, 0C0H, 080H, 041H, 000H, 0C1H, 081H, 040H
DB 000H, 0C1H, 081H, 040H, 001H, 0C0H, 080H, 041H
DB 000H, 0C1H, 081H, 040H, 001H, 0C0H, 080H, 041H
DB 001H, 0C0H, 080H, 041H, 000H, 0C1H, 081H, 040H
DB 001H, 0C0H, 080H, 041H, 000H, 0C1H, 081H, 040H
DB 000H, 0C1H, 081H, 040H, 001H, 0C0H, 080H, 041H
DB 000H, 0C1H, 081H, 040H, 001H, 0C0H, 080H, 041H
DB 001H, 0C0H, 080H, 041H, 000H, 0C1H, 081H, 040H
DB 000H, 0C1H, 081H, 040H, 001H, 0C0H, 080H, 041H
DB 001H, 0C0H, 080H, 041H, 000H, 0C1H, 081H, 040H
DB 001H, 0C0H, 080H, 041H, 000H, 0C1H, 081H, 040H
DB 000H, 0C1H, 081H, 040H, 001H, 0C0H, 080H, 041H
DB 001H, 0C0H, 080H, 041H, 000H, 0C1H, 081H, 040H
DB 000H, 0C1H, 081H, 040H, 001H, 0C0H, 080H, 041H
DB 000H, 0C1H, 081H, 040H, 001H, 0C0H, 080H, 041H
DB 001H, 0C0H, 080H, 041H, 000H, 0C1H, 081H, 040H
DB 000H, 0C1H, 081H, 040H, 001H, 0C0H, 080H, 041H
DB 001H, 0C0H, 080H, 041H, 000H, 0C1H, 081H, 040H
DB 001H, 0C0H, 080H, 041H, 000H, 0C1H, 081H, 040H
DB 000H, 0C1H, 081H, 040H, 001H, 0C0H, 080H, 041H
DB 000H, 0C1H, 081H, 040H, 001H, 0C0H, 080H, 041H
DB 001H, 0C0H, 080H, 041H, 000H, 0C1H, 081H, 040H
DB 001H, 0C0H, 080H, 041H, 000H, 0C1H, 081H, 040H
DB 000H, 0C1H, 081H, 040H, 001H, 0C0H, 080H, 041H
DB 001H, 0C0H, 080H, 041H, 000H, 0C1H, 081H, 040H
DB 000H, 0C1H, 081H, 040H, 001H, 0C0H, 080H, 041H
DB 000H, 0C1H, 081H, 040H, 001H, 0C0H, 080H, 041H
DB 001H, 0C0H, 080H, 041H, 000H, 0C1H, 081H, 040H

DB 000H, 0C0H, 0C1H, 001H, 0C3H, 003H, 002H, 0C2H
DB 0C6H, 006H, 007H, 0C7H, 005H, 0C5H, 0C4H, 004H
DB 0CCH, 00CH, 00DH, 0CDH, 00FH, 0CFH, 0CEH, 00EH
DB 00AH, 0CAH, 0CBH, 00BH, 0C9H, 009H, 008H, 0C8H
DB 0D8H, 018H, 019H, 0D9H, 01BH, 0DBH, 0DAH, 01AH
DB 01EH, 0DEH, 0DFH, 01FH, 0DDH, 01DH, 01CH, 0DCH
DB 014H, 0D4H, 0D5H, 015H, 0D7H, 017H, 016H, 0D6H
DB 0D2H, 012H, 013H, 0D3H, 011H, 0D1H, 0D0H, 010H
DB 0F0H, 030H, 031H, 0F1H, 033H, 0F3H, 0F2H, 032H
DB 036H, 0F6H, 0F7H, 037H, 0F5H, 035H, 034H, 0F4H
DB 03CH, 0FCH, 0FDH, 03DH, 0FFH, 03FH, 03EH, 0FEH
DB 0FAH, 03AH, 03BH, 0FBH, 039H, 0F9H, 0F8H, 038H
DB 028H, 0E8H, 0E9H, 029H, 0EBH, 02BH, 02AH, 0EAH
DB 0EEH, 02EH, 02FH, 0EFH, 02DH, 0EDH, 0ECH, 02CH
DB 0E4H, 024H, 025H, 0E5H, 027H, 0E7H, 0E6H, 026H
DB 022H, 0E2H, 0E3H, 023H, 0E1H, 021H, 020H, 0E0H
DB 0A0H, 060H, 061H, 0A1H, 063H, 0A3H, 0A2H, 062H
DB 066H, 0A6H, 0A7H, 067H, 0A5H, 065H, 064H, 0A4H
DB 06CH, 0ACH, 0ADH, 06DH, 0AFH, 06FH, 06EH, 0AEH
DB 0AAH, 06AH, 06BH, 0ABH, 069H, 0A9H, 0A8H, 068H
DB 078H, 0B8H, 0B9H, 079H, 0BBH, 07BH, 07AH, 0BAH
DB 0BEH, 07EH, 07FH, 0BFH, 07DH, 0BDH, 0BCH, 07CH
DB 0B4H, 074H, 075H, 0B5H, 077H, 0B7H, 0B6H, 076H
DB 072H, 0B2H, 0B3H, 073H, 0B1H, 071H, 070H, 0B0H
DB 050H, 090H, 091H, 051H, 093H, 053H, 052H, 092H
DB 096H, 056H, 057H, 097H, 055H, 095H, 094H, 054H
DB 09CH, 05CH, 05DH, 09DH, 05FH, 09FH, 09EH, 05EH
DB 05AH, 09AH, 09BH, 05BH, 099H, 059H, 058H, 098H
DB 088H, 048H, 049H, 089H, 04BH, 08BH, 08AH, 04AH
DB 04EH, 08EH, 08FH, 04FH, 08DH, 04DH, 04CH, 08CH
DB 044H, 084H, 085H, 045H, 087H, 047H, 046H, 086H
DB 082H, 042H, 043H, 083H, 041H, 081H, 080H, 040H

Works for me! Check out Crenshaw's article. It even made a hardware
body like me understand and be able to use CRCs very easily, indeed.

Will

eug...@bitsoft.msk.su

unread,

Mar 13, 1996, 3:00:00 AM3/13/96

to

Jon Buller <bul...@bnr.ca> wrote:

> ... I can do it with no table, 40 bytes of code,
> and 28 cycles. ...

My code seems to be faster. Of course, if it works ;) (I have no time to test it)
I hope it calculates x^16+x^12+x^5+1.

push ACC
xrl A,lo
mov lo,A
swap A
anl A,#0f0h
xrl A,lo
mov lo,A
swap A
anl A,#0fh
xrl A,hi
xch A,lo
mov hi,A
swap A
rr A
xrl hi,A
anl A,#0f8h
xrl lo,A
xrl hi,A
pop ACC
ret

Eugene (eug...@bitsoft.msk.su)

Dwight Elvey

unread,

Mar 14, 1996, 3:00:00 AM3/14/96

to

Hi
If one is trying to save space and still
use a table look up, one can use a nibble lookup
table instead of the byte wise one. The 8051
has efficient nibble swaps so this should
be easy. To make it a little faster
one can use two tables, one for the hi input
nibble and one for the low input nibble.
This would require 2Bytes * 2^4 * 2 = 64Bytes
of tables. The extra code to do the second
lookup would still beat any code for the
simplified algorithm. It of course takes
more space.
I should make a note here as to how to
make the tables because from what I read,
many don't understand it. The tables are just
the equivilent XOR pattern that would be caused
if a specific input was given. All the CRC
algorithms have the effect of:
CRCSum = ( F(x) XOR CRCSum ) Rolled-By n
Where, F(x) is a value that is related to
the input x and the polynomial used and
n is the number of bits in x. This can also
be re-written as:
CRCSum = F'(x) XOR ( CRCSum Rolled-By n )
In each case F'(x) and F(x) are functions of
the input only and can be done with a lookup
table for any given polynomial. One can create
the XOR patterns by simply taking the 256 different
inputs, in the case of the 8 bit at a time, and
feeding them through the algorithm, with the
seed value of zero. All one has to do to determine
what value to XOR into the running sum for
a particular byte is to use the 8 bit value
as an index into the 16 bit table, fetch the
value and XOR it to the running sum and swap
the hi 8 bits with the low 8 bits of the sum
( in the case of a 16 bit crc sum ).
One can of course swap the low and hi before
doing the XOR if the table was generated to
reflect this. One can apply the same basic technic
to any sized input group. This is why I've
suggested doing two nibbles. It seems the
best compromise of speed and memory size.
( table = 64 bytes vs. 512 bytes )
Now back to the original questions. I haven't
totally figured out how the clever code by Jon Buller
works, but it looks like it takes advantage
of the fact that the polynomial x^16+x^15+x^2+1
has no terms in the range of x^7 to x^14. The
polynomial x^16+x^12+x^5+1 has the term x^12
that looks like it would destroy much chance
of making such a simple algorithm. When I figure
out Jon Buller's algorithm, I'll know better if
what I suspect is true and if a simple code
sequence as his can be applied to the polynomial
x^16+x^12+x^5+1. If so I'll publish it here for
all to use.
Dwight

Joel Estes

unread,

Mar 14, 1996, 3:00:00 AM3/14/96

to

Jon Buller <bul...@bnr.ca> wrote:
>steve.m...@ifrsys.com (Steve Meirowsky) wrote:
>>The fastest code is uses a look up table to help. I'm at home and don't
>>have the code with me. Send me Email.
>

It's been a long time since I coded CRC routines,
>but a fast version would need (as I remember) a 64Kx16 table which is the
>ENTIRE memory of the 8051.

I believe that the table necessary for the CRC is 256 bytes, plus
some for the code. (Reference CRC article in Summer 1987 issue of
C Users Journal).

Regardz(tm),

Joel Estes

Ulf Soderberg

unread,

Mar 14, 1996, 3:00:00 AM3/14/96

to

Joel Estes wrote:
> I believe that the table necessary for the CRC is 256 bytes, plus
> some for the code. (Reference CRC article in Summer 1987 issue of
> C Users Journal).

If I don't remember wrong the table is 256 16-bit words, which would be 512 bytes.

--
Ulf Soderberg System Innovation Development AB
ul...@sysinno.se http://www.sysinno.se/

Jon Buller

unread,

Mar 15, 1996, 3:00:00 AM3/15/96

to

Dwight Elvey wrote:

[stuff about using tables to do CRCs 4 bits at a time deleted]

> Now back to the original questions. I haven't
> totally figured out how the clever code by Jon Buller
> works, but it looks like it takes advantage
> of the fact that the polynomial x^16+x^15+x^2+1
> has no terms in the range of x^7 to x^14. The
> polynomial x^16+x^12+x^5+1 has the term x^12
> that looks like it would destroy much chance
> of making such a simple algorithm. When I figure
> out Jon Buller's algorithm, I'll know better if
> what I suspect is true and if a simple code
> sequence as his can be applied to the polynomial
> x^16+x^12+x^5+1. If so I'll publish it here for
> all to use.

Well, it's not really that hard... What I did (after
someone pointed out the trick to me) was to run 8 bits
of the specific CRC in question symbolically, instead of
on real data. I labeled the low CRC bits as A,B,C...H,
the high bits as P,Q...X, and the incoming data as 0,1...7

I used a Turbo-Pascal program to spit out the table, since I
had that available at the time. it can be done with some
spreadsheet macros as well, because I did it that way once too.
You can do it by hand, but it gets real tedious, and it's semi
prone to making small mistakes, which will blow the thing out
of the water before you get your toes wet.

After 8 bits of shifting and XORing, you get the following
code. (for CRC-CCITT) Of course, once you have the table of
which bits you need to XOR together, you need to figure out
a way to collect them into the same bytes, in the proper
order, and fast helps too 8-) And, yes, it is a bit worse
than the CRC-16 code (Which I included below as well, for
completeness mostly, but I discuss it a tiny bit after the
CCITT code...)

It really is relatively automatic, except for the part about
going over your code several times to squeeze out every last
cycle. Having a friend doing the same thing really helps,
because Jeff Owens (in this case) would take my code and
squeeze about 2 bytes and 1 cycle out of my code, then I came
back and squeezed 1 byte and 3 cycles out of his. Do this for
an hour or two every day for a week or two, and you get this...
(BTW, in this particular contest I won, but not by much, and I
had to beat Jeff about 3 or 4 different times, most of which
were from him beating my previous try.)

To figure out the code, take the table, and compare groups of
bits added into the result with small blocks of instructions.
In general the code shifts and rolls some data a bit, then XORs
it, then does it again to get another block of the table into
the CRC result.

;------------------------------------------------------------------------
;
; A DS5000 subroutine to generate CCITT-CRC data
; By Jon Buller Feb 27, 1991
;
; CRC Input: Data: CRC Result:
; A 0 P A0 E4
; B 1 Q B1 F5
; C 2 R C2 G6
; D 3 S D3 H7 A0
; E 4 T B1
; F 5 U C2
; G 6 W D3
; H 7 X A0 E4
; P A0 B1 F5
; Q B1 C2 G6
; R C2 D3 H7
; S D3
; T E4 A0
; U F5 B1
; W G6 C2
; X H7 D3
;
; All bits listed in the result should be xor'ed together. Bit '0' and
bit
; 'A' are the least significant bits of incoming data. The CRC input
data
; will be changed into the CRC result by this routine. If there is some
; reason that the previous CRC data must be saved, it must be done by
the
; calling routine. (The same thing goes for the 'psw' register.)
;
; This routine uses 28 instruction cycles per call.
;
;------------------------------------------------------------------------
lo equ 40h
hi equ 41h

ccitt:
push acc ; save some state information for caller
xrl a, lo ; data bits always appear with these crc
bits
mov lo, hi ; start accumulating crc information
mov hi, a

anl a, #0fh
swap a ; add this nibble to the crc hi byte
xrl a, hi
mov hi, a ; this is used everywhere

swap a
anl a, #0fh ; take the hi nibble just made
xrl lo, a ; and add it to the low byte

mov a, hi
rr a ; now shift everything 3 bits
swap a
push acc ; and save it for later
anl a, #0f8h ; take these 5 bits
xrl lo, a ; and finish of the lo byte with them

pop acc ; get the shifted byte back
anl a, #07h ; take 3 of the bits this time
xrl hi, a ; and finish the hi byte
pop acc ; restore this for the caller
ret ; and return the results

end

Now for all the people who didn't see the original CRC16 code with
comments here it is, in hopes that the comments will help a bit, more
than they will hinder. (I'd put even money on that at this point, so
if they confuse you, strip them out.) The main trick to this one is
to recognize that the real work needed is a parity calculation, and
that is readily available on the 8051. This does make it simpler
than the CCITT code, as you can see from the table, in fact, if I
hadn't done this one first, I may not have had any try at CCITT,
thinking that it was way too complicated to be worth the bother.

;------------------------------------------------------------------------
;
; A DS5000 subroutine to generate CRC16 data
;
; CRC Input: Data: CRC Result:
; A 0 P ABCDEFGH01234567
; B 1 Q
; C 2 R
; D 3 S
; E 4 T
; F 5 U
; G 6 W A0
; H 7 X A0 B1
; P B1 C2
; Q C2 D3
; R D3 E4
; S 4F E5
; T F5 G6
; U G6 H7
; W H7 ABCDEFGH01234567
; X ABCDEFGH01234567
;
; All bits listed in the result should be xor'ed together. Bit '0' and
bit
; 'A' are the least significant bits of incoming data. The CRC input
data
; will be changed into the CRC result by this routine. If there is some
; reason that the previous CRC data must be saved, it must be done by
the
; calling routine. (The same thing goes for the 'psw' register.)
;
; Notice that the sequence ABCDEFGH01234567 is the parity of the data
and
; the low byte of the crc input. The code that follows tries to put
large
; pieces of the CRC into place, and then clean up the bits that need
fixing.
;
; This routine will execute in 23.5 cycles on average (22 best case, 25
worst)
; The 23.5 cycle time average is based on all 3 conditional branches
being taken
; half of the time. CRC's usually produce a uniform distribution of
outputs
; for any change of the input, so it should be a fair to guess that the
3
; conditional branches will, in fact, be taken half the time, although
this
; has not been verified experimentally.
;
; The variable 'hi' does not need to be bit addressable, and if memory
is so
; tight that 1 byte in the bit addressable space is not desirable and/or
; available, the three instructions that cpl a bit in 'lo' could be
changed
; to xrl instructions, at the cost of 3 bytes of code and 1.5 cycles on
; average.
;
;------------------------------------------------------------------------
lo equ 20h ; low byte of crc
hi equ 21h ; high byte of crc

crc16:
push acc ; save this in case the caller needs it

xrl a, lo
mov lo, hi ; put the high byte of the crc in its
dest.
mov hi, a ; save data xor low(crc) for later
mov c, p
jnc crc0
cpl lo.0 ; add the parity to crc bit 0
crc0:
rrc a ; get the low bit in c
jnc crc1
cpl lo.6 ; need to fix bit 6 of the result
crc1:
mov c, acc.7
xrl a, hi ; compute the results for bits P...U
rrc a ; shift them into place
mov hi, a ; and save them
jnc crc2
cpl lo.7 ; now clean up bit 7
crc2:
pop acc ; restore everything and return
ret

end

Hope you all enjoy, hope it helps too...
Jon Buller
.include lawyer.txt
.include quote.txt

Tim Hogard

unread,

Mar 15, 1996, 3:00:00 AM3/15/96

to

Ulf Soderberg (ul...@sysinno.se) wrote:
: Joel Estes wrote:
: > I believe that the table necessary for the CRC is 256 bytes, plus
: > some for the code. (Reference CRC article in Summer 1987 issue of
: > C Users Journal).

: If I don't remember wrong the table is 256 16-bit words, which would be 512 bytes.

I use two tables that are each 256 bytes. The one table could be reduced
to a bit of code if needed

I have a program to generate the CRC tables for different polynomials as well.

-tim
http://www.inmind.com/~thogard GPS, VW and Usenet topics.

Andre Roos

unread,

Mar 15, 1996, 3:00:00 AM3/15/96

to

steve.m...@ifrsys.com (Steve Meirowsky) wrote:

>The fastest code is uses a look up table to help. I'm at home and don't
>have the code with me. Send me Email.

You are right. I used a 16 bit lookup table for CCITT-CRC (256 values
by 16bit). What I did was to split the 16 bits into two seperate
tables of 256 by 8bits each. That is storing the low order bytes
together and then the high order bytes in their table immediatly
following the low-order-byte-table. The code following was my
sollution using these tables:

;****************************************************************
; CRC accum update
; Initialize:
;crc_accum = 0
;crc_accum+01H = 0
;
; Enter: ACC: char to calc CRC for
; Leave: Nothing (crc_accumulator updated)
; USED: DPTR, ACC
;******************************************************************
CRCupdate:
XRL A,crc_accum ;(1) Calc offset in crctable
MOV DPTR,#crctable ;(2)
MOV Tmp_Store,A ;(1) Using this var to save time
MOVC A,@A+DPTR ;(2) Get CRC msb
XRL A,crc_accum+01H ;(1) accum lsb
MOV crc_accum,A ;(1) Save msb of accum
INC DPH ;(1) To lsb table (NEXT PAGE)
MOV A,Tmp_Store ;(1) Crctable offset
MOVC A,@A+DPTR ;(2) Get CRC lsb
MOV crc_accum+01H,A ;(1) Save lsb of accum
;(13) TOTAL MACHINE CYCLES

As you can see it takes only 13 cycles for each byte to update the
accumulator and 512 bytes in code for the lookup table.(plus the
routine) This fast sollutiuon worked for me in an interrupt driven
serial routine.

Any comments welcome.

Andre Roos
roo...@telkom.co.za