When is regular CRC32 better or worse than CRC32-C

Dex

unread,

Feb 24, 2015, 12:03:36 PM2/24/15

to

Ive been reading about CRC all day but im still a bit fuzzy about this one
thing so I ask here

In regards to CRC32, the polynomial (0x82608edb or 0x104c11db7) has been
commonly used (ZIP, IEEE 802.3 etc), whereas the 'newer' CRC32C Castagnoli
poly (0x8f6e37a0 or 0x11edc6f41) is the one implemented in the SSE4
instruction set, so seemingly Intel at least suggest that CRC32C is "better"
in some sense for long data?

The Hamming weights of each are here:
http://users.ece.cmu.edu/~koopman/crc/hw_data.html
CRC-32: *4294967263,91606,2974,268,171,91,34,21,12,10,10,10,10
CRC32C: *2147483615,*2147483615,5243,5243,177,177,47,47,20,20,8,8,6,6

From those numbers, can somebody tell me when the regular CRC poly is
"better" at error detection than CRC32C, and vice versa? for example, if a
file is 1gb? or 3gbs? or 64 bytes? etc

Many thanks in advance

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

netalat

unread,

Feb 25, 2015, 10:11:52 AM2/25/15

to

CRC-32 is just one specific polynomial out of about 100 primes (or more)
that are 32 bit long. (any of the others could be used)

the metric here is Probability of Undetected error, where one uses
weight sets (Asubi)as coeificients of the binary binomial expansion

PuE = Summation i=dmin to n [ Asubi*(p)^i*(1-p)^(n-i) ]

(detection only. upper bound on Asubi is C(n,i), n is word length)

optomization depends on your most common error patterns, and you choose
the CRC which can detect most of those.

like if it was 8 bits out of 1024, you would try to choose a code with a
small Asub8.

I am rusty on this but I dont think the hamming weight set is same as
PuE weight set for a given poly.

checkout this guy around slid 21;
http://www.slideshare.net/FranJEscribano/introduction-to-channel-coding-15682765

there is not a whole lot around on calculating PuE.
Some codes you can, many you cannot. I think with CRCs they are not, so
one must go and measure them. Block linear codes are determined.

again it depends upon your types and number of errors within a word
length. 32 bits crc on a 1 gbit file not good. but if the file is 1024
bit words with a 32 bit crc composing the 1 gbit file, much better.

Dex

unread,

Feb 25, 2015, 12:48:26 PM2/25/15

to

or perhaps another way of asking the question - if Winzip was invented today
would they use CRC32 or CRC32C, and why?

Dex

unread,

Feb 25, 2015, 2:17:11 PM2/25/15

to

> optomization depends on your most common error patterns, and you choose
> the CRC which can detect most of those.

That's part of the problem of my question though ... "most common error
patterns" im not sure applies (?), as it can be a buffer of any size and any
data ... thats why im wondering if anyone can provide in more detail as to
exactly why Intel have gone with Castagnoli's poly over the far more common
(granted also a lot older-known/implemented) "Zip poly" :)

Dex

unread,

Feb 25, 2015, 2:32:07 PM2/25/15

to

from https://software.intel.com/en-us/forums/topic/302146
"I worked in the lab team that proposed the CRC32 instruction for Nehalem.
CRC32 uses a fixed polynomial to make the instruction a stateless two
operand instruction,minimizeinstruction latency andminimize gate
count.Castagnoli
showed that this polynomial has nice mathematical properties that make it
optimal vs. other choices for data sets less than 64KB and very nearly
optimal
in other cases. This poly is used for some networking protocols such as RDMA
and SCTP that are difficult to offload in hardware."

nitialt

unread,

Feb 25, 2015, 8:35:03 PM2/25/15

to

On 2/25/2015 1:31 PM, Dex wrote:
> from https://software.intel.com/en-us/forums/topic/302146
> "I worked in the lab team that proposed the CRC32 instruction for Nehalem.
> CRC32 uses a fixed polynomial to make the instruction a stateless two
> operand instruction,minimizeinstruction latency andminimize gate
> count.Castagnoli
> showed that this polynomial has nice mathematical properties that make it
> optimal vs. other choices for data sets less than 64KB and very nearly
> optimal
> in other cases. This poly is used for some networking protocols such as RDMA
> and SCTP that are difficult to offload in hardware."

CRC32 is the name of one specific binary prime poly out of a hindered or
more 32 bits long. I think there is another one used a lot. There
must have been committee meetings on it, I am not sure which
Organization came up with the (two) standard for the industry.

all the binary prime polys are easy to do in Hardware nowadays, back in
the old days, one would try to minimize gate count.

But there isn't clear choice for one poly over another one, unless the
error statistics are known, and the weight set for Prob of undetected
error is known.

Later in that same forum, they say
"CRC32c has been adopted by newer protocols that are attempting to be
Jumbo frame compliant. The CRC-IEEE 802.3 polynomal has a smaller
hamming distance where error detection rates drop off beyond the 12,176
bits (1522 bytes) indicated by the maximum frame size set in 802.3
standards."

but see, they assume that there are lots of errors more than 12,176 bits
as a distintion to choose between codes. but this is one of several
error sources or error patterns that can exist, and it is unusal to
choose a code that detects such length of errors. typically one uses
multiple codes, a short linear block code over 32 to 2048 bits and then
a CRC across multiple blocks, lots of ways. It depends upon the error
stats of the comm channel. That opinion stated above is contrary to
having small number of errors. So either they have another short code
for small # of errors, or they only get large numbers of error, or they
make mistake.

on your winzip question, I think you can buy chips and up with CRC32
already hardware encoded, or as a call in a sw/hw function.

But you still can put another type of 32 bit CRC poly on the block
lenghts you want as it comes into winzip, and decode it as it goes out.
However, you get better error detection with longer polys, like a 64 bit.

Think of the CRC as simply deviding up the dataword set into 2^64 sets
for a 64 bit crc, or 2^32 for a 32 bit, simply a lot more sets. All
the codewords in a particular set have the same CRC checksum, but the
codewords are different, by a distance d or greater

nitialt

unread,

Feb 25, 2015, 8:36:24 PM2/25/15

to

On 2/25/2015 1:16 PM, Dex wrote:
>> optomization depends on your most common error patterns, and you choose
>> the CRC which can detect most of those.
>
> That's part of the problem of my question though ... "most common error
> patterns" im not sure applies (?), as it can be a buffer of any size and any
> data ... thats why im wondering if anyone can provide in more detail as to
> exactly why Intel have gone with Castagnoli's poly over the far more common
> (granted also a lot older-known/implemented) "Zip poly" :)
>

that would be interesting to find out!

Dex

unread,

Feb 25, 2015, 9:10:40 PM2/25/15

to

thankyou very much, that is getting a lot closer to the info/answer that im
after :)
now let me take some time to re-read and process it some more!