Deriving partial CRCs from full

rwf_20

unread,

Feb 17, 2009, 12:08:30 PM2/17/09

to

I have the CRC of a large dataset, which is arranged as follows:
- Header (512 bytes)
- Payload (~300 MB)

I have the CRC-32 value of the full dataset, and I have the CRC-32 of
the Header.

Is there any way I can derive the CRC-32 value of the Payload from the
information I have? That is, without actually reading the Payload and
computing it? From what I know of CRC, I can't see how, but I thought
I'd pose the question in case I'm missing something.

Thanks,
Ryan

nm...@cam.ac.uk

unread,

Feb 17, 2009, 12:38:28 PM2/17/09

to

In article <0db151fd-58be-4cbe...@f33g2000vbf.googlegroups.com>,

Yes, you are, but this is probably the wrong group. It's a pure
mathematics question, though you might call it theoretical computer
science.

I should need to restudy the problem to remind myself how, so can't
tell you the details offhand. The technique is roughly to turn the
operation into a ring, with it as multiplication and exclusive or as
addition. You then work out what the header CRC would be after the
number of passes defined by the payload (O(log(N)) operations) and
exclusive or it out.

I may be misremembering, but don't think so.

Regards,
Nick Maclaren.

Terje Mathisen

unread,

Feb 17, 2009, 3:52:25 PM2/17/09

to

nm...@cam.ac.uk wrote:
> In article <0db151fd-58be-4cbe...@f33g2000vbf.googlegroups.com>,
> rwf_20 <rfr...@gmail.com> wrote:
>> I have the CRC of a large dataset, which is arranged as follows:
>> - Header (512 bytes)
>> - Payload (~300 MB)

[snip]

> I should need to restudy the problem to remind myself how, so can't
> tell you the details offhand. The technique is roughly to turn the
> operation into a ring, with it as multiplication and exclusive or as
> addition. You then work out what the header CRC would be after the
> number of passes defined by the payload (O(log(N)) operations) and
> exclusive or it out.
>
> I may be misremembering, but don't think so.

I think you're right, it should be possible to run a CRC in reverse.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Glen Herrmannsfeldt

unread,

Feb 17, 2009, 4:24:25 PM2/17/09

to

nm...@cam.ac.uk wrote:
(snip)

>>Is there any way I can derive the CRC-32 value of the Payload from the
>>information I have? That is, without actually reading the Payload and
>>computing it? From what I know of CRC, I can't see how, but I thought
>>I'd pose the question in case I'm missing something.

> Yes, you are, but this is probably the wrong group. It's a pure
> mathematics question, though you might call it theoretical computer
> science.

The problem might not be as theoretical as you say.

The easy case comes up in IP/NAT routers, since IP uses a checksum
instead of CRC. It is easy to figure out the new checksum from
the values of the bytes changed.

Consider a layer 3 ethernet/IP switch which needs to change the
value of an address of TTL value inside an ethernet frame.
It would be nice to be able to do that without completely
recomputing the CRC.

(as for c.a.arithmetic, some processors do have a CRC instruction.)

> I should need to restudy the problem to remind myself how, so can't
> tell you the details offhand. The technique is roughly to turn the
> operation into a ring, with it as multiplication and exclusive or as
> addition. You then work out what the header CRC would be after the
> number of passes defined by the payload (O(log(N)) operations) and
> exclusive or it out.

> I may be misremembering, but don't think so.

-- glen

Kai Harrekilde-Petersen

unread,

Dec 22, 2009, 2:42:45 PM12/22/09

to

Glen Herrmannsfeldt <g...@ugcs.caltech.edu> writes:

> nm...@cam.ac.uk wrote:
> (snip)
>
>>>Is there any way I can derive the CRC-32 value of the Payload from the
>>>information I have? That is, without actually reading the Payload and
>>>computing it? From what I know of CRC, I can't see how, but I thought
>>>I'd pose the question in case I'm missing something.
>
>> Yes, you are, but this is probably the wrong group. It's a pure
>> mathematics question, though you might call it theoretical computer
>> science.
>
> The problem might not be as theoretical as you say.
>
> The easy case comes up in IP/NAT routers, since IP uses a checksum
> instead of CRC. It is easy to figure out the new checksum from
> the values of the bytes changed.
>
> Consider a layer 3 ethernet/IP switch which needs to change the
> value of an address of TTL value inside an ethernet frame.
> It would be nice to be able to do that without completely
> recomputing the CRC.
>
> (as for c.a.arithmetic, some processors do have a CRC instruction.)

As correctly remembered by Nick, the insight here is that CRCs are
linear codes. That means that if you have two datasets, A and B, then
CRC(A) xor CRC(B) = CRC(A xor B)

In the OP's case, A and B are not the same size. Not a problem, simply
add zeroes at the head/tail of A and B to make them of size(A+B).

If it's a CRC worth it's salt, the syndrome is initialized with a
non-zero value (typically all-ones) to detect pre-pended zeros in the
datastream. Again, using the linear property of the CRC can make you
work around that.

HTH,

Kai
--
Kai Harrekilde-Petersen <khp(at)harrekilde(dot)dk>

Wolfgang Ehrhardt

unread,

Dec 23, 2009, 12:31:35 PM12/23/09

to

On Tue, 22 Dec 2009 20:42:45 +0100, Kai Harrekilde-Petersen
<k...@harrekilde.dk> wrote:

>As correctly remembered by Nick, the insight here is that CRCs are
>linear codes. That means that if you have two datasets, A and B, then
> CRC(A) xor CRC(B) = CRC(A xor B)
>
>In the OP's case, A and B are not the same size. Not a problem, simply
>add zeroes at the head/tail of A and B to make them of size(A+B).
>
>If it's a CRC worth it's salt, the syndrome is initialized with a
>non-zero value (typically all-ones) to detect pre-pended zeros in the
>datastream. Again, using the linear property of the CRC can make you
>work around that.
>

The problem is basically solved for example in zlib's crc32_combine
in the crc32.c zlib source, or see Mark Adler's comments posted to the
comp.compression and sci.crypt groups from 2008-09-17, Message-ID:
<95e1583e-3def-4931...@p10g2000prf.googlegroups.com>

Wolfgang

--
Do not use the e-mail address from the header! You can get a
valid address from http://home.netsurf.de/wolfgang.ehrhardt
(Free open source Crypto, CRC/Hash, MPArith for Pascal/Delphi)

Deriving partial CRCs from full - is it possible?

rwf_20

nm...@cam.ac.uk

Terje Mathisen

Glen Herrmannsfeldt

Kai Harrekilde-Petersen

Wolfgang Ehrhardt