Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

CRC on Files

31 views
Skip to first unread message

Michael Lueck

unread,
Sep 7, 2001, 9:51:30 PM9/7/01
to
I am looking to understand any CRC standards which may exist, C source
code for them, etc... so I can build a CRC function in a Rexx DLL.

Taking a wild guess, the CRC of a file is pretty much like getting a
hash key for a large piece of data. Thus, reducing many files all 15 Meg
in size will someday lead to a collision of the same CRC for two
different files, correct? Or has someone worked such good magic that you
can reduce a 15 Meg file to 32 chars and never get the same 32 chars...
I doubt that.

So, what standards are out there?

And which ones come with some source code.

(I am looking at, for my electronic software distribution systems, using
the byte count and CRC of a file to make sure it is the correct version
vs date/time/size which I currently use as the date/time math gets to be
a real pain when working on networks across time zones / across file
systems some of which are doing UTC time vs local time... uuggg!!!)

--
Michael Lueck
Lueck Data Systems

Remove the upper case letters NOSPAM to contact me directly.


peter volsted

unread,
Sep 8, 2001, 7:09:45 AM9/8/01
to

hi

> Michael Lueck wrote:
>
> I am looking to understand any CRC standards which may exist, C source
> code for them, etc... so I can build a CRC function in a Rexx DLL.
>

This:
- - - - -
Name A REXX interface DLL for calculating CRC-32 checksums
Version 1.00
Author Mads Orbesen Troest & SIRIUS Cybernetics
(see EMail Addresses)
Distrib. Freeware (?)
Type DLL
Price -
Source Internet
Name: rxcrc32.*


This DLL contains two functions - one to calculate
the CRC-32 of a string and one to calculate the
CRC-32 of an entire file.
- - - - -
is a snip from the extremely valuable Rexx Tips & Tricks V. 3.20 by Bernd
Schemmer, latest update aug. 5 this year;
to be found on hobbes as rxtt32.zip
but you probably new that already.


--
good luck

peter

Mike Ruskai

unread,
Sep 8, 2001, 11:30:05 AM9/8/01
to
On Fri, 07 Sep 2001 21:51:30 -0400, Michael Lueck wrote:

>I am looking to understand any CRC standards which may exist, C source
>code for them, etc... so I can build a CRC function in a Rexx DLL.
>
>Taking a wild guess, the CRC of a file is pretty much like getting a
>hash key for a large piece of data. Thus, reducing many files all 15 Meg
>in size will someday lead to a collision of the same CRC for two
>different files, correct? Or has someone worked such good magic that you
>can reduce a 15 Meg file to 32 chars and never get the same 32 chars...
>I doubt that.
>
>So, what standards are out there?
>
>And which ones come with some source code.

There is a standard of sorts for CRC16 and CRC32.

It's a whole lot of dry math, so you'll probably want to skip right to some
source code. Here's what I use in one of my programs:

----- crc32.h -----
#ifndef __CRC32_H__
#define __CRC32_H__

#define ULONG unsigned long
#define LONG long

#define POLYNOMIAL 0x04c11db7L
void GenCRCTable(ULONG *table);
ULONG UpdateCRC(ULONG currCRC, ULONG *table, char *data, LONG dataLen);
ULONG Reflect(ULONG ref, char ch);

#endif
----- end crc32.h -----

----- crc32.c -----
#include "crc32.h"

void GenCRCTable(ULONG *table)
{
int i, j;

for(i = 0; i <= 0xFF; i++)
{
table[i]=Reflect(i, 8) << 24;
for (j = 0; j < 8; j++)
{
table[i] = (table[i] << 1) ^ (table[i] & (1 << 31) ? POLYNOMIAL :
0);
}
table[i] = Reflect(table[i], 32);
}
}

ULONG UpdateCRC(ULONG currCRC, ULONG *table, char *data, LONG dataLen)
{
while(dataLen--)
{
currCRC=(currCRC >> 8) ^ table[(currCRC & 0xFF) ^ *data++];
}

return currCRC;
}

ULONG Reflect(ULONG ref, char ch)
{
ULONG value=0;
int i;

for (i=1; i<(ch+1); i++)
{
if (ref & 1)
{
value |= 1 << (ch - i);
}
ref >>= 1;
}

return value;
}
----- end crc32.c -----

Here's a small sample showing how to use the above:

#include <stdio.h>
#include <string.h>
#include "crc32.h"

int main()
{
char tstring[]="Test string for CRC32\r\n";
ULONG table[256];
ULONG crcval=0xffffffff;

GenCRCTable(table);

crcval=UpdateCRC(crcval, table, tstring, strlen(tstring)) ^ 0xffffffff;

printf("\nCRC32 value: %08X\n", crcval);

return 0;
}

The important points to remember are starting your CRC value out as
0xffffffff, and doing an exclusive OR with that same value when all of the
updates are done. The above does the entire string in one shot. For a file,
you'd call UpdateCRC() for each chunk read from the file (obviously, you
wouldn't do the XOR for each chunk - only after the last one). You can test
the above by echoing the string (sans carriage return and linefeed, which the
ECHO command will add for you) to a file, then zipping it up and looking at
the contents.

Obviously, since there are only 2^32 possible values, a CRC32 value can
definitely apply to more than one string of bytes. The probability, however,
is very low. The polynomial for the standard was chosen to make this
probability as independent as possible from the similarity of the two byte
strings that happen to have the same CRC32 value. You can see this easily by
changing a single bit in a large file. The CRC32 changes drastically.

So, if you see two files with the same size and same CRC, the odds are
greatly in favor of them having identical contents.


--
- Mike

Remove 'spambegone.net' and reverse to send e-mail.


Allodoxaphobia

unread,
Sep 8, 2001, 9:02:45 PM9/8/01
to
On Fri, 07 Sep 2001 21:51:30 -0400, Michael Lueck scribbled:

> I am looking to understand any CRC standards which may exist, C source
> code for them, etc... so I can build a CRC function in a Rexx DLL.
>
> Taking a wild guess, the CRC of a file is pretty much like getting a
> hash key for a large piece of data. Thus, reducing many files all 15 Meg
> in size will someday lead to a collision of the same CRC for two
> different files, correct? Or has someone worked such good magic that you
> can reduce a 15 Meg file to 32 chars and never get the same 32 chars...
> I doubt that.

The CRC of a file is just that -- the CRC of _that_ file. It has
*nothing* to do with the CRC of *any other* file in the universe.

If another file should happen to have the same CRC code -- so what?
It deserves it!

Now, if you're looking for a unique ID for a file, well, pretty
much the file, itself, is the unique ID. ZIP'ping the file
will, in turn, generate another, smaller unique ID (the ZIP file)
that will be singularly related to the original _when_ 'viewed'
through the unzip program.

Jonesy
--
| Marvin L Jones | jonz | W3DHJ | OS/2
| Gunnison, Colorado | @ | Jonesy | linux __
| 7,703' -- 2,345m | frontier.net | DM68mn SK

Michael Lueck

unread,
Sep 8, 2001, 10:19:21 PM9/8/01
to

Allodoxaphobia wrote:

> The CRC of a file is just that -- the CRC of _that_ file. It has
> *nothing* to do with the CRC of *any other* file in the universe.
>
> If another file should happen to have the same CRC code -- so what?
> It deserves it!
>
> Now, if you're looking for a unique ID for a file, well, pretty
> much the file, itself, is the unique ID. ZIP'ping the file
> will, in turn, generate another, smaller unique ID (the ZIP file)
> that will be singularly related to the original _when_ 'viewed'
> through the unzip program.

More my angle was on "if I use a size and CRC to detect when a file is
different and thus needs updating" how likely would it be to, for example,
take a 15MB text file, flip a couple of chars around inside the file (spell
checking let's say) and they byte count nor CRC would change. If it is less
than nil then it's good enough for me.

Michael Lueck

unread,
Sep 8, 2001, 10:20:49 PM9/8/01
to
Thanks for the sample code, Mike. Now to hold my gripes of C/C++ long enough to
figure out how to turn that code into a working DLL for Win32 first and maybe
OS/2.

Carl Byington

unread,
Sep 10, 2001, 11:56:43 PM9/10/01
to
-----BEGIN PGP SIGNED MESSAGE-----

In article <3B9AD1A9...@SlueckPdataAsystemsM.com>,
Nmlu...@SlueckPdataAsystemsM.com says...

[snip]

>More my angle was on "if I use a size and CRC to detect when a file is
>different and thus needs updating" how likely would it be to, for example,
>take a 15MB text file, flip a couple of chars around inside the file
(spell
>checking let's say) and they byte count nor CRC would change. If it is
less
>than nil then it's good enough for me.

The common CRC (cyclic redundany checks) 16 & 32 are NOT designed to detect
arbitrary malicious changes in an arbitrary file. They were designed to
detect common transmission errors over noisy links.

I think you are looking for a cryptographic hash function like MD5 or SHA1.
MD5 has sample code in RFC1321. SHA1 code is available many places on the
net.


- --
PGP key available from the key servers.
Key fingerprint 95 F4 D3 94 66 BA 92 4E 06 1E 95 F8 74 A8 2F A0

-----BEGIN PGP SIGNATURE-----
Version: 4.5

iQCVAgUBO52LOtZjPoeWO7BhAQFhGwP8DnaX6VtemEU4QolMmoPAkUbSIY39lGwP
zm2cvVrErgHqXFYVizLuxnVm1Cr5GjIp7j4PYyMoAynbNdVM6jyaJkl/L9LK+MNj
81AxHEwmQMImOTpqfITzvYYcfTkvgkLEOcrDoq8FjwsIYeyt4TiffCzdVfXuv9eA
Z3mbHVjEx34=
=CWzZ
-----END PGP SIGNATURE-----

Peter Skye

unread,
Sep 18, 2001, 3:04:14 AM9/18/01
to
Michael Lueck wrote:
>
> More my angle was on "if I use a size and CRC to detect
> when a file is different and thus needs updating" how
> likely would it be to, for example, take a 15MB text file,
> flip a couple of chars around inside the file (spell
> checking let's say) and they byte count nor CRC would
> change. If it is less than nil then it's good enough for me.

Michael,

As already mentioned, see the RFC's for MD4 and MD5:

ftp://ftp.isi.edu/in-notes/rfc1320.txt
ftp://ftp.isi.edu/in-notes/rfc1321.txt
or
http://www.rfc-editor.org/rfc/rfc1320.txt
http://www.rfc-editor.org/rfc/rfc1321.txt

Also, for an OS/2 implementation see Daniel Hellerstein's rexx_md5:

http://www.srehttp.org/apps/rexx_md5/

- Peter

Brad Van Duser

unread,
Sep 19, 2001, 8:36:52 AM9/19/01
to
IBM recently developed a new MVS software distribution function for the
mainframe that utilizes a similar hash calculation to validate that the file
delivered to your mainframe over the 'net is the same file you ordered. For
guys who know what SMP/E is on MVS it is a "network receive" function.
Basically you order a fix or upgrade through a web interface then when you
run a small job on your mainframe it pulls down the zipped file along with
an XML packing list that I think contains the hash. The hash is then
computed for the unzipped file and compared to the packing list. It
requires that hardware cryptographic services (ICSF) be active on your
processor otherwise you have to disable the hash check for that receive.

Brad


"Carl Byington" <ca...@five-ten-sg.com> wrote in message
news:9nk21r$qvv$2...@la-mail4.digilink.net...

0 new messages