Taking a wild guess, the CRC of a file is pretty much like getting a
hash key for a large piece of data. Thus, reducing many files all 15 Meg
in size will someday lead to a collision of the same CRC for two
different files, correct? Or has someone worked such good magic that you
can reduce a 15 Meg file to 32 chars and never get the same 32 chars...
I doubt that.
So, what standards are out there?
And which ones come with some source code.
(I am looking at, for my electronic software distribution systems, using
the byte count and CRC of a file to make sure it is the correct version
vs date/time/size which I currently use as the date/time math gets to be
a real pain when working on networks across time zones / across file
systems some of which are doing UTC time vs local time... uuggg!!!)
--
Michael Lueck
Lueck Data Systems
Remove the upper case letters NOSPAM to contact me directly.
> Michael Lueck wrote:
>
> I am looking to understand any CRC standards which may exist, C source
> code for them, etc... so I can build a CRC function in a Rexx DLL.
>
This:
- - - - -
Name A REXX interface DLL for calculating CRC-32 checksums
Version 1.00
Author Mads Orbesen Troest & SIRIUS Cybernetics
(see EMail Addresses)
Distrib. Freeware (?)
Type DLL
Price -
Source Internet
Name: rxcrc32.*
This DLL contains two functions - one to calculate
the CRC-32 of a string and one to calculate the
CRC-32 of an entire file.
- - - - -
is a snip from the extremely valuable Rexx Tips & Tricks V. 3.20 by Bernd
Schemmer, latest update aug. 5 this year;
to be found on hobbes as rxtt32.zip
but you probably new that already.
--
good luck
peter
>I am looking to understand any CRC standards which may exist, C source
>code for them, etc... so I can build a CRC function in a Rexx DLL.
>
>Taking a wild guess, the CRC of a file is pretty much like getting a
>hash key for a large piece of data. Thus, reducing many files all 15 Meg
>in size will someday lead to a collision of the same CRC for two
>different files, correct? Or has someone worked such good magic that you
>can reduce a 15 Meg file to 32 chars and never get the same 32 chars...
>I doubt that.
>
>So, what standards are out there?
>
>And which ones come with some source code.
There is a standard of sorts for CRC16 and CRC32.
It's a whole lot of dry math, so you'll probably want to skip right to some
source code. Here's what I use in one of my programs:
----- crc32.h -----
#ifndef __CRC32_H__
#define __CRC32_H__
#define ULONG unsigned long
#define LONG long
#define POLYNOMIAL 0x04c11db7L
void GenCRCTable(ULONG *table);
ULONG UpdateCRC(ULONG currCRC, ULONG *table, char *data, LONG dataLen);
ULONG Reflect(ULONG ref, char ch);
#endif
----- end crc32.h -----
----- crc32.c -----
#include "crc32.h"
void GenCRCTable(ULONG *table)
{
int i, j;
for(i = 0; i <= 0xFF; i++)
{
table[i]=Reflect(i, 8) << 24;
for (j = 0; j < 8; j++)
{
table[i] = (table[i] << 1) ^ (table[i] & (1 << 31) ? POLYNOMIAL :
0);
}
table[i] = Reflect(table[i], 32);
}
}
ULONG UpdateCRC(ULONG currCRC, ULONG *table, char *data, LONG dataLen)
{
while(dataLen--)
{
currCRC=(currCRC >> 8) ^ table[(currCRC & 0xFF) ^ *data++];
}
return currCRC;
}
ULONG Reflect(ULONG ref, char ch)
{
ULONG value=0;
int i;
for (i=1; i<(ch+1); i++)
{
if (ref & 1)
{
value |= 1 << (ch - i);
}
ref >>= 1;
}
return value;
}
----- end crc32.c -----
Here's a small sample showing how to use the above:
#include <stdio.h>
#include <string.h>
#include "crc32.h"
int main()
{
char tstring[]="Test string for CRC32\r\n";
ULONG table[256];
ULONG crcval=0xffffffff;
GenCRCTable(table);
crcval=UpdateCRC(crcval, table, tstring, strlen(tstring)) ^ 0xffffffff;
printf("\nCRC32 value: %08X\n", crcval);
return 0;
}
The important points to remember are starting your CRC value out as
0xffffffff, and doing an exclusive OR with that same value when all of the
updates are done. The above does the entire string in one shot. For a file,
you'd call UpdateCRC() for each chunk read from the file (obviously, you
wouldn't do the XOR for each chunk - only after the last one). You can test
the above by echoing the string (sans carriage return and linefeed, which the
ECHO command will add for you) to a file, then zipping it up and looking at
the contents.
Obviously, since there are only 2^32 possible values, a CRC32 value can
definitely apply to more than one string of bytes. The probability, however,
is very low. The polynomial for the standard was chosen to make this
probability as independent as possible from the similarity of the two byte
strings that happen to have the same CRC32 value. You can see this easily by
changing a single bit in a large file. The CRC32 changes drastically.
So, if you see two files with the same size and same CRC, the odds are
greatly in favor of them having identical contents.
--
- Mike
Remove 'spambegone.net' and reverse to send e-mail.
The CRC of a file is just that -- the CRC of _that_ file. It has
*nothing* to do with the CRC of *any other* file in the universe.
If another file should happen to have the same CRC code -- so what?
It deserves it!
Now, if you're looking for a unique ID for a file, well, pretty
much the file, itself, is the unique ID. ZIP'ping the file
will, in turn, generate another, smaller unique ID (the ZIP file)
that will be singularly related to the original _when_ 'viewed'
through the unzip program.
Jonesy
--
| Marvin L Jones | jonz | W3DHJ | OS/2
| Gunnison, Colorado | @ | Jonesy | linux __
| 7,703' -- 2,345m | frontier.net | DM68mn SK
Allodoxaphobia wrote:
> The CRC of a file is just that -- the CRC of _that_ file. It has
> *nothing* to do with the CRC of *any other* file in the universe.
>
> If another file should happen to have the same CRC code -- so what?
> It deserves it!
>
> Now, if you're looking for a unique ID for a file, well, pretty
> much the file, itself, is the unique ID. ZIP'ping the file
> will, in turn, generate another, smaller unique ID (the ZIP file)
> that will be singularly related to the original _when_ 'viewed'
> through the unzip program.
More my angle was on "if I use a size and CRC to detect when a file is
different and thus needs updating" how likely would it be to, for example,
take a 15MB text file, flip a couple of chars around inside the file (spell
checking let's say) and they byte count nor CRC would change. If it is less
than nil then it's good enough for me.
In article <3B9AD1A9...@SlueckPdataAsystemsM.com>,
Nmlu...@SlueckPdataAsystemsM.com says...
[snip]
>More my angle was on "if I use a size and CRC to detect when a file is
>different and thus needs updating" how likely would it be to, for example,
>take a 15MB text file, flip a couple of chars around inside the file
(spell
>checking let's say) and they byte count nor CRC would change. If it is
less
>than nil then it's good enough for me.
The common CRC (cyclic redundany checks) 16 & 32 are NOT designed to detect
arbitrary malicious changes in an arbitrary file. They were designed to
detect common transmission errors over noisy links.
I think you are looking for a cryptographic hash function like MD5 or SHA1.
MD5 has sample code in RFC1321. SHA1 code is available many places on the
net.
- --
PGP key available from the key servers.
Key fingerprint 95 F4 D3 94 66 BA 92 4E 06 1E 95 F8 74 A8 2F A0
-----BEGIN PGP SIGNATURE-----
Version: 4.5
iQCVAgUBO52LOtZjPoeWO7BhAQFhGwP8DnaX6VtemEU4QolMmoPAkUbSIY39lGwP
zm2cvVrErgHqXFYVizLuxnVm1Cr5GjIp7j4PYyMoAynbNdVM6jyaJkl/L9LK+MNj
81AxHEwmQMImOTpqfITzvYYcfTkvgkLEOcrDoq8FjwsIYeyt4TiffCzdVfXuv9eA
Z3mbHVjEx34=
=CWzZ
-----END PGP SIGNATURE-----
Michael,
As already mentioned, see the RFC's for MD4 and MD5:
ftp://ftp.isi.edu/in-notes/rfc1320.txt
ftp://ftp.isi.edu/in-notes/rfc1321.txt
or
http://www.rfc-editor.org/rfc/rfc1320.txt
http://www.rfc-editor.org/rfc/rfc1321.txt
Also, for an OS/2 implementation see Daniel Hellerstein's rexx_md5:
http://www.srehttp.org/apps/rexx_md5/
- Peter
Brad
"Carl Byington" <ca...@five-ten-sg.com> wrote in message
news:9nk21r$qvv$2...@la-mail4.digilink.net...