Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Checksum not working for some files

71 views
Skip to first unread message

Alexandru

unread,
Aug 29, 2019, 12:27:46 PM8/29/19
to
Hi,

I have a C extension for Tcl that computes the checksum (md5 hash) of a given file.

I just noticed, that for some files, the function returns the wrong value.

I have a hunch, that I might be due to wrong eof sign or something similar.

Here is the function:
https://www.meshparts.de/download/checksum.c

Here is the binary file, for which checksum fails:
https://www.meshparts.de/download/FS-10022835.sldprt

Can anybody help solving this (eventually test the function)?

Many thanks.
Alexandru


Nicolas

unread,
Aug 29, 2019, 12:41:55 PM8/29/19
to
Hi,
did you try with:
f = _wfopen(FileName, L"rb");

++

Rich

unread,
Aug 29, 2019, 12:42:23 PM8/29/19
to
Alexandru <alexandr...@meshparts.de> wrote:
> Hi,
>
> I have a C extension for Tcl that computes the checksum (md5 hash) of
> a given file.
>
> I just noticed, that for some files, the function returns the wrong
> value.
>
> I have a hunch, that I might be due to wrong eof sign or something
> similar.

First thing that sticks out at me:

fd = _wfopen(FileName,L"rb, ccs=UNICODE");

If the ccs=UNICODE overrides the "b" part, then there is your problem.

MD5 performs a checksum on the raw binary content of a file. You need
to read the raw binary untranslated bytes out and feed them into the
md5 algorithm.

No translations (eol, eof, utf-X, etc.) of anything should be
happening.

Christian Gollwitzer

unread,
Aug 30, 2019, 12:13:18 AM8/30/19
to
Am 29.08.19 um 18:27 schrieb Alexandru:
How do you know what's right or wrong? Please post at the minimum both
MD5 hashes for this file so that we can compare it to the value that we
get from this code.

Christian

Alexandru

unread,
Aug 30, 2019, 1:41:13 AM8/30/19
to
I just tested that: No change.

Alexandru

unread,
Aug 30, 2019, 1:45:22 AM8/30/19
to
The hash looks very sparse (lots of zeros) and it's the same for many different files that look similar by looking at the content with a text editor. The files are not identical but similar. The hash is the same. Then I computed md5 hash with openssl.exe and an online tool and got another result, that looks more realistic so I'm sure the result of openssl.exe is correct. I cannot generate now the hashes but I hope you also have openssl or something similar.

Thanks!

Alexandru

unread,
Aug 30, 2019, 1:46:30 AM8/30/19
to
see prev answer. Removing ccs=UNICODE made no change.

s.effe...@googlemail.com

unread,
Aug 30, 2019, 8:52:11 AM8/30/19
to
Alexandru,

You have an improper notion of when the algorithm is done. In the function md5 there's the test "if(i || len == 0) { done = 1; ... }". If the length of the data is evenly divisible by 64 it will not be done. Your example file has a length that is a multiple of 64 and will not be flagged as done. While examining all possible paths, please note that md5 is only called once because your test file is smaller than the buffer.

I assume you introduced the error when you increased the read buffer so that all files can be read at once and the md5 function must cope will all possible cases at once. If you make the read buffer smaller so that at least two read actions are needed, then the md5 function will always be called with a final data set that is either not divisible by 64 or has a len == 0 and everything works fine. Go ahead, try with a read buffer of e.g. 128 (!).

About 128: Your md5 function offers a special case for padding to 128 bytes instead of 64. Is that correct?

Verdict: Your calculation of when you're done is convoluted.

-- Stephan

Gerald Lester

unread,
Aug 30, 2019, 1:50:54 PM8/30/19
to
What does the md5 from TclLib produce?

--
+----------------------------------------------------------------------+
| Gerald W. Lester, President, KNG Consulting LLC |
| Email: Gerald...@kng-consulting.net |
+----------------------------------------------------------------------+

Alexandru

unread,
Aug 30, 2019, 9:18:19 PM8/30/19
to
Yes indeed, that was the bug! I reused code (see https://groups.google.com/d/msg/comp.lang.tcl/95H6I6iAMfQ/x--Bulm2AQAJ) and then increased the buffer so that the function runs faster. Reducing the buffer size solves the issue but only temporary since I could generate another test case with a file smaller that the current buffer size.

I'll try to understand what the buffer operations do and modify the function. The code in md5 function is insanely complicated...

Alexandru

unread,
Aug 30, 2019, 10:44:47 PM8/30/19
to
Problem solved, thanks to Stephan.
Updated code: https://www.meshparts.de/download/checksum.c
0 new messages