Google 網路論壇不再支援新的 Usenet 貼文或訂閱項目,但過往內容仍可供查看。

Checksum not working for some files

瀏覽次數:71 次
跳到第一則未讀訊息

Alexandru

未讀,
2019年8月29日 中午12:27:462019/8/29
收件者:
Hi,

I have a C extension for Tcl that computes the checksum (md5 hash) of a given file.

I just noticed, that for some files, the function returns the wrong value.

I have a hunch, that I might be due to wrong eof sign or something similar.

Here is the function:
https://www.meshparts.de/download/checksum.c

Here is the binary file, for which checksum fails:
https://www.meshparts.de/download/FS-10022835.sldprt

Can anybody help solving this (eventually test the function)?

Many thanks.
Alexandru


Nicolas

未讀,
2019年8月29日 中午12:41:552019/8/29
收件者:
Hi,
did you try with:
f = _wfopen(FileName, L"rb");

++

Rich

未讀,
2019年8月29日 中午12:42:232019/8/29
收件者:
Alexandru <alexandr...@meshparts.de> wrote:
> Hi,
>
> I have a C extension for Tcl that computes the checksum (md5 hash) of
> a given file.
>
> I just noticed, that for some files, the function returns the wrong
> value.
>
> I have a hunch, that I might be due to wrong eof sign or something
> similar.

First thing that sticks out at me:

fd = _wfopen(FileName,L"rb, ccs=UNICODE");

If the ccs=UNICODE overrides the "b" part, then there is your problem.

MD5 performs a checksum on the raw binary content of a file. You need
to read the raw binary untranslated bytes out and feed them into the
md5 algorithm.

No translations (eol, eof, utf-X, etc.) of anything should be
happening.

Christian Gollwitzer

未讀,
2019年8月30日 凌晨12:13:182019/8/30
收件者:
Am 29.08.19 um 18:27 schrieb Alexandru:
How do you know what's right or wrong? Please post at the minimum both
MD5 hashes for this file so that we can compare it to the value that we
get from this code.

Christian

Alexandru

未讀,
2019年8月30日 凌晨1:41:132019/8/30
收件者:
I just tested that: No change.

Alexandru

未讀,
2019年8月30日 凌晨1:45:222019/8/30
收件者:
The hash looks very sparse (lots of zeros) and it's the same for many different files that look similar by looking at the content with a text editor. The files are not identical but similar. The hash is the same. Then I computed md5 hash with openssl.exe and an online tool and got another result, that looks more realistic so I'm sure the result of openssl.exe is correct. I cannot generate now the hashes but I hope you also have openssl or something similar.

Thanks!

Alexandru

未讀,
2019年8月30日 凌晨1:46:302019/8/30
收件者:
see prev answer. Removing ccs=UNICODE made no change.

s.effe...@googlemail.com

未讀,
2019年8月30日 上午8:52:112019/8/30
收件者:
Alexandru,

You have an improper notion of when the algorithm is done. In the function md5 there's the test "if(i || len == 0) { done = 1; ... }". If the length of the data is evenly divisible by 64 it will not be done. Your example file has a length that is a multiple of 64 and will not be flagged as done. While examining all possible paths, please note that md5 is only called once because your test file is smaller than the buffer.

I assume you introduced the error when you increased the read buffer so that all files can be read at once and the md5 function must cope will all possible cases at once. If you make the read buffer smaller so that at least two read actions are needed, then the md5 function will always be called with a final data set that is either not divisible by 64 or has a len == 0 and everything works fine. Go ahead, try with a read buffer of e.g. 128 (!).

About 128: Your md5 function offers a special case for padding to 128 bytes instead of 64. Is that correct?

Verdict: Your calculation of when you're done is convoluted.

-- Stephan

Gerald Lester

未讀,
2019年8月30日 下午1:50:542019/8/30
收件者:
What does the md5 from TclLib produce?

--
+----------------------------------------------------------------------+
| Gerald W. Lester, President, KNG Consulting LLC |
| Email: Gerald...@kng-consulting.net |
+----------------------------------------------------------------------+

Alexandru

未讀,
2019年8月30日 晚上9:18:192019/8/30
收件者:
Yes indeed, that was the bug! I reused code (see https://groups.google.com/d/msg/comp.lang.tcl/95H6I6iAMfQ/x--Bulm2AQAJ) and then increased the buffer so that the function runs faster. Reducing the buffer size solves the issue but only temporary since I could generate another test case with a file smaller that the current buffer size.

I'll try to understand what the buffer operations do and modify the function. The code in md5 function is insanely complicated...

Alexandru

未讀,
2019年8月30日 晚上10:44:472019/8/30
收件者:
Problem solved, thanks to Stephan.
Updated code: https://www.meshparts.de/download/checksum.c
0 則新訊息