[SLUG] fast hashing/checksumming tool

1 view
Skip to first unread message

Dave Kempe

unread,
Feb 24, 2009, 4:35:52 AM2/24/09
to slug
Hi,
I need to checksum recursively alot of data, and store the checksums in
a database. I can do most via a shell script, but was wondering if
anyone could recommend a checksumming tool that was the fastest.
I know about md5sum, sha1sum, cfv (not recursive enough). I want to be
able to produce a checksum of many files (2.1TB worth) for verification
against other copies of the files in various locations. I need the
fastest available algorithm, not necessarily the most secure etc.
Any suggestions?

Dave
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Dave Kempe

unread,
Feb 24, 2009, 5:52:27 AM2/24/09
to slug
(replying to myself)
http://md5deep.sourceforge.net/
seems to be the answer, with a recompile to get the new CSV feature, and
using sha1 not md5 seems fast enough.

dave

Peter Chubb

unread,
Feb 24, 2009, 5:34:48 AM2/24/09
to Dave Kempe, slug
I think that sum is probably fastest; but overall the CPU time is
negligible compared with the time to read the data.

--
Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au
http://www.ertos.nicta.com.au ERTOS within National ICT Australia

Amos Shapira

unread,
Feb 24, 2009, 6:32:41 AM2/24/09
to Dave Kempe, slug
2009/2/24 Dave Kempe <da...@solutionsfirst.com.au>:

> Hi,
> I need to checksum recursively alot of data, and store the checksums in a
> database. I can do most via a shell script, but was wondering if anyone
> could recommend a checksumming tool that was the fastest.
> I know about md5sum, sha1sum, cfv (not recursive enough). I want to be able
> to produce a checksum of many files (2.1TB worth) for verification against
> other copies of the files in various locations. I need the fastest available
> algorithm, not necessarily the most secure etc.
> Any suggestions?

I'm no expert but I think md4 is considered very weak but also faster
than other hash algorithms. It is therefore used where security is
less of a concern (e.g. to checksum data which is already signed by
stronger algorithms). According to its wikipedia article rsync uses
it.

openssl comes with md4 so you can do, for instance, "openssl md4 /etc/passwd".

Try comparing the relative performance by replacing "md4" by "md5".

On my system, I ran it multiple times on a 892Mb file and once the
file was all cached in memory md4 persistently ran for 2.06 seconds
elapsed time on it while md5 settled at 3.2 seconds. That's a 35%
speedup compared to md5.

Maybe there are faster algorithms around.

--Amos

Jake Anderson

unread,
Feb 24, 2009, 6:51:19 AM2/24/09
to Amos Shapira, slug
That's the thing though the OP had 2.1TB worth of data.
Most hash algorithms on standard hardware will be diskIO bound rather
than CPU limited.
In other words it don't really matter much.

I think I have heard of hashing algorithms being implemented in video
cards (GPGPU and CUDA)
so if you really wanted some high speed hashing that would be the way to
go ;-> getting enough data to hash at that rate is left as an exercise
for the reader

Sridhar Dhanapalan

unread,
Feb 25, 2009, 7:38:18 AM2/25/09
to Dave Kempe, slug
2009/2/24 Dave Kempe <da...@solutionsfirst.com.au>:

> Hi,
> I need to checksum recursively alot of data, and store the checksums in a
> database. I can do most via a shell script, but was wondering if anyone
> could recommend a checksumming tool that was the fastest.
> I know about md5sum, sha1sum, cfv (not recursive enough). I want to be able
> to produce a checksum of many files (2.1TB worth) for verification against
> other copies of the files in various locations. I need the fastest available
> algorithm, not necessarily the most secure etc.
> Any suggestions?

sum and cksum do basic checksums and don't try to be incredibly secure.

md4 is used by some p2p networks and rsync.


--
Bring choice back to your computer.
http://www.linux.org.au/linux

Reply all
Reply to author
Forward
0 new messages