\t \tRaymond Lin's MD5 & SHA-1 Checksum Utility is a standalone freeware tool that generates and verifies cryptographic hashes in MD5 and SHA-1. Cryptographic hash functions are commonly used to guard against malicious changes to protected data in a wide variety of software, Internet, and security applications, including digital signatures and other forms of authentication. Two of the most common cryptographic hash functions are the Secure Hash Algorithm (SHA) and Message Digest Algorithm-5 (MD5). Checksum utilities are used to verify the integrity of generated hashes. There are two basic types, those that calculate checksum values and those that also validate them by checking them against a list of values for the protected data, which is the only way it can be done.
\t \tMD5 & SHA-1 Checksum Utility is free to download and use, though Ray accepts donations from satisfied users. At a mere 57k, his checksum tool is about as small as a useful, functioning utility can be in this age of bloatware, and even more so considering that it's certified to work in Windows Vista and 7.
Raymond Lin's MD5 & SHA-1 Checksum Utility is a standalone freeware tool that generates and verifies cryptographic hashes in MD5 and SHA-1. Cryptographic hash functions are commonly used to guard against malicious changes to protected data in a wide variety of software, Internet, and security applications, including digital signatures and other forms of authentication. Two of the most common cryptographic hash functions are the Secure Hash Algorithm (SHA) and Message Digest Algorithm-5 (MD5). Checksum utilities are used to verify the integrity of generated hashes. There are two basic types, those that calculate checksum values and those that also validate them by checking them against a list of values for the protected data, which is the only way it can be done.
MD5 & SHA-1 Checksum Utility is free to download and use, though Ray accepts donations from satisfied users. At a mere 57k, his checksum tool is about as small as a useful, functioning utility can be in this age of bloatware, and even more so considering that it's certified to work in Windows Vista and 7.
File Checksum Tool allows you to verify the Hash to ensure the file integrity is correct with the matching file or create new checksum for your important data. Application is portable and requires no installation.
TLDR: Got tired of having no way to generate/verify checksum files on Windows without 3rd-party software. So I wrote my own in PS and made it a module. I wanted to share it with anyone who would find it useful. Bug reports, issues and improvements are encouraged and supported.
Edit: Just wanted to point out that this is not a replacement for Get-FileHash. The script actually uses Get-FileHash to do the file hashing. What this script does do, is extend the functionality of Get-FileHash to make it work something like *nix based checksum programs like md5sum/sha256sum.
I use file check sums to verify files and downloads multiple times a day, and on my Linux/MacOS machines it's never been a problem since they have tools built into the OS for this purpose. It always bothered me that to do this on windows you have to either install a third-party program, or write a quick one liner to verify files in PowerShell with Get-FileHash or by using CertUtil. While these methods work to get the job done, it doesn't really scale for huge backups with 100s or even 1000s of files, and does not allow much flexibility without writing a more complicated script, such as this one.
It works by supplying a path to either a file, directory or checksum file and then specifying which mode you want to operate in. The output can quickly show failed checksums with different colored outputs (green for PASS and red for FAIL). It can use MD5, SHA1, SHA256, SHA384 and SHA512 algorithms.
I'm currently testing a -Recurse flag (recursive search for directory mode), and some other code refactoring/error handling. After this I will be moving on to implementing a -Quiet flag (only output failed checksums for check mode)
So I wrote this little tool in Rust: it can automatically locate the location of MSFS packages, and compute its hash value for each file. After the calculation is complete, you can compare it to hashes computed by other users with the same MSFS version to determine if there is any file corruption.
This tool uses the 128-bit xxHash algorithm (aka XXH128) and automatically uses all CPU cores for parallel computing. So it generates hashes very fast, and the performance bottleneck is almost solely determined by the read speed of your hard drive.
I'd like to find a tool that allows me to efficiently find redundancy across remote filesystems, so that we can delete redundant data, and copy non-redundant data when decomissioning storage bricks. (Side note: distributed filesystems like Ceph promise to handle these cases; this will be the future route, but now we have to deal with the existing system as-is)
Since many objects have been moved and renamed by hand, I cannot rely on their file names to compare with diff or rsync. I'd rather use a crypto checksum such as sha256 to identify my data files.
Is there an existing tool to do this ? Maybe something that stores a checksum in a Posix Extended Attribute (using the timestamp to check the checksum freshness), and a tool that can extract that information to efficiently diff the contents of the filesystems, without caring about the filenames ?
If these are large files, you could consider setting up a system that lets users duplicate data using bittorrent; it has a built-in way of checksumming data and if you have several places that store the files you gain added benefits from not loading down one or two systems with transfers.
I would caution that duplicating files like that and using checksums isn't technically a backup; it's a duplicate. Backups means that when your master file is corrupt you can "roll back" to a previous version (wanna set up something similar to CVS to check out your large data files?...) while duplicating, even with checksums, means that if your original is corrupted (accidental deletion, bad sector in the drive, etc.) that corruption will get copied out, checksum and all, to your duplicates, rendering them useless. You'll want to plan for that scenario.
pt-table-checksum is designed to do the right thing by default in almost everycase. When in doubt, use --explain to see how the tool will checksum atable. The following is a high-level overview of how the tool functions.
In contrast to older versions of pt-table-checksum, this tool is focused on asingle purpose, and does not have a lot of complexity or support many differentchecksumming techniques. It executes checksum queries on only one server, andthese flow through replication to re-execute on replicas. If you need the olderbehavior, you can use Percona Toolkit version 1.0.
pt-table-checksum connects to the server you specify, and finds databases andtables that match the filters you specify (if any). It works one table at atime, so it does not accumulate large amounts of memory or do a lot of workbefore beginning to checksum. This makes it usable on very large servers. Wehave used it on servers with hundreds of thousands of databases and tables, andtrillions of rows. No matter how large the server is, pt-table-checksum worksequally well.
The tool monitors replicas continually. If any replica falls too far behind inreplication, pt-table-checksum pauses to allow it to catch up. If any replicahas an error, or replication stops, pt-table-checksum pauses and waits. Inaddition, pt-table-checksum looks for common causes of problems, such asreplication filters, and refuses to operate unless you force it to. Replicationfilters are dangerous, because the queries that pt-table-checksum executes couldpotentially conflict with them and cause replication to fail.
There are several other safeguards. For example, pt-table-checksum sets itssession-level innodb_lock_wait_timeout to 1 second, so that if there is a lockwait, it will be the victim instead of causing other queries to time out.Another safeguard checks the load on the database server, and pauses if the loadis too high. There is no single right answer for how to do this, but by defaultpt-table-checksum will pause if there are more than 25 concurrently executingqueries. You should probably set a sane value for your server with the--max-load option.
If pt-table-checksum encounters a condition that causes it to stop completely,it is easy to resume it with the --resume option. It will begin from thelast chunk of the last table that it processed. You can also safely stop thetool with CTRL-C. It will finish the chunk it is currently processing, and thenexit. You can resume it as usual afterwards.
After pt-table-checksum finishes checksumming all of the chunks in a table, itpauses and waits for all detected replicas to finish executing the checksumqueries. Once that is finished, it checks all of the replicas to see if theyhave the same data as the master, and then prints a line of output with theresults. You can see a sample of its output later in this documentation.
The tool prints progress indicators during time-consuming operations. It printsa progress indicator as each table is checksummed. The progress is computed bythe estimated number of rows in the table. It will also print a progress reportwhen it pauses to wait for replication to catch up, and when it is waiting tocheck replicas for differences from the master. You can make the output lessverbose with the --quiet option.
If you wish, you can query the checksum tables manually to get a report of whichtables and chunks have differences from the master. The following query willreport every database and table with differences, along with a summary of thenumber of chunks and rows possibly affected:
df19127ead