--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curation+unsubscribe@googlegroups.com.
To post to this group, send email to digital-curation@googlegroups.com.
Visit this group at https://groups.google.com/group/digital-curation.
For more options, visit https://groups.google.com/d/optout.
Hello all,I work on bagit-java and I was wondering what checksum algorithms does everyone use when using bagit? How many people would want to be able to use algorithms like SHA3 that don't come standard with the JVM?
There is also the case that since you have to read the entire file anyway, you can use the same bits over and over to calculate multiple checksums in case in the future there is discovered a weakness in one or more of the current checksum algorithms used.
Hi Simon,I was indeed referring to in parallel, i.e. manifest-md5.txt as well as manifest-sha256.txt, etc
The main use case would be for hash collisions. Since MD5 uses 128 bits and due to the birthday problem you have a 1 in 2^64 chance of a hash collision. Since SHA-256 uses 256 bits (as it's name implies) which is double the 128 bits MD5 uses, it is half as likely to generate a collision. Here at the Library of Congress we have a unimaginable amount of files, so from a curation point of view it makes sense for us to use as many bits as possible to ensure we don't get a collision.
On Wed, Jan 25, 2017 at 12:00 PM, John Scancella <blacksmi...@gmail.com> wrote:The main use case would be for hash collisions. Since MD5 uses 128 bits and due to the birthday problem you have a 1 in 2^64 chance of a hash collision. Since SHA-256 uses 256 bits (as it's name implies) which is double the 128 bits MD5 uses, it is half as likely to generate a collision. Here at the Library of Congress we have a unimaginable amount of files, so from a curation point of view it makes sense for us to use as many bits as possible to ensure we don't get a collision.Beyond needing to archive things like security researcher's work, one reason for doing this is that the tools we use are based on standard crypto libraries and so algorithm availability is driven by the larger security community. That touches on both performance – recent processors often have dedicated SHA-2 support – but, more importantly, it also means that algorithms which are now considered insecure may not be available at all.
On Wed, Jan 25, 2017 at 12:13 PM, Chris Adams <ch...@improbable.org> wrote:Beyond needing to archive things like security researcher's work, one reason for doing this is that the tools we use are based on standard crypto libraries and so algorithm availability is driven by the larger security community. That touches on both performance – recent processors often have dedicated SHA-2 support – but, more importantly, it also means that algorithms which are now considered insecure may not be available at all.Probably a dumb question, but are we discussing using the algorithms for the purposes of hashing files in the bag (i.e., bit auditing) or for encrypting the bag itself? Those seem like two fairly different use cases.
--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curation+unsubscribe@googlegroups.com.
To post to this group, send email to digital-curation@googlegroups.com.
Visit this group at https://groups.google.com/group/digital-curation.
For more options, visit https://groups.google.com/d/optout.
One of the challenges we encountered while working on https://github.com/LibraryOfCongress/bagger-js was that the WebCrypto API and asmcrypto.js don't implement MD-5 at all (browser SHA-2 performance is actually surprisingly good), which isn't a deal-breaker for creating new bags but means that anyone trying to validate existing bags in a JavaScript-based environment either needs to add extra complexity or be ready to do some sort of in-place upgrade to add manifests using newer algorithms.
Hi Chris,Makes sense! Guess we might be rolling our own for all those legacy CRC32 hashes in the coming years...