Recommendation on Using BLAKE3 as a Cryptographic Hash Instead of MD5

267 views
Skip to first unread message

Tanvee...@protonmail.com

unread,
May 2, 2020, 12:03:57 PM5/2/20
to bup-list
Dear BUP Mailing List Members,

Hello! I was reading the bup documentation. The documentation writers kindly requested users to post recommendations on what they thought would be good alternatives to hashes to their users.

I suggest we use a standard C implementation of BLAKE3 instead of the MD5 hash that Rsync regularly uses. 

BLAKE3 is a significantly faster hash than MD5, especially on Intel/AMD architectures.

The following hyperlink takes us to the BLAKE3 GitHub homepage, which gives a table comparison of how much faster BLAKE3 is to MD5:




Since BLAKE3 is so much faster than MD5 and almost certainly the Adler-32 bit rolling checksum, I think it would be find if we simply use BLAKE3 as a rolling hash as well.


Please let me know what you all think of making this switch. Predecessors of BLAKE3, such as BLAKE2, have already been incorporated into other famous free and open source projects for their speed and low collision frequency, such as the WireGuard Linux Kernel Project (https://www.wireguard.com/)

Johannes Berg

unread,
May 2, 2020, 2:20:54 PM5/2/20
to Tanvee...@protonmail.com, bup-list
Hi,

> Hello! I was reading the bup documentation. The documentation writers
> kindly requested users to post recommendations on what they thought
> would be good alternatives to hashes to their users.

Are you referring to this?

| (If you're a computer scientist and can demonstrate that some other rolling
| checksum would be faster and/or better and/or have fewer screwy edge cases,
| we need your help! Avery's out of control! Join our mailing list! Please!
| Save us! ... oh boy, I sure hope he doesn't read this)

I mean, that doesn't really imply you should throw an algorithm over the
wall ;-)

Bup doesn't even use md5 though, and you cannot use blake3 as a rolling
hash (cheaply).

johannes

Tanvee...@protonmail.com

unread,
May 2, 2020, 8:06:55 PM5/2/20
to bup-list
Dear Johannes, No, I was not referring to using it as a rolling hash primarily, although I confess I suggested that in a previous email. I thought Bup used md5 since the documentation admitted they copied code from the rsync library. Bup did say it borrowed a lot of ideas from rsync, including its rolling checksum technique. I have to admit, after grepping through the bup GitHub source code I just realized that bup certainly does not use MD5 as a stronger hash algorithm. Rsync uses MD5 as a stronger algorithm when the weaker rolling checksum algorithm claims its found a match. If Bup does not use MD5 as a stronger algorithm then what does it use? Anyways, may you please explain to me why BLAKE3 cannot be used as a rolling hash cheaply? I am the one here that is learning about bup, git, and rsync and how they compete against each other for performance.

Johannes Berg

unread,
May 3, 2020, 3:17:22 AM5/3/20
to Tanvee...@protonmail.com, bup-list
Hi,

> Bup did say it borrowed a lot of ideas from rsync, including its
> rolling checksum technique.

Yeah, but that's not related to md5.

> Rsync uses MD5 as a stronger algorithm when the weaker rolling
> checksum algorithm claims its found a match.

I'm not sure this is quite right, but I haven't looked at rsync
specifically.

In bup, the rolling hash is just used to split the data into chunks, and
then the chunks are checksummed independently.

> If Bup does not use MD5 as a stronger algorithm then what does it use?

It's compatible with git, so it uses git's blob/checksum construction
that uses sha1.

This is being changed in git, and I expect bup might follow eventually.
Note that blake3 isn't a contender in git at this point, afaict.

> Anyways, may you please explain to me why BLAKE3 cannot be used as a
> rolling hash cheaply?

Because by design you cannot remove any content prefix bytes from a
cryptograpic hash, which is a key property you need for a rolling hash.
Otherwise you have to recalculate the checksum over every window again
and again.

> I am the one here that is learning about bup, git, and rsync and how
> they compete against each other for performance.

But they don't.

johannes


Tanveer Salim

unread,
May 3, 2020, 12:42:30 PM5/3/20
to bup-list
Dear Johannes Berg,

Thanks for informing me about how bup uses git's blob/checksum construction as a stronger algorithm instead.

However, you did mention this blob/checksum algorithm technique is actually being edited in git?

How will git change its blob/checksum algorithm technique?

May you hyperlink any online resources that gives more information about how git's blob/checksum construction is being edited?

I thank you and the rest of the Bup Mailing List Team for any responses they send back to me.

Brian Minton

unread,
May 5, 2020, 9:47:15 AM5/5/20
to Tanveer Salim, bup-list
On Sun, May 03, 2020 at 09:42:30AM -0700, 'Tanveer Salim' via bup-list wrote:
> May you hyperlink any online resources that gives more information about
> how git's blob/checksum construction is being edited?

Here's the official document on the git transition away from SHA1:

https://git-scm.com/docs/hash-function-transition/
signature.asc
Reply all
Reply to author
Forward
0 new messages