Another implementation of rolling adler32

280 views
Skip to first unread message

Christophe-Marie Duquesne

unread,
Apr 21, 2015, 10:25:27 AM4/21/15
to golan...@googlegroups.com
Hi,

I was looking for a rolling implementation of adler32, but I did not
find one with an interface I liked, so I wrote mine.

code: https://github.com/chmduquesne/rollinghash
doc: https://godoc.org/github.com/chmduquesne/rollinghash

To initialize the rolling window, just Write() it to the Hash. Then
you can just Roll() the entering byte to update the rolling hash. It
internally keeps (and updates) a copy of the rolling window in order
to determine the leaving byte.

I tested it with the same data as the vanilla package. The lib comes
with benchmarking code, to compare vanilla and rolling implementations
on 1024 and 128 bytes rolling windows:

% go test -bench .
PASS
BenchmarkVanillaKB 100000 17974 ns/op 56.97 MB/s
BenchmarkRollingKB 20000000 80.6 ns/op 12705.49 MB/s
BenchmarkVanilla128B 1000000 1059 ns/op 966.14 MB/s
BenchmarkRolling128B 50000000 77.6 ns/op 13195.44 MB/s
ok github.com/chmduquesne/rollinghash/adler32 8.767s

If you like it, use it!

Cheers,
Christophe-Marie Duquesne

Tamás Gulácsi

unread,
Apr 21, 2015, 3:13:04 PM4/21/15
to golan...@googlegroups.com
Camlistore.org has one.

Christophe-Marie Duquesne

unread,
Apr 21, 2015, 4:51:23 PM4/21/15
to golan...@googlegroups.com
Oh, that is nice. Searches for adler32 on godoc.org did not yield that result.

To comment about the differences between implementations, in case anyone wonders:

- From what I see, the window size is hardcoded to 64 bytes in Camlistore, while in my implementation it is determined by the first Write(). That is certainly not a problem, because I don't see a reason to change the window size once someone has settled for one in a given project, but that's a difference.

- My implementation satisfies the hash.Hash interface, while the one in Camlistore does not. But that will not bring you much, because the point of a rolling checksum is to be able to Roll, which is not possible with this hash.Hash interface.

- I have a bunch of tests trying to show that it yields the same results as the vanilla adler32 implementation, and I see no such tests in Camlistore. But Camlistore seems like a large, well maintained project, so there is no reason not to trust that code.

And that's it, roughly. I am not saying one is better than the other, I just wanted to point out the differences in case that helps someone make their choice.

andrewc...@gmail.com

unread,
Apr 22, 2015, 12:08:24 AM4/22/15
to golan...@googlegroups.com, ch...@chmd.fr
Using an interface for something that must be called on each byte is probably going to be slow for huge files. What are you using it for?

Christophe-Marie Duquesne

unread,
Apr 22, 2015, 5:36:43 AM4/22/15
to andrewc...@gmail.com, golan...@googlegroups.com
Typically, splitting a file in chunks. The end of a chunk is reached
when some criteria is met on the checksum.

Personally, I plan to use this to implement some kind of clone of bup
[1]. However, looking at this [2], I am not sure how much explanation
you need. Since we seem to share similar interests, you are welcome to
reach me out of this list :)

[1]: https://github.com/bup/bup
[2]: https://github.com/buppyio/buppy

andrewc...@gmail.com

unread,
Apr 22, 2015, 5:52:27 AM4/22/15
to golan...@googlegroups.com, ch...@chmd.fr, andrewc...@gmail.com
Will do :)
Reply all
Reply to author
Forward
0 new messages