Proposal: creating a "hash" subrepo

290 views
Skip to first unread message

mortdeus

unread,
Apr 1, 2015, 10:18:16 PM4/1/15
to golan...@googlegroups.com
I was looking at the non cryptographic hashing algorithms nested in the stdlib's "hash" pkg earlier and there are a few non crypto hashing algorithms I think should be added; namely xxhash32/64 and cityhash/farmhash.

And if this proposal is shot down, can we at least expose the hashing algorithm used in the runtime (the hash inspired by and modeled around the xxhash and cityhash algorithms) to clients via a new hashing pkg nested inside std's "hash" pkgpath directory?  

Keith Randall

unread,
Apr 1, 2015, 10:27:54 PM4/1/15
to mortdeus, golang-dev
Feel free to extract the runtime's hash function into a separate package.  I don't think it should live in the standard library, however.  It would make a fine go-gettable package.

I don't want to expose the runtime's hashing package directly.  It can and does vary from release to release and from architecture to architecture.  We wouldn't want to expose clients to that.

On Wed, Apr 1, 2015 at 7:18 PM, mortdeus <mort...@gmail.com> wrote:
I was looking at the non cryptographic hashing algorithms nested in the stdlib's "hash" pkg earlier and there are a few non crypto hashing algorithms I think should be added; namely xxhash32/64 and cityhash/farmhash.

And if this proposal is shot down, can we at least expose the hashing algorithm used in the runtime (the hash inspired by and modeled around the xxhash and cityhash algorithms) to clients via a new hashing pkg nested inside std's "hash" pkgpath directory?  

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

roger roach

unread,
Apr 2, 2015, 12:03:52 AM4/2/15
to Keith Randall, golang-dev
On Wed, Apr 1, 2015 at 9:27 PM, Keith Randall <k...@google.com> wrote:
Feel free to extract the runtime's hash function into a separate package.  I don't think it should live in the standard library, however.  It would make a fine go-gettable package.

​Thats why I proposed a "hash" sub repo. Doesn't the rationality used to justify the creation and maintenance of 3rd party subrepos for stdlib pkgs like "crypto" and "image", also apply to the "hash" pkg as well?
     ​
I don't want to expose the runtime's hashing package directly.  It can and does vary from release to release and from architecture to architecture.  We wouldn't want to expose clients to that.

​The value in hash algorithms like xxhash, farmhash (the successor to cityhash) is that the algorithms are fast and robust enough to use during runtime in scenarios that requires the client to hash a lot of data and any collisions have to have their identity vetted at runtime too.

For example when implementing the diff utility it's optimal to strip, hash and sort all the lines of text contained within all the files being compared before you navigate through the string blobs looking for the graph that reveals the LCS.                    ​

​If a change to the underlying hashing algorithm changes the value that is output for a given input; that is totally fine because nobody should be using a non crypto hash to store important data to disk for later retrieval and re-usage anyways.

   ​
 

Keith Randall

unread,
Apr 2, 2015, 2:02:56 AM4/2/15
to roger roach, golang-dev
On Wed, Apr 1, 2015 at 9:03 PM, roger roach <mort...@gmail.com> wrote:


On Wed, Apr 1, 2015 at 9:27 PM, Keith Randall <k...@google.com> wrote:
Feel free to extract the runtime's hash function into a separate package.  I don't think it should live in the standard library, however.  It would make a fine go-gettable package.

​Thats why I proposed a "hash" sub repo. Doesn't the rationality used to justify the creation and maintenance of 3rd party subrepos for stdlib pkgs like "crypto" and "image", also apply to the "hash" pkg as well?
     ​

I guess I don't understand what you're proposing then.  3rd party subrepos and the standard repo are disjoint things.  Are you proposing to add something to the standard repo in hash/???, or in a third party repo?
 
I don't want to expose the runtime's hashing package directly.  It can and does vary from release to release and from architecture to architecture.  We wouldn't want to expose clients to that.

​The value in hash algorithms like xxhash, farmhash (the successor to cityhash) is that the algorithms are fast and robust enough to use during runtime in scenarios that requires the client to hash a lot of data and any collisions have to have their identity vetted at runtime too.

For example when implementing the diff utility it's optimal to strip, hash and sort all the lines of text contained within all the files being compared before you navigate through the string blobs looking for the graph that reveals the LCS.                    ​

​If a change to the underlying hashing algorithm changes the value that is output for a given input; that is totally fine because nobody should be using a non crypto hash to store important data to disk for later retrieval and re-usage anyways.

If we're going to add a new hash to the standard library, it should be well-specified and invariant over time and architecture.  I understand that you don't care particularly about those qualities but others will.  The internal runtime hash is not that.  Maybe it could be with some work extracting a stand-alone copy.  Probably better would be to implement something like farmhash for which we can match behavior with a known code base.

In any case, I'm not convinced there is demand for such a thing.  Make an implementation in a 3rd party repo and import it from there.  If we see sufficient demand for it we can pull it into the standard library.

Brad Fitzpatrick

unread,
Apr 2, 2015, 4:22:37 AM4/2/15
to mortdeus, Keith Randall, golang-dev
I think "hash" for non-"crypto" hashes is too specific of a topic to promote to be an official subrepo.

(Keith: he means one of these: https://go.googlesource.com/)

I think like xxhash32/64/cityhash/farmhash are best on Github. In fact, I think they already exist there.



On Thu, Apr 2, 2015 at 4:18 AM, mortdeus <mort...@gmail.com> wrote:
I was looking at the non cryptographic hashing algorithms nested in the stdlib's "hash" pkg earlier and there are a few non crypto hashing algorithms I think should be added; namely xxhash32/64 and cityhash/farmhash.

And if this proposal is shot down, can we at least expose the hashing algorithm used in the runtime (the hash inspired by and modeled around the xxhash and cityhash algorithms) to clients via a new hashing pkg nested inside std's "hash" pkgpath directory?  

--

mortdeus

unread,
Apr 2, 2015, 6:50:37 AM4/2/15
to golan...@googlegroups.com
And go.crypto and go.image isn't too specific?

And including the hashes closer to the standard library would encourage developers to actually use the faster hashing algorithms in more occasions. The only rationale supporting even having the subrepos is to amerlioate the fact that we can't trivally exploit the "gopher-approved" blessing aspect that pkgs in the standard library receive anymore; since when we made a rigorous commitment to not introduce breaking changes to std's API after Go 1.0's release.

Which honestly it's fine to be overly cautious and zealous about what we put into std because what we put in, can't be pulled out after the code ships.

With that being said however, checksum hashing is something developers do all the time... There is a reason we don't use FNV or CRC hashing for the runtime.... And right now the standard library does nothing to incentivize and encourage developers to look for a faster checksum hashing algorithm. Most developers are going to stick with what we put in their toyboxes for them to play with.

I mean we can thumbs up go.crypto/twofish and image/webp but that xxhash nonsense better get back in its damn github repos where it belongs! >:o

Dave Cheney

unread,
Apr 2, 2015, 6:55:43 AM4/2/15
to mortdeus, golang-dev
Roger, you need to calm down.

If you want to propose some new hashs for the x/crypto repo, then that
sounds like a good solution.

Apart from that I believe you a projecting some bias about what is
good code because it so "gopher approved" that simply is not there.
Good code is good code because it is high quality, well tested, and
relied on by many, irrespective of where its source lives.

Dave
Reply all
Reply to author
Forward
0 new messages