Way to obtain hash from a struct?

1,679 views
Skip to first unread message

Oliver Plohmann

unread,
Jul 18, 2014, 10:18:29 AM7/18/14
to golan...@googlegroups.com
Hello,

I'm working on a segmented map (to get better throughput when accessed concurrently). It has n segments where each has a map of its own. Depending on the hash of the key the segmented map decides into which segment to put the value. Now it should also be possible for the key to be a struct. Therefore I need a way to figure out how to obtain the hash of some struct. I figured this out for a plain string using the hash package. But I still can't find a way to obtain the hash of an arbitrary struct. The struct doesn't have to be completely arbitrary, some meaningful restrictions are ok.

Any hints appreciated.
Regards, Oliver

Jan Mercl

unread,
Jul 18, 2014, 10:53:24 AM7/18/14
to Oliver Plohmann, golang-nuts
On Fri, Jul 18, 2014 at 4:18 PM, Oliver Plohmann <saxo...@gmail.com> wrote:

Maybe compute the hash from a serialization of the struct. A "flat"
struct might use hash(fmt.Sprintf("%#v", foo)) sufficiently.

-j

Konstantin Khomoutov

unread,
Jul 18, 2014, 11:00:59 AM7/18/14
to Oliver Plohmann, golan...@googlegroups.com
If you know the set of all the struct types your code will work with,
just have each structure implement a method (this might be a method of
an interface if you so wish) which walks the struct and updates a hash
value by hasing primitive types it finds (and further walking through
structural types it finds).

If your code is to work with arbitrary structs then you have to use
reflection (through the `reflect` package) to implement "walking" code.
Look at packages like `encoding/binary` and such to see how this works.

On the other hand, you might hack something quickly using existing code
which supports marashaling of arbitrary structs to binary data -- for
instance using `encoding/gob` with any hasher supporting stream
(io.Reader) should be just fine: you shovel your data of unknown type
to an instance of gob.Encoder, which writes its stream to an io.Writer
which is a hasher.

Kevin Malachowski

unread,
Jul 18, 2014, 2:14:58 PM7/18/14
to golan...@googlegroups.com
If you didn't want to manually implement serialization methods to append values to a hash you could probably use unsafe to get the raw bytes of the struct.

Note, though, that would only give you a "shallow" hash as pointers and references would be interpreted as their literal bits rather than the data they point to.

Dave Cheney

unread,
Jul 19, 2014, 11:52:34 PM7/19/14
to golan...@googlegroups.com
This would be a very bad idea if your structure contains any padding. Padded bytes are not required to be zero, so hashing several conceptually equal values may result in different hashes.

Oliver Plohmann

unread,
Jul 20, 2014, 6:40:27 AM7/20/14
to golan...@googlegroups.com
Thanks for all the suggestions. I'm thinking of defining an Hashed interface like this:

type Hashed interface {
    hash() *int
}

The user is required for keys to implement the Hashed interface where hash() may return nil.
In that case the segmented map would create some hash on its own using some arbitrary approach
where the hash generator is pluggable so the user can plug in his own in case it doesn't fit
his purpose for some reason.

I'm not sure yet what approach to take for the hash generator. I think I will extract the values
of inst vars that are of a basic type using meta-programming and create a hash out of them. This may
not give the best possible performance, but in Java it is also often done that way and nobody cares about
performance. The hash to be meaningful seems to be more important, otherwise you loose the support
by the users.

Cheers, Oliver

Dave Cheney

unread,
Jul 20, 2014, 6:43:12 AM7/20/14
to Oliver Plohmann, golan...@googlegroups.com
If you're using a signed int, return -1 for "no hash", then you don't need to work about the indirection. 


--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/TkAqPxbdGEY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Oliver Plohmann

unread,
Jul 20, 2014, 7:09:11 AM7/20/14
to golan...@googlegroups.com, saxo...@gmail.com
Yes, I though about that, too. What made me move away from it is that for example in Java a negative hash is legal. I think this is because overflow may happen when creating the hash. For example an empty HashMap in Java has a hash which is 0. The more elements are added to it the larger the hash may get resulting somewhen in overflow. Using a hash of 0 to denote that no hash was created seemed obvious. But, again, in Java an empty HashMap has a hash of 0. This does not really make sense to me, though. An empty list then would also have hash 0, but a list is a completely different thing than a map, whether empty or not. AFAIKS, the type name should also be encoded in the hash to cover for this and hence would never be 0. In that way using 0 to denote "no hash" would make sense. Hm ...

Oliver Plohmann

unread,
Jul 20, 2014, 9:58:38 AM7/20/14
to golan...@googlegroups.com, saxo...@gmail.com
Oh, I see what you mean. If the hash is that large that it runs over the positive max value into the negative max value it has to be in addition insanely large to reach -1 coming from negative max. So it is safe to use -1 as a special value denoting "no hash".


Am Sonntag, 20. Juli 2014 12:43:12 UTC+2 schrieb Dave Cheney:

Nick Craig-Wood

unread,
Jul 21, 2014, 5:48:04 AM7/21/14
to Oliver Plohmann, golan...@googlegroups.com
On 20/07/14 11:40, Oliver Plohmann wrote:
> Thanks for all the suggestions. I'm thinking of defining an Hashed
> interface like this:
>
> type Hashed interface{
>
> hash() *int
>
> }

This would be more idiomatic in my opinion

type Hasher interface {
Hash() (int, error)
}

Return an error, don't mess with nil pointers which will come to bite
you later!


--
Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick

David DENG

unread,
Jul 21, 2014, 10:24:38 AM7/21/14
to golan...@googlegroups.com
When should the `Hash()` return `nil`?

Oliver Plohmann

unread,
Jul 21, 2014, 10:30:08 AM7/21/14
to golan...@googlegroups.com
In case the user is to lazy to calculate the hash himself for some struct. In that case the application will calculate it but not necessarily be ultimately fast in doing so.

David DENG

unread,
Jul 21, 2014, 1:17:50 PM7/21/14
to golan...@googlegroups.com
Why don't you let the user use a predefined function in this case.
Reply all
Reply to author
Forward
0 new messages