Any way of simulating a map[[]byte] without serializing to string?

6,751 views
Skip to first unread message

ptolomy23

unread,
Jul 12, 2010, 7:32:19 PM7/12/10
to golang-nuts
I think this has come up before, but I don't recall the resolution.
If I want to have a map keyed on a slice of bytes, is there any other way aside from
serializing to a string or writing my own hash table?

I have some code that is dealing with byte slices, and converting to/from string is seriously hampering the performance.
I considered using a map[int]int and hashing myself, but that doesn't work out too well when there are collisions.

(I can write my own hash table for this specific purpose, but it is awkward and only worth using when the key isn't a string/int/etc.)

Brad Fitzpatrick

unread,
Jul 12, 2010, 7:52:29 PM7/12/10
to ptolomy23, golang-nuts

Serge Hulne

unread,
Jul 13, 2010, 2:07:33 AM7/13/10
to golang-nuts
One efficient way to make map with an arbitrary object as a key is to:

1) Define a hash function on said type of object.
2) Store said objets in an Array (possibly resizing it when
necessary).
3) Instead of storing said object *direcly* as a key to a map, use
indirection: i.e. : Store the hash value as a key and the *address* of
the corresponding object as a value.

I posted an example on this forum a few months ago, it can be
retrieved with the keywords "map" and "bigram".
As to the hash table, I defined my own, but I do not think it is
necessary : there seems to be some predefined ones in the standard
library.

Serge.

Serge Hulne

unread,
Jul 13, 2010, 8:10:53 PM7/13/10
to golang-nuts
I do, indeed, not see a reason not to extend the type of objects
allowed as maps keys to objects simply satisfying an interface (say:
"mappable") defined simply as having methods like : Hash() and
Equal(), in particlular, since the interface paradigm is the closest
construct the Go language offers to a "Class".

Extending map keys to objects satisfying a "mappable" interface would
be handy (and would not break the existing maps concept, since it
would be valid with strings and integers).

Serge.



> Btw, this ishttp://code.google.com/p/go/issues/detail?id=283

David Roundy

unread,
Jul 13, 2010, 8:52:40 PM7/13/10
to Serge Hulne, golang-nuts
How do you handle the case where the key changes? If you can assume
there are no hash collisions, this isn't an issue, since you can just
use the value when a given element was added, but if you aren't
willing to make that assumption, then you need to add an extra copy
when adding something to the hash, which doesn't seem very nice. For
small types, it would be fine, but for larger types, it seems like
this could be a nasty performance pitfall...

David

Serge Hulne

unread,
Jul 14, 2010, 3:26:11 AM7/14/10
to golang-nuts
A "mappable" type would have a Hash() method.

Therefore, a given oject, say : k satisfying said "mappable"
interface, would be represented internally (in the map) as k.Hash(),
in other words as an interger.

Since the existing map container works for integers, it would work
for k.Hash(), hence for k.

In the case of int, resp. strings, Hash() would just be the indentity
function.



On the other hand, as far as performance is concerned, I think that in
most cases there would be no difference between:

1. Providing a generalized map working on the "mappable" interface.

2. Storing arbitrary objects via serialization.

(since providing a Hash() method or a serailization function would be
very similar in cases like structs, for instance).

For example : I think there would not be any performance gain, in
storing a struct which holds strings, between :

a. providing a function rediucing the struct to a string (and vice
versa)

b. providing a Hash() method acting on that type of struct

basically because solution b. would use solution a. (in this example)

Serge.



Serge.



On Jul 14, 2:52 am, David Roundy <roun...@physics.oregonstate.edu>
wrote:
> How do you handle the case where the key changes? If you can assume
> there are no hash collisions, this isn't an issue, since you can just
> use the value when a given element was added, but if you aren't
> willing to make that assumption, then you need to add an extra copy
> when adding something to the hash, which doesn't seem very nice.  For
> small types, it would be fine, but for larger types, it seems like
> this could be a nasty performance pitfall...
>
> David
>

chris dollin

unread,
Jul 14, 2010, 4:34:05 AM7/14/10
to Serge Hulne, golang-nuts
On 14 July 2010 08:26, Serge Hulne <serge...@gmail.com> wrote:

> On the other hand, as far as performance is concerned, I think that in
> most cases there would be no difference between:
>
> 1. Providing a generalized map working on the "mappable" interface.
>
> 2. Storing arbitrary objects via serialization.
>
> (since providing a Hash() method or a serailization function would be
> very similar in cases like structs, for instance).

I don't see why; you can accumulate the hash in a working variable,
whereas a serialisation is going to need arbitrarily much store.

Chris

--
Chris "allusive" Dollin

Serge Hulne

unread,
Jul 14, 2010, 6:50:23 AM7/14/10
to golang-nuts
Yes, in terms of storage space, seralization would be more costly.

However, in terms of sheer execution time, serialization would
probably be comparable with the alternative (providing a generalized
map, accepting as keys all objects satisfying a "mappable" interface
(as defined before in this thread)).

Serge.
Reply all
Reply to author
Forward
0 new messages