restrictions on map keys

5,627 views
Skip to first unread message

Jonathan Amsterdam

unread,
Nov 20, 2009, 11:35:34 AM11/20/09
to golang-nuts
I'm surprised that no one has mentioned the restriction on map keys.
Allow me...

The restriction on map keys is regrettable. Granted, 90% of the time
the key type is string or numeric or maybe pointer. But the other 10%
of the time, it seems I have only two choices:

a. Transform my key type into one of the valid ones (e.g. format it as
a string). This is expensive and cumbersome.

b. Write my own hashtable package. This is unfortunate, since it's
unlikely to be as good as the built-in one.

Here are four proposals, in increasing order of conservativeness:

1. Tuples. See http://groups.google.com/group/golang-nuts/browse_thread/thread/a3c49b10ce462d1c.

2. Let the key type of map be

type MapKey interface {
Hash() int;
Equals() bool;
}

Have all currently valid key types support this interface natively.

3. Allow equality for some structs. Perhaps a struct with all public
fields. Or a keyword could be used, like Ada's "limited" or its
inverse, to indicate that the programmer wants or does not want the
naive definition of equality for this type.

4. This is a library change, not a language change. Provide a package
that looks something like the following, and hook it into the built-in
implementation:

package hash

type Map ...

type Key interface {
Hash() int;
Equals() bool;
}

func New() Map ...

func (m *Map) Put(k Key, val interface{}) ...

func (m *Map) Get(k Key) interface{} ...

func (m *Map) Find(k Key) (val interface{}, found bool) ...

func (m *Map) Remove(k Key) ...

// etc.

///// Expose the built-in hash functions so users can compose their
own.

func int hashString(s string) ...

// etc.

Ben Bullock

unread,
Nov 20, 2009, 12:26:25 PM11/20/09
to golang-nuts
On Nov 21, 1:35 am, Jonathan Amsterdam <jbamster...@gmail.com> wrote:

> Granted, 90% of the time
> the key type is string or numeric or maybe pointer. But the other 10%
> of the time, it seems I have only two choices:

I am struggling to understand. Could you provide concrete examples of
the 10% of cases where you need to have an associative array going
from a non-boolean, numeric, string, pointer, function, interface,
map, or channel type A to type B? The only other possible type A's are
structs, arrays, and slices. If type A is a struct, you simply add
another field to struct A pointing at B, so that would never be a
problem. That leaves us with arrays and slices only. Do you have a
case where you want to use an array or slice as a map key? What is it?

> a. Transform my key type into one of the valid ones (e.g. format it as
> a string). This is expensive and cumbersome.

Could you please provide some kind of concrete example of what you
want to do which Go doesn't let you, preferably in code if possible?

> b. Write my own hashtable package. This is unfortunate, since it's
> unlikely to be as good as the built-in one.

But, since you are saying that there is no built-in hash function to
go from type A to type B, how can your hashtable package not be as
good as something which doesn't exist?

> ///// Expose the built-in hash functions so users can compose their
> own.
>
> func int hashString(s string) ...

Here I am getting lost; you don't want to create a string for Go's
current "map" but you want access to Go's string hash function in
order that you can create a hash key from some kind of string, which
you said that either you didn't have, or was too cumbersome or
expensive to get, in order to create your own Map library.


Jonathan Amsterdam

unread,
Nov 20, 2009, 1:10:54 PM11/20/09
to golang-nuts
> Could you provide concrete examples...

I'll do better than that; I'll provide an actual example from my job,
which is in mortgage finance (yes, I know, shoot me now).

Mortgage interest rates vary across two dimensions: the term of the
mortgage (15 years, 30 years, etc.) and the entity issuing the
mortgage (one of those quasi-governmental things like Freddie Mac that
we all so love to hate now, or the so-called "Jumbo" rate for loans
that don't meet those agencies' criteria). At startup I load vectors
of historical interest rates, and I retrieve them based on those two
criteria. In the actual C++ code, the key type of the map is
pair<Term, Agency>. I don't see how I'd represent this is with a Go
map, short of creating a string or int represented a combined Term and
Agency.

> > b. Write my own hashtable package. This is unfortunate, since it's
> > unlikely to be as good as the built-in one.
>
> But, since you are saying that there is no built-in hash function to
> go from type A to type B, how can your hashtable package not be as
> good as something which doesn't exist?

Very clever! Yes, well, what I mean is that under the hood, I'm
guessing that there is a pretty general hashtable implementation that
could, with a little tweaking, work on any type that could be hashed
and compared for equality.

> Here I am getting lost; you don't want to create a string for Go's
> current "map" but you want access to Go's string hash function in
> order that you can create a hash key from some kind of string, which
> you said that either you didn't have, or was too cumbersome or
> expensive to get, in order to create your own Map library.

On the assumption that the Go implementation has a good hash function
on strings, I want to use it as part of my hash function on my own
type:

type MyKey struct { s string, i int }

function (k *MyKey) Hash() int { return hash.hashString(k.s) +
k.i }

Russ Cox

unread,
Nov 20, 2009, 1:27:26 PM11/20/09
to Jonathan Amsterdam, golang-nuts
This issue--not being able to use something like
struct { x string; y int } as a map key--is certainly
a pain point that we feel too. We're not sure how
to resolve certain details of the implementation,
but we'd certainly like to lift at least some of the
restriction. It just complicates things, which means
we want to take the time to do it right. (Same reason
for exceptions, generics, etc., though I think this
is simpler than those.)

Feel free to file a bug in the bug tracker if you want
to get notified when this changes.

Russ

Jonathan Amsterdam

unread,
Nov 20, 2009, 1:56:32 PM11/20/09
to golang-nuts
> Feel free to file a bug in the bug tracker if you want
> to get notified when this changes.

Done, it is #283.

Ben Bullock

unread,
Nov 20, 2009, 10:36:00 PM11/20/09
to golang-nuts
On Nov 21, 3:10 am, Jonathan Amsterdam <jbamster...@gmail.com> wrote:
> > Could you provide concrete examples...
>
> I'll do better than that; I'll provide an actual example from my job,
> which is in mortgage finance (yes, I know, shoot me now).
>
> Mortgage interest rates vary across two dimensions: the term of the
> mortgage (15 years, 30 years, etc.) and the entity issuing the
> mortgage (one of those quasi-governmental things like Freddie Mac that
> we all so love to hate now, or the so-called "Jumbo" rate for loans
> that don't meet those agencies' criteria). At startup I load vectors
> of historical interest rates, and I retrieve them based on those two
> criteria. In the actual C++ code, the key type of the map is
> pair<Term, Agency>.

But the C++ map is based on a sorted binary tree not a hash table, so
it's not a very good analogy.

>  I don't see how I'd represent this is with a Go
> map, short of creating a string or int represented a combined Term and
> Agency.

I don't see how you'd do that either, but if I had that problem I
might try something like the following:

package main
import "os";

type year_to_rate map[int] float64;
type agency_to_year map[string] year_to_rate;

type agency_year_rate struct {
agency string;
year int;
rate float64;
};

var stuff = [...] agency_year_rate {
agency_year_rate{"Bailey", 1947, 0.01},
agency_year_rate{"Potter", 1947, 10.0},
agency_year_rate{"Bailey", 1946, 0.02},
agency_year_rate{"Potter", 1946, 20.0},
};

func fill_in_data () (*agency_to_year) {
aty := make (agency_to_year);
for _, item := range stuff {
if _, ok := aty[item.agency]; !ok {
aty[item.agency] = make (year_to_rate);
}
var ytra year_to_rate = aty[item.agency];
ytra[item.year] = item.rate;
}
return & aty;
}

func (aty *agency_to_year) look_up_rate (agency string, year int)
float64 {
ytr, ok := (*aty)[agency];
if (! ok) {
println (agency, " not found");
os.Exit (1);
}
rate, ok := ytr[year];
if (! ok) {
println (year, " not found");
os.Exit (1);
}
return rate;
}

func main () {
aty := fill_in_data ();
println (aty.look_up_rate ("Bailey", 1947));
}


Maybe a Go-Ru can do better.

> > > b. Write my own hashtable package. This is unfortunate, since it's
> > > unlikely to be as good as the built-in one.
>
> > But, since you are saying that there is no built-in hash function to
> > go from type A to type B, how can your hashtable package not be as
> > good as something which doesn't exist?
>
> Very clever! Yes, well, what I mean is that under the hood, I'm
> guessing that there is a pretty general hashtable implementation that
> could, with a little tweaking, work on any type that could be hashed
> and compared for equality.

The string thing certainly seems to be a hashtable since the strings
come out in a random order.

> On the assumption that the Go implementation has a good hash function
> on strings, I want to use it as part of my hash function on my own
> type:
>
>     type MyKey struct { s string, i int }
>
>     function (k *MyKey) Hash() int { return hash.hashString(k.s) +
> k.i }

Re-hashing using a hash output doesn't look like a good way to me. But
I'm probably wrong.


Jonathan Amsterdam

unread,
Nov 20, 2009, 10:57:03 PM11/20/09
to golang-nuts
The map-of-maps idea is not bad -- I should have listed it as
alternative (c) in my original list. It's not as expensive as
converting my key into a string, but it is more expensive than if I
could use my key directly, and, as your code demonstrates, it's a bit
cumbersome.

Ben Bullock

unread,
Nov 21, 2009, 1:18:32 AM11/21/09
to golang-nuts
Most of the example program is just filling in the data though. It's
only a few lines to implement the actual search, plus the clunky "die
if not found" code there would be replaced with some kind of error
handling, and once the search function exists it's just a method call,
which is about as simple as anything built-in would be:

rate:=aty.look_up_rate ("Bailey", 1947)

as opposed to

rate:=aty["Bailey"][1947]

I agree with you, though, if multi-key maps could be implemented as a
part of the language, it would make life easier:

map[string][int] float64

or

map[string,int]float64

perhaps?

Graham

unread,
Dec 1, 2009, 4:34:15 AM12/1/09
to golang-nuts
> Here are four proposals, in increasing order of conservativeness:

> 1. Tuples. Seehttp://groups.google.com/group/golang-nuts/browse_thread/thread/a3c49....

I vote Tuples. They are hashable, equality-comparable, and
immutable. Everything you need for a map key.

Dictionary keys are the primary use for tuples in Python: I've written
code that maps pairs of records to similarity vectors, example mymap
[(("Joe","Bloggs",23),("James","Doe",33))] = (0.5,0.0,0.3)

The reason for represinting the data as a map of keys to values
instead of a list of {key,value} structs as one might use in Go, is
that the keys need to be unique. The code checks whether a (A,B) is
in the map and if (A,B) is not in the map, it calculates the value V
then sets mymap[(A,B)] = V. The calculation of V is fully cached, and
(A,B) pairs are guaranteed to be unique.

I might add that once you have tuples (=immutable lists), you will
also find immutable sets to be useful hash keys.

> 2. Let the key type of map be
>
>     type MapKey interface {
>         Hash() int;
>         Equals() bool;
>    }
>
>    Have all currently valid key types support this interface natively.

Sorry, but mere Hash and Equality interface is a problem if the Hash()
can change due to external modification of a type implementing
MapKey.

Types stored in a map need to be immutable, or they corrupt the
hashtable. Strings already are immutable in Go, and ints are by-value
so are de-facto cannot be changed once copied into the hash key. Not
so for a large struct which should be returned by-reference.

> 3. Allow equality for some structs. Perhaps a struct with all public
> fields. Or a keyword could be used, like Ada's "limited" or its
> inverse, to indicate that the programmer wants or does not want the
> naive definition of equality for this type.

An efficient map requires a hash (and immutability), so equality is
not sufficient.

Graham

unread,
Dec 1, 2009, 5:04:55 AM12/1/09
to golang-nuts
> Dictionary keys are the primary use for tuples in Python

Ok... tuples are also how Python does multiple return values, but Go
does it differently.

I've written
> code that maps pairs of records to similarity vectors, example mymap
> [(("Joe","Bloggs",23),("James","Doe",33))] = (0.5,0.0,0.3)

The tuples are like immutable heterogenous lists, and are hashable/
immutable and equality comparable only if the components are.

However, map keys in Go will need to be typed: in Python you can use a
(str,str) and an (int,float) as keys in the same dict. Not so in Go.

To keep tuples anonymous while retaining strong typing on dictionary
keys, Go would need a type declaration for accepting tuples, like a map
[(string,string,int)]
(float,float,float) so you can then do m[("A","B",23)]=(0.1,0.2,0.3)

Go could even define an interface syntax for tuples - for example
interface T = (int,int,float) and then the anonymous (1,2,5.3) will
automatically satisfy T, and you can declare a map[T] float and use
the map like m[(1,2,5.3)] = 8.2

Qtvali

unread,
Dec 16, 2009, 5:29:02 AM12/16/09
to golang-nuts
On Nov 20, 8:27 pm, Russ Cox <r...@golang.org> wrote:
> This issue--not being able to use something like
> struct { x string; y int } as amapkey--is certainly
> a pain point that we feel too.  We're not sure how
> to resolve certain details of the implementation,
> but we'd certainly like to lift at least some of the
> restriction.  It just complicates things, which means
> we want to take the time to do it right.  (Same reason
> for exceptions, generics, etc., though I think this
> is simpler than those.)

I don't know, possibly I just miss some point here, but in Java,
HashMap behaves like that:
* Giving object to it, it by default uses memory address of that
object as hash code, so you can check if *the same* object is stored
* If object has "hashCode" and "equals" defined, it uses this hashCode
as integer to store an item to map. It is allowed to give same
hashCode to several equal objects. When you request an object, it will
try "equals" method of each object stored to map with same code.
* When you ask an object from map by key, it asks hashCode from this
object. Object itself is responsible to calculate it fast.

HashCode could return just ID field for SQL row (as this uniquely
identifies it), calculate string hash or do some other calculations.

I am personally mostly using two cases:
* One can reverse integer from big endian and back in Java quite
easily. If I have two integers x and y, which probably take about last
half or little more of uint64, I will do x + reversed(y) to get
probably unique hash of a position.
* Having several integers, I will shift them do different positions,
especially if they are probably or certantly small.

Java does not care at all if those objects are mutable or not - it
simply asks the hashCode when needed, stores it also to it's table and
supposes that you are not going to change an object. There could be
some "hashCheck" method for a map to check that all objects return the
same hash, which is stored - for debugging - but mostly one can cast
an object to some read-only interface and if someone other is going to
cast back, it's her problem; also one can manually copy an object when
setting it to be map key. Those are pretty trivial things a programmer
should be able to handle or there is no way to keep bugs out anyway :)
I would also like to see debugger class, which allows to get an
average of object count associated with the same hashCode - but I
mostly trust few simple tricks to get an unique code.
Reply all
Reply to author
Forward
0 new messages