Valid hash keys?

660 views
Skip to first unread message

Autrijus Tang

unread,
Feb 27, 2005, 3:50:50 AM2/27/05
to perl6-l...@perl.org
Just a quick question: Is Hash keys still Strings, or can they
be arbitary values? If the latter, can Int 2, Num 2.0 and Str "2"
point to different values?

Thanks,
/Autrijus/

Luke Palmer

unread,
Feb 27, 2005, 4:20:59 AM2/27/05
to Autrijus Tang, perl6-l...@perl.org
Luke Palmer writes:

> Autrijus Tang writes:
> > Just a quick question: Is Hash keys still Strings, or can they be
> > arbitary values?
>
> They can be declared to be arbitrary:
>
> my %hash is shape(Any);

>
>
> > If the latter, can Int 2, Num 2.0 and Str "2" point to different
> > values?
>
> That's an interesting question. Some people would want them to, and
> some people would definitely not want them to. I think the general
> consensus is that people would not want them to be different, since in
> the rest of perl, 2 and "2" are the same.

I forgot an important concretity. Hashes should compare based on the
generic "equal" operator, which knows how to compare apples and apples,
and oranges and oranges, and occasionally a red orange to an apple.

That is:

3 equal 3 ==> true
3 equal { codeblock } ==> false
3 equal "3" ==> true

Luke

Luke Palmer

unread,
Feb 27, 2005, 4:18:06 AM2/27/05
to Autrijus Tang, perl6-l...@perl.org
Autrijus Tang writes:
> Just a quick question: Is Hash keys still Strings, or can they be
> arbitary values?

They can be declared to be arbitrary:

my %hash is shape(Any);


> If the latter, can Int 2, Num 2.0 and Str "2" point to different
> values?

That's an interesting question. Some people would want them to, and


some people would definitely not want them to. I think the general
consensus is that people would not want them to be different, since in
the rest of perl, 2 and "2" are the same.

The object model that I'm working on actually identifies 2 and "2" as
the same object, indistinguishable in every respect. But that hasn't
been accepted (or even proposed)... yet.

Luke

Nigel Sandever

unread,
Feb 27, 2005, 7:19:20 AM2/27/05
to perl6-l...@perl.org

I would have assumed a hash who shape was defined as C<Int> to perform the
hashing function directly on the (nominally 32-bit) binary representation Int 2.

Likewise, c<my % hash is shape(Double)>, would perform the hashing on the binary
rep of the (nom.64-bit) Double value.

And C<my %hash is shape(Ref)>, on the address of the key passed?

By extension, a C<%hash is shape( Any )> would hash the binary representation of
whatever (type of) key it was given, which would make keys of 2, 2.0, '2',
'2.0', (Int2)2 etc. all map to different keys.

If C<%hash is shape(Any)> maps all the above representation of 2 to the same
value, then C<my %hash is shape(Any)> becomes a synonym for

C<%hash is shape(Scalar)>.

(is that the same as C<my %hash of Scalar>?).

If my auumption is correct, then that would imply that each type hash or
inherits a .binary or .raw method to allow access to the internal
representation?

> Luke


Alex Burr

unread,
Feb 27, 2005, 2:54:47 PM2/27/05
to perl6-l...@perl.org

On Sun, Feb 27, 2005 at 02:20:59AM -0700, Luke Palmer wrote:
> I forgot an important concretity. Hashes should compare based on the
> generic "equal" operator, which knows how to compare apples and apples,
> and oranges and oranges, and occasionally a red orange to an apple.

Um. Hashes don't really compare, though, do they? Maybe you
just mean a notional equals operator, which isn't really used; but
it seems to me that what hashes acutally implement is more of a
'canonicalize' operator. Actually, it would be useful sometimes
to be able to give a hash an explicit canonicalizer:

my %msdos_files is canonicalized_by lc;

my %fractions is canonicalized_by gcd;

Alex

Luke Palmer

unread,
Feb 27, 2005, 5:36:42 PM2/27/05
to Nigel Sandever, perl6-l...@perl.org
Nigel Sandever writes:
> On Sun, 27 Feb 2005 02:20:59 -0700, lu...@luqui.org (Luke Palmer) wrote:
> > I forgot an important concretity. Hashes should compare based on the
> > generic "equal" operator, which knows how to compare apples and apples,
> > and oranges and oranges, and occasionally a red orange to an apple.
> >
> > That is:
> >
> > 3 equal 3 ==> true
> > 3 equal { codeblock } ==> false
> > 3 equal "3" ==> true
> >
>
> I would have assumed a hash who shape was defined as C<Int> to perform
> the hashing function directly on the (nominally 32-bit) binary
> representation Int 2.

I wasn't even thinking about implementation. Sometimes it's good to let
implementation drive language, but I don't think it's appropriate here.

When we're talking about hashes of everything, there are a couple of
routes we can take. We can go Java's way and define a .hash method on
every object. We can go C++'s way and not hash at all, but instead
define a transitive ordering on every object. We can go Perl's way and
find a string representation of the object and map the problem back to
string hashing, which we can do well.

But the biggest problem is that if the user overloads 'equal' on two
objects, the hash should consider them equal. We could require that to
overload 'equal', you also have to overload .hash so that you've given
some thought to the thing. The worry I have is that people will do:

method hash() { 0 }

But I suppose that's okay. That just punts the work off to 'equal',
albeit in linear time.

That may be the right way to go. Use a Javaesque .hash method with a
sane default (an introspecting default, perhaps), and use a sane
equivalent default for 'equal'.

As far as getting 2, 2.0, and "2" to hash to the same object, well, we
know they're 'equal', so we just need to know how to hash them the same
way. In fact, I don't believe 2.0 can be represented as a Num. At
least in Perl 5, it translates itself back to an int. So we can just
stringify and hash for the scalar types.

Luke

Thomas Sandlaß

unread,
Feb 27, 2005, 5:57:30 PM2/27/05
to perl6-l...@perl.org
Alex Burr wrote:

> [..] Actually, it would be useful sometimes


> to be able to give a hash an explicit canonicalizer:
>
> my %msdos_files is canonicalized_by lc;
>
> my %fractions is canonicalized_by gcd;

Shouldn't that be handled by container subclasses of Hash?
Like PersitentScalar or SparseArray?

Regards,
--
TSa (Thomas Sandlaß)

Nigel Sandever

unread,
Feb 27, 2005, 5:57:22 PM2/27/05
to perl6-l...@perl.org

My thought is that if c<my %hash is shape(Any)> uses the stringyfied values of
the keys, then it is no different to C<my %hash>,

I think it would be useful for shape(Any) be different to an ordinary hash, and
hashing the binary representation of the key, so that

(Int)2, (Num)2, (String)2, (uint)2 (uint4)2 etc.

would be a useful way of collating things according to their "type" rather than
their value....?
>
> Luke

njs.

Luke Palmer

unread,
Feb 27, 2005, 6:55:20 PM2/27/05
to Nigel Sandever, perl6-l...@perl.org
Nigel Sandever writes:
> On Sun, 27 Feb 2005 15:36:42 -0700, lu...@luqui.org (Luke Palmer) wrote:
> > As far as getting 2, 2.0, and "2" to hash to the same object, well, we
> > know they're 'equal', so we just need to know how to hash them the same
> > way. In fact, I don't believe 2.0 can be represented as a Num. At
> > least in Perl 5, it translates itself back to an int. So we can just
> > stringify and hash for the scalar types.
>
> My thought is that if c<my %hash is shape(Any)> uses the stringyfied values of
> the keys, then it is no different to C<my %hash>,

Indeed, but I meant just for our non-reference scalar types, such as
Num, Int, and Str.

> I think it would be useful for shape(Any) be different to an ordinary
> hash, and hashing the binary representation of the key, so that
>
> (Int)2, (Num)2, (String)2, (uint)2 (uint4)2 etc.
>
> would be a useful way of collating things according to their "type"
> rather than their value....?

That may indeed be a useful kind of map to have, but it's hardly what
people expect when they declare a hash keyed by Any. The reason I want
to identify 2 and "2" is because they are identical everywhere else in
the language.

Also, if you hash on the binary representation, what's to say that uint
hashes differently from uint4. And even if we do guarantee that they
hash differently, there will be collisions. And then uint(2) equal
uint4(2) (for any reasonable implementation of equal), and they are the
same, but only for collision values.

A better way to make a type-collated hash would be to:

class TypeHash is Hash[shape => [Class; Any]] {
method postcircumfix:<{ }> (Any $arg) {
.SUPER::{$arg.type; $arg}
}
}

And if you really think that it's going to be that common, then you can
write a module that implements that. It would be about five lines long.

Luke

Rod Adams

unread,
Feb 27, 2005, 7:42:17 PM2/27/05
to Luke Palmer, perl6-l...@perl.org
Luke Palmer wrote:

>The object model that I'm working on actually identifies 2 and "2" as
>the same object, indistinguishable in every respect.
>

Okay, that's fine, since C< 2 eq "2" > and C< 2 == "2" >. But what about
2.0 and "2.0"?

In Perl5, C< 2.0 == "2.0" >, but C< 2.0 ne "2.0" >.

-- Rod Adams


Alex Burr

unread,
Feb 27, 2005, 7:21:06 PM2/27/05
to perl6-l...@perl.org

Possibly. Clearly that's what one would do in any other language.
What I was thinking was that if hashes are going to have a
canonicalizer function *anyway*, maybe the default implementation
could be overridable with a trait (or role?). But I can't
actually claim to have followed perl6 development enough to
be able to argue that it really makes sense.

Alex

Nigel Sandever

unread,
Feb 27, 2005, 7:41:43 PM2/27/05
to perl6-l...@perl.org
On Sun, 27 Feb 2005 15:36:42 -0700, lu...@luqui.org (Luke Palmer) wrote:
> Nigel Sandever writes:
>
> When we're talking about hashes of everything, there are a couple of
> routes we can take. We can go Java's way and define a .hash method on
> every object. We can go C++'s way and not hash at all, but instead
> define a transitive ordering on every object.

The more I think about this, please, no. The reason hashes are called hashes is
because they hash.

If we need bags, sets, or orderThingies with overloadable transitive ordering,
they can be written as classes--that possibly overload the hash syntax

obj<<key>> = value

or whatever, but don't tack all that overhead on to the basic primitive.


> We can go Perl's way and
> find a string representation of the object and map the problem back to
> string hashing, which we can do well.

The only question is why does the thing that gets hashed have to be stringyfied
first?

In p5, I often use hash{ pack'V', $key } = $value. # or 'd'

1) Because for large hashes using numeric keys it use up less space for the
keys. 4-bytes rather than 10 for 2**32.

2) By using all 256 values of each byte, it tends to spread the keys more even
across fewer buckets;

use Devel::Size qw[total_size size];
undef $h{ pack 'V', $_ } for map{ $_ * 10000 } 0 .. 999999;

print total_size \%h;
18418136

print scalar %h;
292754/524288

versus

use Devel::Size qw[total_size size];
undef $h{ $_ } for map{ $_ * 10000 } 0 .. 999999;

print total_size \%h;
48083250

print scalar %h;
644301/1048576

It would also avoid the need for hacks like Tie:RefHash, by hashing the address
of the ref rather than the stringyfied ref and forcing the key to be stored
twice and the creation of zillions of anonymous arrays to hold the unstrigyfied
ref+value.

The same could be extended to hashing the composite binary representations of
whole structures and objects.

njs.


Alex Burr

unread,
Mar 1, 2005, 6:51:44 PM3/1/05
to Luke Palmer, perl6-l...@perl.org
On Sun, Feb 27, 2005 at 03:36:42PM -0700, Luke Palmer wrote:
> But the biggest problem is that if the user overloads 'equal' on two
> objects, the hash should consider them equal. We could require that to
> overload 'equal', you also have to overload .hash so that you've given
> some thought to the thing. The worry I have is that people will do:
>
> method hash() { 0 }

This is why I think a 'canonicalize' method is better than a
.hash' method: if canonicalize is defined correctly, then .equal
is just {$a.canon() is_bitwise_identical_to $b.canon()}, so
people don't have to supply both. Quite often when you would ordinarily
define a .equal method, it is of that form anyhow. And then .hash
can be straightforwardly defined as {$_.canon().binaryhash()}, where
binaryhash() just means hash all the memory of this object.

Stringify would be an instance of a canonicalization function; and
perhaps the default.


Alex

Luke Palmer

unread,
Mar 1, 2005, 7:23:27 PM3/1/05
to Alex Burr, perl6-l...@perl.org
Alex Burr writes:
> On Sun, Feb 27, 2005 at 03:36:42PM -0700, Luke Palmer wrote:
> > But the biggest problem is that if the user overloads 'equal' on two
> > objects, the hash should consider them equal. We could require that to
> > overload 'equal', you also have to overload .hash so that you've given
> > some thought to the thing. The worry I have is that people will do:
> >
> > method hash() { 0 }
>
> This is why I think a 'canonicalize' method is better than a .hash'
> method: if canonicalize is defined correctly, then .equal is just
> {$a.canon() is_bitwise_identical_to $b.canon()}, so people don't have
> to supply both.

Sure. And you know what's better than all of that? Transitive ordering
on all objects, so that .equal is just:

!($a < $b && $b < $a)

But the reason we're not going with that is because it's very hard (both
from a user perspective and computationally) to do that in the general
case. And the same is true for canonicalization. Sure, when you have
certain kinds of objects in mind, like strings, or trees, it's very easy
to canonicalize. But when you have other objects in mind, like
filehandles, graphs, or references to remote objects, it's hard or
impossible to canonicalize.

And in fact, there are some objects for which it's hard or impossible to
define an .equal method. And those objects can't be hashed, plain and
simple. If you need to hash them, you can define a type of hash that
uses =:= as comparison, so that things are only equal if they are
referentially equal.

But I expect that we have can have a standard role that defines .hash
and .equal in terms of .canonicalize, if it exists. We can also have a
magical role that defines .canonicalize when it can. If .hash isn't
defined when an object is used as a key of a hash (but .equal is), it
will have to bite the speed bullet and use 0, probably along with a
warning.

Luke

Reply all
Reply to author
Forward
0 new messages