Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Ada.Containers Hash function for a set of small integers

240 views
Skip to first unread message

Michael R

unread,
Apr 22, 2010, 7:26:18 PM4/22/10
to
Hi Folks,

I'd like to use the generic Hashed_Maps container to store mappings
from a set of small integers (three) to Wide_Strings
(Indefinite_Hashed_Maps):

(1, 10, 4) => "ABC",
(10, 3, 5) => "XYZ",

Does anyone have recommendations on how best to implement the Hash
function for keys like this?

Take care,
Michael.

John B. Matthews

unread,
Apr 22, 2010, 9:39:46 PM4/22/10
to
In article
<50701baa-7c05-450c...@t14g2000prm.googlegroups.com>,
Michael R <mic...@zanyblue.com> wrote:

I'd try something simple like shift and add, but it might depend on the
definition of "small".

Whatever you decide, here's some code I used to examine the distribution
of collisions in a map keyed by "/usr/share/dict/words":

$ ./collisions
0: 218586 (55.59%)
1: 126250 (32.10%)
2: 38432 (9.77%)
3: 8362 (2.13%)
4: 1354 (0.34%)
5: 229 (0.06%)
6: 24 (0.01%)
7: 4 (0.00%)
8: 1 (0.00%)

<http://home.roadrunner.com/~jbmatthews/jumble.html#sec2>

--
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>

Simon Wright

unread,
Apr 23, 2010, 5:39:52 PM4/23/10
to
Michael R <mic...@zanyblue.com> writes:

I don't know about 'best', but in ColdFrame
(http://coldframe.sourceforge.net/coldframe/) I would generate this hash
for use with Booch Maps, where the key components of Instance are
{A:Integer, B:Integer, C:Integer}:

function Instance_Hash (I : Instance) return Natural is
type M is mod 2**31;
Result : M := 0;
begin
Result := Result xor M (I.A mod 2**31);
Result := Result * 10019;
Result := Result xor M (I.B mod 2**31);
Result := Result * 10019;
Result := Result xor M (I.C mod 2**31);
Result := Result * 10019;
return Natural (Result);
end Instance_Hash;

I believe the 10019 came from Knuth, but can't see a reference.

Michael R

unread,
Apr 23, 2010, 6:47:38 PM4/23/10
to
On Apr 23, 2:39 pm, Simon Wright <si...@pushface.org> wrote:

Hi,

Thank you for this suggestion. Just fyi, my GNAT 4.4.3 compiler
generated

xtest.adb:20:40: value not in range of type "Standard.Integer"
xtest.adb:20:40: static expression fails Constraint_Check

for "M (I.A mod 2**31)" expressions. Rephrasing as

Result := Result xor M'Mod (I.A);

compiled OK.

Take care,
Michael.

Jeffrey R. Carter

unread,
Apr 23, 2010, 7:08:17 PM4/23/10
to
Michael R <mic...@zanyblue.com> writes:

> I'd like to use the generic Hashed_Maps container to store mappings
> from a set of small integers (three) to Wide_Strings
> (Indefinite_Hashed_Maps):
>
> (1, 10, 4) => "ABC",
> (10, 3, 5) => "XYZ",
>
> Does anyone have recommendations on how best to implement the Hash
> function for keys like this?

What is the definition of "small"? If it has a defined range, for example:

subtype Small is Natural range Low .. High;

then you can define

Offset = 0 - Low
Max = High + Offset

and for X = # bits needed to represent Max, if 3 * X <=
Ada.Containers.Hash_Type'Size, then you can do a perfect hash:

type Triple is record
A : Small;
B : Small;
C : Small;
end record;

Hash (R : Triple) = Shift_Left (R.A + Offset, 2 * X) or
Shift_Left (R.B + Offset, X) or
(R.C + Offset)

--
Jeff Carter
"I was in love with a beautiful blonde once, dear.
She drove me to drink. That's the one thing I'm
indebted to her for."
Never Give a Sucker an Even Break
109

Simon Wright

unread,
Apr 24, 2010, 7:28:23 AM4/24/10
to
Michael R <mic...@zanyblue.com> writes:

> On Apr 23, 2:39 pm, Simon Wright <si...@pushface.org> wrote:

>> I don't know about 'best', but in ColdFrame
>> (http://coldframe.sourceforge.net/coldframe/) I would generate this hash
>> for use with Booch Maps, where the key components of Instance are
>> {A:Integer, B:Integer, C:Integer}:
>>
>>    function Instance_Hash (I : Instance) return Natural is
>>       type M is mod 2**31;
>>       Result : M := 0;
>>    begin
>>       Result := Result xor M (I.A mod 2**31);
>>       Result := Result * 10019;
>>       Result := Result xor M (I.B mod 2**31);
>>       Result := Result * 10019;
>>       Result := Result xor M (I.C mod 2**31);
>>       Result := Result * 10019;
>>       return Natural (Result);
>>    end Instance_Hash;
>>
>> I believe the 10019 came from Knuth, but can't see a reference.
>

> Thank you for this suggestion. Just fyi, my GNAT 4.4.3 compiler
> generated
>
> xtest.adb:20:40: value not in range of type "Standard.Integer"
> xtest.adb:20:40: static expression fails Constraint_Check
>
> for "M (I.A mod 2**31)" expressions. Rephrasing as
>
> Result := Result xor M'Mod (I.A);
>
> compiled OK.

I looked into this and found that for this specific case I would in fact
generate rather different code -- sorry about that, a bit like posting
untested Ada :-(

Actually it would be

function Instance_Hash (I : Instance) return Natural is
type M is mod 2**31;
Result : M := 0;
begin

Result := Result xor Integer'Pos (I.A);


Result := Result * 10019;

Result := Result xor Integer'Pos (I.B);


Result := Result * 10019;

Result := Result xor Integer'Pos (I.C);


return Natural (Result);
end Instance_Hash;

and, before anyone rushes to check it out, yes, it fails with negative
values and I'm about to file a bug against it.

Thanks for the response.

Warren

unread,
Apr 26, 2010, 11:33:03 AM4/26/10
to
Simon Wright expounded in news:m28w8dy...@pushface.org:
..

> for use with Booch Maps, where the key components of Instance are
> {A:Integer, B:Integer, C:Integer}:
>
> function Instance_Hash (I : Instance) return Natural is
> type M is mod 2**31;
> Result : M := 0;
> begin
> Result := Result xor M (I.A mod 2**31);
> Result := Result * 10019;
> Result := Result xor M (I.B mod 2**31);
> Result := Result * 10019;
> Result := Result xor M (I.C mod 2**31);
> Result := Result * 10019;
> return Natural (Result);
> end Instance_Hash;
>
> I believe the 10019 came from Knuth, but can't see a reference.

I think this is just a selected prime number:

http://www.mathematical.com/primes3000kto4000k.html

Probably a compromise between "wideness" and memory
used in the map itself (affects number of buckets perhaps).

Warren.

Jeffrey R. Carter

unread,
Apr 26, 2010, 2:14:41 PM4/26/10
to
Warren wrote:
> Simon Wright expounded in news:m28w8dy...@pushface.org:
>>
>> I believe the 10019 came from Knuth, but can't see a reference.
>
> I think this is just a selected prime number:

Except that 10_019 is not a prime number, according to:

http://www.mathematical.com/primes0to1000k.html

9949,9967,9973,10007,10009,10037,10039,10061,10067,10069,10079,10091,10093

--
Jeff Carter
"Ah, go away or I'll kill ya."


Never Give a Sucker an Even Break

100

Charmed Snark

unread,
Apr 26, 2010, 2:32:34 PM4/26/10
to
Jeffrey R. Carter expounded in news:hr4lnl$6m8$1...@tornado.tornevall.net:

> Warren wrote:
>> Simon Wright expounded in news:m28w8dy...@pushface.org:
>>>
>>> I believe the 10019 came from Knuth, but can't see a reference.
>>
>> I think this is just a selected prime number:
>
> Except that 10_019 is not a prime number, according to:
>
> http://www.mathematical.com/primes0to1000k.html
>
> 9949,9967,9973,10007,10009,10037,10039,10061,10067,10069,10079,10091,10
> 093

Oh I see - these numbers start at 3000017-- my haste, my bad.

Warren

Robert A Duff

unread,
Apr 26, 2010, 2:37:08 PM4/26/10
to
Simon Wright <si...@pushface.org> writes:

> type M is mod 2**31;
> Result : M := 0;
> begin
> Result := Result xor Integer'Pos (I.A);

> and, before anyone rushes to check it out, yes, it fails with negative


> values and I'm about to file a bug against it.

I don't think that's a compiler bug. Conversion of negative numbers
to modular raises Constraint_Error. I think that's a language
design flaw, but not everyone agrees with me. The 'Mod attribute
works around the problem.

- Bob

Simon Wright

unread,
Apr 26, 2010, 5:05:05 PM4/26/10
to

No, I meant a bug against my code generator!

On the other hand, I found some strange behaviour with GNAT GPL 2009
(and GCC 4.5.0) while investigating possible solutions:

------------------------------------------------------------------------
with Ada.Text_IO; use Ada.Text_IO;

procedure Odd_Warning is

procedure Inner (X : Integer) is


type M is mod 2**31;

Result : M;
begin

begin
Result := 0;
Result := Result xor (Integer'Pos (X) mod 2**30);
-- No compiler warning, but raises exception
exception
when others => Put_Line ("exception a");
end;

begin
Result := 0;
Result := Result xor (Integer'Pos (X) mod 2**31);
-- odd_warning.adb:20:53: warning: mod with zero divisor
-- odd_warning.adb:20:53: warning: "Constraint_Error" will be
-- raised at run time
exception
when others => Put_Line ("exception b");
end;

begin
Result := 0;
Result := Result xor M (Integer'Pos (X) mod 2**31);
-- No warning (and executes as I had expected)
Put_Line ("result:" & M'Image (Result));
exception
when others => Put_Line ("exception c");
end;

end Inner;

begin

Inner (-1);

end Odd_Warning;
------------------------------------------------------------------------

The second block's compiler warning is strange, especially the part
about the zero divisor. The Aonix ObjectAda 7.2 compiler gives a very
similar warning referring to 95LRM4.5.5(22), and gives an error at the
third block (95LRM4.9(35)).

I think my route will be via unchecked conversion, if I don't decide
that it's a feature - in 10 years of use no one has needed to use a
negative value as a component of a hash.

Adam Beneschan

unread,
Apr 26, 2010, 5:50:31 PM4/26/10
to
On Apr 26, 2:05 pm, Simon Wright <si...@pushface.org> wrote:

The warning looks correct to me. The way the expression is
constructed in the second block, all of the operators have operands of
type M (except for the right operand of "**"). This means that you're
using "**" with the definition

function "**" (Left : M; Right : Integer) return M;

And modular operations wrap around. The result of the above operator
with arguments Left=2, Right=31 is therefore 0, since the range of M
is 0 .. 2**31-1.

In the third block, the type conversion to M changes everything; the
argument of this type conversion can be anything that works (as
opposed to the second block, in which the right side must be an
expression of type M). Since the type of Integer'Pos(X) is a
universal integer, the end result is that the operators inside the
type conversion are interpreted as being the operations of the
"root_integer" type. Whether the error message is correct, incorrect,
or implementation-dependent, I'm not sure without further study, but I
suspect that it may be implementation-dependent and may depend on what
base ranges the implementation has chosen for its built-in integer
types.

In any case, unless you're trying to develop code that compiles with
an Ada95 compiler, I'd prefer to use 'Mod over Unchecked_Conversion.
The whole purpose of 'Mod was to eliminate the need for
Unchecked_Conversion in cases like this, since it was impossible to
write "normal" code that would do the computation correctly.

-- Adam

AdaMagica

unread,
Apr 27, 2010, 12:50:19 AM4/27/10
to
> >          Result := Result xor M (Integer'Pos (X) mod 2**31);
> >          --  No warning (and executes as I had expected)
> >          Put_Line ("result:" & M'Image (Result));
> >       exception
> >          when others => Put_Line ("exception c");
> >       end;
> about the zero divisor. The Aonix ObjectAda 7.2 compiler ...

> gives an error at the third block (95LRM4.9(35)).

I'm no language lawyer, but I'll give it a try.

RM95 4.9(35): If the expression is not part of a larger static
expression, then its value shall be within the base range of its
expected type. Otherwise, the value may be arbitrarily large or small.

RM TC1 AM1 4.9(35/2) {AI95-00269-01} If the expression is not part of
a larger static expression *and the expression is expected to be of a
single specific type*, then its value shall be within the base range
of its expected type. Otherwise, the value may be arbitrarily large or
small.

Since Aonix quotes 95RM4.9(35), it must be a pure Ada95 compiler.
2**31 is not part or a larger static expression. The further condition
of being of a single specific type (which is not the case here, the
expected type is any integer type) was added later.

So Aonix is correct, and so is GNAT, if the latter is new enough to
obey AM1.

Simon Wright

unread,
Apr 27, 2010, 3:04:56 PM4/27/10
to
AdaMagica <christo...@eurocopter.com> writes:

> Since Aonix quotes 95RM4.9(35), it must be a pure Ada95 compiler.

Thanks for the analysis.

The '95' in that quote was actually me. But since the compiler binaries
are no later than Feb 2000 I think your conclusion is correct!

Simon Wright

unread,
Apr 27, 2010, 3:08:02 PM4/27/10
to
Adam Beneschan <ad...@irvine.com> writes:

> In any case, unless you're trying to develop code that compiles with
> an Ada95 compiler, I'd prefer to use 'Mod over Unchecked_Conversion.
> The whole purpose of 'Mod was to eliminate the need for
> Unchecked_Conversion in cases like this, since it was impossible to
> write "normal" code that would do the computation correctly.

Ada95 it is .. though I could actually change now with no impact on the
current user project, which is not going to migrate.

Yannick Duchêne (Hibou57)

unread,
May 5, 2010, 12:29:02 AM5/5/10
to
Le Fri, 23 Apr 2010 23:39:52 +0200, Simon Wright <si...@pushface.org> a
écrit:

> I don't know about 'best', but in ColdFrame
> (http://coldframe.sourceforge.net/coldframe/) I would generate this hash
> for use with Booch Maps, where the key components of Instance are
> {A:Integer, B:Integer, C:Integer}:
>
> function Instance_Hash (I : Instance) return Natural is
> type M is mod 2**31;
> Result : M := 0;
> begin
> Result := Result xor M (I.A mod 2**31);
> Result := Result * 10019;
> Result := Result xor M (I.B mod 2**31);
> Result := Result * 10019;
> Result := Result xor M (I.C mod 2**31);
> Result := Result * 10019;
> return Natural (Result);
> end Instance_Hash;
>
> I believe the 10019 came from Knuth, but can't see a reference.
Unlucky, I was to ask you how this 10_019 was computed. At least, this
magic number is not a prime number, the nearest is 1009.

If this can help, may be some goodies there :
http://www.isthe.com/chongo/tech/comp/fnv/
http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx

--
No-no, this isn't an oops ...or I hope (TM) - Don't blame me... I'm just
not lucky

Warren

unread,
May 6, 2010, 11:46:39 AM5/6/10
to
=?iso-8859-15?Q?Yannick_Duch=EAne_=28Hibou57=29?= expounded in
news:op.vb7teo1pxmjfy8@garhos:

>> I believe the 10019 came from Knuth, but can't see a reference.

> Unlucky, I was to ask you how this 10_019 was computed. At least, this
> magic number is not a prime number, the nearest is 1009.
>
> If this can help, may be some goodies there :
> http://www.isthe.com/chongo/tech/comp/fnv/
> http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx

Thanks for those references. That is good information
for my own project, since several maps are used in it.

Warren

0 new messages