How to implement a hash function for different types?

29 views
Skip to first unread message

Jesse Wang

unread,
Aug 2, 2019, 9:22:16 PM8/2/19
to Racket Users
Racket allows different types of keys in a single hash, so there must be a hash function that works for different types(different primitive types), I wonder how Racket implements this under the hood.
Also, If I want to implement a hash table which allows different types of keys, how can I implement such a hash function in Racket?

Thanks!

Justin Zamora

unread,
Aug 2, 2019, 9:37:20 PM8/2/19
to Jesse Wang, Racket Users
Racket doesn't implement hash tables using a hash function. If I
recall correctly, it uses b-trees as the representation for a hash
table. The Racket reference states that "Immutable hash tables
actually provide O(log N) access and update. Since N is limited by the
address space so that log N is limited to less than 30 or 62
(depending on the platform), log N can be treated reasonably as a
constant." See. https://docs.racket-lang.org/reference/hashtables.html

Justin
> --
> You received this message because you are subscribed to the Google Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/ad183814-d9f1-454b-b1a2-d1a9b394f49d%40googlegroups.com.

Jon Zeppieri

unread,
Aug 2, 2019, 9:52:28 PM8/2/19
to Justin Zamora, Jesse Wang, Racket Users
On Fri, Aug 2, 2019 at 9:37 PM Justin Zamora <jus...@zamora.com> wrote:
>
> Racket doesn't implement hash tables using a hash function. If I
> recall correctly, it uses b-trees as the representation for a hash
> table. The Racket reference states that "Immutable hash tables
> actually provide O(log N) access and update. Since N is limited by the
> address space so that log N is limited to less than 30 or 62
> (depending on the platform), log N can be treated reasonably as a
> constant." See. https://docs.racket-lang.org/reference/hashtables.html
>
> Justin
>
> On Fri, Aug 2, 2019 at 9:22 PM Jesse Wang <hell...@gmail.com> wrote:
> >
> > Racket allows different types of keys in a single hash, so there must be a hash function that works for different types(different primitive types), I wonder how Racket implements this under the hood.
> > Also, If I want to implement a hash table which allows different types of keys, how can I implement such a hash function in Racket?
> >
> > Thanks!
> >
> > --


- Racket has both mutable and immutable hashes. Both use hash
functions -- both use the same hash functions, but these hash
functions take into account what kind of data they're called on. See
https://github.com/racket/racket/blob/master/racket/src/cs/rumble/hash-code.ss.
- Immutable hashes are implemented as hash array-mapped tries in
Racket "classic" and Patricia tries in Racket-on-Chez.
- There are equal?-, eqv?-, and eq?-based hashes (both mutable and immutable).
- Each equivalence relation has a corresponding hash function.
(Actually equal? hash two corresponding hash functions:
equal-hash-code? and equal-secondary-hash-code?.)
- Struct types can provide their own implementations of equal?,
equal-hash-code, and equal-secondary-hash-code (see gen:equal+hash).
- So: the built-in hashes can be used with new data types.
Message has been deleted

Jesse Wang

unread,
Aug 3, 2019, 12:52:35 AM8/3/19
to Racket Users
It seems for functions like equal-hash-code and eq-hash-code, different primitive type
have different rules for calculating the hash value, for example:

> (equal-hash-code 123)
123
> (equal-hash-code '123)
123
> (equal-hash-code "123")
> (equal-hash-code 123.0)
1432682342767248795

the result hash values are not uniformly distributed.
If I want to turn the hash code into array index in a hash table, do I need to
apply another uniform hash function such as md5 on the result of equal-hash-code? 

Jon Zeppieri

unread,
Aug 3, 2019, 1:36:01 AM8/3/19
to Jesse Wang, Racket Users
On Sat, Aug 3, 2019 at 12:52 AM Jesse Wang <hell...@gmail.com> wrote:
>
> If I want to turn the hash code into array index in a hash table, do I need to
> apply another uniform hash function such as md5 on the result of equal-hash-code?
>

That wouldn't accomplish anything. The defining feature of a function
is that the output depends solely on the input. Applying the same
function to colliding hash codes will get you more colliding hash
codes, only at a higher cost.

If you're going to implement your own hash tables instead of using the
ones that Racket provides, you can use whatever hash function you
want, but in that case you want to start with the key itself, not with
the result of Racket's hash function. Hash functions, by their very
nature, tend to discard information. If you start with the result of
Racket's hash functions, you're not going to be able to do better than
them.

Jesse Wang

unread,
Aug 3, 2019, 1:56:07 AM8/3/19
to Racket Users


On Saturday, August 3, 2019 at 1:36:01 PM UTC+8, Jon Zeppieri wrote:
On Sat, Aug 3, 2019 at 12:52 AM Jesse Wang <hell...@gmail.com> wrote:
>
> If I want to turn the hash code into array index in a hash table, do I need to
> apply another uniform hash function such as md5 on the result of equal-hash-code?
>

That wouldn't accomplish anything. The defining feature of a function
is that the output depends solely on the input. Applying the same
function to colliding hash codes will get you more colliding hash
codes, only at a higher cost.

I think you are right here.
 

If you're going to implement your own hash tables instead of using the
ones that Racket provides, you can use whatever hash function you
want, but in that case you want to start with the key itself, not with
the result of Racket's hash function. Hash functions, by their very
nature, tend to discard information. If you start with the result of
Racket's hash functions, you're not going to be able to do better than
them. 

What I want is a uniform hash function for different types.
Racket's equal-hash-code seems to give non-uniform hash codes for different types
of primitive types, so I wonder how Racket achieves good performance in a
hash table with different primitive type keys at the same time...

Jon Zeppieri

unread,
Aug 3, 2019, 2:07:18 AM8/3/19
to Jesse Wang, Racket Users
On Sat, Aug 3, 2019 at 1:35 AM Jon Zeppieri <zepp...@gmail.com> wrote:
>
> On Sat, Aug 3, 2019 at 12:52 AM Jesse Wang <hell...@gmail.com> wrote:
> >
> > If I want to turn the hash code into array index in a hash table, do I need to
> > apply another uniform hash function such as md5 on the result of equal-hash-code?
> >
>
> That wouldn't accomplish anything. [...]

Sorry! My response assumed that you're concerned with the hash codes
themselves colliding, but of course you're concerned about the codes
modulo the length of the table (or bit-masked) colliding. (I've been
thinking too much about immutable hashes lately, which aren't actually
hash tables.)

So, yes, applying a cryptographic hash function to the result could
help with this case, but at some point you need to consider whether
the improvement is worth the cost. Cryptographic hash functions are
usually a lot more expensive to compute than the hash functions
commonly used in hash tables.

---

I just saw your response. If you're concerned about performance, you
should try it out and measure. Collisions certainly do slow down hash
tables, but so do expensive hash functions. It's a trade-off.
Reply all
Reply to author
Forward
0 new messages