Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

(make-hash-table :test #'mytest)

38 views
Skip to first unread message

Knut Olav Bøhmer

unread,
Nov 2, 2002, 4:43:38 AM11/2/02
to

Hi,

Is it possible to make ones own :test to use with hash-tables? if yes,
how (where is the doc)? if no, why?


/me newbie
--
Knut Olav Böhmer
_ _
/ / (_)__ __ ____ __
/ /__/ / _ \/ // /\ \/ / ... The choice of a
/____/_/_//_/\.,_/ /_/\.\ GNU generation

An ideal world is left as an exercise to the reader. (Paul Graham)

Coby Beck

unread,
Nov 2, 2002, 5:17:44 AM11/2/02
to

"Knut Olav Bøhmer" <kn...@linpro.no> wrote in message
news:ujp8z0c...@patch.linpro.no...

>
> Hi,
>
> Is it possible to make ones own :test to use with hash-tables? if yes,
> how (where is the doc)? if no, why?

Can't say why, but the answer is no.

from the make-hash-table entry in the hyperspec:

<quote>
make-hash-table &key test size rehash-size rehash-threshold => hash-table

Arguments and Values:

test---a designator for one of the functions eq, eql, equal, or equalp. The
default is eql.

</quote>


--
Coby Beck
(remove #\Space "coby 101 @ bigpond . com")


David Lichteblau

unread,
Nov 2, 2002, 6:43:46 AM11/2/02
to
"Knut Olav Bøhmer" <kn...@linpro.no> writes:

> Is it possible to make ones own :test to use with hash-tables? if yes,

Not portably, but most implementations support non-standard hash
functions and test predicates, either as MAKE-HASH-TABLE keyword
arguments or (in the case of CMUCL) using a separate registration
mechanism.

> how (where is the doc)?

Ships with your Lisp.

> if no, why?

No idea.

Paul F. Dietz

unread,
Nov 2, 2002, 8:30:52 AM11/2/02
to Knut Olav Bøhmer
Knut Olav Břhmer wrote:
> Hi,
>
> Is it possible to make ones own :test to use with hash-tables? if yes,
> how (where is the doc)? if no, why?

How would the hash table be able to obtain a hash function H
with the property

(funcall testfn x y) ==> (eql (H x) (H y))

The only way it could do that would be if H were a function
returning a constant, but in that case you might as well
use a list with FIND.

It would make sense to augment the hash table interface to
take a hash function as well as a comparison function.

Paul

Wade Humeniuk

unread,
Nov 2, 2002, 10:29:41 AM11/2/02
to

"Knut Olav Bøhmer" <kn...@linpro.no> wrote in message news:ujp8z0c...@patch.linpro.no...
>
> Hi,
>
> Is it possible to make ones own :test to use with hash-tables? if yes,
> how (where is the doc)? if no, why?

It is likely that if you need your own test for a hash-table that should
implement some specialized data structure to hold your data (and
not use the built in hash tables). For example I had some data
sorted by date, so I created arrays of months for each year. Each
month contained an array of days-in-the-month, each day was an
ordered list of entries. This specialized storage is faster than a
hash-table, but its structure is based on knowledge of the nature
of the data.


--
Wade

(format t "Email: ~A"
(map 'string
'code-char
'(119 104 117 109 101 110 105 117 64
116 101 108 117 115 46 110 101 116)))

Kaz Kylheku

unread,
Nov 2, 2002, 11:28:15 AM11/2/02
to
"Knut Olav Bøhmer <kn...@linpro.no> wrote in message news:<ujp8z0c...@patch.linpro.no>...
> Hi,
>
> Is it possible to make ones own :test to use with hash-tables? if yes,
> how (where is the doc)? if no, why?

The language specification says that the test function must be one of
EQ, EQL, EQUAL or EQUALP. So the restriction does not just rule out
your own test functions, but even some standard ones like STRING=.

One productive way to deal with hash tables is to view them as a low
level optimization tool. You can wrap them behind interfaces that
don't require the user to select the appropriate test. Internally, you
know which of the four applies.

E.g. how about this hack:

;; real version would pass through other make-hash-table args
(defgeneric make-hash-for-slot (obj slot))


(defmethod make-hash-for-slot ((obj employee) (slot symbol))
(ecase slot
((employee-number) (make-hash-table :test #'eql))
((first-name) (make-hash-table :test #'equal))
(otherwise

(error "MAKE-HASH-FOR-SLOT: Unsupported slot ~a."
slot)))

Another useful technique is a wrapper that maps some standard
comparison functions to the closes EQ- one.

(defconstant hash-fun-substitution-list
`((nil ,#'eql)
(,#'eq ,#'eq)
(,#'eql ,#'eql)
(,#'equal ,#'equal)
(,#'equalp ,#'equalp)
(,#'string= ,#'equal)
(,#'string-equal ,#'equalp)
(,#'= ,#'eql)
;; ... etc
))

(defun make-hash-table* (&key test) ;; other args
(let ((fun (find test hash-fun-substitution-list
:key #'first :test #'eq)))
(if fun
(make-hash-table :test (second fun))
(error "MAKE-HASH-TABLE*: Unsupported function ~a." test))))

Vijay L

unread,
Nov 2, 2002, 11:55:23 AM11/2/02
to
"Knut Olav Bøhmer <kn...@linpro.no> wrote in message news:<ujp8z0c...@patch.linpro.no>...
> Hi,
>
> Is it possible to make ones own :test to use with hash-tables? if yes,
> how (where is the doc)? if no, why?
>
>
> /me newbie

AFAIK it isn't possible. I don't think it is really necessary to keep
any a test other than the four already provided: EQ, EQL, EQUAL,
EQUALP

EQ takes care of symbols
EQL symbols and numbers
EQUAL conses
EQUALP structures (even copies of structures with the same data), like
an EQUAL test for all structures

The :test tests for equality when searching the hash-table. If you
want to alter the key value use the :key. However check the manual and
see if the key function is applied to both the hash-key in the table
and key you give, I don't have access to the manual right now.

Thanks,

Vijay L

Duane Rettig

unread,
Nov 2, 2002, 4:00:00 PM11/2/02
to
"Knut Olav Bøhmer" <kn...@linpro.no> writes:

> Hi,
>
> Is it possible to make ones own :test to use with hash-tables? if yes,
> how (where is the doc)? if no, why?

Not portably, other than the ones given in the Ansi Spec. However,
some implementations provide such features, also including hash-code
generator function specification, as well as weak hash-tables and key-only
tables for space savings. For documentation on Allegro CL's extensions,
see

http://www.franz.com/support/documentation/6.2/doc/implementation.htm#cl-make-hash-table-2

--
Duane Rettig du...@franz.com Franz Inc. http://www.franz.com/
555 12th St., Suite 1450 http://www.555citycenter.com/
Oakland, Ca. 94607 Phone: (510) 452-2000; Fax: (510) 452-0182

Matthew Danish

unread,
Nov 3, 2002, 6:16:02 AM11/3/02
to
On Sat, Nov 02, 2002 at 10:43:38AM +0100, Knut Olav B?hmer wrote:
> Is it possible to make ones own :test to use with hash-tables? if yes,
> how (where is the doc)? if no, why?

The problem is that you need more than a :TEST to hash values in a
table; you need a hash-function for that type of value. You have the
option of using the hash-table interface extensions that various
implementations provide, using a third-party hash-table package, or
writing your own hash function which maps your values to something that
can be inserted into the standard hash-tables.

Generally I take the third option; for example I encoded a Checkers
board as a 96-bit integer and stored using that number in a EQL
hash-table. In this case the key was always unique to the board, so
there were no collisions, but if there is the possibility of a collision
then you will need to handle it yourself via chaining or some other
mechanism. The built-in hash-table would not know the difference
between two different Checkers boards that give the same key value, for
example.

--
; Matthew Danish <mda...@andrew.cmu.edu>
; OpenPGP public key: C24B6010 on keyring.debian.org
; Signed or encrypted mail welcome.
; "There is no dark side of the moon really; matter of fact, it's all dark."

Erik Naggum

unread,
Nov 3, 2002, 1:58:46 PM11/3/02
to
* Knut Olav Bøhmer

| Is it possible to make ones own :test to use with hash-tables? if yes,
| how (where is the doc)? if no, why?

This is actually very simple, but not immediately obvious because of the
many choices to `make-hash-table´, which tends to make you look for ways
to expand on those choices. The better solution is to look another way
entirely and realize that the key in the hash-table does not necessarily
have to be the actual key.

Even with `equalp´, which traverses structures, you still end up with a
hash bucket that is some index into an array. So the function has to do
two things: compute some magic number and compare the item found in the
hash-table to the actual key before returning the value in the hash-table.

Your `mytest´ function would therefore have to consist of two functions:
(A) One of one argument that returns a (numeric) hash key, and (B) one of
two arguments that returns true if the two arguments are the same under
your test and nil otherwise. You might argue that the way we specify the
`:test´ argument to `make-hash-table´ today is deficient in that it hides
the details and pretends that real functions are involved when they are
really keyword arguments from a finite set of values that select two or
more aspects of the hash-table at once.

To make your own hashtable work, you would therefore have to implement
the bucket yourself, and each hash key you use would have to retrieve a
list of potential values and scan it for the actual value you want. One
good way to do this is to find an orthogonal quality that can also be
reduced to a `eq´-testable value and then use plists. Then you build a
few simple wrapper macros to make use of your improved hash table.

In practice, your knowledge of the data you wish to store and retrieve
will in the general case be superior to what you can express in any
declarative language and it is therefore alluring the way politcal
campaign promises are to choose some "simple" mechanism that covers
another small area of the problem space in addition to the existing hash
test functions, but then someone will come along to ask exactly the same
question you did when they step outside that area, too. Therefore, a
standard has to strike a balance between convenience and simplicity.

To wrap this up with a reference back to what I said at the top, the fact
that you can get a bunch of test functions tends to make you look for a
way to extend that list. This is the wrong approach in general, but it
is known to be hard to leave a path you think ought to be good. There is
even an idiom for it: To be stuck in a rut, where rut is literally a deep
track made by the passage of a wheel.

--
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.

Peter Seibel

unread,
Nov 5, 2002, 12:23:45 AM11/5/02
to
Erik Naggum <er...@naggum.no> writes:

> You might argue that the way we specify the `:test´ argument to
> `make-hash-table´ today is deficient in that it hides the details
> and pretends that real functions are involved when they are really
> keyword arguments from a finite set of values that select two or
> more aspects of the hash-table at once.

So, is it the case that internally gethash uses different hash
functions depending on which :test you choose?

-Peter

--
Peter Seibel
pe...@javamonkey.com

Paul F. Dietz

unread,
Nov 5, 2002, 6:36:38 AM11/5/02
to
Peter Seibel wrote:

> So, is it the case that internally gethash uses different hash
> functions depending on which :test you choose?

Almost certainly. It would be inefficient to have an EQ hash
table use the same hash function as an EQUAL hash table, since the
latter must traverse the S expression to compute the function, while
the former can just hash the address of the object.

Paul


Marcus Breiing

unread,
Nov 5, 2002, 7:19:43 AM11/5/02
to

Beyond efficiency, the reason to have different hash functions is
modifications to hash table keys. You can't reasonably expect
EQUAL-hash tables to work correctly under such modifications. If
you'd use the EQUAL-hash for EQ-tables, such modifications wouldn't
work there, either. However, EQ-Hash tables are rightly required to
work correctly if you modify objects used as keys (using standard
functions;) see 18.1.2 in the Hyperspec.


--
Marcus Breiing
mailto:news-g...@breiing.com (will expire)

Frode Vatvedt Fjeld

unread,
Nov 5, 2002, 7:46:07 AM11/5/02
to
"Paul F. Dietz" <di...@dls.net> writes:

> Almost certainly. It would be inefficient to have an EQ hash table
> use the same hash function as an EQUAL hash table, since the latter
> must traverse the S expression to compute the function, while the
> former can just hash the address of the object.

And rehash at GC-time? Is this really being done in practice?

--
Frode Vatvedt Fjeld

Tim Bradshaw

unread,
Nov 5, 2002, 8:01:47 AM11/5/02
to
* Frode Vatvedt Fjeld wrote:
> And rehash at GC-time? Is this really being done in practice?

It more-or-less has to be I think. Unless object contain some magic
bit of data which uniquely identifies them but is not their address,
then if their address changes, you need to rehash.

--tim

Paul F. Dietz

unread,
Nov 5, 2002, 8:10:51 AM11/5/02
to
Frode Vatvedt Fjeld wrote:

> And rehash at GC-time? Is this really being done in practice?

If the collector moves the objects then, yes, it has to rehash
at GC time. ACL does this, I believe.

GC clearly also needs to be involved if the lisp supports a weak key
extension for EQ hash tables.

Paul


Peter Seibel

unread,
Nov 5, 2002, 10:57:01 AM11/5/02
to
"Paul F. Dietz" <di...@dls.net> writes:

That's pretty much what I figured, thanks. Anyone know why (I'm sure
someone here does) the API was designed with this sort of "fake"
function argument as opposed to actually taking a :test function
argument that can be any comparator function and something like a
:hash argument that takes a hash function (which might be optional if
you pass one of the standard comparators in the :test argument.)

I've read Erik Naggum's response to the OP saying this isn't what you
really want a couple times but I'm still not quite getting it. I
understand why--given the current API--you can't pass any old function
to :test; what I don't get is why it would have been so horrible to do
something like what I proposed to support different
comparators/hash-functions.

Duane Rettig

unread,
Nov 5, 2002, 12:00:01 PM11/5/02
to
"Paul F. Dietz" <di...@dls.net> writes:

> Frode Vatvedt Fjeld wrote:
>
> > And rehash at GC-time? Is this really being done in practice?
>
> If the collector moves the objects then, yes, it has to rehash
> at GC time. ACL does this, I believe.

No. Rehashing after every gc would be incredibly inefficient. Objects
whose hash codes will change due to a gc only _flag_ the hash table
for rehash before the next gethash or puthash operation (if further
GCs occur before the next hashing operation, then if automatic
rehashing had been forced each gc would have wasted one rehash
operation). Also, some objects provably don't move, and some objects
can be hashed in ways other than their address, so that they don't
"dirty" the hash-table. So no rehashing is ever needed if the
hash-table only contains those objects whose hash codes have provably
not changed.

> GC clearly also needs to be involved if the lisp supports a weak key
> extension for EQ hash tables.

Yes.

Daniel Barlow

unread,
Nov 5, 2002, 12:29:31 PM11/5/02
to
Tim Bradshaw <t...@cley.com> writes:

Though note that you can do this lazily (I think CMUCL does) - do the
rehash on the first gethash after the gc has happened, instead of at
gc time itself. This is a win if you don't access the hash table very
often (you could do two or more GCs in between rehashes), and a lose
if you were expecting that gethash takes a predictable time to run.


-dan

--

http://ww.telent.net/cliki/ - Link farm for free CL-on-Unix resources

Michael Hudson

unread,
Nov 5, 2002, 1:37:51 PM11/5/02
to
Duane Rettig <du...@franz.com> writes:

> No. Rehashing after every gc would be incredibly inefficient. Objects
> whose hash codes will change due to a gc only _flag_ the hash table
> for rehash before the next gethash or puthash operation

How do you efficiently tell which hashtables a given object is in? I
can't think of a blindly obvious way of doing that. I haven't spent
long thinking about it, so sorry if I'm just being dumb.

Cheers,
M.

--
All programs evolve until they can send email. -- Richard Letts
Except Microsoft Exchange. -- Art
-- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html

Duane Rettig

unread,
Nov 5, 2002, 4:00:02 PM11/5/02
to
Michael Hudson <m...@python.net> writes:

> Duane Rettig <du...@franz.com> writes:
>
> > No. Rehashing after every gc would be incredibly inefficient. Objects
> > whose hash codes will change due to a gc only _flag_ the hash table
> > for rehash before the next gethash or puthash operation
>
> How do you efficiently tell which hashtables a given object is in? I
> can't think of a blindly obvious way of doing that. I haven't spent
> long thinking about it, so sorry if I'm just being dumb.

Who is "you" in this case? Why do "you" _want_ to tell which hashtables


a given object is in?

--

Erik Naggum

unread,
Nov 5, 2002, 5:30:40 PM11/5/02
to
* Michael Hudson

| How do you efficiently tell which hashtables a given object is in? I
| can't think of a blindly obvious way of doing that. I haven't spent long
| thinking about it, so sorry if I'm just being dumb.

You are not allowed to change the key in a way that would change the
value of the test function applied to it. This is not when rehashing
takes place. Rehashing takes place when you have an `eq´ hashtable of
objects that do not have an internal hash code (such as is commonly done
with symbols), and a garbage collection moves the objects around. As
Duane explained, there is no point in actually performing the rehash
until the hash-table is accessed after a garbage collection that actually
moved an object. The information necessary to make this decision is all
local to the hash-table.

Tim Bradshaw

unread,
Nov 5, 2002, 5:26:41 PM11/5/02
to
* Michael Hudson wrote:

> How do you efficiently tell which hashtables a given object is in? I
> can't think of a blindly obvious way of doing that. I haven't spent
> long thinking about it, so sorry if I'm just being dumb.

You don't need to. When you insert an object whose hash-code may
change into a table, you flag the table as needing a rehash after a
GC, then, next time you access it, if a GC has happened, you rehash
then. (I bet implementations do even smarter things of course).

The classic case where you can avoid rehashing is EQL tables which
only have numeric keys.

--tim

Paul F. Dietz

unread,
Nov 5, 2002, 7:09:39 PM11/5/02
to
Michael Hudson wrote:

> How do you efficiently tell which hashtables a given object is in? I
> can't think of a blindly obvious way of doing that. I haven't spent
> long thinking about it, so sorry if I'm just being dumb.

If GC is moving objects around it already has to solve the problem
of finding all the pointers to those objects.

Paul

Paul F. Dietz

unread,
Nov 5, 2002, 7:11:57 PM11/5/02
to
Duane Rettig wrote:

> No. Rehashing after every gc would be incredibly inefficient. Objects
> whose hash codes will change due to a gc only _flag_ the hash table
> for rehash before the next gethash or puthash operation (if further
> GCs occur before the next hashing operation, then if automatic
> rehashing had been forced each gc would have wasted one rehash
> operation).

Ah. Thanks.

Paul

Rob Warnock

unread,
Nov 5, 2002, 9:56:41 PM11/5/02
to
Tim Bradshaw <t...@cley.com> wrote:
+---------------
+---------------

Some kinds of objects might conveniently have a hash computed on them
when they're created or read in (or lazily, the first time they're used
in any hash-related operation) that is permanently stored with the object
(and recomputed if the object is mutated). Strings & symbols (with the
hash being based on their name-strings) come to mind as the main candidates
for this, though in some cases bignums might also benefit.

In that case, hash tables containing only keys with such "persistent"
hash values [or values whose hashes don't depend on location] need not
be re-hashed when GC'd, nor even when expanded (if the stored hash value
contains "extra" bits which were unused in doing lookups with the smaller
hash table size). Indeed, a hash table might contain a hint bit that says
whether the table is currently "GC-clean", which is cleared whenever a
non-cooperating key is stored into it (and perhaps set again during some
future GC if the table becomes "clean" again).


-Rob

-----
Rob Warnock, PP-ASEL-IA <rp...@rpw3.org>
627 26th Avenue <URL:http://www.rpw3.org/>
San Mateo, CA 94403 (650)572-2607

Tim Bradshaw

unread,
Nov 6, 2002, 4:28:18 AM11/6/02
to
* Rob Warnock wrote:

> Some kinds of objects might conveniently have a hash computed on
> them when they're created or read in (or lazily, the first time
> they're used in any hash-related operation) that is permanently
> stored with the object (and recomputed if the object is
> mutated). Strings & symbols (with the hash being based on their
> name-strings) come to mind as the main candidates for this, though
> in some cases bignums might also benefit.

This can only work, I think, for immutable objects, or for objects
whose hash-code doesn't need to change if they are mutated.
Otherwise, if you change the object, then any hashtable which contains
it as a key needs to be marked for rehash, and that is hideous -
either you need a global bit which says `some hash code may have
changed, rehash any table on access', which bit needs to be set by any
mutation of any such hash-coded object, or such objects need to keep
tabs on which tables they are in.

(This isn't to say that such a trick isn't very useful.)

--tim

Duane Rettig

unread,
Nov 6, 2002, 2:00:01 PM11/6/02
to
Tim Bradshaw <t...@cley.com> writes:

> * Rob Warnock wrote:
>
> > Some kinds of objects might conveniently have a hash computed on
> > them when they're created or read in (or lazily, the first time
> > they're used in any hash-related operation) that is permanently
> > stored with the object (and recomputed if the object is
> > mutated). Strings & symbols (with the hash being based on their
> > name-strings) come to mind as the main candidates for this, though
> > in some cases bignums might also benefit.
>
> This can only work, I think, for immutable objects, or for objects
> whose hash-code doesn't need to change if they are mutated.

This is what I understood Rob to be saying. Symbols are easy, but
strings are not quite as easy, since they may or may not be
immutable, depending on whether they were created as constants or
via make-string. So you can still use strings in eq/eql hash-tables,
though we don't assume immutability for strings.

In Allegro CL, we have an internal function called excl::sxhash-if-fast
(which also implies "if-immutable-hash-code"). It yields results for
all of the following object types (which are thus "clean" by definition
for eq/eql hashtables): null, symbol, character, single-float, double-float,
fixnum, bignum, ratio, complex, function-object, standard-instance.

> Otherwise, if you change the object, then any hashtable which contains
> it as a key needs to be marked for rehash, and that is hideous -
> either you need a global bit which says `some hash code may have
> changed, rehash any table on access', which bit needs to be set by any
> mutation of any such hash-coded object, or such objects need to keep
> tabs on which tables they are in.
>
> (This isn't to say that such a trick isn't very useful.)

Useful, and used. A hash-table is marked as dirty when an "unclean"
object is put into it.

However, there is yet one more trick that we employ, even if an object
is not clean wrt its hash-code generation: For objects of types other
than listed above, where for eq/eql hastables the hash-code is generated
from the object's address, the object will only dirty the table if it
moves. Since some objects move less often than others, we can sometimes
determine _how_ dirty the hash-table is by where its objects were when
they were allocated. In our own generational GC, the nursery generations
are kept in a newspace and are likely to move each scavenge (generational
gc) and the aged objects are kept in an old area and cannot move unless
a global gc is performed. Thus, when an "unclean" object is put into an
eq/eql hash-table, its age is checked and the hash-table is only dirtied
according to that age. We maintain two dirty bits instead of one:
a dirty-after-scavenge bit and a dirty-after-global-gc bit. Thus, if all
of the objects have been tenured (aged) before being placed into the
hash-table, the hash table need never be rehashed until the before the
access after the next global-gc, rather than before the access after the
next scavenge.

Erik Naggum

unread,
Nov 6, 2002, 3:58:17 PM11/6/02
to
* Tim Bradshaw

| Otherwise, if you change the object, then any hashtable which contains
| it as a key needs to be marked for rehash, and that is hideous -

It is also undefined behavior in Common Lisp.

Tim Bradshaw

unread,
Nov 6, 2002, 7:37:16 PM11/6/02
to
* Erik Naggum wrote:

> It is also undefined behavior in Common Lisp.

Good point.

Thomas A. Russ

unread,
Nov 7, 2002, 5:47:20 PM11/7/02
to
Michael Hudson <m...@python.net> writes:

>
> Duane Rettig <du...@franz.com> writes:
>
> > No. Rehashing after every gc would be incredibly inefficient. Objects
> > whose hash codes will change due to a gc only _flag_ the hash table
> > for rehash before the next gethash or puthash operation
>
> How do you efficiently tell which hashtables a given object is in? I
> can't think of a blindly obvious way of doing that. I haven't spent
> long thinking about it, so sorry if I'm just being dumb.

I don't think you have to do that at all. I can imagine it being done
with a gc-counter and a flag. If any item whose hash changes on a GC
is added to a particular hash-table, it sets the hash-changes flag. The
GC counter is set on each rehash of the table, and on accesses checked
to see if one (or more) GC operations have occurred since the last rehash.

--
Thomas A. Russ, USC/Information Sciences Institute t...@isi.edu

0 new messages