> Is it possible to make ones own :test to use with hash-tables? if yes,
Not portably, but most implementations support non-standard hash functions and test predicates, either as MAKE-HASH-TABLE keyword arguments or (in the case of CMUCL) using a separate registration mechanism.
> Is it possible to make ones own :test to use with hash-tables? if yes, > how (where is the doc)? if no, why?
It is likely that if you need your own test for a hash-table that should implement some specialized data structure to hold your data (and not use the built in hash tables). For example I had some data sorted by date, so I created arrays of months for each year. Each month contained an array of days-in-the-month, each day was an ordered list of entries. This specialized storage is faster than a hash-table, but its structure is based on knowledge of the nature of the data.
> Is it possible to make ones own :test to use with hash-tables? if yes, > how (where is the doc)? if no, why?
The language specification says that the test function must be one of EQ, EQL, EQUAL or EQUALP. So the restriction does not just rule out your own test functions, but even some standard ones like STRING=.
One productive way to deal with hash tables is to view them as a low level optimization tool. You can wrap them behind interfaces that don't require the user to select the appropriate test. Internally, you know which of the four applies.
E.g. how about this hack:
;; real version would pass through other make-hash-table args (defgeneric make-hash-for-slot (obj slot))
> Is it possible to make ones own :test to use with hash-tables? if yes, > how (where is the doc)? if no, why?
> /me newbie
AFAIK it isn't possible. I don't think it is really necessary to keep any a test other than the four already provided: EQ, EQL, EQUAL, EQUALP
EQ takes care of symbols EQL symbols and numbers EQUAL conses EQUALP structures (even copies of structures with the same data), like an EQUAL test for all structures
The :test tests for equality when searching the hash-table. If you want to alter the key value use the :key. However check the manual and see if the key function is applied to both the hash-key in the table and key you give, I don't have access to the manual right now.
> Is it possible to make ones own :test to use with hash-tables? if yes, > how (where is the doc)? if no, why?
Not portably, other than the ones given in the Ansi Spec. However, some implementations provide such features, also including hash-code generator function specification, as well as weak hash-tables and key-only tables for space savings. For documentation on Allegro CL's extensions, see
On Sat, Nov 02, 2002 at 10:43:38AM +0100, Knut Olav B?hmer wrote: > Is it possible to make ones own :test to use with hash-tables? if yes, > how (where is the doc)? if no, why?
The problem is that you need more than a :TEST to hash values in a table; you need a hash-function for that type of value. You have the option of using the hash-table interface extensions that various implementations provide, using a third-party hash-table package, or writing your own hash function which maps your values to something that can be inserted into the standard hash-tables.
Generally I take the third option; for example I encoded a Checkers board as a 96-bit integer and stored using that number in a EQL hash-table. In this case the key was always unique to the board, so there were no collisions, but if there is the possibility of a collision then you will need to handle it yourself via chaining or some other mechanism. The built-in hash-table would not know the difference between two different Checkers boards that give the same key value, for example.
-- ; Matthew Danish <mdan...@andrew.cmu.edu> ; OpenPGP public key: C24B6010 on keyring.debian.org ; Signed or encrypted mail welcome. ; "There is no dark side of the moon really; matter of fact, it's all dark."
* Knut Olav Bøhmer | Is it possible to make ones own :test to use with hash-tables? if yes, | how (where is the doc)? if no, why?
This is actually very simple, but not immediately obvious because of the many choices to `make-hash-table´, which tends to make you look for ways to expand on those choices. The better solution is to look another way entirely and realize that the key in the hash-table does not necessarily have to be the actual key.
Even with `equalp´, which traverses structures, you still end up with a hash bucket that is some index into an array. So the function has to do two things: compute some magic number and compare the item found in the hash-table to the actual key before returning the value in the hash-table.
Your `mytest´ function would therefore have to consist of two functions: (A) One of one argument that returns a (numeric) hash key, and (B) one of two arguments that returns true if the two arguments are the same under your test and nil otherwise. You might argue that the way we specify the `:test´ argument to `make-hash-table´ today is deficient in that it hides the details and pretends that real functions are involved when they are really keyword arguments from a finite set of values that select two or more aspects of the hash-table at once.
To make your own hashtable work, you would therefore have to implement the bucket yourself, and each hash key you use would have to retrieve a list of potential values and scan it for the actual value you want. One good way to do this is to find an orthogonal quality that can also be reduced to a `eq´-testable value and then use plists. Then you build a few simple wrapper macros to make use of your improved hash table.
In practice, your knowledge of the data you wish to store and retrieve will in the general case be superior to what you can express in any declarative language and it is therefore alluring the way politcal campaign promises are to choose some "simple" mechanism that covers another small area of the problem space in addition to the existing hash test functions, but then someone will come along to ask exactly the same question you did when they step outside that area, too. Therefore, a standard has to strike a balance between convenience and simplicity.
To wrap this up with a reference back to what I said at the top, the fact that you can get a bunch of test functions tends to make you look for a way to extend that list. This is the wrong approach in general, but it is known to be hard to leave a path you think ought to be good. There is even an idiom for it: To be stuck in a rut, where rut is literally a deep track made by the passage of a wheel.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
Erik Naggum <e...@naggum.no> writes: > You might argue that the way we specify the `:test´ argument to > `make-hash-table´ today is deficient in that it hides the details > and pretends that real functions are involved when they are really > keyword arguments from a finite set of values that select two or > more aspects of the hash-table at once.
So, is it the case that internally gethash uses different hash functions depending on which :test you choose?
Peter Seibel wrote: > So, is it the case that internally gethash uses different hash > functions depending on which :test you choose?
Almost certainly. It would be inefficient to have an EQ hash table use the same hash function as an EQUAL hash table, since the latter must traverse the S expression to compute the function, while the former can just hash the address of the object.
> Peter Seibel wrote: >> So, is it the case that internally gethash uses different hash >> functions depending on which :test you choose? > Almost certainly. It would be inefficient to have an EQ hash table > use the same hash function as an EQUAL hash table, since the latter > must traverse the S expression to compute the function, while the > former can just hash the address of the object.
Beyond efficiency, the reason to have different hash functions is modifications to hash table keys. You can't reasonably expect EQUAL-hash tables to work correctly under such modifications. If you'd use the EQUAL-hash for EQ-tables, such modifications wouldn't work there, either. However, EQ-Hash tables are rightly required to work correctly if you modify objects used as keys (using standard functions;) see 18.1.2 in the Hyperspec.
-- Marcus Breiing mailto:news-gwif...@breiing.com (will expire)
> Almost certainly. It would be inefficient to have an EQ hash table > use the same hash function as an EQUAL hash table, since the latter > must traverse the S expression to compute the function, while the > former can just hash the address of the object.
And rehash at GC-time? Is this really being done in practice?
> And rehash at GC-time? Is this really being done in practice?
It more-or-less has to be I think. Unless object contain some magic bit of data which uniquely identifies them but is not their address, then if their address changes, you need to rehash.
> > So, is it the case that internally gethash uses different hash > > functions depending on which :test you choose?
> Almost certainly. It would be inefficient to have an EQ hash > table use the same hash function as an EQUAL hash table, since the > latter must traverse the S expression to compute the function, while > the former can just hash the address of the object.
That's pretty much what I figured, thanks. Anyone know why (I'm sure someone here does) the API was designed with this sort of "fake" function argument as opposed to actually taking a :test function argument that can be any comparator function and something like a :hash argument that takes a hash function (which might be optional if you pass one of the standard comparators in the :test argument.)
I've read Erik Naggum's response to the OP saying this isn't what you really want a couple times but I'm still not quite getting it. I understand why--given the current API--you can't pass any old function to :test; what I don't get is why it would have been so horrible to do something like what I proposed to support different comparators/hash-functions.
> > And rehash at GC-time? Is this really being done in practice?
> If the collector moves the objects then, yes, it has to rehash > at GC time. ACL does this, I believe.
No. Rehashing after every gc would be incredibly inefficient. Objects whose hash codes will change due to a gc only _flag_ the hash table for rehash before the next gethash or puthash operation (if further GCs occur before the next hashing operation, then if automatic rehashing had been forced each gc would have wasted one rehash operation). Also, some objects provably don't move, and some objects can be hashed in ways other than their address, so that they don't "dirty" the hash-table. So no rehashing is ever needed if the hash-table only contains those objects whose hash codes have provably not changed.
> GC clearly also needs to be involved if the lisp supports a weak key > extension for EQ hash tables.
Tim Bradshaw <t...@cley.com> writes: > * Frode Vatvedt Fjeld wrote: >> And rehash at GC-time? Is this really being done in practice?
> It more-or-less has to be I think. Unless object contain some magic > bit of data which uniquely identifies them but is not their address, > then if their address changes, you need to rehash.
Though note that you can do this lazily (I think CMUCL does) - do the rehash on the first gethash after the gc has happened, instead of at gc time itself. This is a win if you don't access the hash table very often (you could do two or more GCs in between rehashes), and a lose if you were expecting that gethash takes a predictable time to run.
Duane Rettig <du...@franz.com> writes: > No. Rehashing after every gc would be incredibly inefficient. Objects > whose hash codes will change due to a gc only _flag_ the hash table > for rehash before the next gethash or puthash operation
How do you efficiently tell which hashtables a given object is in? I can't think of a blindly obvious way of doing that. I haven't spent long thinking about it, so sorry if I'm just being dumb.
Michael Hudson <m...@python.net> writes: > Duane Rettig <du...@franz.com> writes:
> > No. Rehashing after every gc would be incredibly inefficient. Objects > > whose hash codes will change due to a gc only _flag_ the hash table > > for rehash before the next gethash or puthash operation
> How do you efficiently tell which hashtables a given object is in? I > can't think of a blindly obvious way of doing that. I haven't spent > long thinking about it, so sorry if I'm just being dumb.
Who is "you" in this case? Why do "you" _want_ to tell which hashtables a given object is in?
* Michael Hudson | How do you efficiently tell which hashtables a given object is in? I | can't think of a blindly obvious way of doing that. I haven't spent long | thinking about it, so sorry if I'm just being dumb.
You are not allowed to change the key in a way that would change the value of the test function applied to it. This is not when rehashing takes place. Rehashing takes place when you have an `eq´ hashtable of objects that do not have an internal hash code (such as is commonly done with symbols), and a garbage collection moves the objects around. As Duane explained, there is no point in actually performing the rehash until the hash-table is accessed after a garbage collection that actually moved an object. The information necessary to make this decision is all local to the hash-table.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
* Michael Hudson wrote: > How do you efficiently tell which hashtables a given object is in? I > can't think of a blindly obvious way of doing that. I haven't spent > long thinking about it, so sorry if I'm just being dumb.
You don't need to. When you insert an object whose hash-code may change into a table, you flag the table as needing a rehash after a GC, then, next time you access it, if a GC has happened, you rehash then. (I bet implementations do even smarter things of course).
The classic case where you can avoid rehashing is EQL tables which only have numeric keys.
Michael Hudson wrote: > How do you efficiently tell which hashtables a given object is in? I > can't think of a blindly obvious way of doing that. I haven't spent > long thinking about it, so sorry if I'm just being dumb.
If GC is moving objects around it already has to solve the problem of finding all the pointers to those objects.
Duane Rettig wrote: > No. Rehashing after every gc would be incredibly inefficient. Objects > whose hash codes will change due to a gc only _flag_ the hash table > for rehash before the next gethash or puthash operation (if further > GCs occur before the next hashing operation, then if automatic > rehashing had been forced each gc would have wasted one rehash > operation).