Problems with EQUALP hash tables and Arrays

Kenneth P. Turvey

unread,

May 21, 1999, 3:00:00 AM5/21/99

to

Am I doing something wrong here. I am using EQUALP hash tables to
associate a key that is an array of 20 integer to a single integer.

One would think this would be simple, but it isn't working out that way.
CMUCL doesn't even support EQUALP hash tables. ACL 4.0 works great, but
if I dump the table to a file and read it back in (using maphash and
read respectively) all the sudden it quits working.

It no longer finds the association based on the array. It behaves as if
it is an EQ hash table. Even copying the hash table using setf to the
same slot in another instance (not the values, just the table) seems to
mess things up.

Can someone point me in the right direction? Are there pitfalls of
using EQUALP hash tables that I might be missing. Are there concerns
that I may not be addressing.

Are the implementations of EQUALP hash tables consistently broken when
it comes to using array keys?

Thanks,
--
Kenneth P. Turvey <ktu...@SprocketShop.com>
----------------- http://www.tranquility.net/~kturvey

Careful attention to small detail, often proves superior to genius.
-- Cicero

Lyman S. Taylor

unread,

May 22, 1999, 3:00:00 AM5/22/99

to

"Kenneth P. Turvey" wrote:

> it is an EQ hash table. Even copying the hash table using setf to the
> same slot in another instance (not the values, just the table) seems to
> mess things up.

I'm confused. SETF doesn't "copy" anything. It may make two places
refer to the same "thing". I can't fathom how that, in and of itself,
could cause a state change in the "thing" being referred to.

(defclass ht-holder () ( (table :accessor ht-holder-table) ) )

(setf hth1 (make-instance 'ht-holder) hth2 (make-instance 'ht-holder))

(setf (ht-holder-table hth1) (make-hash-table :test #'equalp ) )

The above seems to imply that the following changes the table

(setf (ht-holder-table hth2 ) (ht-holder-table hth1 ) )

This all works correctly in ACL 5.0. I'm not sure why that would be
ACL 5.0 specific. [ Not sure why instances should all refer to the
same table. A class allocated slot would be better. ]

Or do you mean copying the key/value references into a new table:

(defun copy-ht-holder-table ( from to )
( let ( (new-ht (make-hash-table :test #'equalp )) )
(maphash #'(lambda ( key val) (setf (gethash key new-ht) val))
(ht-holder-table from))
(setf (ht-holder-table to) new-ht) ) )

Correctness should be entirely dependent upon the make-hash-table
expression.

---

Lyman

Kenneth P. Turvey

unread,

May 23, 1999, 3:00:00 AM5/23/99

to

On Fri, 21 May 1999 15:40:41 -0500,
I wrote:
>Am I doing something wrong here. I am using EQUALP hash tables to
>associate a key that is an array of 20 integer to a single integer.
>
>One would think this would be simple, but it isn't working out that way.
>CMUCL doesn't even support EQUALP hash tables. ACL 4.0 works great, but
>if I dump the table to a file and read it back in (using maphash and
>read respectively) all the sudden it quits working.
>
>It no longer finds the association based on the array. It behaves as if

>it is an EQ hash table. Even copying the hash table using setf to the
>same slot in another instance (not the values, just the table) seems to
>mess things up.
>

>Can someone point me in the right direction? Are there pitfalls of
>using EQUALP hash tables that I might be missing. Are there concerns
>that I may not be addressing.
>
>Are the implementations of EQUALP hash tables consistently broken when
>it comes to using array keys?

I hate to follow up my own post, but I thought I would provide an
example that I just can't understand. Maybe one or more of you can
tell me if this is in my code (and what to look for) or the
implementation. I can't really provide complete source; the code that
uses this hash table is quite long. I can provide an example of
something that seems contradictory:

USER(29): (describe *test*)
#<RUBIK-WITH-HASH-HEURISTIC @ #x8351782> is an instance of
#<STANDARD-CLASS RUBIK-WITH-HASH-HEURISTIC>:
The following slots have :INSTANCE allocation:
FACE-LIST (ORANGE BLUE GREEN YELLOW WHITE RED)
STATE #(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)
DEPTH 3
TABLE #<EQUALP hash-table with 1288 entries @ #x8350db2>
USER(30): (gethash (get-internal-state *test*))
Error: GETHASH got 1 arg, wanted from 2 to 3.
[condition type: PROGRAM-ERROR]
[1] USER(31): :pop
USER(31): (gethash (get-internal-state *test*) (slot-value *test*
'table))
NIL
NIL
USER(32): (gethash #(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)
(slot-value *test* 'table))
NIL
NIL
USER(33): (maphash #'(lambda (x y) (when (equalp x #(0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0)) (print (list x y)))) (slot-value *test* 'table))

(#(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0) 0)
(#(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0) 2)
(#(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0) 2)
NIL

Note that this seems to indicate that there are three entries in the
hash table that all have EQUALP keys. I don't understand this at all.
My program is unable to find any of the three when calling gethash.

Thanks,

--
Kenneth P. Turvey <ktu...@SprocketShop.com>
----------------- http://www.tranquility.net/~kturvey

The enumeration in the Constitution, of certain rights, shall not be
construed to deny or disparage others retained by the people.
-- Amendment IX to The United States Constitution

P.S Here is the source code for a couple of the functions that may be
helpful. If you spend the time to look at them you are truly a generous
person. :-) Thanks.

(defmethod heuristic ((puz puz-with-hash-heuristic))
(let ((temp-heur (gethash (get-internal-state puz) (slot-value puz 'table))))
(if (not (eql temp-heur nil)) temp-heur
(1+ (max-depth puz)))))

(defun increment-list (move-list num-moves)
"This function increments the provided list of moves to allow the
program to generate all of the possible combinations of moves of a
given length. It returns two values the first is the actual list, the second is t or
nil used to indicate whether the list was extended by one element."
(cond ((equal move-list () ) (setf move-list (cons 0 move-list)))
(t (setf (first move-list) (1+ (first move-list)))
(cond ((= (first move-list) num-moves)
(setf (first move-list) 0)
(cond ((equal (rest move-list) ())
(setf move-list (cons 0 move-list)))
(t (setf (rest move-list)
(increment-list
(rest move-list) num-moves))))))))
move-list)

;;; This function assumes that the value returned by the get-internal-state
;;; method is suitable for use as the key to an equalp hash table. If this
;;; is not the case, this function will not work.
(defmethod initialize :after ((puz puz-with-hash-heuristic)
&key (depth 5) &allow-other-keys)
(let* ((ops (get-operators puz))
(num-ops (length ops)))
(setf (slot-value puz 'table)
(make-hash-table :test #'equalp
:rehash-size (/ num-ops 2.0)
:size num-ops))
(setf (slot-value puz 'depth) depth)
;; Generate the puzzles
(do ((moves '() (increment-list moves num-ops))
(puzzle (copy-puzzle puz))
(hash nil)
(exists nil))
((> (length moves) (max-depth puz)) puz)
(solved-state puzzle)
;; Generate one of the puzzle based on the moves list
(dotimes (index (length moves))
(funcall (get-function (elt ops (elt moves index))) puzzle))
;; If it doesn't already exist in the hash table, put it there.
(multiple-value-bind (hash exists)
(gethash (get-internal-state puzzle)
(slot-value puz 'table))
(unless exists
(setf (gethash (get-internal-state puzzle)
(slot-value puz 'table))
(length moves)))))))

Kenneth P. Turvey

unread,

May 25, 1999, 3:00:00 AM5/25/99

to

On Sun, 23 May 1999 22:31:34 -0500,

I wrote:
>>Am I doing something wrong here. I am using EQUALP hash tables to
>>associate a key that is an array of 20 integer to a single integer.
>>

[Big-Snip]

>
>(#(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0) 0)
>(#(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0) 2)
>(#(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0) 2)
>NIL
>
>Note that this seems to indicate that there are three entries in the
>hash table that all have EQUALP keys. I don't understand this at all.
>My program is unable to find any of the three when calling gethash.

To followup my own post a second time....

This does seem to be an implementation bug. I downloaded version 5.0
and it behaves as expected. Well not exactly as expected. It provides
the correct answer after expending an unbearable amount of time to
determine it. For now, I have given up on using EQUALP hash tables and
moved back to just converting the array into a list before using it as a
key. It works..

Thanks,

--
Kenneth P. Turvey <ktu...@SprocketShop.com>
----------------- http://www.tranquility.net/~kturvey

Anyone who challenges the prevailing orthodoxy finds himself silenced
with surprising effectiveness. A genuinely unfashionable opinion is
almost never given a fair hearing. -- George Orwell

Lyman S. Taylor

unread,

May 25, 1999, 3:00:00 AM5/25/99

to

"Kenneth P. Turvey" wrote:
....

> Well not exactly as expected. It provides
> the correct answer after expending an unbearable amount of time to
> determine it. For now, I have given up on using EQUALP hash tables and
> moved back to just converting the array into a list before using it as a
> key. It works..

If I remember correctly, in ACL 5.0 (and prehaps previous) the
following will return the same result.

(sxhash #( 1 2 3)) <==> (length #( 1 2 3 ))

If Franz is using an equally inspired hash code algorithm (or sxhash
itself)
to implement equalp hash tables, then every single vector in your table
hashes to the same code ( I presume they are all the same length).
Therefore,
you would definately not get O(1) hash retrieval as you would expect.
Every accesss is a collision.

You might try using a transform to convert the vector just for use as
input as the key to the hash retrieval. If they're ephermeral enough
they should get collected pretty easily and you can still get vector
like characteristics elsewhere. If the vector is solely a key then
abandoning vectors is the correct move.

---

Lyman

Kenneth P. Turvey

unread,

May 25, 1999, 3:00:00 AM5/25/99

to

On Tue, 25 May 1999 11:47:28 -0700,
Lyman S. Taylor <lyman....@mindspring.com> wrote:

>"Kenneth P. Turvey" wrote:
>
> If I remember correctly, in ACL 5.0 (and prehaps previous) the
> following will return the same result.
>
> (sxhash #( 1 2 3)) <==> (length #( 1 2 3 ))
>
> If Franz is using an equally inspired hash code algorithm (or sxhash
>itself)
> to implement equalp hash tables, then every single vector in your table
> hashes to the same code ( I presume they are all the same length).
>Therefore,
> you would definately not get O(1) hash retrieval as you would expect.
> Every accesss is a collision.

That could be it... The performance is absolutely horrible and declines
rapidly with the size of the hash table.

> You might try using a transform to convert the vector just for use as
> input as the key to the hash retrieval. If they're ephermeral enough
> they should get collected pretty easily and you can still get vector
> like characteristics elsewhere. If the vector is solely a key then
> abandoning vectors is the correct move.

I'm currently converting the vector to a list for use as the key in the
hash table. It seems to work well, but it means that I have added a
COERCE form to an inner loop. I was hoping to be done with it.

--
Kenneth P. Turvey <ktu...@SprocketShop.com>
----------------- http://www.tranquility.net/~kturvey

The scum also rises.
-- Dr. Hunter S. Thompson

Marco Antoniotti

unread,

May 26, 1999, 3:00:00 AM5/26/99

to

ktu...@pug1.sprocketshop.com (Kenneth P. Turvey) writes:

> On Tue, 25 May 1999 11:47:28 -0700,
> Lyman S. Taylor <lyman....@mindspring.com> wrote:
> >"Kenneth P. Turvey" wrote:
> >
> > If I remember correctly, in ACL 5.0 (and prehaps previous) the
> > following will return the same result.
> >
> > (sxhash #( 1 2 3)) <==> (length #( 1 2 3 ))
> >
> > If Franz is using an equally inspired hash code algorithm (or sxhash
> >itself)
> > to implement equalp hash tables, then every single vector in your table
> > hashes to the same code ( I presume they are all the same length).
> >Therefore,
> > you would definately not get O(1) hash retrieval as you would expect.
> > Every accesss is a collision.
>
> That could be it... The performance is absolutely horrible and declines
> rapidly with the size of the hash table.
>
> > You might try using a transform to convert the vector just for use as
> > input as the key to the hash retrieval. If they're ephermeral enough
> > they should get collected pretty easily and you can still get vector
> > like characteristics elsewhere. If the vector is solely a key then
> > abandoning vectors is the correct move.
>
> I'm currently converting the vector to a list for use as the key in the
> hash table. It seems to work well, but it means that I have added a
> COERCE form to an inner loop. I was hoping to be done with it.
>

This is definitively one for the ANSI review. SXHASH should be made
into a generic function and some guarantee about the recursive
behavior of it on vectors (i.e. not general arrays) should be required
for "conformance".

--
Marco Antoniotti ===========================================
PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
tel. +39 - 06 68 10 03 17, fax. +39 - 06 68 80 79 26
http://www.parades.rm.cnr.it/~marcoxa

Barry Margolin

unread,

May 26, 1999, 3:00:00 AM5/26/99

to

In article <lwyaic1...@copernico.parades.rm.cnr.it>,

Marco Antoniotti <mar...@copernico.parades.rm.cnr.it> wrote:
>This is definitively one for the ANSI review. SXHASH should be made
>into a generic function and some guarantee about the recursive
>behavior of it on vectors (i.e. not general arrays) should be required
>for "conformance".

Hash tables are supposed to be fast. Requiring the hash function to be a
recursive function of the hashes of the vector contents would likely
conflict with this goal.

It's true that most implementations recurse through conses. This is
necessary because conses are so common and there's nothing that
distinguishes them except what they point to. Arrays, on the other hand,
have some properties (such as length and type) that can be determined
quickly and used instead of the contents. Also, arrays tend to be used
more for large quantities of data, so a recursive SXHASH would have to
recurse on lots of elements (unless we gave it license to limit the number
of elements it examines, perhaps something like the first 10).

--
Barry Margolin, bar...@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Kent M Pitman

unread,

May 26, 1999, 3:00:00 AM5/26/99

to

Barry Margolin <bar...@bbnplanet.com> writes:

> In article <lwyaic1...@copernico.parades.rm.cnr.it>,
> Marco Antoniotti <mar...@copernico.parades.rm.cnr.it> wrote:
> >This is definitively one for the ANSI review. SXHASH should be made
> >into a generic function and some guarantee about the recursive
> >behavior of it on vectors (i.e. not general arrays) should be required
> >for "conformance".
>
> Hash tables are supposed to be fast. Requiring the hash function to be a
> recursive function of the hashes of the vector contents would likely
> conflict with this goal.

Right. This is a very tricky matter. You might say (as I almost did)
that computing recursively a hash function would take only one recursion
down the tree but that if this avoided several compares using EQUALP
(which is a full-resursive compare) then it would "obviously" be worth
it. But depending on the data, it could be that EQUALP will return NIL
much faster than a full recursive descent, so comparing a tree of 100 nodes
to 10 other trees of nodes of the same size might still take only 10
compares total (one per node, best case) whereas just computing an
efficient hash function might take 100 node visits plus some computation at
each node.

Bounded efficiency is the kind of thing that you DEFINITELY want BUT
it is also something for which there is no industry-wide accepted theory
of how to do. As such, it is not (IMO) a candidate for standardization
per se. HOWEVER, I do think it would be good for standards to start
to document what you can expect in terms of behavior even if that isn't
all you might wish for--so as not to trick people into thinking they are
getting better performance than they think they are.

Further, it would be good to offer more user-accessible options, but
the standard is not the right place to start, since likely some of
those options will turn out to be half-baked.

My favorite example is the hash table hack for EQ hash tables where
EQ hash tables need rehashing on every GC, but sometimes are not
accessed for several GC cycles. A lot of vendors automatically
"optimize" this case by rehashing on demand, rather than "gratuitously"
on every GC. This gives good "long term" behavior at the expense of
bad "real time" behavior. Most people use GETHASH thinking they
are getting approximately constant-time access and many are surprised
to take the occasional huge hist that comes from a GETHASH on a "dirty"
hash table (one that was marked "needs rehash" after a GC but now needs
to be demand-rehashed). My personal feeling is that this is now a
well-enough-understood situation that a hash table should take an
init option saying :REHASH-MODE :LAZY or :REHASH-MODE :AGGRESSIVE
since only the application, and not the system, can know for each
hash table which will give best performance characteristics for
the application. Applications that sit idle a lot may want to aggressively
rehash even when it's going to be wasted in order to improve real-time
response. Applications that are compute-intensive may not care about
real-time response. It's a mistake for implementations to try to do
this by magic. But this one situation I'm describing is a special
case we have a lot of "current practice" to reason from. The same
is arguably not true for other cases of SXHASH and hash tables.
Also, you want to avoid making the control so specific that it precludes
further optimizations in the future (which is why I chose :LAZY and
:AGGRESSIVE above, instead of :AFTER-GC and :ON-GETHASH); such details
greatly effect the durability of design decisions.

> It's true that most implementations recurse through conses. This is
> necessary because conses are so common and there's nothing that
> distinguishes them except what they point to. Arrays, on the other hand,
> have some properties (such as length and type) that can be determined
> quickly and used instead of the contents. Also, arrays tend to be used
> more for large quantities of data, so a recursive SXHASH would have to
> recurse on lots of elements (unless we gave it license to limit the number
> of elements it examines, perhaps something like the first 10).

I agree with Barry's analysis here, and would only add that if we
followed his "suggestion" of looking only at the first 10, someone would
immediately report a bug in which the 11th element was what mattered,
calling the decision to look at only 10 "obviously wrong since there are
arrays bigger than 10". It's very hard to win totally.
That's why there are marketplaces. Marketplaces are about keeping
the tension on the vendor to do better than what standards require.

Lyman S. Taylor

unread,

May 26, 1999, 3:00:00 AM5/26/99

to

Barry Margolin wrote:
>
> In article <lwyaic1...@copernico.parades.rm.cnr.it>,
> Marco Antoniotti <mar...@copernico.parades.rm.cnr.it> wrote:

....
> >into a generic function
...

> Hash tables are supposed to be fast.

Those two (without Dylan-like sealing directives) are somewhat mutally
exclusive. I think the motivation for a generic is so that the individual
can "fix it" to something better. That might be more appropriately served
by a another related hash table type where you can specify the
hash code function. I'm not sure that needs to be in the standard
(however, a generally available "template" of this would be nice).

The dispatch overhead may not be that high, but it is definately not
without cost.

> It's true that most implementations recurse through conses. This is
> necessary because conses are so common and there's nothing that
> distinguishes them except what they point to. Arrays, on the other hand,
> have some properties (such as length and type) that can be determined
> quickly and used instead of the contents.

They're convienent. However, I don't think those two are appropriate.
The "so common" has to be in the context of the hash table the code is
used for. I would strongly suspect that most hash tables are not used
with heterogeneous keys. My intuition is that the key would very likely
be quite homogeneous in "size" and type. That makes those two
attributes quite inappropriate for a hash code. Yes, you can access
them quickly, but that seems penny wise and pound foolish.

When a "container" is used as a key I would expect that the homogeneity
of the container's type and "size" would hold true. The discriminator
should be based on the contents.

> Also, arrays tend to be used
> more for large quantities of data,

A reasonable tradeoff would be to do a deterministic sampling of the
contents. Not the first 10, but perhaps the every 10th index.
Or every XXth, where XX is based on the array's size.
[ This isn't perfect. Data that happens to be the same across the
"stride" will exhibit bad behaviour, but that seems more generally
unlikely
than the first N and significantly better than basing it on the
container's generic attributes. ]

For "small" (e.g., < 30) arrays, such as the case that spawned this
thread,
it could be based on each element. A cursory glance at the Dragon
book reveals a hash code algorithm for hashing strings which uses
each element to make a contribution. [ Essentially, make "large" arrays
a special case and more likely to do poorly. You can dynamically
trade off accuracy for time without too much overhead.]

Hash tables critically depend upon the hash code algorithm generating a
reasonable distribution of values. Returning the "size" or "type" of
an object is not very likely to return a reasonable distribution unless
you have a pretty weird key domain.

---

Lyman

Hrvoje Niksic

unread,

May 26, 1999, 3:00:00 AM5/26/99

to

"Lyman S. Taylor" <lyman....@mindspring.com> writes:

> If I remember correctly, in ACL 5.0 (and prehaps previous) the
> following will return the same result.
>
> (sxhash #( 1 2 3)) <==> (length #( 1 2 3 ))

Why do they do that? It's practically *guaranteed* to degrade hash
searches to O(n) with equal-length keys.

XEmacs hash-table implementation hashes vectors by calculating hash
keys of five of its elements and hashing them together (the maximum
recursion depth is limited to five.) This calculation is of course
slower than just returning the length of the vector, but that's what
you ask for when you use vectors as hash keys.

Raymond Toy

unread,

May 26, 1999, 3:00:00 AM5/26/99

to

>>>>> "Hrvoje" == Hrvoje Niksic <hni...@srce.hr> writes:

Hrvoje> "Lyman S. Taylor" <lyman....@mindspring.com> writes:
>> If I remember correctly, in ACL 5.0 (and prehaps previous) the
>> following will return the same result.
>>
>> (sxhash #( 1 2 3)) <==> (length #( 1 2 3 ))

Hrvoje> Why do they do that? It's practically *guaranteed* to degrade hash
Hrvoje> searches to O(n) with equal-length keys.

Because they were lazy and didn't expect anyone to use vectors as
keys? :-)

CMUCL certainly has this fault, except it's array-rank instead of
length. It's also deficient in not having equalp hash tables.

Ray

Duane Rettig

unread,

May 26, 1999, 3:00:00 AM5/26/99

to

Hrvoje Niksic <hni...@srce.hr> writes:

> "Lyman S. Taylor" <lyman....@mindspring.com> writes:
>
> > If I remember correctly, in ACL 5.0 (and prehaps previous) the
> > following will return the same result.
> >
> > (sxhash #( 1 2 3)) <==> (length #( 1 2 3 ))
>

> Why do they do that? It's practically *guaranteed* to degrade hash

> searches to O(n) with equal-length keys.

I'm surprised that nobody has caught this yet, so I'll say some words
about sxhash and equalp hash tables:

Sxhash is meant for equal hash tables, and though it can be used for
equalp hashing, it is not as useful for equalp hashing. The problem
is in what we can and can't do with sxhash itself: sxhash cannot change
its return value based on any externally visible change on the object
itself. If you change the contents of a vector being used as a key,
the hash code must not change, because the key/value pair would
otherwise get lost in the hash-table.

For EQUALP hash-tables, we don't use sxhash; we use a similar function
(whose equivalent is the undocumented function excl::equalp-hash). This
function does return values based on the contents of the key (and the
user must be careful not to modify such keys).

For EQUAL hash tables, sxhash is used, and hash-tables using vectors as
keys in an equal setting are not going to be as useful. Perhaps an EQ
hashtable would be more appropriate anyway.

A little more on the tradeoffs of EQUAL hashing: some kinds of objects
have some room to store a hash value, such as symbols, function objects,
and standard-instances. Vectors have no extra storage (we would have
many angry customers if we took up an extra slot for a hash code for
every vector). Instead of forcing an extra slot in every vector to
hold a hash code we provide an extension to make-hash-table, to allow
for a couple of extra keywords to help in such situations. One such
extension is to allow the :test keyword to be used for other functions
besides the normal four (eq, eql, equal, and equalp). The function must
be predefined and accept two arguments. The other extension we provide
is the :hash-function keyword. It must be a function of one argument,
and must return a hash code, generated in any way desired from the argument
(or not :-). In this way, one can simulate a descending hash-code
generation technique or better yet one can set aside a slot for a
unique hash code which is generated when the key is created, and the
hashing can be fast without forcing data enlargement.

> XEmacs hash-table implementation hashes vectors by calculating hash
> keys of five of its elements and hashing them together (the maximum
> recursion depth is limited to five.) This calculation is of course
> slower than just returning the length of the vector, but that's what
> you ask for when you use vectors as hash keys.

The problem comes in when you start modifying your keys. If you start
out with a key of (setq key1 #(1 2 3)) and then after it has been used
in a hash-table, modify it at all, say, (setf (aref key1 0) 4), then
if the hash code is different the key will no longer be found. If you're
willing to live with this, then perhaps an equalp hash-table is the right
thing to use. But for equal hashing, we cannot select any hash code
that might change based on the value of the key.

--
Duane Rettig Franz Inc. http://www.franz.com/ (www)
1995 University Ave Suite 275 Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253 du...@Franz.COM (internet)

Barry Margolin

unread,

May 26, 1999, 3:00:00 AM5/26/99

to

In article <4zp2rl...@beta.franz.com>,

Duane Rettig <du...@franz.com> wrote:
>Hrvoje Niksic <hni...@srce.hr> writes:
>
>> "Lyman S. Taylor" <lyman....@mindspring.com> writes:
>>
>> > If I remember correctly, in ACL 5.0 (and prehaps previous) the
>> > following will return the same result.
>> >
>> > (sxhash #( 1 2 3)) <==> (length #( 1 2 3 ))
>>
>> Why do they do that? It's practically *guaranteed* to degrade hash
>> searches to O(n) with equal-length keys.
>
>I'm surprised that nobody has caught this yet, so I'll say some words
>about sxhash and equalp hash tables:
>
>Sxhash is meant for equal hash tables, and though it can be used for
>equalp hashing, it is not as useful for equalp hashing. The problem
>is in what we can and can't do with sxhash itself: sxhash cannot change
>its return value based on any externally visible change on the object
>itself. If you change the contents of a vector being used as a key,
>the hash code must not change, because the key/value pair would
>otherwise get lost in the hash-table.

In fact, if the vector is adjustable, even the length cannot be used as the
hash code, since changing the length doesn't change the result of EQUAL.
For arrays other than strings or bit vectors, EQUAL == EQL, and
ADJUST-ARRAY doesn't change the identity of an adjustable array.

Tim Bradshaw

unread,

May 26, 1999, 3:00:00 AM5/26/99

to

* Barry Margolin wrote:

> It's true that most implementations recurse through conses. This is
> necessary because conses are so common and there's nothing that
> distinguishes them except what they point to. Arrays, on the other hand,
> have some properties (such as length and type) that can be determined

> quickly and used instead of the contents. Also, arrays tend to be used
> more for large quantities of data, so a recursive SXHASH would have to
> recurse on lots of elements (unless we gave it license to limit the number
> of elements it examines, perhaps something like the first 10).

Another optimisation that would be possible for an implementation but
hard for a user to extend would be to have a slot in an array (like
the LispM `array header' stuff) where the hash code was cached. Then
you could compute it once by doing some hairy computation (say when
the array was put in the table) and then reads would be very fast,
which is kind of a nice way around.

--tim

Erik Naggum

unread,

May 27, 1999, 3:00:00 AM5/27/99

to

* Hrvoje Niksic <hni...@srce.hr>

| Why do they do that? It's practically *guaranteed* to degrade hash
| searches to O(n) with equal-length keys.

I don't know _why_ they do it, but it's silly to take a special case and
argue that it's the world's most important issue. if hash tables don't
give you the performance you need, you engage your programmer brain and
write something that does. if you are one of those people who think that
everything should be free and easy, programming is the wrong trade.

let me give you an example. I have a pretty decent MD5 algorithm that I
use in various protocols to ensure that lines don't kill data. MD5 is a
fairly complex algorithm, and until I can interface the assembly version
I wrote directly with Allegro CL, I use one written in Common Lisp, and
heavily bummed (if I ever start to lecture on optimization, I can boast a
real life example of 13 000 times factor difference between the naive
code and the optimized-to-death version). however, I sometimes need to
compute the checksum of strings that happen to be STRING= several times.
finding the appropriate string in a cache is not as trivial as it sounds:
it needs to be significantly faster than computing the value, since the
cache will be tried before computing the value. also, a cache would tend
to grow without bounds in the absence of weak pointers and the like. in
the end, I decided to let the caller signal "cachability" of a string,
reorder strings in the cache when a hit is made to make it more likely to
win the next time, flush the cache when it gets full and upon global GC.
all of this means it doesn't pay to cache strings 119 characters or less
in length if I am to retain up to 256 strings in the cache. a lot of CPU
time went into finding the point where it made sense to cache. why all
this work? because the naïve caching strategy caused a very serious
performance degradation in actual use, with a usage pattern that had gone
completely unnoticed during testing.

if there's a morale to this story it is that optimization holds surprises
by its very nature: if we knew how to write it as fast as possible,
that's precisely what we'd do from the start. we start to optimize
because the system is unexpectedly slow, right? so _of_course_ we'll
find some special special case where performance sucks real hard.

| This calculation is of course slower than just returning the length of
| the vector, but that's what you ask for when you use vectors as hash keys.

it isn't clear to me what you "ask for" when you use vectors as hash
keys, but it is nonetheless obvious that user expectations have not been
met. sometimes, user expectations are just plain wrong. in the field of
optimal performance, user expectations are normally wrong: users are like
politicians that way, they want something very badly, but have absolutely
no idea how to get it. bad performance is a computer's way of showing
its approval rating of the programmers.

#:Erik
--
@1999-07-22T00:37:33Z -- pi billion seconds since the turn of the century

Marco Antoniotti

unread,

May 27, 1999, 3:00:00 AM5/27/99

to

Kent M Pitman <pit...@world.std.com> writes:

> Barry Margolin <bar...@bbnplanet.com> writes:

Many, many good things deleted. But.....

I believe you missed my - and other 's - point (apart from the
"standardization" process).

Basically, right now you do not have any control about (to be read
"way to specify") how the hash code of an object gets computed.

In CMUCL this is what happens.

* (defstruct a x)
A
* (dotimes (i 20)
(print (sxhash (make-a :x i))))
65
65
65
65
65
65
65
65
65
65
65
65
65
65
65
65
65
65
65
65
NIL
*

ACL 4.3 on Solaris exhibits the same behavior.

What I would like is to be able to do

(define-sxhash ((an-a a))
(let-s-shoot-ourselves-in-you-know-what an-a))

or

(define-sxhash ((an-a a) (ht (eql *my-hash-table*))
(let-s-shoot-ourselves-in-you-know-what an-a))

or

(make-hash-table :hash-function #'let-s-shoot-ourselves-in-you-know-what)

Where

(defun let-s-shoot-ourselves-in-you-know-what (an-a)
(cl:sxhash (a-x an-a)))

You get the general idea. As you can see I may accept something
different than a (defgeneric SXHASH (obj)).

Note that given the actual definition of SXHASH there is no point in
making an EQUALP hash table if you want to key on a structure (or an object).

Java pretty much gets this right. Each class has the right to redefine
its hash code function.

The current situation forces you to write two-level hashing schemes
whenever you need to hash on an object or a structure. I.e. you write
code like

(defun insert-in-hash (obj ht whatever)
(let ((hash-code (compute-obj-hash-code obj)))
(setf (gethash hash-code ht) whatever)))

(defun get-in-hash (obj ht)
(let ((hash-code (compute-obj-hash-code obj)))
(gethash hash-code ht) whatever))

This is perfectly acceptable ('compute-obj-hash-code' could be made
into a generic function), but it does not beat

(setf (gethash obj ht) whatever)

Given that you were able to define a SXHASH yourself.

Going back to the standardization effort, I'd say that KMP is
right. An reference document should be available and experience made
before incorporating it in the standard.

Weak hash tables seem almost ready for incorporation and that is
already a lot of meat on the fire. (apart form the fact that I do not
like the idea to do

(defvar wht (make-hash-table :weak-p t))
(setf wp (make-weak-reference (something))
(setf (gethash wp wht) (something-else))

:-) )

Cheers

Kent M Pitman

unread,

May 27, 1999, 3:00:00 AM5/27/99

to

Marco Antoniotti <mar...@copernico.parades.rm.cnr.it> writes:

> Kent M Pitman <pit...@world.std.com> writes:
>
> > Barry Margolin <bar...@bbnplanet.com> writes:
>
> Many, many good things deleted. But.....
>
> I believe you missed my - and other 's - point (apart from the
> "standardization" process).
>
> Basically, right now you do not have any control about (to be read
> "way to specify") how the hash code of an object gets computed.
>
> In CMUCL this is what happens.
>
> * (defstruct a x)
> A
> * (dotimes (i 20)
> (print (sxhash (make-a :x i))))

> 65 65 ...

The point of Erik Naggum's remark was that neither sxhash nor hash tables
gives you the power you need. I understand what you're saying, but I'm
saying that the right solution to this is to (a) write your own and
(b) lobby for its acceptance as standard. You're making a 100% correct
observation but it's not accompanied by any suggestion. It's a property
of sxhash that it's not supposed to change its function over time--I
don't know if that's written down, but vendors know it to be so. It would
break shared data written to files if the formula changed from release
to release. The right answer is for you to prpose a better scheme,
not to claim this scheme is broken, and then to get all vendors to adopt
it. Short of that, rolling your own is the right answer. The language
certainly doesn't keep you from doing that.

> What I would like is to be able to do
>
> (define-sxhash ((an-a a))
> (let-s-shoot-ourselves-in-you-know-what an-a))

If you saved the hashes to a file, and later changed your function,
you might be screwed. One virtue of regular hash tables is that they
hide the hashing. Might be what you're suggesting wouldn't be
bad, but that is untested.

> (make-hash-table :hash-function #'let-s-shoot-ourselves-in-you-know-what)

Proposals were bandied about to do this, but it's more complicated
than you think. The Lisp Machine had a way to do this, I think.
The problem as I recall is that you need more than one function
that have to work in sync--a hasher and a comparer. And they have
to really work togetehr right in various ways or they get all
screwed up.

> Going back to the standardization effort, I'd say that KMP is
> right. An reference document should be available and experience made
> before incorporating it in the standard.

Marco Antoniotti

unread,

May 27, 1999, 3:00:00 AM5/27/99

to

Kent M Pitman <pit...@world.std.com> writes:

> Marco Antoniotti <mar...@copernico.parades.rm.cnr.it> writes:
>
> > Kent M Pitman <pit...@world.std.com> writes:
> >
> > > Barry Margolin <bar...@bbnplanet.com> writes:
> >
> > Many, many good things deleted. But.....
> >

..
>
> The point of Erik Naggum's remark was that neither sxhash nor hash tables
> gives you the power you need. I understand what you're saying, but I'm
> saying that the right solution to this is to (a) write your own and
> (b) lobby for its acceptance as standard. You're making a 100% correct
> observation but it's not accompanied by any suggestion. It's a property
> of sxhash that it's not supposed to change its function over time--I
> don't know if that's written down, but vendors know it to be so. It would
> break shared data written to files if the formula changed from release
> to release. The right answer is for you to prpose a better scheme,
> not to claim this scheme is broken, and then to get all vendors to adopt
> it. Short of that, rolling your own is the right answer. The language
> certainly doesn't keep you from doing that.
> > What I would like is to be able to do
> >
> > (define-sxhash ((an-a a))
> > (let-s-shoot-ourselves-in-you-know-what an-a))
>
> If you saved the hashes to a file, and later changed your function,
> you might be screwed. One virtue of regular hash tables is that they
> hide the hashing. Might be what you're suggesting wouldn't be
> bad, but that is untested.

If your hash function were "bad" you'd be screwed from the beginning.
That is the point of having a user definable SXHASH. The user can
shoot himself (female users usually being more intelligent :) ) in the
foot or other more painful parts of the body.
Besides, a certain "restrictive" interpretation of the Hyperspec
requirement #3 for hash-codes in the SXHASH description may make you
think that writing hash codes to file is not very kosher.

>
> > (make-hash-table :hash-function #'let-s-shoot-ourselves-in-you-know-what)
>
> Proposals were bandied about to do this, but it's more complicated
> than you think. The Lisp Machine had a way to do this, I think.
> The problem as I recall is that you need more than one function
> that have to work in sync--a hasher and a comparer. And they have
> to really work togetehr right in various ways or they get all
> screwed up.

This may well be. I have not really thought about the interaction of
the hasher and the comparer.

> > Going back to the standardization effort, I'd say that KMP is
> > right. An reference document should be available and experience made
> > before incorporating it in the standard.
>
> <sigh of relief>

Made you shiver, didn't I? :)

Duane Rettig

unread,

May 27, 1999, 3:00:00 AM5/27/99

to

Marco Antoniotti <mar...@copernico.parades.rm.cnr.it> writes:

> Basically, right now you do not have any control about (to be read
> "way to specify") how the hash code of an object gets computed.

Well, not in a standard way. Allegro CL provides an extension.

> In CMUCL this is what happens.
>
> * (defstruct a x)
> A
> * (dotimes (i 20)
> (print (sxhash (make-a :x i))))
> 65

...

> 65
> NIL
> *
>
> ACL 4.3 on Solaris exhibits the same behavior.

Both CMUCL and Allegro CL were designed with the same determination
that an extra space to hold a hash code in _every_ structure object
was unacceptable, as is the case with arrays.

> What I would like is to be able to do

...

> or
>
> (make-hash-table :hash-function #'let-s-shoot-ourselves-in-you-know-what)
>
> Where
>
> (defun let-s-shoot-ourselves-in-you-know-what (an-a)
> (cl:sxhash (a-x an-a)))

You can do this in Allegro CL. However, there are a couple of
things that you should be aware of in your example:

- Introduction of a hash-function into the make-hash-table form
doesn't change the default value of the :test function, which is
EQL. Thus, it depends on whether your hash-function is
identity-based as to whether you will find all of the keys you
put into the table, or whether they will be otherwise lost.
perhaps
(make-hash-table ... :test 'let-s-test-ourselves-in-the-you-know-what)
might be warranted.

- Each instance of the structure should be initialized either at
structure-instance creation time, or lazily at hash-time. So either
let-s-shoot-ourselves-in-you-know-what could store a hash-value into
(a-x an-a) if it is null, or else you could define a as

(defstruct a (x (new-you-know-what-hash-value)))

to get a unique hash code (if that is what you want).

> You get the general idea. As you can see I may accept something
> different than a (defgeneric SXHASH (obj)).

I don't know if this was discussed in this group (it was discussed
briefly at Franz) but making sxhash generic is unworkable because
adding methods to it would thus change the hash code and lose keys
in any tables already built.

> Note that given the actual definition of SXHASH there is no point in
> making an EQUALP hash table if you want to key on a structure (or an object).

SXHASH is not the right hashing function to use in EQUALP hash tables.
Allegro uses an internal one, but no standard function exists.

Duane Rettig

unread,

May 27, 1999, 3:00:00 AM5/27/99

to

Kent M Pitman <pit...@world.std.com> writes:

> It's a property
> of sxhash that it's not supposed to change its function over time--I
> don't know if that's written down, but vendors know it to be so.

It's written down. It's also logical (imagine that!-)

> > (make-hash-table :hash-function #'let-s-shoot-ourselves-in-you-know-what)
>
> Proposals were bandied about to do this, but it's more complicated
> than you think. The Lisp Machine had a way to do this, I think.
> The problem as I recall is that you need more than one function
> that have to work in sync--a hasher and a comparer. And they have
> to really work togetehr right in various ways or they get all
> screwed up.

Our first effort in Allegro CL to provide an extension was as a patch
to the hash-table implementation in version 4.2; it extended the :test
argument to be a function of two or three arguments which performed
the hash code generation or the comparison, depending on how it was
called. It did the job of forcing the user to define both hash-generation
and comparison at the same time, but in my opinion it was a horrible
hack, and we went quickly (startiing with version 4.3) to separating
the :hash-function and :test functionality. This separated functionality
is much more direct and understandable.

Lieven Marchand

unread,

May 27, 1999, 3:00:00 AM5/27/99

to

Kent M Pitman <pit...@world.std.com> writes:

> HOWEVER, I do think it would be good for standards to start
> to document what you can expect in terms of behavior even if that isn't
> all you might wish for--so as not to trick people into thinking they are
> getting better performance than they think they are.
>

I think it would make more sense for the standard to demand of
implementors to document the expected performance of their
implementation. If you start putting this kind of stuff in the
standard, you may end up cutting off some worthwhile parts of the
design space that would be appropriate for some
implementations. Alternatively, you end up with requirements that are
so loose as to be almost meaningless. The C++ standard demands certain
performance bounds in terms of O(f(n)) of the STL library components,
but since the implementation is not required to support objects or
arrays above a certain size any vendor can claim O(1) performance of
everything by choosing a suitably large constant.

--
Lieven Marchand <m...@bewoner.dma.be>
If there are aliens, they play Go. -- Lasker

Erik Naggum

unread,

May 28, 1999, 3:00:00 AM5/28/99

to

* Lieven Marchand <m...@bewoner.dma.be>

| I think it would make more sense for the standard to demand of
| implementors to document the expected performance of their implementation.

this would mean that no implemntation would ever be conforming to the
standard, and reasonable people would shrug off that requirement, because
it is actually an unreasonable thing to request people to do. the reason
for support channels back to a vendor is to be able to ask when you need
to know. perhaps this is foreign to people who don't like commercial
vendors in the programming language market, but the support I get from
Franz Inc when I need it is tremendously valuable.

| Alternatively, you end up with requirements that are so loose as to be
| almost meaningless. The C++ standard demands certain performance bounds
| in terms of O(f(n)) of the STL library components, but since the
| implementation is not required to support objects or arrays above a
| certain size any vendor can claim O(1) performance of everything by
| choosing a suitably large constant.

I don't think you know what the big-O notation means, but I'm sure the
people who read the C++ standard don't either, so it's probably a useful
requirement for that community.

Lieven Marchand

unread,

May 28, 1999, 3:00:00 AM5/28/99

to

Erik Naggum <er...@naggum.no> writes:

> * Lieven Marchand <m...@bewoner.dma.be>
> | I think it would make more sense for the standard to demand of
> | implementors to document the expected performance of their implementation.
>
> this would mean that no implemntation would ever be conforming to the
> standard, and reasonable people would shrug off that requirement, because
> it is actually an unreasonable thing to request people to do.

Why? It seems more reasonable than demanding a certain performace requirement.

> the reason
> for support channels back to a vendor is to be able to ask when you need
> to know. perhaps this is foreign to people who don't like commercial
> vendors in the programming language market, but the support I get from
> Franz Inc when I need it is tremendously valuable.
>

Personally, I have nothing against commercial vendors but this issue
seems orthogonal to the discussion. Besides, it would make sense for
the vendors to document things like that when it gets asked
frequently. That is why a lot of vendors put (a sanitized version of)
their support knowledge base online.

> I don't think you know what the big-O notation means, but I'm sure the
> people who read the C++ standard don't either, so it's probably a useful
> requirement for that community.
>

I think I do understand. That's why I don't think such requirements
are useful in a standard.

Christopher Browne

unread,

May 29, 1999, 3:00:00 AM5/29/99

to

On 28 May 1999 20:20:54 +0200, Lieven Marchand <m...@bewoner.dma.be> wrote:
>Erik Naggum <er...@naggum.no> writes:
>
>> * Lieven Marchand <m...@bewoner.dma.be>
>> | I think it would make more sense for the standard to demand of
>> | implementors to document the expected performance of their implementation.
>>
>> this would mean that no implemntation would ever be conforming to the
>> standard, and reasonable people would shrug off that requirement, because
>> it is actually an unreasonable thing to request people to do.
>
>Why? It seems more reasonable than demanding a certain performace requirement.

It would surely be nice for implementors to indicate *some* "Big O"
information for operators, but it's not necessarily as useful as you
think it is... More below...

>> the reason
>> for support channels back to a vendor is to be able to ask when you need
>> to know. perhaps this is foreign to people who don't like commercial
>> vendors in the programming language market, but the support I get from
>> Franz Inc when I need it is tremendously valuable.
>
>Personally, I have nothing against commercial vendors but this issue
>seems orthogonal to the discussion. Besides, it would make sense for
>the vendors to document things like that when it gets asked
>frequently. That is why a lot of vendors put (a sanitized version of)
>their support knowledge base online.

This may well be something that would be "nice to have." I agree that
this is orthogonal to discussion.

>> I don't think you know what the big-O notation means, but I'm sure the
>> people who read the C++ standard don't either, so it's probably a useful
>> requirement for that community.
>
>I think I do understand. That's why I don't think such requirements
>are useful in a standard.

This is less useful than people may think it would be.

Consider that there is not necessarily an unambiguously "best"
algorithm.

1. Looking just at sorting, Quicksort may be provably "optimal" for
some purposes, but has rather bad O(n^2) worst-case performance, and
doesn't preserve the ordering of elements as does mergesort. There are
"midpoint of elements" approaches to remove the worst-case, but
supposing you have an array with a *HUGE* number of identical items, it
still breaks down.

Heapsort provides better worst-case results, which means that sometimes
it's preferable.

Mergesort preserves ordering, which means that sometimes it's
preferable.

That's an example based simply on "stability of results" and "speed."

2. Gonnett's "Handbook of Algorithms and Data Structures" records not
only quantities of operations, but also memory consumption.

With sorting, there may be tradeoffs between performance and
"stability;" for other sorts of algorithms, there may be a tradeoff
where if you can afford a bigger O(g(m)) (e.g. - the characterization of
memory consumption in "Big O" terms), you can get a better O(f(n)),
quantity of CPU operations.

In Lisp, this probably gets exemplified almost invisibly by the
consideration that different GC strategies will perform better for
different applications.

3. Bonus marks go to the consideration that if you build a compound
data structure that is composed out of multiple data structures,
determining the "Big O" for that composition may be a nontrivial task.

The C++ folk doubtless *couldn't* establish forcible "Big O"
quantifications for things like STL, because all three of the
considerations above kick in in *spades,* making analytic solutions
impossible. You can't define things at the "STL Standard" level,
because it needs to be portable across the variety of possible C++
implementations, and there is a fairly wide scope of permissible ways of
implementing C++. (If you are skeptical, I have two words to say:
"Real Time.")

--
"There are three kinds of program statements: sequence,
repetition, and seduction."
cbbr...@hex.net- <http://www.ntlug.org/~cbbrowne/computing.html>

Erik Naggum

unread,

May 29, 1999, 3:00:00 AM5/29/99

to

* Lieven Marchand <m...@bewoner.dma.be>
| I think it would make more sense for the standard to demand of
| implementors to document the expected performance of their implementation.

* Erik Naggum <er...@naggum.no>
| this would mean that no implementation would ever be conforming to the

| standard, and reasonable people would shrug off that requirement, because
| it is actually an unreasonable thing to request people to do.

* Lieven Marchand <m...@bewoner.dma.be>
| Why? It seems more reasonable than demanding a certain performance
| requirement.

I have tried, but I do not understand what you are trying to say. are
people "demanding a certain performance requirement" (whatever this
means) today? or is it an example of something much less reasonable than
your already unreasonable request that you are kindly offering us because
you actually have the much more unreasonable request in mind?

in case you missed my point entirely, language standards are not the
place to require performance statistics from implementations. it doesn't
matter how much you think you need it or how useful you think it might
be. it's incredibly stupid to put such requirements in a standard.

| Personally, I have nothing against commercial vendors but this issue
| seems orthogonal to the discussion. Besides, it would make sense for the
| vendors to document things like that when it gets asked frequently. That
| is why a lot of vendors put (a sanitized version of) their support
| knowledge base online.

and what? are you unhappy with the current support offerings? it looks
like you don't have anything actually constructive in mind.

| I think I do understand.

you wrote:

||... but since the implementation is not required to support objects or

||arrays above a certain size any vendor can claim O(1) performance of
||everything by choosing a suitably large constant.

and this indicates that you have very little understanding of what the
big-O notation is supposed to express and use it in a very sloppy fashion
that I would expect from newspaper hacks. I certainly do not expect any
vendor to be equally devoid of understanding or willing to compromise his
credibility with users who actually knows what you suggest they say means.

| That's why I don't think such requirements are useful in a standard.

so the standard should not include them, but you think it's smart to
demand such data of every conforming implementations, even though you
think that by jacking up the constant, you can get away with grossly
misleading numbers?

but never mind -- programmers who can't write stuff they need on their
own should be doing something else entirely.

Lieven Marchand

unread,

May 29, 1999, 3:00:00 AM5/29/99

to

cbbr...@news.hex.net (Christopher Browne) writes:

> On 28 May 1999 20:20:54 +0200, Lieven Marchand <m...@bewoner.dma.be> wrote:

> >I think I do understand. That's why I don't think such requirements

> >are useful in a standard.
>

> This is less useful than people may think it would be.
>

We seem to agree here ;-)

> Consider that there is not necessarily an unambiguously "best"
> algorithm.
>

Consider even that there is not necessarily an unambiguously "best"
metric. Some nifty data structures have very good amortized cost over
a sequence of operations but can be very bad when used in an
"pessimal" order. They can also give problems for real time.

> The C++ folk doubtless *couldn't* establish forcible "Big O"
> quantifications for things like STL, because all three of the
> considerations above kick in in *spades,* making analytic solutions
> impossible. You can't define things at the "STL Standard" level,
> because it needs to be portable across the variety of possible C++
> implementations, and there is a fairly wide scope of permissible ways of
> implementing C++.

Did they take it out? The last time I looked (1st or 2nd committee
draft) it was still in.

Lieven Marchand

unread,

May 29, 1999, 3:00:00 AM5/29/99

to

Erik Naggum <er...@naggum.no> writes:

> * Lieven Marchand <m...@bewoner.dma.be>
> | I think it would make more sense for the standard to demand of
> | implementors to document the expected performance of their implementation.
>
> * Erik Naggum <er...@naggum.no>
> | this would mean that no implementation would ever be conforming to the
> | standard, and reasonable people would shrug off that requirement, because
> | it is actually an unreasonable thing to request people to do.
>
> * Lieven Marchand <m...@bewoner.dma.be>
> | Why? It seems more reasonable than demanding a certain performance
> | requirement.
>
> I have tried, but I do not understand what you are trying to say. are
> people "demanding a certain performance requirement" (whatever this
> means) today? or is it an example of something much less reasonable than
> your already unreasonable request that you are kindly offering us because
> you actually have the much more unreasonable request in mind?
>

Pay attention. In the beginning of the thread Kent Pitman wrote:

HOWEVER, I do think it would be good for standards to start
to document what you can expect in terms of behavior even if that isn't
all you might wish for--so as not to trick people into thinking they are
getting better performance than they think they are.

to which I replied the above.

> in case you missed my point entirely, language standards are not the
> place to require performance statistics from implementations. it doesn't
> matter how much you think you need it or how useful you think it might
> be. it's incredibly stupid to put such requirements in a standard.
>

It seems we're in violent agreement.

> and what? are you unhappy with the current support offerings? it looks
> like you don't have anything actually constructive in mind.
>

For the record, I use a commercial implementation and I'm very
satisfied with the support. If I wasn't, I would shop around for
another vendor. Quality of support is something the market place is
very good at sorting out.

> so the standard should not include them, but you think it's smart to
> demand such data of every conforming implementations, even though you
> think that by jacking up the constant, you can get away with grossly
> misleading numbers?
>

Ever heard of benchmarks?

Erik Naggum

unread,

May 29, 1999, 3:00:00 AM5/29/99

to

* Lieven Marchand <m...@bewoner.dma.be>
| Pay attention.

I always do.

| to which I replied the above.

and which still doesn't mean it makes sense. but never mind.

| For the record, I use a commercial implementation and I'm very satisfied
| with the support. If I wasn't, I would shop around for another vendor.
| Quality of support is something the market place is very good at sorting
| out.

ok, so we're agreeing on that aspect, too. amazing.

| Ever heard of benchmarks?

yes, so much that I know that they are almost completely worthless.