Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

hashtable w/o keys stored...

266 views
Skip to first unread message

David Bakhash

unread,
Jan 12, 1999, 3:00:00 AM1/12/99
to
hey,

I am using hash-tables, and realized that Lisp, by the way hash-tables are
defined, must store the keys of the hash-table. I am trying to run something
which quickly starts sucking up memory because storing all the keys is
heinus. I'd rather just have collisions, in which case I have a novel way to
resolve them. I'm trying to figure out a way out.

The idea is pretty simple and probably generic:

instead of using (setf gethash) to insert, I'll use pushnew + (setf gethash)
and for instead of using gethash I'll use find + gethash. that way I can make
my own hash-table, but somehow using keys which don't necessary suck up as
much memory for storage as what I'd originally wanted.

does this make sense to anyone? I'm sure poeple have, at some point in time,
wanted a more relaxed version of hash-tables which don't guarantee "no
collisions", and (maybe) one for which you cannot iterate over the keys.
Anyone have a clue here?

dave

Barry Margolin

unread,
Jan 12, 1999, 3:00:00 AM1/12/99
to
In article <cxjlnj8...@acs1.bu.edu>, David Bakhash <ca...@bu.edu> wrote:
>I am using hash-tables, and realized that Lisp, by the way hash-tables are
>defined, must store the keys of the hash-table. I am trying to run something
>which quickly starts sucking up memory because storing all the keys is
>heinus. I'd rather just have collisions, in which case I have a novel way to
>resolve them. I'm trying to figure out a way out.

Use SXHASH to compute a hash code for the key, and then use this as the key
in the hash table.

--
Barry Margolin, bar...@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Don't bother cc'ing followups to me.

Will Fitzgerald

unread,
Jan 12, 1999, 3:00:00 AM1/12/99
to
You could do the following:

(defun make-table (size)
(make-array size :initial-element nil))

(defun get-value (key table)
(svref table (mod (sxhash key) (length table))))

(defun (setf get-value) (value key table)
(setf (svref table (mod (sxhash key) (length table))) value))


David Bakhash wrote in message ...
>hey,


>
>I am using hash-tables, and realized that Lisp, by the way hash-tables are
>defined, must store the keys of the hash-table. I am trying to run
something
>which quickly starts sucking up memory because storing all the keys is
>heinus. I'd rather just have collisions, in which case I have a novel way
to
>resolve them. I'm trying to figure out a way out.
>

Lyman S. Taylor

unread,
Jan 12, 1999, 3:00:00 AM1/12/99
to
In article <cxjlnj8...@acs1.bu.edu>, David Bakhash <ca...@bu.edu> wrote:
>hey,
>
>I am using hash-tables, and realized that Lisp, by the way hash-tables are
>defined, must store the keys of the hash-table.
....

> I'd rather just have collisions, in which case I have a novel way to
>resolve them. I'm trying to figure out a way out.

The reason you must store keys is to resolve the collisions. So the
collisions may happen now. I think it would be more precise to
say you need something else to be the key preferably with a more
concise encoding.

Two other responses have point out the function SXHASH.
However, this doesn't guarrantee unique values for each object.
You would still need the key to resolve collisions. For example:

USER(11): (sxhash #( a b ))
2
USER(12): (sxhash #( c d ))
2

Here the impelementation is apparently returning the size of the vector
as the code. For other data types the values distribution is much more
less uniform. :-) [ It is implementation specific on how well the
hash codes are distributed over all possible values. The restriction
the codes be indepedent of an "image" likely means for some
situations it will be far from "perfect". ]

What I think you are looking for is a "perfect hashing" function.
In some circumstances you can create these. See Knuth (or another good
algorithms book) on perfect hashes. If each hash code is unique
then you don't need to store the keys to resolve collisions. Because
there aren't any and the index into the underlying array is the
"key".


[ Another partial solution would be to only store the keys for
the cases where SXHASH did produce a solution. Although in
the worst case everything could collide and you'd have an
even larger "problem". ]


>wanted a more relaxed version of hash-tables which don't guarantee "no
>collisions", and (maybe) one for which you cannot iterate over the keys.
>Anyone have a clue here?

The predefine tables don't guarantee "no collisions".

And I'm not sure what you mean by iterate over the keys.
You can iterate over a hash table ( this iteration process is
independent of the keys). MAPHASH and WITH-HASH-TABLE-ITERATOR
iterate over the table but there is no "order" in which the
keys are encouttered.

--

Lyman S. Taylor "Twinkie Cream; food of the Gods"
(ly...@cc.gatech.edu) Jarod, "The Pretender"


Barry Margolin

unread,
Jan 12, 1999, 3:00:00 AM1/12/99
to
In article <77g8fd$j...@pravda.cc.gatech.edu>,

Lyman S. Taylor <ly...@cc.gatech.edu> wrote:
>In article <cxjlnj8...@acs1.bu.edu>, David Bakhash <ca...@bu.edu> wrote:
>>hey,
>>
>>I am using hash-tables, and realized that Lisp, by the way hash-tables are
>>defined, must store the keys of the hash-table.
>....
>> I'd rather just have collisions, in which case I have a novel way to
>>resolve them. I'm trying to figure out a way out.
>
> The reason you must store keys is to resolve the collisions. So the
> collisions may happen now. I think it would be more precise to
> say you need something else to be the key preferably with a more
> concise encoding.

He said he's going to resolve the collisions his own way. Perhaps there's
something in the values that does that. He didn't go into any more detail,
so I took his claim at face value.

Lyman S. Taylor

unread,
Jan 12, 1999, 3:00:00 AM1/12/99
to
In article <wWNm2.88$oD6....@burlma1-snr1.gtei.net>,
Barry Margolin <bar...@bbnplanet.com> wrote:
...

>He said he's going to resolve the collisions his own way. Perhaps there's
>something in the values that does that. He didn't go into any more detail,

That bothered me. If this "something" can resolve colllisions why isn't
it the key? Otherwise, if you have a piece of data (no key) in a
"hash bucket" how do you know which "key" it belongs to if there is not
one-to-one mapping between keys and buckets?

I also wondered if this memory growth problem was that the keys never got
flushed. Garbage collection is nice, but with non-ephemeral hash tables
you have to explicity flush the key->data mapping once you are done with
it. [ Unless you have weak tables. ]

David Bakhash

unread,
Jan 13, 1999, 3:00:00 AM1/13/99
to
ly...@cc.gatech.edu (Lyman S. Taylor) writes:

> In article <wWNm2.88$oD6....@burlma1-snr1.gtei.net>,
> Barry Margolin <bar...@bbnplanet.com> wrote:
> ...
> >He said he's going to resolve the collisions his own way. Perhaps there's
> >something in the values that does that. He didn't go into any more detail,
>
> That bothered me. If this "something" can resolve colllisions why isn't
> it the key? Otherwise, if you have a piece of data (no key) in a
> "hash bucket" how do you know which "key" it belongs to if there is not
> one-to-one mapping between keys and buckets?

okay. Let me explain. I'm only doing this b/c you're interested,
though it's really not that important.

suppose, ideally, I want a hash table whose keys are strings and whose
values are objects. However, I have so many keys, and each key (string)
is, say 50 characters long. With 50,000 keys, this can start to really
swallow up memory. But if, instead, the objects in the hash buckets
somehow contain the strings efficiently, then I can create those strings
when needed, but not store them permanently.

If you're wondering how that can be done, just imagine that two numbers
(pos . length) specifiy a sub-string inside one huge string. So just
storing the pos and length is enough.

dave

vnik...@poboxes.com

unread,
Jan 13, 1999, 3:00:00 AM1/13/99
to
In article <wkiueb2...@mit.edu>,
David Bakhash <ca...@mit.edu> wrote:
(...)

> suppose, ideally, I want a hash table whose keys are strings and whose
> values are objects. However, I have so many keys, and each key (string)
> is, say 50 characters long. With 50,000 keys, this can start to really
> swallow up memory. But if, instead, the objects in the hash buckets
> somehow contain the strings efficiently, then I can create those strings
> when needed, but not store them permanently.

So you need a hash table of a kind different from what Common Lisp
has. You (seem to) need:

* to pass just one object to the hash table when you do the store
(instead of one key and one object, as with CL);

* to pass a function to the hash table constructor which will later
be called on each object being stored to compute the key.

Unless someone has already implemented such tables somewhere,
it is up to you.

Good luck,
Vassil.

Vassil Nikolov <vnik...@poboxes.com> www.poboxes.com/vnikolov
(You may want to cc your posting to me if I _have_ to see it.)
LEGEMANVALEMFVTVTVM (Ancient Roman programmers' adage.)

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own

Barry Margolin

unread,
Jan 13, 1999, 3:00:00 AM1/13/99
to
In article <wkiueb2...@mit.edu>, David Bakhash <ca...@mit.edu> wrote:
>If you're wondering how that can be done, just imagine that two numbers
>(pos . length) specifiy a sub-string inside one huge string. So just
>storing the pos and length is enough.

So use (pos . length) as the keys of your hash table, rather than the
substrings themselves.

David Bakhash

unread,
Jan 13, 1999, 3:00:00 AM1/13/99
to
Barry Margolin <bar...@bbnplanet.com> writes:

> In article <wkiueb2...@mit.edu>, David Bakhash <ca...@mit.edu> wrote:
> >If you're wondering how that can be done, just imagine that two numbers
> >(pos . length) specifiy a sub-string inside one huge string. So just
> >storing the pos and length is enough.
>
> So use (pos . length) as the keys of your hash table, rather than the
> substrings themselves.

Not quite. I was going to do it that way, but then the hash-values
couldn't easily refer back to their (pos . length), which I needed. I
didn't want to figure out how to make hash-values know the store the
pointer to the keys, so I just did it the way someone else suggested,
which was (IMO) a very neat idea, which was basically to use the sxhash
function.

thanks, everyone.
dave

Erik Naggum

unread,
Jan 14, 1999, 3:00:00 AM1/14/99
to
* David Bakhash <ca...@mit.edu>

| suppose, ideally, I want a hash table whose keys are strings and whose
| values are objects. However, I have so many keys, and each key (string)
| is, say 50 characters long. With 50,000 keys, this can start to really
| swallow up memory. But if, instead, the objects in the hash buckets
| somehow contain the strings efficiently, then I can create those strings
| when needed, but not store them permanently.
|
| If you're wondering how that can be done, just imagine that two numbers
| (pos . length) specifiy a sub-string inside one huge string. So just
| storing the pos and length is enough.

I'd like to see the actual size of the application that needs this kind
of anal-retentive "efficiency" tinkering. I have an application that I
thought would gobble up all available memory in production, so I wrote a
little thing using SLOT-UNBOUND that gave me lazy-loaded messages that
could be "unloaded" and read back from disk on demand, and prepared to
tie this into GC hooks and all. a message is between 100 bytes and 40K
long, and contains a _lot_ of structure, so the 100-byte message conses
about 1K and the 40K message about 44K. there are approximately 350 of
them per day, 5 days a week today. there will be approximately 3000 of
them per day at the end of this month, albeit the increase will be in
very small messages. the machine has only 64M RAM, so I worried a bit.
a test run of 15000 evenly paced messages (so GC could run near normally
and tenure these bastards at a normal pace) caused Allegro CL to swell to
34M RAM. it was no problem at all. the SLOT-UNBOUND stuff is useful for
a lot of other reasons, so it wasn't a waste of time, but UNLOAD-MESSAGE
never gets called in the production version. the part of my brain that
still remembers when I thought C was a programming language had voiced
its usual misguided protests. a C programmer would have worried that 34M
was a lot of wasted memory. after all, the active lifetime of a message
averages out at 15 seconds, but worst case, it's hanging around in memory
for a week, just because some client _may_ want an old copy of it. the
optimization problem is basically this: it would cost more to figure out
when to unload these messages than I would ever hope to save by doing so
and since the memory is not used for anything else, _any_ time spent on
figuring out when to unload would be a waste, and the occasional load
would _also_ be a loss. still, I'll bet a C programmer would cringe at
the "inefficiency" of having more or less dead objects take up many
megabytes of RAM for a whole week, but what would I do with it? (it's
not like I run Windows or anything.)

the hardest part for a programmer used to C and C++ and that crap is to
shed the _invalid_ concerns. psychologists call them "obsessions" and
charge people a lot to get rid of them. some programmers charge their
users a lot to be able to keep them. go figure.

#:Erik

David Bakhash

unread,
Jan 14, 1999, 3:00:00 AM1/14/99
to
trust me eric. I'm the laziest programmer I know, and there are times
that I would argue that laziness in an engineer is a damn good thing.
A lazy person that was still determined to complete a hard task would
tend to do it in a way that was quick, efficient, and that would
minimize energy consumption over the usage of that program, meaning that
his/her program would be extensible as far as he/she could have
forcasted initially.

In the actual case that I'm working on, there is an optimal way to store
the data (compressed, in the Huffman sense, which would require
probabilistic analysis, etc). I'm just doing something in-between.
Though I'm probably sweating a lot on this now, it's so that I don't
have to sweat later, and thus, I'm being lazy. I don't want to
re-design later.

Still, after reading your few posts-of-the-day (and Kent's) I think that
the best way is to take advantage of CLOS, and do it inefficiently now,
and then, if it needs to be tweaked later, I should be able to fix it
w/o changing the interface to the class objects. I guess this is how
all methods in CLOS are virtual. I'd just have to make sure that in the
subsequent coding, I'll have to make sure to use `with-accessors'
instead of `with-slots'. I don't know anymore. When a programming task
requires you to swallow the 6+ Meg Brown corpus, and run recursively
optimal segmentation on it, I start to worry about storage. I don't
think this is so crazy. Plus, I've always used defstruct in the past.
I've always been reluctant to use CLOS. I'll do what you suggest. I
think you're probably right, since this stuff is sucking up wads of
time. My girlfriend told me that last nite I said, in my sleep, "I
really don't understand object orientation in Lisp." It's probably the
first complete sentence I've ever uttered in my sleep that was
gramatically and semantically correct.

dave

Kent M Pitman

unread,
Jan 14, 1999, 3:00:00 AM1/14/99
to
David Bakhash <ca...@mit.edu> writes:

> trust me eric.

His name is Erik.

> Still, after reading your few posts-of-the-day (and Kent's) I think that
> the best way is to take advantage of CLOS, and do it inefficiently now,

What makes you think it's going to be any more or less efficient if
you do it the way we've suggested than the way you were wanting to?
I suspect you are making some incorrect assumption about what is fast
and slow in CLOS.

CLOS does not do compile-time method resolution.

CLOS also does not necessarily take longer to find a method on a
parent class than a child class. Effective method computation
is not doen every time you do dispatch, and once it is done, there
is no necessary inefficiency of finding the method that can be
described as a cost in terms of distance between the actual class
and the class being inherited from in the type tree.

As to using defstruct instead of defclass, there are sometimes reasons
for doing this. But they are very few. In general, the only
difference there will be constant factors in speed, and if all that
stands between you and a perfect product is constant factors on that
order of magnitude, you're going to be well ahead of the game.
Further, using defclass gives you good leverage that will help assure
you are far enough along in your work that it'll be worth caring about
performance. Using defstruct, except as an after-the-fact
optimization to accomodate an observed make-or-break efficiency
problem that is specifically proven to be making a complete product
unacceptably slow is very suspect.

>[sleeping] "I really don't understand object orientation in Lisp."

You're not going to get any closer to understanding it without using it.

Kenneth P. Turvey

unread,
Jan 14, 1999, 3:00:00 AM1/14/99
to
On 12 Jan 1999 17:09:05 -0500, Lyman S. Taylor <ly...@cc.gatech.edu> wrote:
>In article <wWNm2.88$oD6....@burlma1-snr1.gtei.net>,
>Barry Margolin <bar...@bbnplanet.com> wrote:
>>He said he's going to resolve the collisions his own way. Perhaps there's
>>something in the values that does that. He didn't go into any more detail,
>
> That bothered me. If this "something" can resolve colllisions why isn't
> it the key? Otherwise, if you have a piece of data (no key) in a
> "hash bucket" how do you know which "key" it belongs to if there is not
> one-to-one mapping between keys and buckets?

I have been working on application that doesn't really care if a
collision happens every once in a while. Only the minimum of the
associated values could be stored. As long as colisions are infrequent
this is sufficient.

There are circumstances where resolving colisions doesn't require
knowing the keys.


--
Kenneth P. Turvey <ktu...@SprocketShop.com>

A computer lets you make more mistakes faster than any invention in
human history with the possible exceptions of handguns and tequila.
-- Mitch Ratliffe, Technology Review, April, 1992

David Bakhash

unread,
Jan 14, 1999, 3:00:00 AM1/14/99
to
Kent M Pitman <pit...@world.std.com> writes:

> CLOS does not do compile-time method resolution.

I always wondered about this. It never seemed to make much sense. From
looking at my code, and the declarations therein, it seems that an
optimizing compiler would be able to avoid certain class resolution
statements at certain points in the code. I guess that since the whole
system is subject to change at runtime, and since new classes can pop up
at runtime, these optimizations can't actually be done. Furthermore,
b/c CLOS doesn't have the concept of sealing classes, this is even
harder.

But still, if I had this:

(defclass city () ((name :accessor name)))
(defclass person () ((name :accessor name)))

(defmethod city-with-same-name-as-person ((p person))
(with-accessors ((pname name)) p
(loop for city of-type city in *cities*
until (string= name (name city))
finally (return city))))

Now, wouldn't it be nice if #'name in (name city) could take advantage
of the fact that it knew that its argument was a subtype of city? Maybe
this argument isn't so concrete, but if you note that CLOS dispatches on
the types of multiple arguments, then you'll see how compile-time method
resolution can be (at least partially) computed. i.e. you have
something like:

(defmethod f ((a type-1) (b type-1)) ...)
(defmethod f ((a type-1) (b type 2)) ...)
(defmethod f ((a type-2) (b type-1)) ...)
(defmethod f ((a type-2) (b type-2)) ...)

Then if a declaration in another method told you that `a' was of-type
`type-2', then you'd only have to check the type of `b' at run-time.

But, of course, with the dynamism of CL, this might be in conflict with
run-time class- and method-definition.

dave

Kelly Murray

unread,
Jan 14, 1999, 3:00:00 AM1/14/99
to
A hashtable w/o keys isn't really a hashtable it seems to me.
It's trivial to build such a thing that returns all values
for the key:

(defun make-keyless-hash () (make-array 20))

(defun gethash (key table)
(aref table (mod (sxhash key) (length table)))))

(defun sethash (value key table)
(push value (aref table (mod (sxhash key) (length table)))))

If you make the table big enough and the hash values distribute
well, you might be ok.
Another hack is to use a "sparse matrix" array if the hash keys
are highly distributed, that is, make the "array" huge, but don't
actually create the whole table until needed, below the
array is 65,535 elements large.

see http://www.intellimarket.com/oodb/keyless.cl

-Kelly Murray

Mike McDonald

unread,
Jan 14, 1999, 3:00:00 AM1/14/99
to
In article <369E7787...@intellimarket.com>,

Kelly Murray <k...@IntelliMarket.Com> writes:
> A hashtable w/o keys isn't really a hashtable it seems to me.
> It's trivial to build such a thing that returns all values
> for the key:
>
> (defun make-keyless-hash () (make-array 20))
>
> (defun gethash (key table)
> (aref table (mod (sxhash key) (length table)))))
>
> (defun sethash (value key table)
> (push value (aref table (mod (sxhash key) (length table)))))

Without saving the key, you can't tell the difference between a collision
and replacing the entry. For example:

(sethash a 'foo *hash-table*)
(sethash b 'foo *hash-table*)
(gethash 'foo *hash-table*)

I'd have expected the answer to be b, not (b a). I personally would have
wrtten it so that if a collision happens, the last one wins, aka:

(defun sethash (value key table)

(setf (aref table (mod (sxhash key) (length table))) value))

Mike McDonald
mik...@mikemac.com

Lyman S. Taylor

unread,
Jan 14, 1999, 3:00:00 AM1/14/99
to
In article <77lug0$emo$1...@spitting-spider.aracnet.com>,
Mike McDonald <mik...@mikemac.com> wrote:
...

> Without saving the key, you can't tell the difference between a collision
>and replacing the entry. For example:

Additionally, if FOO and BAZ happen have the same "SXHASH mod length"
encoding....

(sethash a 'foo *hash-table*)
(gethash 'baz *hash-table*)

Then the latter expression will return the entry for FOO. I
would presume that this is an error, not a feature.

IHMO, only in extremely special circumstances can you not include
the key (or an "alias" thereof) along with the value in the hash table
entry. Specifically, when there are unique hash buckets for each
possible entry, the bucket "index" can serve as the key. This
index may not be an effective "alias" if you every wish to reverse
index the key.

SXHASH doesn't intentially provide this property. ( it may happend to
work on a specific implementation in a specific context. )

Pierre Mai

unread,
Jan 14, 1999, 3:00:00 AM1/14/99
to
Kent M Pitman <pit...@world.std.com> writes:

[ sound advice elided ]

> >[sleeping] "I really don't understand object orientation in Lisp."
>
> You're not going to get any closer to understanding it without using it.

Furthermore, I'd suggest that David get a copy of Sonya E. Keene's
classic "Object-Oriented Programming in Common Lisp -- A Programmer's
Guide to CLOS", Addison-Wesley/Symbolics Press, 1988, ISBN
0-201-17589-4. Reading this will probably clarify a number of basic
and not so basic issues in dealing with CLOS in the way that was
intended. This book is all the more valuable to David, as it
introduces you to CLOS programming and the CLOS way of OOP, without
trying to (incorrectly) map the given concepts to those found in other
languages and approaches.

Regs, Pierre.

PS: Last time I looked, the book was available via amazon.com.

--
Pierre Mai <pm...@acm.org> http://home.pages.de/~trillian/
"One smaller motivation which, in part, stems from altruism is Microsoft-
bashing." [Microsoft memo, see http://www.opensource.org/halloween1.html]

Barry Margolin

unread,
Jan 15, 1999, 3:00:00 AM1/15/99
to
In article <wkaezl3...@mit.edu>, David Bakhash <ca...@mit.edu> wrote:
>Kent M Pitman <pit...@world.std.com> writes:
>
>> CLOS does not do compile-time method resolution.
>
>I always wondered about this. It never seemed to make much sense. From
>looking at my code, and the declarations therein, it seems that an
>optimizing compiler would be able to avoid certain class resolution
>statements at certain points in the code. I guess that since the whole
>system is subject to change at runtime, and since new classes can pop up
>at runtime, these optimizations can't actually be done. Furthermore,
>b/c CLOS doesn't have the concept of sealing classes, this is even
>harder.

Right. Dylan added sealing precisely to allow these types of
optimizations.

>But still, if I had this:
>
>(defclass city () ((name :accessor name)))
>(defclass person () ((name :accessor name)))
>
>(defmethod city-with-same-name-as-person ((p person))
> (with-accessors ((pname name)) p
> (loop for city of-type city in *cities*
> until (string= name (name city))
> finally (return city))))
>
>Now, wouldn't it be nice if #'name in (name city) could take advantage
>of the fact that it knew that its argument was a subtype of city?

How? After this code were compiled, someone could write another module
like:

(defclass wonderful-city (city) ())

(defmethod city :around ((self wonderful-city))
(format nil "The wonderful city of ~A" (next-method)))

Most optimizations that were done to (name city) would probably not work on
objects of class wonderful-city. It might help if the implementation of
method dispatch were some kind of search through the class hierarchy, since
it would give you a starting point in the tree; but that's not generally
how dispatching is implemented, so the hint doesn't really help.

> Maybe
>this argument isn't so concrete, but if you note that CLOS dispatches on
>the types of multiple arguments, then you'll see how compile-time method
>resolution can be (at least partially) computed. i.e. you have
>something like:
>
>(defmethod f ((a type-1) (b type-1)) ...)
>(defmethod f ((a type-1) (b type 2)) ...)
>(defmethod f ((a type-2) (b type-1)) ...)
>(defmethod f ((a type-2) (b type-2)) ...)
>
>Then if a declaration in another method told you that `a' was of-type
>`type-2', then you'd only have to check the type of `b' at run-time.
>
>But, of course, with the dynamism of CL, this might be in conflict with
>run-time class- and method-definition.

Precisely. The solution that's generally used in CLOS implementations is
caching. An expensive lookup is done the first time a method is called on
a particular set of leaf classes, and the particular combined method is
then cached in a table keyed off those classes. Certain runtime
redefinitions will cause the cache to clear.

Erik Naggum

unread,
Jan 15, 1999, 3:00:00 AM1/15/99
to
* David Bakhash <ca...@mit.edu>

| trust me eric. I'm the laziest programmer I know, and there are times
| that I would argue that laziness in an engineer is a damn good thing.

well, there's constructive laziness and there's careless laziness.

| A lazy person that was still determined to complete a hard task would
| tend to do it in a way that was quick, efficient, and that would minimize
| energy consumption over the usage of that program, meaning that his/her
| program would be extensible as far as he/she could have forcasted
| initially.

sure. one way to accomplish this is not to worry about needless things.

| I don't want to re-design later.

but you will. you don't know enough about the _real_ efficiency issues
at this early time to make the right design choices. trust me on this.

| I guess this is how all methods in CLOS are virtual.

you're being carelessly lazy, yet you don't appear to see it. "virtual"
doesn't exist in the CLOS vocabulary. you show that you don't understand
CLOS by using terms that are meaningless in its context, and, moreover,
you show that you don't care to do the _necessary_ work to understand
what that differences are all about. this is not the good lzey.

| I'd just have to make sure that in the subsequent coding, I'll have to
| make sure to use `with-accessors' instead of `with-slots'.

huh?

| When a programming task requires you to swallow the 6+ Meg Brown corpus,
| and run recursively optimal segmentation on it, I start to worry about
| storage. I don't think this is so crazy.

well, I don't think it's crazy to worry about, because that makes you
spend more time on the important properties of storage efficiency, but
obsessing about the unimportant issues _is_ crazy, and complaining about
it before you know what's really a problem is definitely crazy.

I've seen some horribly "efficient" code try to deal with large amounts
of text, and it was just so badly done that _any_ change to it would mean
a rewrite of major parts of the rest of the system. these guys also
argued vociferously about efficiency -- their report said they had to
because they were severely limited in machine power. so I rewrote their
entire system just to see how much it would be pessimized by being more
elegant. (it took a few months of spare time, compared to the week I had
envisioned when I started.) well, the first rewrite took twice as long
to run and used 10% more memory. I profiled it, and I located some
boneheaded duplication of effort, so I memoized a couple functions, which
used another 10% more memory, but it used only 5% more CPU than the
highly optimized code I had rewritten. then I noticed that a lot of time
was spent in the memoization code for one of the functions, and realized
that memoization should be taken one step further: the whole relationship
that was being computed was basically the same from run to run, and it
was "learning" from more and more input. they had tried to save that
stuff to disk, but that was slower than training it anew, and it didn't
really work, either. so I made a *huge* array and used twice as much
memory as the original code, but it ran in 75% of the time. there was
still ample room for improvement, and my 64M SPARC was not at all unhappy
about the memory consumption. once this relationship vector had proven
to converge, it could indeed be stored more efficiently, at the cost of
slightly more expensive updates, and one shot at that brought the memory
size to 80% of the original program had spent on _its_ second round.

I couldn't quite figure this out at the time, but it dawned me some time
later: I was willing to be "temporarily inefficient", so I discovered a
very different "optimization path" than the one where _every_ step was
supposed to efficient. it was about that time that I started to figure
out why it is so tremendously hard to improve things in society, too: at
every opportunity, people will optimize their observed political reality,
and they become very good at it. any significant improvement comes at
such a high cost to everyone who has optimized their personal political
reality that it takes a "revolution" to change almost anything. stepwise
refinements to this realization took the form of a desire to find the
real constructive element in what keeps traditions around and what makes
people "conservative", and I think I got it: "it was painful to get here,
so if you change anything, it will be painful to make that as cozy as we
have made this place". that personal pain will far outweigh any real or
projected benefits to any person or any part of the system. I don't
think this is a valid objection. I also don't think "sacrifices" is a
valid argument. the point is to be secure and safe enough to be able to
relinquish control for a short period of time and then regain it, maybe
in a new or at least _slightly_ different form.



| Plus, I've always used defstruct in the past. I've always been reluctant
| to use CLOS.

why? are you using CMUCL or CLISP or soemthing? (I know I abhorred CLOS
while I only had CMUCL and CLISP, and that's one reason why I don't think
people should optimize for free software until they know what it should
be like. (with a Unix work-alike, that's pretty easy, since a child can
see what's wrong with Unix and improve upon it.) too often, they settle
for what can be done without a lot of effort and only through incremental
improvements.

#:Erik

Barry Margolin

unread,
Jan 15, 1999, 3:00:00 AM1/15/99
to
In article <31253516...@naggum.no>, Erik Naggum <er...@naggum.no> wrote:
>* David Bakhash <ca...@mit.edu>

>| I guess this is how all methods in CLOS are virtual.
>
> you're being carelessly lazy, yet you don't appear to see it. "virtual"
> doesn't exist in the CLOS vocabulary. you show that you don't understand
> CLOS by using terms that are meaningless in its context, and, moreover,
> you show that you don't care to do the _necessary_ work to understand
> what that differences are all about. this is not the good lzey.

It may not exist in the CLOS vocabulary, but it certainly exists in the OOP
vocabulary. And in that context, all CLOS methods are virtual -- they
dispatch on the dynamic type of the object, rather than the static type of
a variable declaration. Since CLOS doesn't provide non-virtual methods, it
doesn't need a syntactic way to indicate which are virtual and non-virtual,
as C++ does.

Erik Naggum

unread,
Jan 15, 1999, 3:00:00 AM1/15/99
to
* Barry Margolin <bar...@bbnplanet.com>

| It may not exist in the CLOS vocabulary, but it certainly exists in the
| OOP vocabulary.

well, what is this purported single OOP vocabulary? is it a denial of
the fact that certain terms have meaning only in the context of specific
languages? does it make any more sense to talk about "virtual methods"
in CLOS than it does to talk about "generic functions" in C++ just
because there's multiple inheritance from "the C++ vocabulary" and "the
CLOS vocabulary" into "the OOP vocabulary"?

| And in that context, all CLOS methods are virtual -- they dispatch on the
| dynamic type of the object, rather than the static type of a variable
| declaration. Since CLOS doesn't provide non-virtual methods, it doesn't
| need a syntactic way to indicate which are virtual and non-virtual, as
| C++ does.

great. I'm sure you made him feel better for having been consoled and
told "you're not completely wrong", and now he'll continue to talk about
virtual functions in CLOS forever, instead of having the potential of
being rewarded for getting it right some time in the future. just great.

#:Erik

Barry Margolin

unread,
Jan 15, 1999, 3:00:00 AM1/15/99
to
In article <31254165...@naggum.no>, Erik Naggum <er...@naggum.no> wrote:
>* Barry Margolin <bar...@bbnplanet.com>
>| It may not exist in the CLOS vocabulary, but it certainly exists in the
>| OOP vocabulary.
>
> well, what is this purported single OOP vocabulary? is it a denial of
> the fact that certain terms have meaning only in the context of specific
> languages? does it make any more sense to talk about "virtual methods"
> in CLOS than it does to talk about "generic functions" in C++ just
> because there's multiple inheritance from "the C++ vocabulary" and "the
> CLOS vocabulary" into "the OOP vocabulary"?

OOP vocabulary is the terminology that someone who is familiar with a
multitude of OO languages would be expected to understand. Just because we
happen to be in a CLOS context doesn't mean we can't make references to
features of other languages like C++. If we can't borrow the term "virtual
function", we have to say "method that dispatches on the dynamic type",
which is unnecessarily verbose. If someone were doing a survey of
programming languages, and there were a question like "Does your language
include virtual functions?", I would answer "yes" for Common Lisp.

However, I've discovered from past conversations that you believe it's
totally wrong to make any references or analogies between programming
paradigms (it's come up in threads comparing Lisp and Perl, for instance).
In comp.lang.lisp we must act as if no other languages exist, and woe be
unto one who lapses and mentions a term they've learned in another
environment -- the wrath of Erik will be upon them.

Gareth McCaughan

unread,
Jan 15, 1999, 3:00:00 AM1/15/99
to
Erik Naggum wrote:

> | And in that context, all CLOS methods are virtual -- they dispatch on the
> | dynamic type of the object, rather than the static type of a variable
> | declaration. Since CLOS doesn't provide non-virtual methods, it doesn't
> | need a syntactic way to indicate which are virtual and non-virtual, as
> | C++ does.
>
> great. I'm sure you made him feel better for having been consoled and
> told "you're not completely wrong", and now he'll continue to talk about
> virtual functions in CLOS forever, instead of having the potential of
> being rewarded for getting it right some time in the future. just great.

If he does much CLOS, he'll get so fed up of calling *everything*
virtual that he'll stop bothering. Then there'll be no problem.
(Until he goes back to C++ and starts asking about generic functions
in C++. (-: )

--
Gareth McCaughan Dept. of Pure Mathematics & Mathematical Statistics,
gj...@dpmms.cam.ac.uk Cambridge University, England.

Barry Margolin

unread,
Jan 16, 1999, 3:00:00 AM1/16/99
to
In article <86lnj4j...@g.pet.cam.ac.uk>,

Gareth McCaughan <gj...@dpmms.cam.ac.uk> wrote:
>Erik Naggum wrote:
>
>> | And in that context, all CLOS methods are virtual -- they dispatch on the
>> | dynamic type of the object, rather than the static type of a variable
>> | declaration. Since CLOS doesn't provide non-virtual methods, it doesn't
>> | need a syntactic way to indicate which are virtual and non-virtual, as
>> | C++ does.
>>
>> great. I'm sure you made him feel better for having been consoled and
>> told "you're not completely wrong", and now he'll continue to talk about
>> virtual functions in CLOS forever, instead of having the potential of
>> being rewarded for getting it right some time in the future. just great.
>
>If he does much CLOS, he'll get so fed up of calling *everything*
>virtual that he'll stop bothering. Then there'll be no problem.
>(Until he goes back to C++ and starts asking about generic functions
>in C++. (-: )

He wasn't calling everything virtual, he was making a point in a discussion
about programming methodologies. The fact that CLOS methods have the
properties associated with virtual member functions in another language I
won't name was germaine to his point. He expressed it very clearly, IMHO.

Erik Naggum

unread,
Jan 16, 1999, 3:00:00 AM1/16/99
to
* Barry Margolin <bar...@bbnplanet.com>

| OOP vocabulary is the terminology that someone who is familiar with a
| multitude of OO languages would be expected to understand.

precisely, and _understand_ is the operative word here. someone who
believes that other languages has to have "virtual functions" just
because he's used to C++, does not understand what they are in the C++
context, _either_. you help nobody by defending their pretense to know
something they don't. since you continue to pretend that you have every
right to post your stupid drivel about _me_, as opposed to what I do and
which you _can_ observe, I can only assume that you defend your _own_
pretenses to know something you don't, not actually those of anybody else.

| Just because we happen to be in a CLOS context doesn't mean we can't make
| references to features of other languages like C++.

"references"? the introduction was "I'm trying to figure out how to make
a virtual function", and the conclusion, _long_ before there had been
time to answer him, was "actually, I'm not even sure that this is
possible in CL (and if it isn't, then that's a bit dissappointing)".
what the hell kind of "reference to features of other languages" is that?

why _is_ it necessary for you to start talking about something else that
_isn't_ wrong every time you're trying to argue against my pointing out
something that _is_ wrong? what do you _gain_ from twisting things
around so much? I can't see _any_ constructive element to anything
you're doing in your moronic defenses of specific failures to understand
something. again, I think you're defending yourself, not anybody else.

if we _were_ talking about references to features of other languages,
there wouldn't be any issue. if you would be happy in your pretense that
this is a "reference to features of other languages" and not the request
for help on doing C++ particulars in CLOS that it is, there might be
grounds for a discussion, but I don't think you want that. all I see
from you is an immense desire to attack _me_, no matter what I do, and
every time I point to a serious lack of understanding and a willingness
to make unwarranted, unfounded conclusions, you rise to the defense and
you prove beyond any doubt that you are the champion of the unwarranted
and the unfounded conclusions in the way you attack me.

would you please have somebody you listen to tell you that if you keep
making false accusations against somebody, they _will_ continue to be
pissed at you, because _you_ do something wrong towards _them_, no matter
how morally outraged you are and no matter what you really defend. you
just _don't_ get people to treat you well if you ignore their objections
and continue to attack them for stuff they have already denied or which
has already been disproven. only braindamaged fanatics do that.

| If we can't borrow the term "virtual function", we have to say "method
| that dispatches on the dynamic type", which is unnecessarily verbose.

why do you pretend we "have" to say that? there's already the prefectly
usable term "generic function", and there's no lack of precision to what
it means. it is no more verbose than "virtual function", and it doesn't
talk about CLOS in terms unique to C++. now, _why_ did you discard the
term "generic function"? you _did_ know about it, didn't you?

| If someone were doing a survey of programming languages, and there were a
| question like "Does your language include virtual functions?", I would
| answer "yes" for Common Lisp.

so would I, because every person has a moral obligation to lie to stupid
people who are about to do damage in order to limit that damage: "no"
would be true because CLOS isn't C++, but it would be more damaging than
"yes" because any idiot asking that question has no clue what he's asking
about or what either "yes" or "no" would mean. all that he _could_ be
after is proving that C++ is the best language, just like Bill Gates
ordering _surveys_ that "prove" his point.

| However, I've discovered from past conversations that you believe it's
| totally wrong to make any references or analogies between programming
| paradigms (it's come up in threads comparing Lisp and Perl, for instance).

holy shit. you can't even _read_ something you don't agree with! why am
I not surprised by your willingness to make up even _more_ moronic
bullshit from what you have _not_ observed? I wouldn't trust you to be
able to discover your own navel, much less any protruding body parts.

| In comp.lang.lisp we must act as if no other languages exist, and woe be
| unto one who lapses and mentions a term they've learned in another
| environment -- the wrath of Erik will be upon them.

do you learn the wrong things from everything you experience? if you
break the law and get caught, you'll study how not to get caught, not how
to obey the law, right? when arguing with me, you "learn" that you must
press harder, be even more moronic, make even more unfounded accusations
and post even more insane drivel, because I might start to _accept_ it
once it gets beyond a certain acceptance threshold, is that it? (but
what _am_ I doing using rhetorical devices to a guy so illiterate that he
doesn't even recognize when his own position is ridiculed?) just because
_you_ are incapable of dealing with what I have said about suspending
one's knowledge of something else until there is a _reason_ to compare it
to other knowledge on their _respective_ premises, you will wind up with
meaningless comparisons (or surveys) of apples and oranges, which I have
to ask you if you believe is wrong: DO YOU BELIEVE THAT COMPARING APPLES
AND ORANGES AND DISCUSSING APPLENESS OF ORANGES and ORANGENESS OF APPLES
IS A GOOD WAY TO LEARN ABOUT APPLES AND ORANGES? if you do not, there is
perhaps hope that the view might one day penetrate your thick skull that
I'm arguing against the concept of using appleness as a means of judging
oranges, I'm _not_ arguing against either apples or oranges, which you
have stupidly concluded over and over and over.

I wish you would some day grow smart enough to realize that the reason my
wrath is upon you in particular is that you make so many fucking annoying
accusations that do nothing but help maintain your _own_ mental image of
something despite a continued flow of information that would have negated
it in a _living_ brain. I have to ask: what _do_ you gain from this? if
you find the wrath uncomfortable, stop accusing me of stuff you have no
way of knowing and which couldn't be any more false. if you have to make
stuff up, be my guest, but be _honest_ enough to know that you make it up
and don't pretend that you have knowledge you don't have, and _don't_
pretend that your amazing ability to reinforce your own prejudices is any
smarter than poking people in the eye to see if they will get mad at you.

to conclude, I think your defense of the doofuses here is something you
do only because your own amazing inability to separate conjecture from
fact is under implicit attack, and I guess that if I were allowed to
successfully attack the phenomenon of baseless conjectures masquerading
as fact or knowledge, something _you_ do would be in serious jeopardy.
in order for this not to take place, my guess is that you have to make up
a demonic image of me and do your best to portray me as something I'm not
and as attacking something perfectly benign that I'm _not_ attacking in
order that nobody turn around and ask _you_ why you keep your projection
nonsense going at full speed. people _do_ notice that you argue against
something other than that which people actually say, you know. the more
you keep doing it, the easier it is to expose it as a pattern of yours.

I wonder what nonsense you'll get out of this and which pathetic lies you
will serve the next time you feel fear of exposure or whatever it is that
keeps you going. when I argued that in order to learn something that is
very close to something one already knows, one must effectively suspend
what one knows from the prior, similar experiences and listen to the new
as if it was _all_ new, your moronic "conclusion" is "I've discovered


from past conversations that you believe it's totally wrong to make any

references or analogies between programming paradigms". what the fuck
possessed your pathetic excuse for a brain to arrive at such an amazingly
unintelligent conclusion? you used to be quite able at what you were
doing, but you've become incapable of dealing even with the simplest
summaries of somebody else's position! what _is_ it that gives you the
right to pretend you know better than somebody else what they think? my
guess is some seriously misguided piece of fundamentalist religion, but
please feel free to tell me you're not a religious nut -- such have been
the only kind of people to make the kind of mistake I see you do, but it
would be interesting to learn if there is a more basic error than that
producing fundamentalist religious beliefs at work with you.

now, I can't make you stop producing insane drivel about me, and I can't
stop you from lying about what I say or from arguing against something I
have _not_ said in order to defend whatever it is you are afraid of
having exposed or from producing summaries of my position or arguments
that would have got a failing grade from anybody who can read reasonably
long sentences, but I _will_ continue to show how you defend something
people _don't_ do in order to detract from my legitimate criticism of
something they _do_ do, and I _will_ continue to object to every single
piece of insane drivel you post about me, and I _will_ continue to
correct your false statements of fact when you impute words or positions
to me that are in fact _contradicted_ by the easily available evidence, a
modus operandi you have stuck to for quite a while. if you can't stop
posting insane drivel and impute all sorts of bullshit to me, talk to
somebody about it, and just get _over_ your obsession, OK? I'm getting
sick of it.

#:Erik

Barry Margolin

unread,
Jan 16, 1999, 3:00:00 AM1/16/99
to
In article <31254437...@naggum.no>, Erik Naggum <er...@naggum.no> wrote:
> "references"? the introduction was "I'm trying to figure out how to make
> a virtual function", and the conclusion, _long_ before there had been

Wasn't that a completely different thread? This is the thread about the
weird hash table where he doesn't mind collisions.

Erik Naggum

unread,
Jan 16, 1999, 3:00:00 AM1/16/99
to
* Erik Naggum <er...@naggum.no>

| "references"? the introduction was "I'm trying to figure out how to make
| a virtual function", and the conclusion, _long_ before there had been

* Barry Margolin <bar...@bbnplanet.com>


| Wasn't that a completely different thread? This is the thread about the
| weird hash table where he doesn't mind collisions.

I'm trying to answer your replies, hard as it may be. if there's a mixup
of the threads, it isn't mine.

#:Erik

Barry Margolin

unread,
Jan 16, 1999, 3:00:00 AM1/16/99
to
In article <31254459...@naggum.no>, Erik Naggum <er...@naggum.no> wrote:
>* Erik Naggum <er...@naggum.no>

>| "references"? the introduction was "I'm trying to figure out how to make
>| a virtual function", and the conclusion, _long_ before there had been
>
>* Barry Margolin <bar...@bbnplanet.com>
>| Wasn't that a completely different thread? This is the thread about the
>| weird hash table where he doesn't mind collisions.
>
> I'm trying to answer your replies, hard as it may be. if there's a mixup
> of the threads, it isn't mine.

Sorry, it *is* you. The guy was trying design his funny hash table
mechanism. At some point he decided that he could use CLOS to do it, as it
would allow him to provide different implementations with the same
interface. He said, "I guess this is how all CLOS methods are virtual."
Obviously there's something wrong with the syntax there (maybe he's not a
native English speaker, or he simply mis-edited the sentence) -- I
interpreted it as "I guess this is because all CLOS methods are virtual."
Given what the term "virtual" means in OO contexts, this is a true
statement.

What I see there is someone making a realization about the language, and
relating it to what he already knows. A light bulb went off above his
head, he learned something. This is something to be applauded! But even
when someone demonstrates that they're starting to "get it" you knock them
down because they have the nerve to use another language's terminology in
describing it.

If you can't even keep straight what you're flaming about, maybe you should
slow down a bit.

Erik Naggum

unread,
Jan 16, 1999, 3:00:00 AM1/16/99
to
* Barry Margolin <bar...@bbnplanet.com>

| What I see there is someone making a realization about the language, and
| relating it to what he already knows. A light bulb went off above his
| head, he learned something. This is something to be applauded! But even
| when someone demonstrates that they're starting to "get it" you knock
| them down because they have the nerve to use another language's
| terminology in describing it.

you're amazingly forgiving with everybody else, Barry. I wonder what's
_really_ your concern with your incredible insistence on projecting
evilness that isn't there onto me all the time. again, it seems like a
mission from a God to strike blindly at whatever looks bad, never mind
what it is. this time, you make another one of your amazingly stupid
generalizations. I guess either people are bad or they are good in your
eyes, and you'll do _everything_ to maintain your mental images: since
I'm bad in your eyes, you see nothing but the bad you want to see, and
since doofuses are good, you see nothing but the good you want to see.
I'm trying _real_ hard to figure you out. it isn't easy. every time I
think I have something, you do something amazingly contradictory, like
this time, when you suddenly switch from your modus operandi of using the
past to attack me ever more forcefully, to arguing that _threads_ are to
be kept _entirely_ separate, obviously because that serves your need
right now, but still. the constant element appears to be to attack me
and defend doofuses, no matter what anybody actually _does_.

here's what I wrote, to refresh your memory:

| I guess this is how all methods in CLOS are virtual.

you're being carelessly lazy, yet you don't appear to see it. ...

what's the context here? I had already talked about constructive vs
careless lazy, and he's made a point out of being the laziest programmer
he knows. you wish to see a light bulb that isn't there, I see another
carelessness. I want hard evidence that somebody has stopped doing stuff
I have already pointed out. you'll settle for a vision. yet, when it
comes to judging me, you'll also settle for a vision of something you
don't like, and proceed without caution.

| If you can't even keep straight what you're flaming about, maybe you
| should slow down a bit.

I'm sorry I was tricked by your sudden switch to keep the contexts of
separate threads separate, because you have never done so before. you
see, there is nothing in what you write to make anyone realize you are
such a champion of suspending all knowledge from previous threads (or
languages) and behave as if every thread (or language) is new, at least
until you know that you can compare them. it has indeed appeared as
though you think it is morally right to maintain a strong memory of other
threads (or experiences in general) when you react to what _I_ do. it
is manifestly _not_ enough to start a new thread for you to treat me
differently, and it is manifestly _not_ enough for you to limit yourself
to what you see -- your _insistence_ on vile generalizations about my
character is the only thing that is unchanging about what you write in
our exchanges. what's a man to beleive about you?

when you want to, however, it is obviously your right to change your
attitude _completely_ and do what I have suggested about learning stuff,
which you have consistently refused to listen to and have ridiculed on
many an occasion: judge something for what it is there and then, instead
of remembering the invalid past. you have previously been so strongly
opposed to the very concept of delaying judgment that one must wonder
what has suddenly possessed you. perhaps what I have asked you to do for
the longest time only suddenly appeared to be the best option to attack
me, and so you used it, without even realizing what you were getting
yourself into. you have been the strongest champion of remembering the
invalid past so long it's going to take me a while to really believe that
you have finally turned around and see the value of suspending memory to
better learn something that differs from the past. so far, it has seemed
that what you have really been after is attacking me, not for what I do,
but for who you want to portray me as being. that has pissed me off.

I'm not ready to believe there's a light bulb over your head, though --
this experience appears more to contradict your actual position than to
be an instance of learning. I'd like some hard evidence that you will
stop using your mental image and your incredibly persistent willingness
to portray the evilness you somehow need to think I possess onto me and
indeed do as you strongly imply is now the _only_ good thing: keep the
threads separate.

however, I don't react to anything you don't do, so if you manage to keep
your threads separate and your willingness to generalize about me in
check, _I_ have no reason to remember your disgusting moralization and
hypocritical, self-contradictory behavioral patterns and self-serving
reversals of your positions. I let stuff be until it comes up again.

#:Erik

Barry Margolin

unread,
Jan 16, 1999, 3:00:00 AM1/16/99
to
In article <31254833...@naggum.no>, Erik Naggum <er...@naggum.no> wrote:
> you're amazingly forgiving with everybody else, Barry. I wonder what's
> _really_ your concern with your incredible insistence on projecting
> evilness that isn't there onto me all the time.

You're venomous in practically every post, and I have a hard time
attributing it to anything else. Perhaps it's because you don't bother
posting unless something riles you, so all we ever see is the negative
side. Or maybe I'm just a poor judge of character. That's a possibility,
but I think people who know me will confirm that I'm pretty easy going, and
it takes someone who's really annoying to get on my bad side (as you said,
I'm amazingly forgiving). I'll also admit that I'm prone to
generalizations, sometimes inappropriate ones.

vnik...@poboxes.com

unread,
Jan 16, 1999, 3:00:00 AM1/16/99
to
In article <Qs3o2.267$oD6....@burlma1-snr1.gtei.net>,
Barry Margolin <bar...@bbnplanet.com> wrote:
> [Erik Naggum is] venomous in practically every post (...)

In some posts, yes (I am not going to take the time to count,
or to consider if `venomous' is the most appropriate word), but
`practically every' is a significant exaggeration. (Very roughly
speaking, I'd say that the number of `venomous' posts is on the
same order of magnitude as that of `non-venomous' ones.)

Important note: this is no more than a statement of the
impressions of one reader of comp.lang.lisp. All parties
directly involved may want to object...

David Bakhash

unread,
Jan 16, 1999, 3:00:00 AM1/16/99
to
I'm sure that some people see Erik as good and bad. I see him as good
and better. The good side is that he knows his stuff. The better side
is that he makes it fun to read, and inter-sperses humor with academia
-- something I don't think people do enough. Even Erik might not
disagree with my analysis, but it's not even important: if from the
point of view of the observer, it's constructive, funny, useful,
interesting, and/or provocative, then it's a good thing. But since,
destructive is often funny, this definition is pretty flexible, and it
means that many people are beneficial to the group, at least as far as
this doofus is concerned.

I am a bit slow. I tend to irritate really sharp people a lot, because
concepts sometimes sit on me for a while before getting absorbed. I
don't mind the noise so much, as long as some signal comes through too.

I still can't sit here and laugh at the thread as much as I'd like to.
I probably could if it were all against me. But I feel as though some
of my posts have been a vehicle for people to be mean to each other.

Erik. I've read lots of material on Lisp. I read both of Graham's
books, the Norvig AI book, and other little things. I try the best I
can. But it's not the first language I've ever learned, so I'm still
not at the point were I can easily view Lisp as a very separate entity
among programming paradigms. The language I used in the virtual
function thread was supplemented by a code example of what I was looking
for, and I tried at least a little to beware of misusing vocabulary. I
shouldn't have thrown in the part about being dissappointed if the
ability to do what I wanted wasn't there. The single most beneficial
sentence I read (don't remember from whom) was "methods belong to
generic functions, not to classes." This was the light bulb that helped
me understand, and that was a big step for me, though when you were
learning this stuff, that was probably a baby step.

Regardless of what others think of you, or what you _think_ they think
of you, you should not be worried about being mis-represented. Anyone
can read the full histories of these threads and judge for themselves,
and form an opinion.

One thing, though. I figure you're using GNUS. Most Emacs users do,
and you're pretty diehard, as far as I can remember. So if you don't
want to read certain people's posts, use a very simple learning
algorithm: if they're doofuses, then just down-score them, or eliminate
them altogether. I hope you don't do that to me, because I really learn
a lot from your replies (and b/c we use the same Lisp implementation).
But in case you don't laugh off all these things and really do get
irritated, then I don't want to add to your irritation. Hopefully, one
day I'll be sharper, and then I'll be able to help you out with
answering some of the hard questions you usually reply to.

One thing I have to agree with Barry on is that we do have to encourage
learning at all levels. If a doofus learns something, then that's a
good thing. Take me for example. I try to help the community, much as
you do. I tried my best to make elisp do CL-style arrays, and wrote
strokes.el, and everywhere I've worked, try to convince my superiors to
open up as much source code as possible. I also promote Lisp in my
community. E.g., compared to you guys I know nothing, but compared to
newbies, I know enough to help out a lot, and do so at school.

dave

Erik Naggum

unread,
Jan 16, 1999, 3:00:00 AM1/16/99
to Barry Margolin
* Barry Margolin <bar...@bbnplanet.com>

| You're venomous in practically every post, and I have a hard time
| attributing it to anything else.

I have a very hard time attributing what you see to anything but the way
your mind works, and you have given me an important clue today: the key
to the way it works is that you attack _people_ and react to _people_.

_you_ see me as venomous in practically every post because that's what
you _expect_ when you see my name and since you cannot even read a
helpful reply in an Emacs group without misrepresenting what I write and
vilifying me, and this you have been doing for _months_, you see me
_really_ pissed at you in return. you see what you want, and if it isn't
there you either invent or provoke it. if you could have seen something
else, you turn away and effectively ignore it.

you're a fucking _pest_ the way you follow me around and are venomous to
_me_ at every opportunity for things I _don't_ do. I don't attack you.
I don't follow up to your articles. I stay out of your way, until and
unless _you_ post some of _your_ venomous drivel about _me_, but you just
keep doing it, over and over and over, like a sick moron. you attack me
for things I _haven't_ done and you portray me as something _way_ beyond
what you could ever have observed or even extrapolated from what you have
observed. like the true evil in the history of mankind has been for some
"good cause", I'm sure there's a fucked-up idea of a "good cause" in you,
too, but it's gone _completely_ out of control. there's nothing I can
do, anymore: your mental image of me is so cast in stone that you will
continue to provoke me to get confirmation that I'm bad because your
fucked-up mind can't deal with invalid generalizations. this is really a
matter between you and your shrink, not anybody else, but as long as you
keep pestering me with your idiocy, I'll continue to expose you for what
you are: a destructive being.

now, I'm dead certain that you are incapable of reading anything I write
at all, and that you'll continue to think you're free of all possible
guilt in this matter. you'll go home and polish your glory and feel
morally superior because once again, you have poked a bad guy in the eye,
and by god, he did not forgive you this time, either, but would have
killed you on the spot if he could, so _he's_ bad and you're innocent.
still, if there's a single shred of decency left in you, you'll at least
hear what the object of your vilification and misrepresentation and your
baseless accusations has to say. but I wouldn't be surprised if you do
what you have done so far: damn the objections, full speed ahead: you're
on a mission to destroy, and if he objects, that's because he's guilty.
I'm sure you would have been a witch hunter if you could.



| Perhaps it's because you don't bother posting unless something riles you,
| so all we ever see is the negative side. Or maybe I'm just a poor judge
| of character.

or perhaps it's because you cannot even _see_ what you don't already
think is there. I'm already guilty according to your ethics, so why
bother with contrary evidence?

you have proven incapable of comprehending anything I write without
imputing evil intentions or _introducing_ faults in it that never would
have been there if you had read it carefully (which you admit you don't
do in the first place), so I wouldn't be surprised at all if you have so
strong mental blocks you cannot even comprehend that _you_ do something
evil towards me that I have every right to object to. whatever you
perhaps once thought could be accomplished with criticism is irrevocably
lost because you become the aggressor, who responds not to something I
do, but to something within yourself: nothing I do could ever make you
change your mind, so it is in fact not something I do that you criticize
or react to, it's your mental image! you have also proven that you will
never relinquish that mental image no matter _what_ happens. this is
something that only fucked-up religious nutcases do, like Scientologists
who are instructed to suspend all ethics and destroy whoever criticizes
them by whatever means available. fairness and justice would be an
impediment to speedy execution, so you just dispense with it.

| That's a possibility, but I think people who know me will confirm that
| I'm pretty easy going, and it takes someone who's really annoying to get
| on my bad side (as you said, I'm amazingly forgiving).

well, gee, I could say the same thing, with two very important exceptions:

1 I never latch onto "SOMEONE who's really annoying". I latch on to
SOMETHING that's really annoying. then I let go once that something goes
away. but you don't let go, because your fucked-up psycho brain attacks
_people_, not _actions_. there's no way for you to let go, because you
judge what you imagine and cannot see, and you're always free to assume
that when there's something that doesn't fit, you can ignore it. for me,
that is inherently never an option, I judge actions, and all I do is hold
the person responsible as if it was done consciously if I cannot find a
constructive element that indicates that there's a simple and easy way to
achieve what was _really_ desired. stop doing it; my criticism vanishes.
do something else that shows that the old wrong will not be repeated; my
criticism will never be repeated or even remembered: mission accomplished.

a related difference is that I attack actions _immediately_. you attack
people a _long_ time after you were first annoyed by them, because it
builds up within: you _collect_ their sins. I respond right away, and
then don't remember it until the _same_ thing happens again that shows me
that they haven't learned since last time. you will remember it when
something _unrelated_ happens, and you will use it against the _person_,
preferably to maximize the pain and the destructive effect. I optimize
for getting their attention and not letting them go until they listen.
you want to destroy, I reprimand. I guess you hate people, while I hate
some of their actions. I don't deal well with people who hate people.
that's why I don't deal well with Barry Margolin's unfair accusations.

2 I don't forgive, because I have no need to, because I don't go around and
remember people's "sins". granting forgiveness is a stupid pretense that
something didn't happen so that the object of your forgiveness is free to
try again, but only because you're willing to deny some part of history.
this is a _great_ means to keep people in debt to whoever is forgiving.

people who forgive are also likely to withdraw their forgiveness if they
are once again morally outraged by something the same person does. they
are also extremely subjective in whom they forgive and for what. the
typical example is hypocrites who forgive mass murderers, but not people
who don't stop on red lights when there's no other traffic. generally,
only the really big evil is forgiven, while the petty evil is denounced
very heavily. Bill Clinton is wanted impeached for lying about a sexual
affair that was nobody's business but his own, not for killing lots of
innocent people in Iraq. _that's_ how the forgiveness ethics works.

such forgiveness ethics is, however, _very_ important in religions where
your sins are being tallied and used against you. that I consider to be
the singularly most evil way to treat people ever invented by either man
or any sick, revengeful god ever invented in their image. it could not
have "prospered" without a religion and some supernatural "god" figure to
back it up, because people just aren't that evil over extended periods of
time when left to themselves. they grow up. religions never do, because
with a religion, you are expected not to grow up, and you're thrown out
of them if you do. peer pressure keeps people unmatured for millennia.

the problem is that there _is_ no need to forgive anybody to begin with.
people _are_ already free to learn from their mistakes and try again
without having somebody else "forgive" them first or using past mistakes
against them. the key is to learn. if you learn, whatever you did
before you learned should never be held against you. (of course, you may
need to prove that you have indeed learned if the risks of you not having
learned is too high.) this view is inconsistent with the very revengeful
religious views and the view that people _are_ somehow evil. it's that
view that caused society to _punish_ people, too, which has been known
for several hundred years to cause _more_ criminal behavior. but facts
don't matter when you've made up your mind about _people_.

Barry Margolin judges your character and forgives practically everything
and defend you against all kinds of criticism if you're his sort of good
person, i.e., pretty stupid, but I judge your actions and expect you to
improve your act and do the best you can. all is forgotten if you do.
if you don't learn, even more shame on you. Barry Margolin goes after
_people_ he thinks are bad and he suspends his ethics if he thinks that
person is bad enough. this he will continue to do no matter what you do
in response, and if he can't prove that you're bad, he'll be a good witch
hunter and invent something that works just as well: vilification and
innuendo. I don't go after people in the first place (even people like
Barry Margolin; if he stops doing his shit, I'll leave him alone, as I do
_between_ every time he rears his ugly head).

if someone is _always_ free to try again and nobody will hold their past
against them, there is no need to _fear_ that anybody will use irrelevant
past information against anybody out of an evil desire to destroy them or
what they have done, either. Barry Margolin, however, sees destruction
of _people_ as his moral obligation, and _nothing_ will deter him from
his destructive task, nothing at all. it's the Barry Margolins of the
United States Senate who want to impeach William Jefferson Clinton: the
religious, conservative Republicans who are so blinded by their moral
outrage that they cannot see anything but their own mental images, and
especially not themselves, not even the voice of the voters, which used
to be most important.

| I'll also admit that I'm prone to generalizations, sometimes
| inappropriate ones.

what's wrong with your generalizations is that they are not rescinded
when they prove inappropriate. that you refuse to rescind them when you
receive contrary information is all I need to know about you -- I shall
just have to deal with your inappropriate generalizations and their
consequences for me as long as you continue to post your insane drivel
about me. you have proven that nothing whatsoever will ever make you
change your mind, so all I can do is fight you back every time you try
one of your stunts, and this has been obvious for months, now.

however, had I been inclined to think that you could not change this
behavior if you somehow woke up from your insanity at the hands of
trained professionals, I would have been forced to think _you_ were evil
and that you should be destroyed. I don't think that way. all I want is
that your evil _actions_ not hurt me or anybody I know, but as long as
you continue to accuse me of things I have not done, as long as you
misrepresent what I write and make me look bad out of _your_ malice, and
as long as you act on your inappropriate generalizations, I will _have_
to fight you. but even if you learn from this, there is nothing you can
do to repair the damage you have done. an apology from people who act
out of moral outrage is a contradiction in terms: it's the morally
outraged version of themselves that needs to apologize, not the timid
little fuck who will do the exact same thing again the next time he's
morally outraged. so far, it doesn't seem that people who become morally
outraged and act during that state of mind are legally sane, and thus
they cannot change their ways except for being stopped from being morally
outraged. I still hold out for that being under volitional control if it
hasn't gone too far. however, moral outrage is a product of a religion,
and people seem to have a very hard time rescinding their religions no
matter how much evidence they receive that it's really bad for them, just
like some people can't rescind inappropriate generalizations.

to summarize: not revoking an accusation against somebody for doing
something they have not in fact done is unforgivable. misrepresenting
others in _order_ to hurt them is unforgivable. Barry Margolin does
both, systematically. on top of it, he forgives everybody else anything,
but goes after those he does not forgive for anything at all, real or,
preferably, imagined, since myth is so much harder to kill than fact.

here's my advice to you, Barry Margolin: just fucking quit it. if you
can't, and I strongly suspect it will take serious effort to stop, at
least limit yourself to what I actually do wrong. there should be plenty
to choose from, but somehow, I don't think I can do _that_ much wrong
when you have to invent something to attack in order to make me look bad.

#:Erik
--
SIGTHTBABW: a signal sent from Unix to its programmers at random
intervals to make them remember that There Has To Be A Better Way.

Erik Naggum

unread,
Jan 16, 1999, 3:00:00 AM1/16/99
to
* David Bakhash <ca...@mit.edu>

| One thing, though. I figure you're using GNUS. Most Emacs users do, and
| you're pretty diehard, as far as I can remember. So if you don't want to
| read certain people's posts, use a very simple learning algorithm: if
| they're doofuses, then just down-score them, or eliminate them altogether.

but that would be doing what I'm really strongly opposed to: it isn't
people who annoy me, it's _some_ of the stuff they _do_. and if people
_do_ differently and I had down-scored them, there's no way I could find
out that I should up-score them again, or if I did it too late, I'd lose
the window where they have learned something and start to change. the
educator part of me positively _lives_ to see people "get" something
difficult. the only people I think are really hopeless are those who
think they have a moral right to do the stupid things they do, but they
don't talk about technical Lisp stuff, anyway.

| I hope you don't do that to me, because I really learn a lot from your


| replies (and b/c we use the same Lisp implementation). But in case you
| don't laugh off all these things and really do get irritated, then I
| don't want to add to your irritation.

most of the time, I'm trying to be harsh and humorous as the same time.
some people see it. some don't even notice puns or jokes or ridicule of
current political farses, or they get really upset about them. what are
these guys doing on USENET in the first place? they should be senators!

however, I don't take lightly to people who accuse me of things I haven't
done or who criticize me for things I have either already fixed or have
explained why I do the way I do. (in decreasing order of seriousness.)
I get _really_ pissed when somebody comes back after a while and accuses
me of stuff he _knows_ I never did. such just isn't a laughing matter.

| Hopefully, one day I'll be sharper, and then I'll be able to help you out
| with answering some of the hard questions you usually reply to.

thanks. :) I look forward to it. and I think you'll get there. always
remember that the truly hopeless face silence.

| One thing I have to agree with Barry on is that we do have to encourage
| learning at all levels.

I'm sorry to see that he's succeeded in setting that up as something I
disagree with. (he does that a lot.) the problem is this: you can't
learn good stuff if bad stuff and good stuff are rewarded on equal terms:
bad stuff _pays_ much better than good stuff does, so the good stuff has
to be rewarded and the bad stuff reprimanded to keep the balance. the
reason is simple: doing only good stuff is painful up front and good
later. bad stuff is painless up front but hurts later. by showing
people that good stuff is enjoyable and making bad stuff painful up
front, you _might_ be able to keep them in line.

it takes a special kind of dedication to avoid bad stuff altogether when
the rewards for bad stuff is lots of cash from a skill-starved industry
and you get no rewards for the good stuff you do until later everybody
around are in pain, but if that's the sort of thing that keeps you going,
you're too cynical even by my standards.

David Bakhash

unread,
Jan 16, 1999, 3:00:00 AM1/16/99
to
Barry Margolin <bar...@bbnplanet.com> writes:

> It's also not my place to defend others, and I'm going to try to avoid it
> in the future.

I cannot disagree with this statement more. In fact, when I read this
reply from Barry, I was quite relieved to hear that someone was sticking
up for what made perfect sense:

========================================


>> great. I'm sure you made him feel better for having been consoled and
>> told "you're not completely wrong", and now he'll continue to talk about
>> virtual functions in CLOS forever, instead of having the potential of
>> being rewarded for getting it right some time in the future. just great.
>

>[?:] If he does much CLOS, he'll get so fed up of calling *everything*


>virtual that he'll stop bothering. Then there'll be no problem.
>(Until he goes back to C++ and starts asking about generic functions
>in C++. (-: )

[barry:] He wasn't calling everything virtual, he was making a point in a


discussion about programming methodologies. The fact that CLOS methods
have the properties associated with virtual member functions in another
language I won't name was germaine to his point. He expressed it very
clearly, IMHO.

========================================

Barry was absolutely right to defend this. Just think about that
thread: there's nothing wrong with making an analogy between a similar
concept in two programming languages. They're not incommensurable. I
get the feeling that [?:] just wanted to get his $0.02 in and say
something unconstructive.

Also, I think here's an okay place to explain a statement I made a while
back in response to Erik's followup. Here's the background. I said:

...I guess this is how all methods in CLOS are virtual. I'd just have


to make sure that in the subsequent coding, I'll have to make sure to
use `with-accessors' instead of `with-slots'.

and then Erik said:

huh?

Well, what I was trying to say was that I tended to use `with-slots' a
lot. I figured that since CLOS doesn't really allow you to hide data,
the way _I'd_ go about it was to make move away from accessing data
directly, and rather to use the accessors as an interface. That way I
wouldn't hang myself with code re-writing if something changed. Here.
Take a look:

(defclass class-A () ((text :type string)))

;; in a far away place...
(defmethod meth ((a class-A))
(with-slots (text) a
...
(concatenate 'string text "blah")))

If I do that, then I'm screwed if I later on have to somehow re-design
the way strings are stored. i.e. I might do what I hinted at in a post:

(defclass displaced-string ()
((source :type string
:reader source)
(start :type fixnum
:accessor start)
(size :type unsigned-byte
:accessor size)))

and then define a method on it:

(defmethod substring ((d-string displaced-string) &optional (start 0) end)
(with-accessors ((source source)
(start-1 start)
(end-1 end)
(size size)) d-string
(assert (<= start size))
(if end
(assert (<= end size))
(setq end size))
(let* ((first (+ start-1 start))
(last (+ first end)))
(subseq source first last))))

And then now, I might be able to re-write the original method like this:

(defmethod meth ((a class-A))
(with-accessors ((text substring)) a
...
(concatenate 'string text "blah")))

as long as I've written my accessor:

(defmethod substring ((a class-A) &optional (start 0) end)
(let ((text (slot-value a 'text)))
(unless end (setq end (length text)))
(subseq text start end)))

(this code is untested) I was just trying to make a point as to why to
prefer `with-accessors' to `with-slots'. Now, I can change the way
strings are stored in class-A (i.e. to `displaced-strings') and the rest
of the code will still work. I think that there still may be holes in
my reasoning, since, even in my example. Doesn't that make sense?
Isn't this why OO language designers originally wanted data hidden in
the first place?

dave

Barry Margolin

unread,
Jan 17, 1999, 3:00:00 AM1/17/99
to
In article <77r541$7h0$1...@nnrp1.dejanews.com>, <vnik...@poboxes.com> wrote:
>In article <Qs3o2.267$oD6....@burlma1-snr1.gtei.net>,
> Barry Margolin <bar...@bbnplanet.com> wrote:
>> [Erik Naggum is] venomous in practically every post (...)
>
>In some posts, yes (I am not going to take the time to count,
>or to consider if `venomous' is the most appropriate word), but
>`practically every' is a significant exaggeration. (Very roughly
>speaking, I'd say that the number of `venomous' posts is on the
>same order of magnitude as that of `non-venomous' ones.)

I've done a little research at DejaNews, and I have to admit that I was
wrong. I looked at a dozen or so of Erik's recent posts, and most of them
were polite. Maybe it's not my job to characterize Erik's style, but I
wanted to know for myself if I was seeing only what I expected to see, as
Erik has accused, and now I realize that I was. I'm sorry.

However, when he gets set off he curses, calls people stupid, etc. These
are the ones that stick in my memory, since I don't expect this type of
language in a technical newsgroup, and color my general impression of Erik.

It's also not my place to defend others, and I'm going to try to avoid it
in the future.

--

Christopher R. Barry

unread,
Jan 17, 1999, 3:00:00 AM1/17/99
to
On 12-24-1998 Barry Margolin started an `OT: Usenet lack of civility'
thread where Erik's behavior was already attacked and everything that
could be said about this whole topic was said[1]. So lets please not
have this exact same thread again under `OT: how many venomous
posts'. Go and reread the thread if you must, it will be forever
archived at Dejanews. And then maybe we can instead discuss Lisp.

I'm seeing lots of large, multi-page 5-10k posts on this issue (Erik's
last reply to Barry today was about 15k) and this is just a (IMHO)
huge waste of a skilled Lisper's time. (I'd like to see 15k posts from
Erik on Lisp, but he probably can't find the time or motivation or the
heart to when he's actually taking the time to type out 15k(!)
replies with zero Lisp content. The same applies to you other guys
that get caught up in this.)

I'm all for tough love, and I don't think Erik's ever viciously ripped
apart anyone that was truly humble, unassuming, open-minded and
on-topic. Of course, not many of us meet this criteria 100% of the
time....

Christopher

1. The thread is archived here:
http://x5.dejanews.com/getdoc.xp?AN=425567652&CONTEXT=916531280.129958037&hitnum=22

Lyman S. Taylor

unread,
Jan 17, 1999, 3:00:00 AM1/17/99
to
In article <wkzp7i2...@mit.edu>, David Bakhash <ca...@mit.edu> wrote:
....

>
>Well, what I was trying to say was that I tended to use `with-slots' a
>lot. I figured that since CLOS doesn't really allow you to hide data,
>the way _I'd_ go about it was to make move away from accessing data
>directly, and rather to use the accessors as an interface. That way I
>wouldn't hang myself with code re-writing if something changed. Here.
>Take a look:

CLOS allows/enables data hiding. That is one reason why _all_ of the
access to a class's data/state is through a method. This is
distinct from hiding "names". Packages do namespace enforcement/management.
CLOS doesn't try to do what is already in Common Lisp. If you don't
want a "user" of a class to know the slotnames then encode the
class and methods in a package. Export what they are suppose to use.
[ Anyone who ignores you wishes can "cheat" anyway. Even in
C++. It is simply the degree of effort involved.... ]

The slot directives for :reader , :writer, :acessor are
syntactic sugar to automagically generate the typical access
to slots and policys. You can always add a "virtual slot" ( "virtual"
in the Dylan sense of the term. A slot for which there is no
corresponding storage in the class).


(defclass do-whop ()
( (slot1 :reader do-whop-s1 :initform 22 )
(slot2 :accessor do-whop-s2 ) ))

(defmethod do-whop-s3 ( ( obj do-whop ))
(do-whop-s1 obj ))

(defmethod (setf do-whop-s3) ( value (obj do-whop) )
(setf (slot-value obj 'slot1) value ))


"externally" the accessor of the virtual slot , DO-WHOP-S3, behaves
no differently that DO-WHOP-S2. That's one advantage of all access
to data being through methods. Later if you wanted to make a slot
for V1 or take away the slot2, the "user" of the class need not know.


USER(25): (setf my-dwhop (make-instance 'do-whop ))
#<DO-WHOP @ #x204cf722>
USER(26): (setf (do-whop-s2myh-dwhop) 23 )
23
USER(27): (with-accessors ( (s3 do-whop-s3 )
(s2 do-whop-s2 ) )
my-dwhop
(setf s3 66 )
(setf s2 (+ 100 s2 )))
123
USER(28): (do-whop-s1 my-dwhop )
66
USER(29): (do-whop-s2 my-dwhop )
123

Personally, I never use (nor encourage ) SLOT-VALUE for access to an
instance's data outside of the package where the class is "defined"in
CLOS solutions I create. However, "internal" to the package it
is useful so setting up things like the virtual slot above.
Since WITH-SLOTS implicitly uses SLOT-VALUE (or low level equiv.),
it falls in the same usage pattern ( "internal" not "external" ).


Note also if you "package" your classes your can export both the
class name and methods in the exports. Since methods belong the
generics that is one way to tie them class (it's name anyway)
and the methods ( member functions) together. [ There are no
"data memembers" since you interface entirely through methods,
from the "user"'s perspective. You can follow the same protocol
in Java too. Only let "users" see methods and use setters/getters
exclusively. Note unlike Smalltalk. ]



--

Lyman S. Taylor "The Borg -- party poopers of the Galaxy. "
(ly...@cc.gatech.edu) EMH Doctor Star Trek Voyager.

Marco Antoniotti

unread,
Jan 17, 1999, 3:00:00 AM1/17/99
to

David Bakhash <ca...@mit.edu> writes:

> ly...@cc.gatech.edu (Lyman S. Taylor) writes:
>
> > In article <wWNm2.88$oD6....@burlma1-snr1.gtei.net>,
> > Barry Margolin <bar...@bbnplanet.com> wrote:
> > ...
> > >He said he's going to resolve the collisions his own way. Perhaps there's
> > >something in the values that does that. He didn't go into any more detail,
> >
> > That bothered me. If this "something" can resolve colllisions why isn't
> > it the key? Otherwise, if you have a piece of data (no key) in a
> > "hash bucket" how do you know which "key" it belongs to if there is not
> > one-to-one mapping between keys and buckets?
>
> okay. Let me explain. I'm only doing this b/c you're interested,
> though it's really not that important.
>
> suppose, ideally, I want a hash table whose keys are strings and whose
> values are objects. However, I have so many keys, and each key (string)
> is, say 50 characters long. With 50,000 keys, this can start to really
> swallow up memory. But if, instead, the objects in the hash buckets
> somehow contain the strings efficiently, then I can create those strings
> when needed, but not store them permanently.
>
> If you're wondering how that can be done, just imagine that two numbers
> (pos . length) specifiy a sub-string inside one huge string. So just
> storing the pos and length is enough.
>


Since the thing *is* interesting, and since I really like hash-tables,
but see many limitations in the way they are specified in CL, I'd like
to see a little more of the specs of your problem.

First of all, are your 50000 strings distinct? What kind of structure
do they share? If you do not want to stro them as keys in the
hash-table, where do you keep them? On a file?

Cheers

--
Marco Antoniotti ===========================================
PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
tel. +39 - (0)6 - 68 10 03 17, fax. +39 - (0)6 - 68 80 79 26
http://www.parades.rm.cnr.it

Erik Naggum

unread,
Jan 17, 1999, 3:00:00 AM1/17/99
to
* David Bakhash <ca...@mit.edu>

| Isn't this why OO language designers originally wanted data hidden in
| the first place?

since "originally" may well refer to the SIMULA work in the late 1960s
and I was, uh, fortunate enough to have SIMULA as my freshman language at
the University of Oslo, maybe I should recognize it, but I don't. in
fact, I haven't found the _one_ argument or the _one_ set of arguments
for data hiding that would have made it so prevailing. there are lots of
arguments, some of them mutually contradictory, for why you don't want to
share some of your slots or want to control the way they are accessed or
inherited. in practice, people aren't able to charter the consequences,
either, and they make serious mistakes in what's hidden and what's not,
but rather than to see that who's owning what is a late-binding property
of a class, they set out to assume ownerships from the very beginning.
all I have gotten out of this is that "it's mine! all mine!" somehow
explains everything.

if ownership were so important, why does it change so much? the one
property that changes the most in class definitions is who can access the
slot directly, according to some report I read in 1994 when I was trying
to figure out why C++ hurt so much. and why do people go to such lengths
to violate the access barriers when there's something in there that they
actually need to get at? remember that this is stuff that happens inside
a team, not something you do to keep outsiders out. although 2/3 of all
goods stolen from shops are reportedly stolen by its employees, I don't
see the need to _make_ "information criminals" out of team members.

imagine people who behave like compilers: "do you sell thiocyanates?" (to
take something that you might legitimately want to hide from the public.)
"no." that's, OK, isn't it? you just go elsewhere, right? "yes, but
not to you." do you ask why? do you get just a wee bit annoyed?

what I think would make sense in this data hiding business is _real_
hiding. in C++, the compiler complains about access violations, not that
the name is effectively _undefined_. if you inherit from two different
classes that have public and private slots with the same name, what
happens? I think the obvious answer is that the private slot would only
be visible to the methods of the class to which it was private. my
reading of the C++ standard says that slot-merging occurs no matter what
the status of the slot is. _that's_ annoying.

indicentally, I remember that I thought very early on that all this type
declaration nonsense was really a half-measure: if the compiler _knew_
what the type of something was, why was it so bitchy about declarations?
in today's Common Lisp terms, I wanted to declare everything of type T
first, but still use only one type per variable. the compiler should
then tell me which types I had used (and could complain about multiple
types stored in the same variable), and that could be part of exported
interfaces. anyway.

so, I don't think the desire to see data hidden was really well founded
_before_ it was invented, because a number of clearly distinct reasons
for wanting it that way have emerged after the fact, none really
explaining why somebody came _up_ with the idea. the needs it covers are
probably mostly a personal preference, or perhaps fear. fear is a great
motivator for protecting stuff that doesn't need protecting.

David Bakhash

unread,
Jan 17, 1999, 3:00:00 AM1/17/99
to
yeah, okay.

but you never actually mentioned the example code in that previous
post. I put it there specifically b/c stuff like that happens a lot,
when you change things around in a class.

The point I was making is that CL kind of lets you hang yourself by
using `with-slots' and `slot-value' (and esp. `(setf slot-value)') for
libraries that are subject to change. It's simply guaranteeing that
code will brake as soon as classes start to change, even just a little.

Another thing that bugs me (unless I find out how to fix it) is that
:initarg seems to do nothing more than shove data into the slot. I
guess that's all right, but it means that a convention I will try to
adopt in the future is:

o if I'm using a foreign library, that is subject to change, _always_
use `with-accessors' when possible.
o use accessors to mutate an object, instead of `(setf slot-value)'
(e.g. `with-slots' + `(setf slot)')
o purposely use :reader instead of :accessor and :writer when you want
to hint that those slots probably shouldn't be touched, except maybe
in that very source code file, (i.e. inside the accessors
themselves).

I'm sure this is all very obvious to most people. Or maybe it's not,
and I'm way off. But it makes sense to me.

dave

David Bakhash

unread,
Jan 17, 1999, 3:00:00 AM1/17/99
to
Marco Antoniotti <mar...@copernico.parades.rm.cnr.it> writes:

> Since the thing *is* interesting, and since I really like hash-tables,
> but see many limitations in the way they are specified in CL, I'd like
> to see a little more of the specs of your problem.
>
> First of all, are your 50000 strings distinct? What kind of structure
> do they share? If you do not want to stro them as keys in the
> hash-table, where do you keep them? On a file?

A lot of them do share structure. It's really quite nice. I wrote a
library called `array-table', and it does everything that gethash does
(except that it's called `get-array-table'). It has the following
properties:

o it resizes automatically (though I don't know where the optimal place
to put the rehash hook. I should research is. I either have to
examine statistics on a `(setf get-array-table)' or (another idea I
have) is whenever there's a collision on a `get-array-table'. I
figure it's better to do it on the put than get, because there'll be
lots more put's then get's.)
o it has a little prime-number generator that helps the table grow to a
nice size (some ratio bigger than it was, user-specified), and then
takes the next prime. (it uses `sxhash' for the hash function).
o it doesn't store the keys, but rather, all array-table values must
inherit from a class called `array-table-entry', which has a generic
function on it, called `id', which somehow computes the key.
o it does not use adjustable arrays, b/c I couldn't figure out how to
rehash the array in-place, and didn't want to store another "flag"
slot in the `array-table-entry' class. I'd rather just make a new
array, rehash, and let the GCer do its thing.

If you provide an `id' method for the entries, then it's pretty good.
One thing, though: unlike built-in hash-tables, a hash value of nil
means implies that, for this table, there's really nothing there.
i.e. the `get-array-table' doesn't return multiple-values. I guess b/c
I don't ever plan to store a `nil' for a given key. Doing that is how
my (analogous) remhash works.

dave

Steve Gonedes

unread,
Jan 18, 1999, 3:00:00 AM1/18/99
to

David Bakhash <ca...@mit.edu> writes:

< o it does not use adjustable arrays, b/c I couldn't figure out how to
< rehash the array in-place, and didn't want to store another "flag"
< slot in the `array-table-entry' class. I'd rather just make a new
< array, rehash, and let the GCer do its thing.

Have you thought about using bit-vectors as flags? I've found them to
be very well suited for things like this; _much_ better than using
small integers. I wrote another goofy macro that works like a very
simplified defstruct (without `include'). If you're interested I'd
post it. Bit vectors are just great time savers but very difficult to
use without some kind of abstraction - (sbit flags 2) just isn't very
informative :)


Marco Antoniotti

unread,
Jan 18, 1999, 3:00:00 AM1/18/99
to

David Bakhash <ca...@mit.edu> writes:

> Marco Antoniotti <mar...@copernico.parades.rm.cnr.it> writes:
>
> > Since the thing *is* interesting, and since I really like hash-tables,
> > but see many limitations in the way they are specified in CL, I'd like
> > to see a little more of the specs of your problem.
> >
> > First of all, are your 50000 strings distinct? What kind of structure
> > do they share? If you do not want to stro them as keys in the
> > hash-table, where do you keep them? On a file?
>
> A lot of them do share structure. It's really quite nice. I wrote a
> library called `array-table', and it does everything that gethash does
> (except that it's called `get-array-table'). It has the following
> properties:
>
> o it resizes automatically (though I don't know where the optimal place
> to put the rehash hook. I should research is. I either have to
> examine statistics on a `(setf get-array-table)' or (another idea I
> have) is whenever there's a collision on a `get-array-table'. I
> figure it's better to do it on the put than get, because there'll be
> lots more put's then get's.)
> o it has a little prime-number generator that helps the table grow to a
> nice size (some ratio bigger than it was, user-specified), and then
> takes the next prime. (it uses `sxhash' for the hash function).
> o it doesn't store the keys, but rather, all array-table values must
> inherit from a class called `array-table-entry', which has a generic
> function on it, called `id', which somehow computes the key.

> o it does not use adjustable arrays, b/c I couldn't figure out how to
> rehash the array in-place, and didn't want to store another "flag"
> slot in the `array-table-entry' class. I'd rather just make a new
> array, rehash, and let the GCer do its thing.
>

> If you provide an `id' method for the entries, then it's pretty good.
> One thing, though: unlike built-in hash-tables, a hash value of nil
> means implies that, for this table, there's really nothing there.
> i.e. the `get-array-table' doesn't return multiple-values. I guess b/c
> I don't ever plan to store a `nil' for a given key. Doing that is how
> my (analogous) remhash works.
>


This is all very interesting, but still, have you considered using a
more standard data structure like a string TRIE to keep and access
your data? Again, if you keep your strings in memory anyway, the
space-time tradeoffs should balance very well.

Marc Battyani

unread,
Jan 18, 1999, 3:00:00 AM1/18/99
to

Marco Antoniotti wrote in message ...

>This is all very interesting, but still, have you considered using a
>more standard data structure like a string TRIE to keep and access
>your data? Again, if you keep your strings in memory anyway, the
>space-time tradeoffs should balance very well.


There are also the "Ternary Search Trees" a TRIE variation.
They can be faster than hashtables.
More on these here: http://www.cs.princeton.edu/~rs/strings/index.html

Cheers,

Marc Battyani

David Bakhash

unread,
Jan 18, 1999, 3:00:00 AM1/18/99
to
"Marc Battyani" <Marc_B...@csi.com> writes:

> There are also the "Ternary Search Trees" a TRIE variation.
> They can be faster than hashtables.
> More on these here: http://www.cs.princeton.edu/~rs/strings/index.html

this sounds really interesting. I must admit that I had not known of
TRIE, and still know nothing.

I highly appreciate this information, though I must admit that I'm
skeptical at how it can _possibley_ be faster than a hashtable.

dave

David Bakhash

unread,
Jan 18, 1999, 3:00:00 AM1/18/99
to
ly...@cc.gatech.edu (Lyman S. Taylor) writes:

> In article <wkzp7i2...@mit.edu>, David Bakhash <ca...@mit.edu> wrote:
> ....
> >
> >Well, what I was trying to say was that I tended to use `with-slots' a
> >lot. I figured that since CLOS doesn't really allow you to hide data,
> >the way _I'd_ go about it was to make move away from accessing data
> >directly, and rather to use the accessors as an interface. That way I
> >wouldn't hang myself with code re-writing if something changed. Here.
> >Take a look:
>
> CLOS allows/enables data hiding. That is one reason why _all_ of the
> access to a class's data/state is through a method. This is
> distinct from hiding "names". Packages do namespace enforcement/management.
> CLOS doesn't try to do what is already in Common Lisp. If you don't
> want a "user" of a class to know the slotnames then encode the
> class and methods in a package. Export what they are suppose to use.
> [ Anyone who ignores you wishes can "cheat" anyway. Even in
> C++. It is simply the degree of effort involved.... ]

This, somehow, never was overlooked. I cannot thank you enough for
clearing up this issue. it was bothering me. I never thought about
using the package system + CLOS for hiding data. I guess I'll just
_NOT_ export the slot names, and be done with that. That makes perfect
sense, though doing it might open up some issues.

Thanks for the post. This post basically restored my faith in CLOS.

thanks,
dave

Marco Antoniotti

unread,
Jan 18, 1999, 3:00:00 AM1/18/99
to

David Bakhash <ca...@mit.edu> writes:

A hash table gives you an 'expected' constant time access to the
stored object. Not a guarantee. You may hit the linear time worst case
every once in a while (hence it is necessary to specify the frequency
of this 'every once in a while'). A TRIE (which is essentially a tree
based data structure) gives you a worst time logarithmic access time
guarantee. Moreover, since a trie basically encodes the keys, you may also
get the memory savings you were after. Finally, a TRIE already keeps
your string sorted, which is not something you get with a hash table.

Barry Margolin

unread,
Jan 18, 1999, 3:00:00 AM1/18/99
to
In article <wkr9ss1...@mit.edu>, David Bakhash <ca...@mit.edu> wrote:
>
>This, somehow, never was overlooked. I cannot thank you enough for
>clearing up this issue. it was bothering me. I never thought about
>using the package system + CLOS for hiding data. I guess I'll just
>_NOT_ export the slot names, and be done with that. That makes perfect
>sense, though doing it might open up some issues.

Be careful, though. I've seen some people try to use the package system to
implement the same granularity of data hiding as they can get with some
other OO languages' public/private distinctions. It may be possible to do
this, but I would recommend against trying. The package system is tricky,
even for expert CL programmers (it's no coincidence that the package system
is described in Chapter 11 of both CltL editions and the ANSI spec). It's
best used for coarse levels of sharing -- typically an application or
library would define one or two packages. You'll likely go crazy if you
try to implement per-class packages.

Marc Battyani

unread,
Jan 18, 1999, 3:00:00 AM1/18/99
to

David Bakhash wrote in message ...

>"Marc Battyani" <Marc_B...@csi.com> writes:
>
>> There are also the "Ternary Search Trees" a TRIE variation.
>> They can be faster than hashtables.
>> More on these here: http://www.cs.princeton.edu/~rs/strings/index.html
>
>this sounds really interesting. I must admit that I had not known of
>TRIE, and still know nothing.
>
>I highly appreciate this information, though I must admit that I'm
>skeptical at how it can _possibley_ be faster than a hashtable.


It's because to compute a hash value you have to process all the characters
of your string.

in a hash table you
1 compute the hash value in O(length(key))
2 compare the found item to the key also in O(lengthKey)
3 you compare more keys if they are collisions

On the other hand, TRIES and their look alike process character by
character.
so if the key is not there you can know if faster.

All this depends on the statistical repartition of your keys.
You can imagine a data set with 100 characters keys on where the first 5 are
usually enough to differentiate them.
In that case It's likely that hashtables will be slower.

Cheers,
Marc Battyani

Barry Margolin

unread,
Jan 18, 1999, 3:00:00 AM1/18/99
to
In article <7F6861BC8A65CC8B.F0645AE6...@library-proxy.airnews.net>,

Marc Battyani <Marc_B...@csi.com> wrote:
>You can imagine a data set with 100 characters keys on where the first 5 are
>usually enough to differentiate them.
>In that case It's likely that hashtables will be slower.

If you have control over the hashing function, you could simply hash on
only the first 5 characters. Items where the difference is only in the
remaining characters would collide, and be differentiated by the collision
resolution routine.

However, tries and their ilk don't require you to know where the cutoff is
a priori. They'll get faster all by themselves depending on the actual
distribution of the data.

Erik Naggum

unread,
Jan 18, 1999, 3:00:00 AM1/18/99
to
* "Marc Battyani" <Marc_B...@csi.com>

| It's because to compute a hash value you have to process all the
| characters of your string.
:
| All this depends on the statistical repartition of your keys. You can

| imagine a data set with 100 characters keys on where the first 5 are
| usually enough to differentiate them. In that case It's likely that
| hashtables will be slower.

in all likelihood, a hashing function is able to take advantage of the
same property of the input, and but then somewhat more for collisions.

the real difference, as I see it, is that tries will be able to take
advantage of _any_ smallest set of differences, so if you have a bunch of
people names starting with "Smith", the trie will effectively just move
past the first 5 and use the next 5 characters, instead. a hash function
would have to be excessively intelligent to adapt the same way.

Barry Margolin

unread,
Jan 18, 1999, 3:00:00 AM1/18/99
to
In article <31256821...@naggum.no>, Erik Naggum <er...@naggum.no> wrote:
> in all likelihood, a hashing function is able to take advantage of the
> same property of the input, and but then somewhat more for collisions.
>
> the real difference, as I see it, is that tries will be able to take
> advantage of _any_ smallest set of differences, so if you have a bunch of
> people names starting with "Smith", the trie will effectively just move
> past the first 5 and use the next 5 characters, instead. a hash function
> would have to be excessively intelligent to adapt the same way.

Well, everyone can stop worrying about the Y2K problem. Erik and I just
posted almost the same comment, and I think agreement between us is one of
the signs of the Apocalypse. :-)

Lyman S. Taylor

unread,
Jan 18, 1999, 3:00:00 AM1/18/99
to
In article <AXLo2.319$oD6....@burlma1-snr1.gtei.net>,

Barry Margolin <bar...@bbnplanet.com> wrote:
>In article <wkr9ss1...@mit.edu>, David Bakhash <ca...@mit.edu> wrote:
....

>Be careful, though. I've seen some people try to use the package system to
>implement the same granularity of data hiding as they can get with some
>other OO languages' public/private distinctions.

Yes. I'd recommend more of Modula3 type of approach with an
interface file which did the namespace definition and the import/export
business and a definition file which did the "defining". [ Even though
keeping the interface and definition synchronized involves lots of
manual grunt work.]

Long chains of "public" hierarchies can all be in the same file since
they, for the most part, share namespace. Even "private" hierarchies as
long at the author interally respects the abstraction boundaries.
You do have to be more "disciplined" when using CLOS+packages then when
depending upon the C++ compiler as an abstraction boundary "enforcer".

This gives you "libraries" (or a framework) of classes.

Probably a closer analogy would be to Java (and its packages) than to
C++ (namespaces). You can develop quite a maze of namespaces
with names space control comingled with the object class definitions quite
a bit more easily with public/protected/private than with namespace
and class definition mechanisms seperated.

> The package system is tricky,
>even for expert CL programmers

Unfortunately this is true. :-( Not so much that the mechanisms' syntax
is tricky, but it is the "landmines" you may stumble across if certain
aspects of the language interact in nonintuitive ways. And the fact
that package system changed significantly so there is legacy issues
to deal with.

The "defsystem" , "modules" , and "package" solution is one of those
things CL doesn't quite do in an "inspired" fashion. You can make it work.
However, it usually seems more painful than it "should be".

I think some people get "bit" by some "package problem" and then just
shy away from it in the future. And I'm sure there are also more than
few for whom in their standard "modus operandi" it all works well.

Erik Naggum

unread,
Jan 18, 1999, 3:00:00 AM1/18/99
to
* Barry Margolin <bar...@bbnplanet.com>

| Well, everyone can stop worrying about the Y2K problem. Erik and I just
| posted almost the same comment, and I think agreement between us is one
| of the signs of the Apocalypse. :-)

problem is, we might have caused the OSCE incident in Kosovo. the sum
total of hostilities in the world has to remain constant, so what was the
world to do? things just _had_ to get ugly somewhere else, instead.

Marco Antoniotti

unread,
Jan 19, 1999, 3:00:00 AM1/19/99
to

Erik Naggum <er...@naggum.no> writes:

> * Barry Margolin <bar...@bbnplanet.com>
> | Well, everyone can stop worrying about the Y2K problem. Erik and I just
> | posted almost the same comment, and I think agreement between us is one
> | of the signs of the Apocalypse. :-)
>
> problem is, we might have caused the OSCE incident in Kosovo. the sum
> total of hostilities in the world has to remain constant, so what was the
> world to do? things just _had_ to get ugly somewhere else, instead.
>

On top of that I just got (again) the Internet Joke stating that

(reduce #+ (beastly-encode "BILL GATES III")) => 666

and that you can encode my home telephone number in Rome as

666 666 666

and you can start to think that every hash function eventually
converges to the Number of The Beast.

Rob Warnock

unread,
Jan 19, 1999, 3:00:00 AM1/19/99
to
Erik Naggum <er...@naggum.no> wrote:
+---------------

| the real difference, as I see it, is that tries will be able to take
| advantage of _any_ smallest set of differences, so if you have a bunch of
| people names starting with "Smith", the trie will effectively just move
| past the first 5 and use the next 5 characters, instead.
+---------------

Exactly. Plus, if you're doing typical symbol-table or other plaintext stuff,
you can *significantly* cut the amount of storage used with tries by various
forms of "compression" on each node (much like the way most C compilers
generate different styles of code for "switch" statements depending on
the pattern of "case" values), tagging each node with its "style" or kind:

- As Erik notes, you can collapse a run of fanout==1 nodes into a single
"match"-only node consisting of a string and one further node pointer.

- If the degree of fanout is >1 but still very low, the node might be
encoded as a small string that's searched sequentially, together with
a vector of the same length (indexed by the searched-for character's
position in the string). While this is a good deal slower than a pure
"radix-sort" style index, it's also *much* smaller.

- With higher fanout, a node might contain "min" & "max" character codes,
and a vector as large as "max - min + 1", indexed by "this_char - min".

- With still higher fanout, revert to a full max-char-code size vector.
In most practical applications, you'll only need a few nodes like this
(maybe even only the root node).

+---------------


| a hash function would have to be excessively intelligent to adapt
| the same way.

+---------------

That reminds me of a trick that can help hash tables: If you use a really
good hash, one for which nearly every bit position contains nearly a full
bit of information (a CRC-32 is such a hash), then if you store the *full*
hash value with the entry, when you want grow the hash table because the
per-bucket chains get too long, you don't have to re-hash the keys -- you
can just use more bits of the original hash value.

Finally, there's an interesting hybrid of hash tables and tries -- a
multi-level hash table that uses different bits of the hash at each level
of the tree/trie (that is, it's a binary tree if you only use one bit of
the hash per level). When you hit a leaf node you do a comparison on
the full hash first before wasting time with a full string compare.
If the hashes are the same but the strings differ, then the strings are
probably *very* different (assuming a "good" hash, such as a CRC), so
hanging a "compressed trie" (as above) off that node should be a cheap
way to resolve the conflict (that is, the strings will very likely differ
in the first character!).

[Hmmm... It would be interesting to compare these three methods for
space & speed to implement "intern"...]


-Rob

-----
Rob Warnock, 8L-855 rp...@sgi.com
Applied Networking http://reality.sgi.com/rpw3/
Silicon Graphics, Inc. Phone: 650-933-1673
2011 N. Shoreline Blvd. FAX: 650-964-0811
Mountain View, CA 94043 PP-ASEL-IA

Marco Antoniotti

unread,
Jan 19, 1999, 3:00:00 AM1/19/99
to

rp...@rigden.engr.sgi.com (Rob Warnock) writes:

> That reminds me of a trick that can help hash tables: If you use a really
> good hash, one for which nearly every bit position contains nearly a full
> bit of information (a CRC-32 is such a hash), then if you store the *full*
> hash value with the entry, when you want grow the hash table because the
> per-bucket chains get too long, you don't have to re-hash the keys -- you
> can just use more bits of the original hash value.

It is not really a 'trick' it is the base for some fancy indexing
schemes used in data bases in alternative to BTree based data structures.

>
> Finally, there's an interesting hybrid of hash tables and tries -- a
> multi-level hash table that uses different bits of the hash at each level
> of the tree/trie (that is, it's a binary tree if you only use one bit of
> the hash per level). When you hit a leaf node you do a comparison on
> the full hash first before wasting time with a full string compare.
> If the hashes are the same but the strings differ, then the strings are
> probably *very* different (assuming a "good" hash, such as a CRC), so
> hanging a "compressed trie" (as above) off that node should be a cheap
> way to resolve the conflict (that is, the strings will very likely differ
> in the first character!).
>
> [Hmmm... It would be interesting to compare these three methods for
> space & speed to implement "intern"...]
>

Very good point indeed.

Kenneth P. Turvey

unread,
Jan 19, 1999, 3:00:00 AM1/19/99
to
On Mon, 18 Jan 1999 19:28:32 GMT, Barry Margolin <bar...@bbnplanet.com> wrote:
>In article <wkr9ss1...@mit.edu>, David Bakhash <ca...@mit.edu> wrote:
>>
>Be careful, though. I've seen some people try to use the package system to
>implement the same granularity of data hiding as they can get with some
>other OO languages' public/private distinctions. It may be possible to do
>this, but I would recommend against trying. The package system is tricky,
>even for expert CL programmers (it's no coincidence that the package system
>is described in Chapter 11 of both CltL editions and the ANSI spec). It's
>best used for coarse levels of sharing -- typically an application or
>library would define one or two packages. You'll likely go crazy if you
>try to implement per-class packages.

This is exactly what I'm doing in an application I'm currently working
on. Why would per-class packages be so difficult. They seem to
implement the exact behavior I desire without any great complexity.

Could you please expand on the difficulties that one might encounter
using this method of data encapsulation?

Thanks,

--
Kenneth P. Turvey <ktu...@SprocketShop.com>

Anyone who challenges the prevailing orthodoxy finds himself silenced
with surprising effectiveness. A genuinely unfashionable opinion is
almost never given a fair hearing. -- George Orwell

Gareth McCaughan

unread,
Jan 19, 1999, 3:00:00 AM1/19/99
to
Barry Margolin wrote:

[I wrote, replying to Erik, replying to Barry]


>> If he does much CLOS, he'll get so fed up of calling *everything*
>> virtual that he'll stop bothering. Then there'll be no problem.
>> (Until he goes back to C++ and starts asking about generic functions
>> in C++. (-: )
>

> He wasn't calling everything virtual, he was making a point in a discussion
> about programming methodologies. The fact that CLOS methods have the
> properties associated with virtual member functions in another language I
> won't name was germaine to his point. He expressed it very clearly, IMHO.

He said "all methods in CLOS are virtual". I have no quarrel with
that statement; I agree that it's a sensible thing for someone
whose background is in C++ to use; I have (so far as I can see)
no disagreement with you here. (I also understand why Erik is
bothered by people using foreign terminology to describe Lisp,
though I think he's wrong. That's why I made the observation that
anyone who spends a lot of time doing CLOS will get used to all
methods being "virtual" in CLOS, and therefore stop using the word
except when trying to relate CLOS and C++; for which reason, I
don't think his concern that you were encouraging someone in a
bad habit was justified.)

--
Gareth McCaughan Dept. of Pure Mathematics & Mathematical Statistics,
gj...@dpmms.cam.ac.uk Cambridge University, England.

Lieven Marchand

unread,
Jan 19, 1999, 3:00:00 AM1/19/99
to
ly...@cc.gatech.edu (Lyman S. Taylor) writes:

> In article <AXLo2.319$oD6....@burlma1-snr1.gtei.net>,


> Barry Margolin <bar...@bbnplanet.com> wrote:
> >In article <wkr9ss1...@mit.edu>, David Bakhash <ca...@mit.edu> wrote:

> ....


> >Be careful, though. I've seen some people try to use the package system to
> >implement the same granularity of data hiding as they can get with some
> >other OO languages' public/private distinctions.
>

> Yes. I'd recommend more of Modula3 type of approach with an
> interface file which did the namespace definition and the import/export
> business and a definition file which did the "defining". [ Even though
> keeping the interface and definition synchronized involves lots of
> manual grunt work.]
>

An other alternative I've seen is having an interface file just defining
the package and a macro DEFEXPORT to define exported functions (and
analoguous macros for exported macros, constants etc.). This solves the
syncronisation problem.

Does anyone have experience with this style?

I find it hard to understand what exactly about the CL package mechanism
hinders people to create many small (fine grained) name spaces. Perhaps
the fact that for easy interactive use they all would tend to get used
in COMMON-LISP-USER anyway. Garnet seems to be the exception as their
recommended style advises against using the package and for an explicit
package:symbol style.

--
Lieven Marchand <m...@bewoner.dma.be>
------------------------------------------------------------------------------
Few people have a talent for constructive laziness. -- Lazarus Long

Paul Rudin

unread,
Jan 20, 1999, 3:00:00 AM1/20/99
to
Barry Margolin <bar...@bbnplanet.com> writes:

[]


> The package system is tricky,
> even for expert CL programmers (it's no coincidence that the package system
> is described in Chapter 11 of both CltL editions and the ANSI spec). It's
> best used for coarse levels of sharing -- typically an application or
> library would define one or two packages.

Choosing an appropriate size for packages, epecially in large
applications, is something that is difficult to call and, I guess,
there's no "right" answer.

What do people tend to do? How large a logical unit should a package
be? How large in terms of lines of code or exported symbols should a
package be? What other measures might be useful when considering the
size of packages? Are mutually dependent (either directly or
indirectly) packages always a Bad Thing? Are there any style guides
that address these issues in detail?

TIA.

Barry Margolin

unread,
Jan 21, 1999, 3:00:00 AM1/21/99
to
In article <787nse$nih$1...@nickel.uunet.be>,

Lieven Marchand <m...@bewoner.dma.be> wrote:
>I find it hard to understand what exactly about the CL package mechanism
>hinders people to create many small (fine grained) name spaces. Perhaps
>the fact that for easy interactive use they all would tend to get used
>in COMMON-LISP-USER anyway. Garnet seems to be the exception as their
>recommended style advises against using the package and for an explicit
>package:symbol style.

Unlike most other things in Lisp, it can be tricky to make package changes
incrementally. Things work best if all the package settings (imports,
exports, shadowing, etc.) are made before any other uses of the package,
because of the way they pervade. If you have lots of little packages, the
web of relationships gets more complicated.

Lieven Marchand

unread,
Jan 22, 1999, 3:00:00 AM1/22/99
to
Barry Margolin <bar...@bbnplanet.com> writes:

> In article <787nse$nih$1...@nickel.uunet.be>,
> Lieven Marchand <m...@bewoner.dma.be> wrote:
> >I find it hard to understand what exactly about the CL package mechanism
> >hinders people to create many small (fine grained) name spaces.
>

> Unlike most other things in Lisp, it can be tricky to make package changes
> incrementally. Things work best if all the package settings (imports,
> exports, shadowing, etc.) are made before any other uses of the package,
> because of the way they pervade. If you have lots of little packages, the
> web of relationships gets more complicated.
>

Good point. The languages in which I most like the module/package
system, Ada and Modula-3, are both statically compiled and come with a
build system that figures out the package dependencies automagically
which takes care of the web of relationships. In such a language,
small packages are beneficial to avoid long compilation times for
localized changes, but in Lisp you already have a better way for this.

Graham Hughes

unread,
Jan 23, 1999, 3:00:00 AM1/23/99
to
>>>>> "David" == David Bakhash <ca...@mit.edu> writes:

David> this sounds really interesting. I must admit that I had
David> not known of TRIE, and still know nothing.

David> I highly appreciate this information, though I must admit
David> that I'm skeptical at how it can _possibley_ be faster than
David> a hashtable.

I'll use the regular trie here, because I know it better. Roughly
speaking:

Searching a trie and coming up with an element if one exists is
roughtly an O(n) operation, with n being the length of the indexing
string. So (trie-ref trie "foo") takes three steps. But if the
element does not exist (suppose there are no strings starting with #\f
in the trie), then it can take less than n steps to reject the
reference.

A hashing function on strings takes some number of characters and
combines them to come up with an index into an array. This is an O(n)
operation with n being the length of the indexing string (although
some implementations arbitrarily cut the hashing function off at say
the 10th character). This hashing function must be performed
regardless of success or failure, and so a hashing function always
requires n steps.

In practice, this isn't really very interesting, and tries may or may
not have poor cache behaviour etc., so a simple version just says hash
tables and tries are about as fast unless you deal with ridiculously
long strings with the same, say, first 10 characters as a prefix.

Tries have another advantage; they grow better than hash tables
(lookup *always* takes about O(n) in length of string, regardless of
how big the trie is; closed hash tables often are worse than O(n) in
the length of the string for large tables and open hash tables often
need resizing, so they're only O(n) in the length of the string
amortized), but they're also nice from a functional point of view
because you can add nodes to a trie and retain the old one.

It's an interesting data structure to be sure, but not particularly
well known, when it is better than a hash table it is often not much
better, implementing them without wasting a great deal of space can be
very annoying, and various other caveats. It is very useful for a
programmer in a purely functional language like Haskell because
Haskell doesn't *get* hash tables (copying the array around to update
one element is ridiculous but necessary), and I can see the use for a
symbol table in a compiler, but for the most part it's not really
worth bothering with, like splay trees[0].

[0] Splay trees are a famous amortized data structure that have
advantages but are often not *so* much better than e.g. red black
trees that it's worth the effort to implement them.
--
Graham Hughes <ghu...@cs.ucsb.edu>
PGP Fingerprint: 36 15 AD 83 6D 2F D8 DE EC 87 86 8A A2 79 E7 E6


David Bakhash

unread,
Jan 23, 1999, 3:00:00 AM1/23/99
to
hey,

It turns out that these data structures -- tries -- are exactly what I
was looking for. I guess it's not important to explain why, because
what I'm doing is so specfic that it's not too interesting. But since
you guys have been so helpful, I may as well try to thank you by
describing what I was doing.

I am making a huge string table. There are times when I'll look up
the string "the" and, right afterwards, I might look up the string
"them". In this case, it might help to retain the node that the "the"
got me to, and just keep going from there. Either way, the point is
that there's a lot of shared data, and so the potential for these
tries in my code is great. I was still hoping that someone would lend
me some code that implements tries (in CL, of course). but otherwise,
I'll simply try to find out where the algorithms are documented and
redo the work.

but I have, very successfully, implemented keyless hash-tables (I call
them `array-tables'), and they perform comparably to ACL's built-in
hash-tables, and (in my opinion) are extremely easily integrated into
programs which use hash-tables. I implementing a macro called
`array-table-loop', which lets you do things like:

(array-table-loop for entry being the array-table-values of atable
[do|maximize|..etc..] ...[etc.])

so you can still get your hash-table iteration working. CLOS made
making this very elegant, and it seems to run pretty fast. Oh, and
one nice thing about the keyless array-tables: they arn't limited in
the :test that they can use. so, for example, you can use
#'string-equal instead of #'equalp. I think this is a big plus, and
I'm still wondering why ANSI CL limits the :test for hash-tables to
(member #'eql #'eq #'equal #'equalp). I guess it's because
hash-tables in CL can store _any_ type of object, and #'string-equal,
#'char=, and friends don't operate on just any Lisp object.

dave

Hrvoje Niksic

unread,
Jan 23, 1999, 3:00:00 AM1/23/99
to
David Bakhash <ca...@engc.bu.edu> writes:

> I think this is a big plus, and I'm still wondering why ANSI CL
> limits the :test for hash-tables to (member #'eql #'eq #'equal
> #'equalp). I guess it's because hash-tables in CL can store _any_
> type of object, and #'string-equal,
> #'char=, and friends don't operate on just any Lisp object.

I don't think this is the real reason the test functions are limited
to those you enumerated.

The choice of test function is limited because the test function must
always correspond to a hash function (specifically, for any two keys A
and B, it must be that `A equals B' implies `hash(A) == hash(B)'; the
reverse needn't be true.)

So if the user specifies a test function of his own, he should also
specify a hash function of his own, and that hash function should be
able to cope with all the Lisp types that could be a part of the key.
Implementing such a hash function without delving into object
allocation internals does not sound feasible.

Lyman S. Taylor

unread,
Jan 23, 1999, 3:00:00 AM1/23/99
to
In article <cxjk8yd...@engc.bu.edu>,
David Bakhash <ca...@engc.bu.edu> wrote:
...

>I'm still wondering why ANSI CL limits the :test for hash-tables to
>(member #'eql #'eq #'equal #'equalp). I guess it's because

If your key is the object identity EQ ( or to a lesser extent EQL )
think about what happens when you do a garbage collection. Namely
you have a nonconservative collector that moves objects.

When those objects "move" their identity changes. The gc may
change all of the references to the "new" identity, but the
hash table was indirectly dependent upon those "old" ones. Namely
your object will likely hash to different spot now.

That's OK. you can simply rehash the whole table before the next lookup.
Or follow some other amortized resolution scheme.

EQUAL and EQUALP should be uninpacted by this. They don't care because
they're based on a different sense of identity. So you need not only
a test but also something that "says" what sort of policy you do
post GC. I think common lisp designers chose to let only the
implementors worry about these issues.

P.S. this may mean that EQ is probably _not_ a good :TEST argument
to your new construct. You just may not have been bitten by
it yet.
[ if you exhaustively search the table until all keys are
examined then fine. However, you can't really say that
it has O(1) lookup characteristics. ]

You may have used SXHASH in all cases regardless of resolution
predicate. The negates the problem of starting to "look" in a
different place. I'm not sure the designeres wanted to constrain the
implementors to this or if this doesn't have hash code distrubution
problems where there are more collisons than "necessary".
[ For instance I think I posted an example where an implementation
use the size of a vector as its SXHASH code, irregardless of
contents. A hash table of same size vectors would have horrific
( OK, linear) performance in this case. ]

Steve Gonedes

unread,
Jan 24, 1999, 3:00:00 AM1/24/99
to

David Bakhash <ca...@engc.bu.edu> writes:

< hey,
<
< It turns out that these data structures -- tries -- are exactly what I
< was looking for. I guess it's not important to explain why, because
< what I'm doing is so specfic that it's not too interesting. But since
< you guys have been so helpful, I may as well try to thank you by
< describing what I was doing.
<
< I am making a huge string table. There are times when I'll look up
< the string "the" and, right afterwards, I might look up the string
< "them". In this case, it might help to retain the node that the "the"
< got me to, and just keep going from there. Either way, the point is
< that there's a lot of shared data, and so the potential for these
< tries in my code is great. I was still hoping that someone would lend
< me some code that implements tries (in CL, of course). but otherwise,
< I'll simply try to find out where the algorithms are documented and
< redo the work.

I wrote some tries just recently... I think I got the main idea from
AICP - I've read about them in many algorithm books, but they usually
make it seem much more complicated than it needs to be. These aren't
too `fancy', but that's what I like about them. Ideally you would be
able to specify the equality function and other such-nots...

;;; Tries

(defstruct (trie)
(value nil)
(arcs ())
parent)

(defun follow-trie-no-extend (key trie)
(cdr (assoc key (trie-arcs trie))))

(defun follow-trie-extending (key trie)
(let ((arc (assoc key (trie-arcs trie))))
(if arc (cdr arc)
(let ((newtrie (make-trie :parent trie)))
(push (cons key newtrie) (trie-arcs trie))
newtrie))))

(defun find-trie-extending (key trie)
(dotimes (i (length key) trie)
(setq trie (follow-trie-extending (aref key i) trie))))

(defun find-trie (key trie)
(loop for code across key
for newtrie = (follow-trie-no-extend code trie)
then (follow-trie-no-extend code newtrie)
while newtrie do (setq trie newtrie)
finally (return trie)))

(defun put-trie (key trie value)
(setf (trie-value (find-trie-extending key trie)) value))

(defun get-trie (key trie)
(let ((found (find-trie key trie)))
(if (not found) nil
(trie-value found))))

(defsetf get-trie put-trie)

(defvar *trie* (make-trie))

(setf (get-trie "them" *trie*) "Nope!")
(setf (get-trie "the" *trie*) "Closer")
(setf (get-trie "the dog" *trie*) "has hair")

(trie-value (find-trie "them" *trie*)) => "Nope!"
(trie-value (find-trie "the" *trie*)) => "Closer"
(get-trie " dog" (trie-parent (find-trie "them" *trie*))) => "has hair"

Useful stuff; can be much better than hash tables.

0 new messages