[Q] Sorting a Hash

Sandeep Koranne

unread,

Mar 9, 2000, 3:00:00 AM3/9/00

to

Hello,
Is it possible to sort a Hash based on a function on the
values ?

TIA
Sandeep

--
"Lets Make Things Better"
ED&T Test, WAY 3.37
Philips Research Laboratories Botstraat, 7
5656 AA, Eindhoven 5654 NL, Eindhoven
The Netherlands The Netherlands
Phone: +31-(040)-27 45250 +31-(040)-2573492
Fax : +31-(040)-27 44626
E-mail: sandeep...@philips.com

Barry Margolin

unread,

Mar 9, 2000, 3:00:00 AM3/9/00

to

In article <yzwn1o8...@natlab.research.philips.com>,

Sandeep Koranne <kor...@natlab.research.philips.com> wrote:
>Hello,
>Is it possible to sort a Hash based on a function on the
>values ?

What does it mean to "sort" a hash table? Hash tables don't have a defined
ordering. They're typically implemented using hashing, which means that
the location of an entry is based on computing a function on the key.
Reordering the hash table would totally break this.

--
Barry Margolin, bar...@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Christopher R. Barry

unread,

Mar 9, 2000, 3:00:00 AM3/9/00

to

Sandeep Koranne <kor...@natlab.research.philips.com> writes:

> Hello,
> Is it possible to sort a Hash based on a function on the
> values ?

Sounds like you want a heap, not a hash table.

Christopher

Pierre R. Mai

unread,

Mar 9, 2000, 3:00:00 AM3/9/00

to

Sandeep Koranne <kor...@natlab.research.philips.com> writes:

> Is it possible to sort a Hash based on a function on the
> values ?

You can't sort a hash table, because hash tables don't maintain any
kind of order (this is a feature of hashing, not a specific of CL).

What you can do is create a sequence of keys that will give you
ordered access to the hash table (an index, if you like).

If you know in advance the order you'll need, you can keep the index
up-to-date on the fly on insertions/changes in the hash-table, hiding
all of this behind a struct and setf-functions, so that

(get-value-ordered-hash key hash)

works as expected, as do

(setf (get-value-ordered-hash key hash) value)

and

(rem-value-ordered-hash key hash)

which will also update the index as needed.

Then

(get-value-ordered-hash-values hash)

will return the ordered sequence of values, and possibly

(get-value-ordered-hash-keys hash)

will return the corresponding sequence of keys. Whatever you want.

Or you do it the other way round, keeping the values in an ordered
vector, and storing the indices into this in the hash-table. Either
way some handywork is needed.

OTOH, you might be better of with a different data-structure
altogether, such as some forms of self-adjusting trees (=> splay
trees), which will keep items in key order, and give logarithmic
key-based access. OTOH you wanted to it sorted on values... Maybe
giving more information on your real set of requirements would help
here.

Regs, Pierre.

--
Pierre Mai <pm...@acm.org> PGP and GPG keys at your nearest Keyserver
"One smaller motivation which, in part, stems from altruism is Microsoft-
bashing." [Microsoft memo, see http://www.opensource.org/halloween1.html]

Erik Naggum

unread,

Mar 9, 2000, 3:00:00 AM3/9/00

to

* Sandeep Koranne <kor...@natlab.research.philips.com>

| Is it possible to sort a Hash based on a function on the values?

as others have indicated in indirect ways, this is not a well-defined
question, since the result of sorting a hash table would have to be
something other than a hash table.

so we need to know what kind of result you want in order to answer your
question. e.g., yes, you can easily _print_ the (key value) pairs of a
hash table sorted on the key or on the value, but you would do that by
creating an intermediary sequence of keys or whatever, sort on that, and
then do the actual printing.

#:Erik

Robert Monfera

unread,

Mar 10, 2000, 3:00:00 AM3/10/00

to

Sandeep Koranne wrote:
>
> Hello,

> Is it possible to sort a Hash based on a function on the
> values ?

Not directly, unless your hash function is not a hash function(*). Your
question is underspecified, but I'll try (maybe you know all these
options):

You could create a simple vector to contain all keys, and you would
define the :KEY function of SORT such that it GETHASHes the value or a
part of the value on which sorting is based. The disatvantage is that
it will GETHASH on every comparison (no big deal if the key is a fixnum
and collisions are rare).

You could avoid it if you aren't worried about simultaneously ordering
the key with the value, or you have the key in the value also, or if you
CONS key - value pairs together. Your vector will contain (refer to)
these objects, and the :KEY function of SORT would be used appropriately
(e.g., #'(lambda (key-value-pair) (first key-value-pair)) or simply
#'first).

Another way is to store most of your data in an array (or arrays), and
create a thin hash table, where the value is just an index to the
array(s). If data are not static, read Pierre's notes on file access
and just translate it to arrays:

<87sny0y...@orion.dent.isdn.cs.tu-berlin.de>

Implementation notes:

In ACL 5.0.1, it's fastest to use a SIMPLE-VECTOR with element type T -
even if objects are fixnums. See Duane's postings several months ago.

In LispWorks 4.1, use a list instead of a vector (SORT works on
sequences). Although a list takes up much more space than a vector,
SORT would convert the vector to a list as the first step anyway. A
positive side effect is that you can't possibly bump into the maximum
array size.

(*) You could implement bucket sorting with hash tables simply by using
an appropriate "hashing" function aware of the data distribution.
Nothing guarantees that it works, but most likely the hash table
iterators go through the hash vector from row 0 to n, resulting in the
ordered delivery of values. Implementing bucket sort yourselves is
almost the same effort, and you could make it platform-independent.

Robert

Sandeep Koranne

unread,

Mar 10, 2000, 3:00:00 AM3/10/00

to

Sorry for not explaining the question in more detail.
I kow that you cannot "sort" the Hash itself, but
I wanted an ordered "lookup".

Step I: Many items are inserted into the Hash :
I dont want insert to take more time : like a Heap.
Now we close the hash for writing in :

Step II: I want a function which will return the
item haveing the max value :

The way I am doing it right now :

(defun my-predicate-function (item)
(gethash item *Hash-Table*)

;; There is a list of items in *item-list*

(setq *sorted-item-list* (sort *item-list* #'> :key #'my-predicate-function))

;; Then *sorted-item-list* contains the ordered items.

What I wanted to know was is there some standard way of doing this.
Main thig to remember is that the Hash-Table is used only
for reading in Step II.

Joe Marshall

unread,

Mar 10, 2000, 3:00:00 AM3/10/00

to

Sandeep Koranne <kor...@natlab.research.philips.com> writes:

> Sorry for not explaining the question in more detail.
> I kow that you cannot "sort" the Hash itself, but
> I wanted an ordered "lookup".
>
> Step I: Many items are inserted into the Hash :
> I dont want insert to take more time : like a Heap.
> Now we close the hash for writing in :
>
> Step II: I want a function which will return the
> item haveing the max value :
>
> The way I am doing it right now :
>
> (defun my-predicate-function (item)
> (gethash item *Hash-Table*)
>
> ;; There is a list of items in *item-list*
>
> (setq *sorted-item-list* (sort *item-list* #'> :key #'my-predicate-function))
>
> ;; Then *sorted-item-list* contains the ordered items.

Sorting the list to get the max value is a little wasteful,
but I'm assuming that you'll use the sorted list for other
things. If you only want the max value, just cache it
when you do Step I.

The function MY-PREDICATE-FUNCTION should be running in
O(1) time if the hash is effective, so using it as the key
to sort shouldn't increase the time except by a constant
factor. The sort should run at O(n log n), and there is
no way to improve on that for random data. If you try to
move some of the work of sorting into Step I, you'll only
increase the time spent in Step I (which you said you don't
want to do).

I don't know your exact application, but I would point out
that a good heap-based algorithm (like red-black trees) will
give you O(log n) lookup and insert time. Log n grows
really slowly, so for nearly all practical purposes it will
only be a constant factor difference from using a hash table.
If your hashing algorithm is much slower than your sorting
predicate, a heap will beat out a hash table.

--
~jrm

Marco Antoniotti

unread,

Mar 10, 2000, 3:00:00 AM3/10/00

to

Joe Marshall <jmar...@alum.mit.edu> writes:

> Sandeep Koranne <kor...@natlab.research.philips.com> writes:
>
...

>
> I don't know your exact application, but I would point out
> that a good heap-based algorithm (like red-black trees) will
> give you O(log n) lookup and insert time. Log n grows
> really slowly, so for nearly all practical purposes it will
> only be a constant factor difference from using a hash table.
> If your hashing algorithm is much slower than your sorting
> predicate, a heap will beat out a hash table.

And... as a shameless plug :) Red-Black trees are available in the
AI.Repository and soon they will be in the CLOCC.

Cheers

--
Marco Antoniotti ===========================================
PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
tel. +39 - 06 68 10 03 17, fax. +39 - 06 68 80 79 26
http://www.parades.rm.cnr.it/~marcoxa