In article <yzwn1o8ma92....@natlab.research.philips.com>, Sandeep Koranne <kora...@natlab.research.philips.com> wrote:
>Hello, >Is it possible to sort a Hash based on a function on the >values ?
What does it mean to "sort" a hash table? Hash tables don't have a defined ordering. They're typically implemented using hashing, which means that the location of an entry is based on computing a function on the key. Reordering the hash table would totally break this.
-- Barry Margolin, bar...@bbnplanet.com GTE Internetworking, Powered by BBN, Burlington, MA *** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups. Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
Sandeep Koranne <kora...@natlab.research.philips.com> writes: > Is it possible to sort a Hash based on a function on the > values ?
You can't sort a hash table, because hash tables don't maintain any kind of order (this is a feature of hashing, not a specific of CL).
What you can do is create a sequence of keys that will give you ordered access to the hash table (an index, if you like).
If you know in advance the order you'll need, you can keep the index up-to-date on the fly on insertions/changes in the hash-table, hiding all of this behind a struct and setf-functions, so that
(get-value-ordered-hash key hash)
works as expected, as do
(setf (get-value-ordered-hash key hash) value)
and
(rem-value-ordered-hash key hash)
which will also update the index as needed.
Then
(get-value-ordered-hash-values hash)
will return the ordered sequence of values, and possibly
(get-value-ordered-hash-keys hash)
will return the corresponding sequence of keys. Whatever you want.
Or you do it the other way round, keeping the values in an ordered vector, and storing the indices into this in the hash-table. Either way some handywork is needed.
OTOH, you might be better of with a different data-structure altogether, such as some forms of self-adjusting trees (=> splay trees), which will keep items in key order, and give logarithmic key-based access. OTOH you wanted to it sorted on values... Maybe giving more information on your real set of requirements would help here.
Regs, Pierre.
-- Pierre Mai <p...@acm.org> PGP and GPG keys at your nearest Keyserver "One smaller motivation which, in part, stems from altruism is Microsoft- bashing." [Microsoft memo, see http://www.opensource.org/halloween1.html]
* Sandeep Koranne <kora...@natlab.research.philips.com> | Is it possible to sort a Hash based on a function on the values?
as others have indicated in indirect ways, this is not a well-defined question, since the result of sorting a hash table would have to be something other than a hash table.
so we need to know what kind of result you want in order to answer your question. e.g., yes, you can easily _print_ the (key value) pairs of a hash table sorted on the key or on the value, but you would do that by creating an intermediary sequence of keys or whatever, sort on that, and then do the actual printing.
> Hello, > Is it possible to sort a Hash based on a function on the > values ?
Not directly, unless your hash function is not a hash function(*). Your question is underspecified, but I'll try (maybe you know all these options):
You could create a simple vector to contain all keys, and you would define the :KEY function of SORT such that it GETHASHes the value or a part of the value on which sorting is based. The disatvantage is that it will GETHASH on every comparison (no big deal if the key is a fixnum and collisions are rare).
You could avoid it if you aren't worried about simultaneously ordering the key with the value, or you have the key in the value also, or if you CONS key - value pairs together. Your vector will contain (refer to) these objects, and the :KEY function of SORT would be used appropriately (e.g., #'(lambda (key-value-pair) (first key-value-pair)) or simply #'first).
Another way is to store most of your data in an array (or arrays), and create a thin hash table, where the value is just an index to the array(s). If data are not static, read Pierre's notes on file access and just translate it to arrays:
In ACL 5.0.1, it's fastest to use a SIMPLE-VECTOR with element type T - even if objects are fixnums. See Duane's postings several months ago.
In LispWorks 4.1, use a list instead of a vector (SORT works on sequences). Although a list takes up much more space than a vector, SORT would convert the vector to a list as the first step anyway. A positive side effect is that you can't possibly bump into the maximum array size.
(*) You could implement bucket sorting with hash tables simply by using an appropriate "hashing" function aware of the data distribution. Nothing guarantees that it works, but most likely the hash table iterators go through the hash vector from row 0 to n, resulting in the ordered delivery of values. Implementing bucket sort yourselves is almost the same effort, and you could make it platform-independent.
Sandeep Koranne <kora...@natlab.research.philips.com> writes: > Sorry for not explaining the question in more detail. > I kow that you cannot "sort" the Hash itself, but > I wanted an ordered "lookup".
> Step I: Many items are inserted into the Hash : > I dont want insert to take more time : like a Heap. > Now we close the hash for writing in :
> Step II: I want a function which will return the > item haveing the max value :
> ;; Then *sorted-item-list* contains the ordered items.
Sorting the list to get the max value is a little wasteful, but I'm assuming that you'll use the sorted list for other things. If you only want the max value, just cache it when you do Step I.
The function MY-PREDICATE-FUNCTION should be running in O(1) time if the hash is effective, so using it as the key to sort shouldn't increase the time except by a constant factor. The sort should run at O(n log n), and there is no way to improve on that for random data. If you try to move some of the work of sorting into Step I, you'll only increase the time spent in Step I (which you said you don't want to do).
I don't know your exact application, but I would point out that a good heap-based algorithm (like red-black trees) will give you O(log n) lookup and insert time. Log n grows really slowly, so for nearly all practical purposes it will only be a constant factor difference from using a hash table. If your hashing algorithm is much slower than your sorting predicate, a heap will beat out a hash table.
Joe Marshall <jmarsh...@alum.mit.edu> writes: > Sandeep Koranne <kora...@natlab.research.philips.com> writes:
...
> I don't know your exact application, but I would point out > that a good heap-based algorithm (like red-black trees) will > give you O(log n) lookup and insert time. Log n grows > really slowly, so for nearly all practical purposes it will > only be a constant factor difference from using a hash table. > If your hashing algorithm is much slower than your sorting > predicate, a heap will beat out a hash table.
And... as a shameless plug :) Red-Black trees are available in the AI.Repository and soon they will be in the CLOCC.
Cheers
-- Marco Antoniotti =========================================== PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY tel. +39 - 06 68 10 03 17, fax. +39 - 06 68 80 79 26 http://www.parades.rm.cnr.it/~marcoxa