Will 2.0 come with a good ordered hash?
Thanks
What's buggy about it? I have an ohash in the code to PDF::Writer that
you're welcome to use.
> Will 2.0 come with a good ordered hash?
The native hashing mechanism has been changed to preserve that order, I think.
-austin
--
Austin Ziegler * halos...@gmail.com
* Alternate: aus...@halostatue.ca
It threw an error on '==', I did not dig much deeper after that.
> I have an ohash in the code to PDF::Writer that
> you're welcome to use.
I'll take a look, thanks.
> > Will 2.0 come with a good ordered hash?
>
> The native hashing mechanism has been changed to preserve that order, I
think.
That's good to know. Is this also in the 1.8 stream, or just 1.9 ?
>> Will 2.0 come with a good ordered hash?
> The native hashing mechanism has been changed to preserve that order, I think.
That sounds a bit silly. Wouldn't there be a performance overhead
associated with that? Shouldn't that price be paid only by those
people who want that functionality, given that it can be easily
implemented in C or Ruby?
A C-based extension providing an InsertOrderHash, distributed with
Ruby, would be very handy.
Gavin
At Thu, 2 Dec 2004 12:30:00 +0900,
Austin Ziegler wrote in [ruby-talk:122119]:
> The native hashing mechanism has been changed to preserve that order, I think.
No.
--
Nobu Nakada
Whenever I hear "ordered hash" then I think of a hash whose keys are
ordered according to a definable order - not necessarily insertion order.
In fact, insertion order seems to me a very special case. Is this really
widely used?
> Will 2.0 come with a good ordered hash?
Dunno.
Kind regards
robert
This works fine for me:
class OrderedHash < Hash
def initialize
@order_of_keys = []
super
end
def []=(key, value)
@order_of_keys.delete(key) if has_key?(key)
@order_of_keys << key
super
end
def each
@order_of_keys.each do |key|
yield key, self[key]
end
end
end
Regards,
Michael
Sure, but then any operation that is sequence-sensitive (it modifies state)
can be shoehorned into implicit ordering keys. This does not make it
necessarily a good way to understand (i.e. model or specify) them, or to
implement them, imho.
stack.push(x)
push order is merely another implicit ordering key?
array merely a map from explicit (except for << and the like) integer keys
to value?
stack merely a map from implicit integer keys to value?
Having an ordered Hash is like saying that arrays should be indexed by
strings--it's simply not what they are meant to be. My RDSL library
(http://rdsly.rubyforge.org/) contains an implementation of the Treap
ADT. It works like a Hash, but preserves order. It is implemented much
like a Tree but with Heap-like abilities--hence the name. Anyway, the
items are ordered, not by insertion but by comparison. It does not use
a two-level scheme like OrderedHash seems to use, so there's no wasted
memory on keeping ordering information. Also, benchmarks have shown
that it works very fast, even though it's purely ruby.
If a Treap isn't your style, RDSL also includes a SkipList
implementation, that has similar capabilities to that of a Treap; it's
yet another sorted ADT.
Please, if you haven't already commited to some solution, give RDSL a
try. It needs some testing. I'm going to get back to working on it
when I get time, but until then, some love from other users wouldn't
hurt.
End of plug. Thanks,
nikolai
--
::: name: Nikolai Weibull :: aliases: pcp / lone-star / aka :::
::: born: Chicago, IL USA :: loc atm: Gothenburg, Sweden :::
::: page: www.pcppopper.org :: fun atm: gf,lps,ruby,lisp,war3 :::
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}
> On Thu, 2 Dec 2004 23:37:34 +0900, Nikolai Weibull
> <mailing-lis...@rawuncut.elitemail.org> wrote:
>> * itsme213 <itsm...@hotmail.com> [Dec 02, 2004 14:00]:
>>> Is there a pure-ruby ordered hash? I'm looking for something that will
[...]
>> Having an ordered Hash is like saying that arrays should be indexed by
>> strings--it's simply not what they are meant to be.
>
> Be that as it may, it is still necessary to have an insertion-ordered
> hash-like object.
Is anyone friendly with the Lua crowd? Could they donate us a
table? :-) http://www.lua.org/pil/11.html
Hugh
Which one? The Treap you said is comparison ordered, and you didn't
make it clear whether the SkipList is insertion ordered. The order for
PDF::Writer *must* be insertion ordered. I'll happily use RDSL in a
future version of PDF::Writer rather than the hackish ohash
implementation that I have, but ...
I repeat from my earlier message:
"if you gained significant speed or ease of
implementation from using insertion order specifically."
A stack is simple to implement as a list -- much more simple than
a lookup tree -- and O(1) for push/pop operations. A hash provides
O(1) key lookup instead of O(log n) key lookup. So does an array.
These are obvious.
It gets a little less obvious when you start comparing the things
we were _actually_ talking about, which are considerably more
complex.
a) Multiple key ordered data structure
+ General, can be used for arbitrary multiple key lookups
- O(log n) operations
b) Hash + doubly-linked list
+ O(1) operations (_if_ you implement the list traversal right)
- Can only handle insertion order
In many cases, you might be using insertion order as a synonym
for time. If you wanted at some later point to permit
the user to insert objects retroactively, you'd have to
move to an explicit representation. (a) would let you do that;
(b) would not.
In an interpreted, high-level language, I'd probably prefer to
have a good, solid implementation of (a), because it's more useful
to more people, and for the size of things we usually do in Ruby,
O(log n) operations in fast C for N in the thousands issn't too
much slower -- relative to the speed of the entire program running
in ruby -- as using an O(1) operation in C.
(The arguments above should not be construed as suggesting
that it should also replace general hash tables -- they're
more commonly used, and so benefit more from a little attention
to speed).
If this were in the kernel, it would usually be implemented the
other way.
-Dave
--
work: d...@lcs.mit.edu me: d...@pobox.com
MIT Laboratory for Computer Science http://www.angio.net/
<snipcode>
well, but it works just unless you use a different method to access it
(i.e. shift). We definitely need an Association/Map mixin, imo.
> > Having an ordered Hash is like saying that arrays should be indexed
> > by strings--it's simply not what they are meant to be.
> Be that as it may, it is still necessary to have an insertion-ordered
> hash-like object.
I just gave you one--you seem to have missed the gist of my posting.
(You chopped it right off it seems...)
> Austin Ziegler wrote:
> > On Thu, 2 Dec 2004 23:37:34 +0900, Nikolai Weibull
> >>
> >>Having an ordered Hash is like saying that arrays should be indexed by
> >>strings--it's simply not what they are meant to be.
> >
> > Be that as it may, it is still necessary to have an insertion-ordered
> > hash-like object.
> >
> > I use it in PDF::Writer for page objects that can be referred to
> > meaningfully -- but still render in the order in which they were
> > inserted.
>
> How is an "OrderedHash" different from something we might call a
> "HashedArray", i.e. an array where elements also have a string-like
> address? With this hybrid type, the array-ness and the hash-ness are
> orthogonal and neither takes precedence. To me, this says that the only
> solution (that can maintain an arbitrary ordering) is something like
> Michael's two-layered implementation, where a hash and an array both
> refer to the same set of objects internally. Hashing destroys ordering,
> so you have to pay for it somewhere else.
>
> A more flexible altnerative to Michael's OrderedHash implementation
> might embed an index within each element, or maintain a parallel hash
> with indices, then sort the elements by index on demand. If you access
> the elements in sorted order infrequently, this might be a win for
> performance.
>
> SkipList and related patterns impose an external sort-ordering on the
> hash elements, so they would not suit Austin's requirements.
>
Why is everybody in this thread so fixed on using array and hash. As I see it
the requirements are:
insert(k1, k2, value)
delete_by_k1(k1, value)
delete_by_k2(k2, value)
get_by_k1(k1)
get_by_k2(k2)
each_by_k1
and maybe things like delete_min etc. Any datastructure that fullfills these
purposes fits. If I can construct one by using an Array and a Hash thats good,
but if I can achieve this using a fibonaccy heap or a skiplist or something
else that should not matter.
Maybe we should first get the specifications right? (Austin, what exactly are
your needs, have I hit them more or less?)
kind regards,
Brian
--
Brian Schröder
http://www.brian-schroeder.de/
> Austin Ziegler wrote:
>> On Thu, 2 Dec 2004 23:37:34 +0900, Nikolai Weibull
>>>
>>> Having an ordered Hash is like saying that arrays should be indexed
>>> by
>>> strings--it's simply not what they are meant to be.
>> Be that as it may, it is still necessary to have an insertion-ordered
>> hash-like object.
>> I use it in PDF::Writer for page objects that can be referred to
>> meaningfully -- but still render in the order in which they were
>> inserted.
>
> How is an "OrderedHash" different from something we might call a
> "HashedArray", i.e. an array where elements also have a string-like
> address?
I think an ordered hash typically keeps track of the insertion order.
So you basically have two ways of accessing elements - by the order of
insertion (index?) and the hash key.
> With this hybrid type, the array-ness and the hash-ness are
> orthogonal and neither takes precedence. To me, this says that the
> only solution (that can maintain an arbitrary ordering) is something
> like Michael's two-layered implementation, where a hash and an array
> both refer to the same set of objects internally. Hashing destroys
> ordering, so you have to pay for it somewhere else.
>
Yeah. Seems like you have to have an array or linked list combined
with a hash table to make this work.
Here is one way to do it:
Have a hash table which maps hash keys to insertion indices like so
(note: key.hash #=> hash key):
Hash Table: { hash key => insertion index, hash key => insertion index }
Then an array, indexed by insertion order which contains key, value
pairs.
Array: [ [ key, value ], [key, value] ]
So to find an element you generate its hash key from the given key.
Then get the insertion index from the hash table (may be the wrong
insertion index), if it exists. Check to make sure it is the right
insertion index by comparing the key at the insertion index to the key
you were given. If it is then you return the value, if it isn't the
either the hash doesn't contain that key or it is deeper in the hash
bucket.
You don't have to do it the way described above, but it has some
advantages. The main one being you can go both ways insertion index =>
key and key => insertion index. This is a very useful property. The
other big advantage is iteration is very fast. You have everything you
need for any type of iteration in the array - the key, the value, and
the insertion index.
> A more flexible altnerative to Michael's OrderedHash implementation
> might embed an index within each element, or maintain a parallel hash
> with indices, then sort the elements by index on demand. If you
> access the elements in sorted order infrequently, this might be a win
> for performance.
You could also include a reorder function, it would be kind of
expensive, but it is possible because you can go both ways insertion
index <=> key.
> SkipList and related patterns impose an external sort-ordering on the
> hash elements, so they would not suit Austin's requirements.
>
-Charlie
How is an "OrderedHash" different from something we might call a
"HashedArray", i.e. an array where elements also have a string-like
address? With this hybrid type, the array-ness and the hash-ness are
orthogonal and neither takes precedence. To me, this says that the only
solution (that can maintain an arbitrary ordering) is something like
Michael's two-layered implementation, where a hash and an array both
refer to the same set of objects internally. Hashing destroys ordering,
so you have to pay for it somewhere else.
A more flexible altnerative to Michael's OrderedHash implementation
might embed an index within each element, or maintain a parallel hash
with indices, then sort the elements by index on demand. If you access
the elements in sorted order infrequently, this might be a win for
performance.
SkipList and related patterns impose an external sort-ordering on the
hash elements, so they would not suit Austin's requirements.
--
Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/>
Which, if you had Brian's mechanism as a base, you could
implement as:
class OrderedThingy < BriansMultiKeyThingy
def initialize
@sequence = 0
end
def insert(key, value)
# XXX: Note lack of thread safety here.
super(@sequence, key, value)
@sequence += 1
end
def delete(key)
delete_by_k2(key)
end
def get(key)
get_by_k2(key)
end
def each_by_insertion_order(&proc)
each_by_k1(&proc)
end
end
Brian's point is the right one to consider: insertion order, in a
sense, is merely another key, one that's defined implicitly. The
only reason to not use the general case were if you gained significant
speed or ease of implementation from using insertion order specifically.
Insertion order _is_ particularly easy to manage using a linked
list in a way that the doubly keyed implementation above isn't
(though the naive list.delete operation has to go - you'd be better
off maintaining a doubly linked list so that you can do the delete
from the key), but unless you're super concerned with performance,
the generality of the above implementation is a pretty nice thing
to have.
Actually, my needs are:
insert(key, value)
delete(key)
get(key)
each_by_insertion_order
I don't want to manage two sets of keys.
Basically, I need to be able to manipulate a number of objects on
the page queue, but they have to be rendered in the order inserted.
I handle other issues through transaction support via
Transaction::Simple.
You are right.
> In fact, insertion order seems to me a very special case. Is this really
> widely used?
It's not uncommon. And not meant to replace comparison -based ordered ADTs.
Be that as it may, it is still necessary to have an insertion-ordered
hash-like object.
I use it in PDF::Writer for page objects that can be referred to
meaningfully -- but still render in the order in which they were
inserted.
-austin
I may have to add some Hash conveniences, like default blocks. Thanks.
> Which one? The Treap you said is comparison ordered, and you didn't
> make it clear whether the SkipList is insertion ordered. The order for
> PDF::Writer *must* be insertion ordered. I'll happily use RDSL in a
> future version of PDF::Writer rather than the hackish ohash
> implementation that I have, but ...
Hm, OK, sorry. I misunderstood. I thought you wanted comparison.
At Thu, 2 Dec 2004 15:51:46 +0900,
Yukihiro Matsumoto wrote in [ruby-talk:122141]:
> Nobu himself made a patch to preserve hash order. I have not decided
> yet to merge it. The only concern is performance.
The performance wouldn't increase for insertion, iteration and
lookup, but do for direct deleting (except for st_delete_safe).
And the memory usage increases 2 pointers for each hash
entries.
--
Nobu Nakada
Hm, 2 pointers for each hash entry, that's an increase of 50%. On the
other side, 8 bytes more per hash entry isn't very much (if it's useful
enough).
Regards,
Michael
# Assumes you're not initializing with values at the beginning, i.e.
# h = InsertOrderHash.new
# Extension to the general case is left as an exercise for the reader
# (I've always wanted to say that ;-)
class InsertOrderHash < Hash
def initialize(*args)
@insert_order = Array.new
end
def []=(key, val)
@insert_order.push key unless self.key? key
super
end
def each # Assumes block follows
@insert_order.each do |key|
Definitely!
> Secondly, insertion order should IMO be the default -- because of
> hash literals, if for no other reason.
Hm, I'd prefer the "natural" order of key elements, i.e. the order defined
by <=>. The reason being that you can get away with no additional data
stored. If you make insertion oder the default order you're likely going
to have to wrap keys with some kind of other object that maintains an
insertion count (which then is used for ordering) and references the
original key. If you base order on the key values only you can safe all
the hassle with maintaining insertion oder information, taking care of key
comparisons, hash values etc.
> If I say {this=>1, that=>2, other=>3} then that is what I expect
> (assuming there is any order at all). I would not want my hash
> literals scrambled (assuming some ordering, but one not based on
> insertion).
I can understand that from the point of view of source code, but in which
situations in applications does this really matter?
> But as long as a hash HAS an order, you can order it any way you
> like -- just as we can sort an array any way we like.
Yep, I'd do it similar to java.util.TreeMap - set a block as comparator
during hash creation that receives two values and returns -1,0 or 1 (same
as <=> does).
> And yes, I know there are fundamental differences between hashes
> and arrays. ;)
What differences? Just kiddin'... I'm in a strange mood today. :-)
Kind regards
robert
There are many more methods than just #[]= and #each that should be
(re)defined for an insert-order Hash --- including #delete, #delete_if,
#clear, #keys, #values, and others. The OrderedHash on RAA goes a
long way towards (re)defining the methods that might make sense
for such an object.
Fixing (or overriding) the #== method (depending on the semantics
you desire) would probably be a trivial matter (adding a check
that the second hash is also an OrderedHash seems a likely choice). That
module's author has been quick to respond to bug reports/fixes in the
past.
regards,
andrew
<kidding>
but sometimes they are :)
>> a=[1,2]
=> [1, 2]
>> b=['one' ,'two']
=> ['one', 'two']
>> c=a,b
=> [[1, 2], ['one', 'two']]
>> c.assoc 'one'
=> ['one', 'two']
</kidding>
I'll reply to this, since an ordered hash is on my wish list.
First of all, if it HAS an order, then it can be sorted (reordered)
in any way you want. The (first) important thing to me is that it
HAVE an order.
Secondly, insertion order should IMO be the default -- because of
hash literals, if for no other reason.
If I say {this=>1, that=>2, other=>3} then that is what I expect
(assuming there is any order at all). I would not want my hash
literals scrambled (assuming some ordering, but one not based on
insertion).
But as long as a hash HAS an order, you can order it any way you
like -- just as we can sort an array any way we like.
And yes, I know there are fundamental differences between hashes
and arrays. ;)
Hal
But that's okay: you'll get bitten by an exception if you insert such
keys. It will work perfectly for the most common key types (I guess these
are String and maybe Fixnum). If you want such keys then you can easily
define your own ordering with the lambda you provide.
class Foo
attr_accessor :bar
end
oh = OrderedHash( lambda {|foo1,foo2| foo1.bar.size <=> foo2.bar.size} )
oh[Foo.new]="bar"
...
Kind regards
robert
I meant "natural" to refer to the ordering of the instance class. I guess
I didn't make that clear enough. But yes, I'd use that as the default
ordering of a hash, too. :-)
> I guess I'd prefer to see hash order be a
> characteristic of every hash (without an exception raised), regardless
Please, not of every Hash - just the OrderedHashes. IMHO space and time
overheads are significant enough to not include this into every hash.
> of the keys, and then have things like ordering by string value, etc.,
> be the more specialized case.
Well, I beg to differ for stated reasons. Let's see what Matz and others
think about this.
Kind regards
robert
Hm, might be. I can't remember such a statement but then again - I don't
read every single posting. So this might have slipped through my fingers.
Still I don't think it's a good idea to make all hashes insertion ordered.
I could imagine a solution that uses a callback instance for greatest
flexibility. That way even insertion order could be easily realized:
class OrderedHash
include Enumerable
# callbacks for various events
class InstanceCallbacks
def hash_instance(o) o.hash end
def insert(key,val) end
def update(key,val) end
def remove(key,val) end
def compare(a,b) a<=>b end
end
class InsertionOrderCB < InstanceCallbacks
def initialize() @ins = [] end
def insert(key,val) @ins << key end
def remove(key,val) @ins.delete key end
def compare(a,b)
return 0 if a <=> == 0
@ins.each do |x|
if a <=> x == 0
return -1
elsif b <=> x == 0
return 1
end
end
raise "Internal Error"
end
end
def initialize(ih = InsertionOrderCB.new,&b)
@ih = ih
# ...
end
def store(key,val)
# use @ih instance handler to determine order...
end
def delete(key)
# ...
end
end
Kind regards
robert
On Fri, 3 Dec 2004, Robert Klemme wrote:
>
> "David A. Black" <dbl...@wobblini.net> schrieb im Newsbeitrag
> news:Pine.LNX.4.44.0412030519240.28725-100000@wobblini...
> > Hi --
> >
> > On Fri, 3 Dec 2004, Robert Klemme wrote:
> >
> > > Hm, I'd prefer the "natural" order of key elements, i.e. the order
> defined
> > > by <=>.
> >
> > The problem is, though, that keys can be anything, including things
> > that don't respond to <=> or that can't be meaningfully compared with
> > other keys.
>
> But that's okay: you'll get bitten by an exception if you insert such
> keys. It will work perfectly for the most common key types (I guess these
> are String and maybe Fixnum). If you want such keys then you can easily
> define your own ordering with the lambda you provide.
There may be a lot of hashes with such keys, but I wouldn't describe
that uniformity as more "natural" to a hash than the act of key/value
insertion itself. I guess I'd prefer to see hash order be a
characteristic of every hash (without an exception raised), regardless
of the keys, and then have things like ordering by string value, etc.,
be the more specialized case.
David
--
David A. Black
dbl...@wobblini.net
On Fri, 3 Dec 2004, Robert Klemme wrote:
> > I guess I'd prefer to see hash order be a
> > characteristic of every hash (without an exception raised), regardless
>
> Please, not of every Hash - just the OrderedHashes. IMHO space and time
> overheads are significant enough to not include this into every hash.
Right, just to the extent that a Hash is Ordered in the first place.
(I had the feeling Matz was considering having all of them be ordered,
but that may be wrong and/or a different matter.)
On Fri, 3 Dec 2004, Robert Klemme wrote:
> Hm, I'd prefer the "natural" order of key elements, i.e. the order defined
> by <=>.
The problem is, though, that keys can be anything, including things
that don't respond to <=> or that can't be meaningfully compared with
other keys.
In my opinion, it would be worth it.
I am curious: How often do people use "large" hashes? Nearly all
of mine are under 50 keys, I think.
It becomes more of an issue if the hash has 1000000 keys, of
course.
Hal
The point is that a hash ordered by insertion order is pretty
easy to maintain. Your hash table changes from something like:
hash_item *hash_table;
where:
struct hash_item {
something *item_ref;
}
update(new_item) {
/* Yes, yes, you have to do some kind of chaining or
* whatever.. but you understand the point here. :) */
hash_table[ hash_of(new_item) ] = new_item;
}
to:
hash_item *hash_table;
struct hash_item {
something *item_ref;
hash_item *prev, *next;
}
hash_item *head, *tail;
update(new_item) {
hash_table[ hash_of(new_item) ] = new_item;
new_item.prev = tail;
tail = new_item;
}
And, as Nobu pointed out, all of the operations remain O(1),
except, obviously, for each(). They do a little more work, but
not exceedingly more.
If you want to maintain by something other than insertion order,
you have to sort things, which _would_ be likely to slow down the
basic hash table.
>>
>>
>> The performance wouldn't increase for insertion, iteration and
>> lookup, but do for direct deleting (except for st_delete_safe).
>> And the memory usage increases 2 pointers for each hash
>> entries.
>>
>
> In my opinion, it would be worth it.
>
> I am curious: How often do people use "large" hashes? Nearly all
> of mine are under 50 keys, I think.
I don't use ruby that often these days but mine
were and are often much larger then this
>
> It becomes more of an issue if the hash has 1000000 keys, of
> course.
I am strongly against an insertion-ordered Hash Class (as the
default implementation) . Feature overloading a fundamental data
structure like Hash, instead of defining a separate class like
"InsOrderedHash", to maintain a shallow tree of bases classes,
violates my sense of aesthetics - I would probably seriously consider
changing to the Python-camp:-(
/Christoph
I use hashes as large as this. (For building my chordlist.
http://chordlist.brian-schroeder.de/)
Regards,
Forbidden
You don't have permission to access /index.rb on this server.
--
dave
Thanks for pointing me to it. I just buyed some more space on the webserver,
and the admin messed it up and disabled cgi. Now its working again.
Regards,
Brian
--
Brian Schröder
http://ruby.brian-schroeder.de/
Yeah, but space *can* be crucial sometimes. So I'd vote for keeping the
current Hash as it is and maybe adding another class InsertionOrderedHash
(or whatever) that keeps insertion order.
> If you want to maintain by something other than insertion order,
> you have to sort things, which _would_ be likely to slow down the
> basic hash table.
I think for ordered maps usually some kind of tree implementation is
chosen, which naturally is a bit slower than a hash. I think in Java and
C++ (STL) they use red-black-trees.
Kind regards
robert