Earlier tonight I tried to iterate a hash. I have somewhat thoughtlessly
been doing things like:
PMC *iter = VTABLE_get_iter(interp, some_hash);
while (VTABLE_get_bool(iter)) {
PMC *key_pmc = VTABLE_shift_pmc(interp, iter);
STRING *key = VTABLE_get_string(interp, key_pmc);
...
}
Rather than:
PMC *iter = VTABLE_get_iter(interp, some_hash);
while (VTABLE_get_bool(iter)) {
STRING *key = VTABLE_shift_string(interp, iter);
...
}
So now you're thinking "so what, Jonathan did something stupid in his
code yet again, big deal". What is perhaps surprising (well, I found it
surprising) is that if you do the first of these you only ever get the
first key in the hash - and then no more! If you do the second you'll
get all of the keys.
The whole load of hash/iterator code and the way they are so closely
tied looks kinda evil. If I'm reading it right, hash iteration ain't
threadsafe either (iterating a hash in two threads simultaneously will
run into issues); I'm not sure if that's a design goal, but anyways...
Does anyone have any thoughts on why these two code snippets do
different things, or have any reasons why they should? Or a ruling that
they shouldn't so we can decide if it's a bug...
Thanks,
Jonathan
P.S. Just for inspiration, a few comments from the current hash/iterator
code!
/* XXX int keys may be zero - use different iterator
*/
/* found next key - FIXME hash iter does auto next */
Get number of remaining elements. Does not work for hashes yet.
Jonathan Worthington wrote:
> Hi,
>
> Earlier tonight I tried to iterate a hash. I have somewhat thoughtlessly
> been doing things like:
>
> PMC *iter = VTABLE_get_iter(interp, some_hash);
> while (VTABLE_get_bool(iter)) {
> PMC *key_pmc = VTABLE_shift_pmc(interp, iter);
> STRING *key = VTABLE_get_string(interp, key_pmc);
> ...
> }
>
> Rather than:
>
> PMC *iter = VTABLE_get_iter(interp, some_hash);
> while (VTABLE_get_bool(iter)) {
> STRING *key = VTABLE_shift_string(interp, iter);
> ...
> }
>
> So now you're thinking "so what, Jonathan did something stupid in his
> code yet again, big deal". What is perhaps surprising (well, I found it
> surprising) is that if you do the first of these you only ever get the
> first key in the hash - and then no more! If you do the second you'll
> get all of the keys.
> Does anyone have any thoughts on why these two code snippets do
> different things, or have any reasons why they should? Or a ruling that
> they shouldn't so we can decide if it's a bug...
The significant differences between the two are that Iterator's
shift_pmc throws an exception if the iteration key is -1 while
shift_string doesn't bother to check (it probably should), and
shift_string calls VTABLE_get_string_keyed on the aggregate it contains,
while shift_pmc calls VTABLE_get_pmc_keyed. And the only significant
differences between Hash's get_string_keyed and get_pmc_keyed are to be
expected for returning a string instead of a PMC. So, I can't explain
why they're giving different results. I can say they shouldn't be giving
different results. And note that in PIR, you can do either:
iter = new Iterator, some_hash
iter_loop:
unless iter, iter_end
shift $S2, iter
say $S2
goto iter_loop
iter_end:
Or:
iter = new Iterator, myhash
iter_loop:
unless iter, iter_end
shift $P2, iter
$S2 = $P2
say $S2
goto iter_loop
iter_end:
And get the same result.
> The whole load of hash/iterator code and the way they are so closely
> tied looks kinda evil. If I'm reading it right, hash iteration ain't
> threadsafe either (iterating a hash in two threads simultaneously will
> run into issues); I'm not sure if that's a design goal, but anyways...
Yes, the code for the two seems to be excessively intertwined, and needs
a review. As long as all the state for the iteration is stored in the
iterator object, and the iterator is only doing ordinary keyed lookups
on the hash, it should be threadsafe. (If that's not the case, it's a bug.)
Allison
The former is also implementing the Hash iteration internals. E.g.
switch (PObj_get_FLAGS(key) & KEY_type_FLAGS) {
case KEY_hash_iterator_FLAGS:
...
return parrot_hash_get_idx(INTERP, hash, key);
This implementation is somehow inside out and a bit weird. Earlier it was
insider the iterator, but then the iterator had to know some internals of the
hash to work, which was suboptimal too.
The major problem is probably, that there are no specific iterator interface
vtable functions that an iterator would use.
leo
Yeah, that part's pretty horrifying, but also the same between
get_string_keyed and get_pmc_keyed, so not the cause of Jonathan's problem.
> This implementation is somehow inside out and a bit weird. Earlier it was
> insider the iterator, but then the iterator had to know some internals of the
> hash to work, which was suboptimal too.
We'll have to untangle the two at some point. Consider it an advanced
cage cleaner task someone can tackle when they need a break from other
stuff.
> The major problem is probably, that there are no specific iterator interface
> vtable functions that an iterator would use.
Isn't that what nextkey_keyed is for? Though, looking at it, it seems
more like part of the problem of entanglement than part of the solution.
Allison
Yep. I should of course have mentioned C<nextkey_keyed()>. But it's not
sufficient for an interface. There's also C<get_iter()>, which returns a new
iterator for the aggregate/string, which is probably ok.
The main problem is C<nextkey_keyed()>:
- you sometimes want to specify iteration direction backwards (arrays) [1]
- you sometimes want to retrieve keys (hash) or values (array) or maybe both
(hash.each)
And the iterator should be able to iterate over all possible thingies like
slices or lazy PMCs by using just one specified interface.
leo
[1] this might probably be stated during iterater creation