> > This would mean that most the people will need to stop upgrading...
> > Any code which uses hashes in scalar context will need to be wrapped
> > in eval{} etc...
> That's most right. I just sent a message to P5P about this.
> (are you subscribed to P5P ?)
No.
Actually, the return value of scalar(%tied) should be better something
like "23/-1" if 23 == keys %tied. This -1 is not 0, but conveys
"meaningless" better than the format "23/23" I proposed before.
Or just return "-1/-1"...
Thanks,
Ilya
join '/',keys %tied, tied %tied
On Wed, 2003-12-03 at 01:24, Ilya Zakharevich wrote:
>
> Actually, the return value of scalar(%tied) should be better something
> like "23/-1" if 23 == keys %tied. This -1 is not 0, but conveys
> "meaningless" better than the format "23/23" I proposed before.
>
> Or just return "-1/-1"...
>
> Thanks,
> Ilya
--
david nicol "I'll be working, working; but if you come visit I'll
put down what I'm doing: my friends are important" -- David Byrne
Since I doubt anyone uses the literal scalar hash return value for anything
but optimizing Perl's hashing algorithm, it really doesn't matter what a tied
hash returns as long as its:
A) true if there's keys.
B) false if there's no keys.
C) Matches \d+/\d+.
D) cheap.
Calculating the number of keys in a tied hash is not always cheap, so I'd
suggest we just drop any requirement that SCALAR has to report an accurate
number of keys.
--
Michael G Schwern sch...@pobox.com http://www.pobox.com/~schwern/
Perl_croak(aTHX_ "Believe me, you don't want to use \"-u\" on a Macintosh");
-- toke.c
On Wed, 2003-12-03 at 13:12, Michael G Schwern wrote:
> On Wed, 2003-12-03 at 01:24, Ilya Zakharevich wrote:
> > Actually, the return value of scalar(%tied) should be better something
> > like "23/-1" if 23 == keys %tied. This -1 is not 0, but conveys
> > "meaningless" better than the format "23/23" I proposed before.
> >
> > Or just return "-1/-1"...
>
> Since I doubt anyone uses the literal scalar hash return value for anything
> but optimizing Perl's hashing algorithm, it really doesn't matter what a tied
> hash returns as long as its:
>
> A) true if there's keys.
> B) false if there's no keys.
> C) Matches \d+/\d+.
> D) cheap.
>
> Calculating the number of keys in a tied hash is not always cheap, so I'd
> suggest we just drop any requirement that SCALAR has to report an accurate
> number of keys.
--
david nicol
Where the hell did I put my coffee?
Why not use use tied()?
> > A) true if there's keys.
> > B) false if there's no keys.
Both needed. But maybe also apply the same to components of \d+/\d+
notation?
> > C) Matches \d+/\d+.
Nice to have too. Then something like "01/01" may be an answer, right?
> > D) cheap.
Something like doing one "new" each() call
> > Calculating the number of keys in a tied hash is not always cheap, so I'd
> > suggest we just drop any requirement that SCALAR has to report an accurate
> > number of keys.
This is why I suggested "-1/-1"...
Yours,
Ilya
Because it emulates what a normal hash returns. Tied hashes are supposed
to look like regular hashes. It is an edge case, but there's no reason
to screw up someone that's doing
($keys, $buckets) = split '/', scalar %hash;
just because %hash is tied.
A and B dominate. C is a nice-to-have.
> C prevents us from finding out,
> in the case of a tied hash, what it is tied to.
We have tied() for that!
> Am I the only one who imagines that that would be useful information?
In the scalar return value of a tied hash? Think so. :)
--
Michael G Schwern sch...@pobox.com http://www.pobox.com/~schwern/
...let me think it over while Cheese beats you with a baseball bat.
I don't quite understand. To clarify, I'm saying the solution should have
the features A, B, C and D. A, B and D being most important. C being
fairly minor since very few people actually parse the scalar return value of
a hash.
> > > C) Matches \d+/\d+.
>
> Nice to have too. Then something like "01/01" may be an answer, right?
1/1 would be fine when there are keys and 0/1 when there's not. Returning
the correct number of keys would be nice, but not a requirement. So long
as A and B are met.
> > > D) cheap.
>
> Something like doing one "new" each() call
Using each() to determine if there are keys will call NEXTKEY incrementing
the key counter, so this will go wrong:
while(($k,$v) = each %tied) {
print scalar %tied;
}
You'll skip every other key. :(
There's also this odd edge case. Consider a tied hash with one key.
print scalar %tied;
print scalar %tied;
The first line will call FIRSTKEY and then NEXTKEY getting the one key in
the hash causing a correct return value of true. The next call will call
NEXTKEY and since there are no more keys to list it will return false causing
an incorrect return value of false. :(
I don't think there's any way we can supply a default SCALAR method without
messing up the key counter or supplying the wrong value.
> > > Calculating the number of keys in a tied hash is not always cheap, so I'd
> > > suggest we just drop any requirement that SCALAR has to report an accurate
> > > number of keys.
>
> This is why I suggested "-1/-1"...
The only problem I have with that is it doesn't quite match \d+/\d+. This
isn't a big deal, but it would be nice to keep the same format. It does
have the advantage of indicating "this key/bucket value is bogus".
--
Michael G Schwern sch...@pobox.com http://www.pobox.com/~schwern/
Stupid am I? Stupid like a fox!
I say that this may be also useful:
> > > > C) Matches \d+/\d+.
C') ... and the components are TRUE (so != 0 if under restriction \d+)
And I think \d+ for components is too restrictive. I think we should
also discuss the variant -1/-1, which conveys "something fishy" better
than 1/1.
> > Nice to have too. Then something like "01/01" may be an answer, right?
>
> 1/1 would be fine when there are keys and 0/1 when there's not. Returning
> the correct number of keys would be nice, but not a requirement. So long
> as A and B are met.
I propose 01 instead of 1 to allow distinguishing the "fake" case from
the real one...
> There's also this odd edge case. Consider a tied hash with one key.
>
> print scalar %tied;
> print scalar %tied;
>
> The first line will call FIRSTKEY and then NEXTKEY getting the one key in
> the hash causing a correct return value of true. The next call will call
> NEXTKEY and since there are no more keys to list it will return false causing
> an incorrect return value of false. :(
I do not see how the existence of a wrong implementation should ruin
the idea. ;-). Of course, it should be FIRSTKEY which is called.
> I don't think there's any way we can supply a default SCALAR method without
> messing up the key counter or supplying the wrong value.
Of course there is.
If there is a key counter, then the hash is not empty.
If there is no key counter, create one; if it is created, the hash
is not empty.
Hope this helps,
Ilya
Well, no. Given a/b: a is only true if the hash has keys, false otherwise.
It doesn't really matter what the value of b is.
> And I think \d+ for components is too restrictive. I think we should
> also discuss the variant -1/-1, which conveys "something fishy" better
> than 1/1.
A bucket value of -1 would be acceptable to me. Like I said, I doubt
anyone's doing this:
my($keys,$buckets) = scalar %hash =~ m{(\d+)/(\d+)};
so its not too important that \d+/\d+ be strictly upheld.
> > > Nice to have too. Then something like "01/01" may be an answer, right?
> >
> > 1/1 would be fine when there are keys and 0/1 when there's not. Returning
> > the correct number of keys would be nice, but not a requirement. So long
> > as A and B are met.
>
> I propose 01 instead of 1 to allow distinguishing the "fake" case from
> the real one...
A bucket value of -1 handles that, but again, the format isn't terribly
important to me.
> Of course there is.
>
> If there is a key counter, then the hash is not empty.
>
> If there is no key counter, create one; if it is created, the hash
> is not empty.
Maybe I'm not up on how each() is implemented with tied hashes. I thought
there was no key counter on the HV and its all handled inside FIRSTKEY
and NEXTKEY.
Could you clarify 'key counter'? Do you mean xpvhv->xhv_riter and
xpvhv->eiter?
So what you're saying is SCALAR would...
Check if xpvhv->xhv_eiter exists. If so, return a true value
because we're in the middle of an iteration which means there's
keys.
Else call FIRSTKEY. If it returns true, reset the hash iterator
and return true. Otherwise return false.
Ingenious!
--
Michael G Schwern sch...@pobox.com http://www.pobox.com/~schwern/
I need a SHOWER a BURGER and some ROBOTS, STAT!
-- http://www.angryflower.com/allrigh.gif
Er, when there are no keys, you should return 0, of course. "0/1" isn't
false. :)
As to the other suggestion, a tied hash in a scalar context simply can't
call FIRSTKEY. Otherwise you've completely broken the transparency with
non-tied hashes!
Ronald
Oh. Right.
> As to the other suggestion, a tied hash in a scalar context simply can't
> call FIRSTKEY. Otherwise you've completely broken the transparency with
> non-tied hashes!
I don't see why.
--
Michael G Schwern sch...@pobox.com http://www.pobox.com/~schwern/
My breasts are arousing weapons.
Well, your argument is uncompatible with your
A) true if there's keys.
B) false if there's no keys.
So if one gets a/b, which is automatically TRUE, there are keys; thus
a must be TRUE as well. It is good to have B TRUE too.
> > Of course there is.
> >
> > If there is a key counter, then the hash is not empty.
> >
> > If there is no key counter, create one; if it is created, the hash
> > is not empty.
>
> Maybe I'm not up on how each() is implemented with tied hashes. I thought
> there was no key counter on the HV and its all handled inside FIRSTKEY
> and NEXTKEY.
How would it know then that it should call FIRSTKEY?
> Could you clarify 'key counter'? Do you mean xpvhv->xhv_riter and
> xpvhv->eiter?
I do not remember; what I said was based on common sense, not an
implementation. The HV must knows when iteration through
each/keys/values has been started, and when it is finished.
> So what you're saying is SCALAR would...
>
> Check if xpvhv->xhv_eiter exists. If so, return a true value
> because we're in the middle of an iteration which means there's
> keys.
>
> Else call FIRSTKEY. If it returns true, reset the hash iterator
> and return true. Otherwise return false.
>
> Ingenious!
If xpvhv->xhv_eiter has the necessary semantic, then yes, this is what
I meant...
Yours,
Ilya
I'd forgotten that a hash returns 0 in scalar context if there's no keys.
> > Maybe I'm not up on how each() is implemented with tied hashes. I thought
> > there was no key counter on the HV and its all handled inside FIRSTKEY
> > and NEXTKEY.
>
> How would it know then that it should call FIRSTKEY?
Right.
--
Michael G Schwern sch...@pobox.com http://www.pobox.com/~schwern/
I knew right away that my pants and your inner child could be best friends.
When following this thread, I come to think that maybe my proposal in
<http://www.mail-archive.com/perl5-...@perl.org/msg72484.html>
wasn't such a bad idea. It transfers all these considerations done in
this thread onto the user by letting him craft his own SCALAR method for
tied hashes.
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
I like your proposal, esp. if you add a default Tie::Hash::SCALAR method.
But I've already mentioned this. I think that adding a new tie method
is worthwhile in this case ; and SCALAR is not the kind of thing that
everyone will want to override anyway.
Thanks for the support. I'll complete and polish up my previous patch a
little so that it could theoretically be applied to blead.
Well wait a second. The algorithm I outlined using FIRSTKEY and the
internal hash iterator values looks like it can give a tied hash the
proper scalar value, no user defined method necessary. If we can get tied
hashes to behave correctly in scalar context by default, I don't think
SCALAR is worthwhile. Who's really going to want to redefine the key/bucket
value of a hash?
--
Michael G Schwern sch...@pobox.com http://www.pobox.com/~schwern/
"A Masterpiece."
"Well, better than average, maybe."
But you can't really make them behave correctly because you don't know
what the user implementing a tied hash considers correct. If I
understand your algorithm correctly you first checking for the existence
of xpvhv->xhv_eiter. If it exists, return true, otherwise trigger
FIRSTKEY and return its value.
Considering that tying is about changing the behaviour of a data-type, I
think this is too limiting. With the above, there is either only false
or true returned. Secondly, you might end up triggering a method (namely
FIRSTKEY) anyway, so why not just trigger SCALAR in the first place?
The amount of work necessary to implement the xhv_iter/FIRSTKEY trick is
around the same as the SCALAR approach.
However (and now it gets sophisticated:-), we could do both. If SCALAR
does not exist, fallback to your method. Of course, this is the most
work implementation-wise. But it would be fully backwards-compatible.
Whatever the eventual solution will be, the one I dislike most is a
solution where a user cannot control the behaviour of the tied hash in
scalar conext. This violates the principle behind tied variables.
If we are to have SCALAR, this is the only acceptable way to do it. Hashes
in scalar context are so trivial a feature that it should just DWIM without
user intervention.
--
Michael G Schwern sch...@pobox.com http://www.pobox.com/~schwern/
Monkey tennis
Alright, there's something we both agree on. Tomorrow I'll send a
modified patch that incorporates your method as a fallback into mine.
It is ingenious. The only flaw is that clearing the hash or deleting
the last element will leave xhv_eiter set. Obviously hv_clear can zero
out xhv_eiter, but I don't think catching the delete case is possible,
and even if it were, it isn't desirable. Consider code like this:
while (my ($k,$v) = each %h) {
# do some stuff
delete $h{$k};
if (%h) {
# if there are more keys to come, do some other stuff
}
}
if xhv_eiter is 0, we can call FIRSTKEY; otherwise we should just croak.
Ilya has pointed out that we could just document that scalar(%hash)
*may* perturb the iterator. If we go that route, I'd rather make it
*always* clear the iterator even for regular hashes (which would break
code like that above, but at least provide consistency).
> On Thu, Dec 04, 2003 at 02:19:47PM -0800, Michael G Schwern <sch...@pobox.com> wrote:
> > So what you're saying is SCALAR would...
> >
> > Check if xpvhv->xhv_eiter exists. If so, return a true value
> > because we're in the middle of an iteration which means there's
> > keys.
> >
> > Else call FIRSTKEY. If it returns true, reset the hash iterator
> > and return true. Otherwise return false.
> >
> > Ingenious!
>
> It is ingenious. The only flaw is that clearing the hash or deleting
> the last element will leave xhv_eiter set. Obviously hv_clear can zero
> out xhv_eiter, but I don't think catching the delete case is possible,
> and even if it were, it isn't desirable. Consider code like this:
This was the thing that made me tear my hair out when doing the patch.
It took me a while to figure out that xhv_eiter was not reset on
clearing the hash. This is now done in magic_wipepack().
The problem which I didn't think of was deleting key/value pairs until
there are none left. This is currently _not_ handled by my patch.
Thinking about the delete problem, I think this could be solved.
Pseudocode follows:
SV*
Perl_magic_isempty(pTHX_ SV *sv, MAGIC *mg)
{
HE* oldhe = HvEITER((HV*)sv);
if (hv_iternext((HV*)sv)) {
HvEITER((HV*)sv) = oldhe;
return &PL_sv_no;
}
return &PL_sv_yes;
}
I think saving and restoring the iterator would solve it, no?
> Ilya has pointed out that we could just document that scalar(%hash)
> *may* perturb the iterator. If we go that route, I'd rather make it
> *always* clear the iterator even for regular hashes (which would break
> code like that above, but at least provide consistency).
If the above algorithm doesn't work then we would have to document that
the default scalar operation may return a true value when the hash is in
fact empty (namely after many deletes). The iterator itself is not
touched by my patch.
No. For tied hashes, HvEITER is basically only going to determine
whether FIRSTKEY or NEXTKEY will be called for each(). Which element
gets returned by NEXTKEY is up to the tieing class, and you perturb
that by calling hv_iternext. I think the patch you came up with already
is as good as it gets.
> If the above algorithm doesn't work then we would have to document that
> the default scalar operation may return a true value when the hash is in
> fact empty (namely after many deletes). The iterator itself is not
> touched by my patch.
You have to avoid messing with both the hv's iterator and the tied object's
internal iterator. Calling FIRSTKEY if and only if the hv's iterator is
NULL, as you do, should be safe.
I'd rather croak than return bad data, but I can see the counterarguments.
Aww, too bad!
> > If the above algorithm doesn't work then we would have to document that
> > the default scalar operation may return a true value when the hash is in
> > fact empty (namely after many deletes). The iterator itself is not
> > touched by my patch.
>
> You have to avoid messing with both the hv's iterator and the tied object's
> internal iterator. Calling FIRSTKEY if and only if the hv's iterator is
> NULL, as you do, should be safe.
Now it's clear. Since tied hashes can be implemented in an arbitrary way
perl can't ever know about the internal iterator.
> I'd rather croak than return bad data, but I can see the counterarguments.
Especially since we cannot detect the rotten case so we'd always have to
croak when scalar on tied hashes is detected. The best thing IMO is to
just document this problem and advise people to always define a SCALAR
method when the scalar value of their tied hashes is supposed to have
any meaning. Patch to perltie.pod to be expected a little later today.
Tassilo
The rotten case is just when HvEITER is true for a tied hash.
[broken approach deleted]
> I think saving and restoring the iterator would solve it, no?
[Already answered in another message.]
Did not you note another approach I outlined? As documented, each()
is not supported when mixed with write-access to hash. *Enforce* it
for the combination each(%hash)+modify+scalar(%hash):
Keep a flag; set it to true on FIRSTKEY/NEXTKEY; set it to false on
write access. If the flag is set and hv_iter (sp?) is present, reset
the iterator in the beginning of scalar(%hash).
Hope this helps,
Ilya
> On Sun, Dec 07, 2003 at 09:23:54AM +0100, Tassilo von Parseval wrote:
> > The problem which I didn't think of was deleting key/value pairs until
> > there are none left. This is currently _not_ handled by my patch.
> >
> > Thinking about the delete problem, I think this could be solved.
> > Pseudocode follows:
>
> [broken approach deleted]
>
> > I think saving and restoring the iterator would solve it, no?
>
> [Already answered in another message.]
>
> Did not you note another approach I outlined? As documented, each()
> is not supported when mixed with write-access to hash. *Enforce* it
> for the combination each(%hash)+modify+scalar(%hash):
To get this straight, the iterator should manually be reset when write
access (delete or adding a key/value pair) happens while being inside an
iteration. Right so far?
This would be too strict since there is one exception from the
no-delete-while-eaching rule: Namely when you delete a key/value pair
that was just returned by each().
> Keep a flag; set it to true on FIRSTKEY/NEXTKEY; set it to false on
> write access. If the flag is set and hv_iter (sp?) is present, reset
> the iterator in the beginning of scalar(%hash).
This would break with the example given in 'perldoc -f each', I think:
while (($key, $value) = each %hash) {
print $key, "\n";
delete $hash{$key}; # This is safe
}
Appears as though the closer one looks at scalared tied hashes, the more
pathological it gets. :-/
What is so bad about this?
To get meaningful information requires access to tie-class internals
we seem to have consensus that SCALAR would be the method name
let's just document the current behavior
and support SCALAR in future releases (when it exists)
That's a common misconception.
$ perl -MTie::Hash -we'%h = 0..99; tie %h, "Tie::StdHash"; print scalar(%h)'
36/64
> What is so bad about this?
For starters, it's caused a number of bug reports.
> To get meaningful information requires access to tie-class internals
>
> we seem to have consensus that SCALAR would be the method name
>
> let's just document the current behavior
>
> and support SCALAR in future releases (when it exists)
But Tassilo has done a great job coming up with a fallback, and it's
been applied. Why get rid of it now?
Because it breaks the hash API. Where have you been?
> To get meaningful information requires access to tie-class internals
Tassillo already has most of an implementation that can resolve the scalar
problem without making assumptions on the user's tie implementation. Ilya
and him are hashing out the edge cases.
> we seem to have consensus that SCALAR would be the method name
>
> let's just document the current behavior
>
> and support SCALAR in future releases (when it exists)
Tassillo already has a patch in for this. Keep up in back!
--
Michael G Schwern sch...@pobox.com http://www.pobox.com/~schwern/
"The method employed I would gladly explain,
While I have it so clear in my head,
If I had but the time and you had but the brain--
But much yet remains to be said."
-- "Hunting of the Snark", Lewis Carroll
I did
perldoc perl
and can't find *any* place which would explain what is a hash and now
to deal with it. Well there is one such place (perldata) which could
document what is a hash, but it looks like the topic is somewhat
different (after a quick glance I have no idea what questions this
document is supposed to answer).
So: if such an exception exists, I did not take it into consideration.
Back to the drawing board...
Sorry,
Ilya