Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

test two hash(refs) for equality

22 views
Skip to first unread message

Rainer Weikusat

unread,
Jul 7, 2011, 3:09:14 PM7/7/11
to
I'm somewhat surprised that there is no answer for this in the FAQ
(besides 'turn the content of both into a string and compare that').
Assuming that hash values can be compared with string comparisons and
that a value of undef does not need to be distinguished from an empty
string, the following subroutine seems to accomplish that:

sub cmp_href($$)
{
my ($a, $b) = @_;
my ($ka, $va, $kb, $vb, $rc);

OUTER: {
while (($ka, $va, $kb, $vb) = (each(%$a), each(%$b))) {
last OUTER unless defined($ka) && defined($kb);

last OUTER unless
$a->{$kb} eq $vb && $b->{$ka} eq $va
&& exists($a->{$kb}) && exists($b->{$ka});
}

$rc = 1;
}

values(%$a);
values(%$b);
return $rc;
}

Any comments except references to CPAN modules and general "I don't
care about that [and neither should you]" statements would be very
much appreciated.

Uri Guttman

unread,
Jul 7, 2011, 3:49:43 PM7/7/11
to
>>>>> "RW" == Rainer Weikusat <rwei...@mssgmbh.com> writes:

RW> I'm somewhat surprised that there is no answer for this in the FAQ
RW> (besides 'turn the content of both into a string and compare that').
RW> Assuming that hash values can be compared with string comparisons and
RW> that a value of undef does not need to be distinguished from an empty
RW> string, the following subroutine seems to accomplish that:

RW> sub cmp_href($$)
RW> {
RW> my ($a, $b) = @_;

don't use $a and $b for vars. they are reserved for use by sort. even
lexically declared it is bad style. of course you won't listen to me.

RW> my ($ka, $va, $kb, $vb, $rc);

why not a quick test to see of the key counts are the same?

RW> OUTER: {
RW> while (($ka, $va, $kb, $vb) = (each(%$a), each(%$b))) {
RW> last OUTER unless defined($ka) && defined($kb);

keys are always defined so that test makes no sense. values can be
undef. the order of keys will likely be different so that won't check
key matching. at best it may check if the number of keys is the same but
that is a slow way to do it.

RW> last OUTER unless
RW> $a->{$kb} eq $vb && $b->{$ka} eq $va

that will generate warnings if any value is undef. oh, you don't
care. but then an undef value will eq ''. also you use eq and that will
fail for number values in some cases. and the same issue applies to 0
and undef if you used ==.

RW> && exists($a->{$kb}) && exists($b->{$ka});

why test these after you test for equality? if the equality passes, then
exists will pass except for the undef issue i brought up.

RW> Any comments except references to CPAN modules and general "I don't
RW> care about that [and neither should you]" statements would be very
RW> much appreciated.

just bad code. and it has been solved in several places. look in the
Test:: modules for some solutions.

uri

--
Uri Guttman -- uri AT perlhunter DOT com --- http://www.perlhunter.com --
------------ Perl Developer Recruiting and Placement Services -------------
----- Perl Code Review, Architecture, Development, Training, Support -------

Rainer Weikusat

unread,
Jul 7, 2011, 4:22:43 PM7/7/11
to
"Uri Guttman" <u...@StemSystems.com> writes:
>>>>>> "RW" == Rainer Weikusat <rwei...@mssgmbh.com> writes:
>
> RW> I'm somewhat surprised that there is no answer for this in the FAQ
> RW> (besides 'turn the content of both into a string and compare that').
> RW> Assuming that hash values can be compared with string comparisons and
> RW> that a value of undef does not need to be distinguished from an empty
> RW> string, the following subroutine seems to accomplish that:
>
> RW> sub cmp_href($$)
> RW> {
> RW> my ($a, $b) = @_;
>
> don't use $a and $b for vars. they are reserved for use by sort.

They are not reserved. The sort routine uses two variables with names
$a and $b in the symbol table of the module sort is invoked in (as far
as I understand the documentation). These $a and $b therefore don't
collide with lexical variables and they also don't collided with other
'package global' variables because sort localizes them (as it shoud do)

> even lexically declared it is bad style. of course you won't listen
> to me.

In my opinion, you are wrong.

> RW> my ($ka, $va, $kb, $vb, $rc);
>
> why not a quick test to see of the key counts are the same?

Because this test wouldn't be 'quick': It requires two additional
traversals of both hashes just to determine the key lists. I've done a
few benchmarks on this and the routine included in this posting was
the fastest implementation I could come up with (for my very limited
set of test cases, admittedly).

>
> RW> OUTER: {
> RW> while (($ka, $va, $kb, $vb) = (each(%$a), each(%$b))) {
> RW> last OUTER unless defined($ka) && defined($kb);
>
> keys are always defined so that test makes no sense.

It does make sense: Provided that one of the hashes contains less
key-value pairs than the other, one of the the each invocations will
return an emtpy list and in this case, either $ka or $kb will be undef
after the list assignment.

[...]

> RW> last OUTER unless
> RW> $a->{$kb} eq $vb && $b->{$ka} eq $va
>
> that will generate warnings if any value is undef. oh, you don't
> care.

Indeed. Hash key exists but maps to undef is a perfectly possible
situation.

> but then an undef value will eq ''. also you use eq and that will
> fail for number values in some cases.

I specifically wrote

,----


| Assuming that hash values can be compared with string comparisons and

| that a value of undef does not need to be distinguished from an empty

| string,
`----

meaning, while I would like to know about these cases just to know
about them, I meant to exclude anything which cannot be compared with
eq for this comparison routine from the start: It is not supposed to
do that.

> RW> && exists($a->{$kb}) && exists($b->{$ka});
>
> why test these after you test for equality? if the equality passes, then
> exists will pass except for the undef issue i brought up.

Precisely: Provided that one of the hashes contained a key whose
values was either undef or the empty string and the other hash didn't
contain this key, the eq comparison will have returned 'they are
equal' and the exists check is supposed to copw with that.

> RW> Any comments except references to CPAN modules and general "I don't
> RW> care about that [and neither should you]" statements would be very
> RW> much appreciated.
>
> just bad code.

You failed to provide any reasons for this summary judgement except
'I' (meaning, you) 'want to treat undef values specially'. That's your
prerogative, but I don't.

> and it has been solved in several places.

So what? I would be interested in other algorithms for solving this
problem (except the two other I used for testing). I'm not so much
interested in 'can be downloaded for free from the internet'
'solutions', except if these aren't even detailed enough to actually
download them.

Rainer Weikusat

unread,
Jul 7, 2011, 4:24:50 PM7/7/11
to
"Uri Guttman" <u...@StemSystems.com> writes:
>>>>>> "RW" == Rainer Weikusat <rwei...@mssgmbh.com> writes:
>
> RW> I'm somewhat surprised that there is no answer for this in the FAQ
> RW> (besides 'turn the content of both into a string and compare that').
> RW> Assuming that hash values can be compared with string comparisons and
> RW> that a value of undef does not need to be distinguished from an empty
> RW> string, the following subroutine seems to accomplish that:
>
> RW> sub cmp_href($$)
> RW> {
> RW> my ($a, $b) = @_;
>
> don't use $a and $b for vars. they are reserved for use by sort.

They are not reserved. The sort routine uses two variables with names


$a and $b in the symbol table of the module sort is invoked in (as far
as I understand the documentation). These $a and $b therefore don't
collide with lexical variables and they also don't collided with other
'package global' variables because sort localizes them (as it shoud do)

> even lexically declared it is bad style. of course you won't listen
> to me.

In my opinion, you are wrong.

> RW> my ($ka, $va, $kb, $vb, $rc);


>
> why not a quick test to see of the key counts are the same?

Because this test wouldn't be 'quick': It requires two additional


traversals of both hashes just to determine the key lists. I've done a
few benchmarks on this and the routine included in this posting was
the fastest implementation I could come up with (for my very limited
set of test cases, admittedly).

>


> RW> OUTER: {
> RW> while (($ka, $va, $kb, $vb) = (each(%$a), each(%$b))) {
> RW> last OUTER unless defined($ka) && defined($kb);
>
> keys are always defined so that test makes no sense.

It does make sense: Provided that one of the hashes contains less


key-value pairs than the other, one of the the each invocations will
return an emtpy list and in this case, either $ka or $kb will be undef
after the list assignment.

[...]

> RW> last OUTER unless


> RW> $a->{$kb} eq $vb && $b->{$ka} eq $va
>
> that will generate warnings if any value is undef. oh, you don't
> care.

Indeed. Hash key exists but maps to undef is a perfectly possible
situation.

> but then an undef value will eq ''. also you use eq and that will


> fail for number values in some cases.

I specifically wrote

,----


| Assuming that hash values can be compared with string comparisons and

| that a value of undef does not need to be distinguished from an empty

| string,
`----

meaning, while I would like to know about these cases just to know
about them, I meant to exclude anything which cannot be compared with
eq for this comparison routine from the start: It is not supposed to
do that.

> RW> && exists($a->{$kb}) && exists($b->{$ka});


>
> why test these after you test for equality? if the equality passes, then
> exists will pass except for the undef issue i brought up.

Precisely: Provided that one of the hashes contained a key whose


values was either undef or the empty string and the other hash didn't
contain this key, the eq comparison will have returned 'they are
equal' and the exists check is supposed to copw with that.

> RW> Any comments except references to CPAN modules and general "I don't


> RW> care about that [and neither should you]" statements would be very
> RW> much appreciated.
>
> just bad code.

You failed to provide any reasons for this summary judgement except


'I' (meaning, you) 'want to treat undef values specially'. That's your
prerogative, but I don't.

> and it has been solved in several places.

So what? I would be interested in other algorithms for solving this


problem (except the two other I used for testing). I'm not so much
interested in 'can be downloaded for free from the internet'

'solutions', especially if these aren't even detailed enough to actually
download them.

J�rgen Exner

unread,
Jul 7, 2011, 4:28:10 PM7/7/11
to

IMO your approach is way to complicated. And as Uri pointed out already
it has several logical flaws, too.

As a first step I would compare the size of the two hashes and then
check the value for each key (untested, algorithmic sketch only):

my ($h1, $h2) = @_;
return 0 unless scalar(keys(%$h1)) == scalar(keys(%$h2));
#yes, scalar() is redundant, but this makes it very explicit
foreach (my $elem = keys %$h1) {
return 0 unless exists %$h2{$elem} # see note 1
and %$h1{$elem} == %$h2{$elem} # see note 2
}
return 1;

1: This not only checks if each key from h1 exists in h2, too, (i.e.
keys(h1) is subset of keys(h2)), but because h1 and h2 also have the
same number of elements then the two sets of keys are identical.

2: You may have to adapt this comparison somewhat to accomodate your
special undef is equal to empty string equality.

jue

Uri Guttman

unread,
Jul 7, 2011, 4:32:50 PM7/7/11
to
>>>>> "RW" == Rainer Weikusat <rwei...@mssgmbh.com> writes:

RW> "Uri Guttman" <u...@StemSystems.com> writes:
>>>>>>> "RW" == Rainer Weikusat <rwei...@mssgmbh.com> writes:
>>
RW> I'm somewhat surprised that there is no answer for this in the FAQ
RW> (besides 'turn the content of both into a string and compare that').
RW> Assuming that hash values can be compared with string comparisons and
RW> that a value of undef does not need to be distinguished from an empty
RW> string, the following subroutine seems to accomplish that:
>>
RW> sub cmp_href($$)
RW> {
RW> my ($a, $b) = @_;
>>
>> don't use $a and $b for vars. they are reserved for use by sort.

RW> They are not reserved. The sort routine uses two variables with names
RW> $a and $b in the symbol table of the module sort is invoked in (as far
RW> as I understand the documentation). These $a and $b therefore don't
RW> collide with lexical variables and they also don't collided with other
RW> 'package global' variables because sort localizes them (as it shoud do)

it is a convention. do you even care what other coders do or care about?
it is just a bad idea. don't use $a and $b outside of sort. can you even
allow this into your head?

>> even lexically declared it is bad style. of course you won't listen
>> to me.

RW> In my opinion, you are wrong.

you are very off here. too bad as it is your loss. listening to others
is a useful skill.

RW> my ($ka, $va, $kb, $vb, $rc);
>>
>> why not a quick test to see of the key counts are the same?

RW> Because this test wouldn't be 'quick': It requires two additional
RW> traversals of both hashes just to determine the key lists. I've done a
RW> few benchmarks on this and the routine included in this posting was
RW> the fastest implementation I could come up with (for my very limited
RW> set of test cases, admittedly).

and your test cases didn't cover all the bases as i pointed out.

>>
RW> OUTER: {
RW> while (($ka, $va, $kb, $vb) = (each(%$a), each(%$b))) {
RW> last OUTER unless defined($ka) && defined($kb);
>>
>> keys are always defined so that test makes no sense.

RW> It does make sense: Provided that one of the hashes contains less
RW> key-value pairs than the other, one of the the each invocations will
RW> return an emtpy list and in this case, either $ka or $kb will be undef
RW> after the list assignment.

and i covered that point below. it is still a silly test. you can just
as easily scan one hash and check exists in the other and not need extra
defined tests.

RW> [...]

RW> last OUTER unless
RW> $a->{$kb} eq $vb && $b->{$ka} eq $va
>>
>> that will generate warnings if any value is undef. oh, you don't
>> care.

RW> Indeed. Hash key exists but maps to undef is a perfectly possible
RW> situation.

and broken in other situations. then you are not looking for hash
equality as most people would define it but your limited string only, no
undef values hash similarity. you should state that in your
specification.

RW> && exists($a->{$kb}) && exists($b->{$ka});
>>
>> why test these after you test for equality? if the equality passes, then
>> exists will pass except for the undef issue i brought up.

RW> Precisely: Provided that one of the hashes contained a key whose
RW> values was either undef or the empty string and the other hash didn't
RW> contain this key, the eq comparison will have returned 'they are
RW> equal' and the exists check is supposed to copw with that.

and if you reverse the order it would be clearer. but clarity and you
don't mix well it seems.


RW> Any comments except references to CPAN modules and general "I don't
RW> care about that [and neither should you]" statements would be very
RW> much appreciated.
>>
>> just bad code.

RW> You failed to provide any reasons for this summary judgement except
RW> 'I' (meaning, you) 'want to treat undef values specially'. That's your
RW> prerogative, but I don't.

bad code is bad code. you just don't know how to recognize it yet. live
and learn.

>> and it has been solved in several places.

RW> So what? I would be interested in other algorithms for solving this
RW> problem (except the two other I used for testing). I'm not so much
RW> interested in 'can be downloaded for free from the internet'
RW> 'solutions', especially if these aren't even detailed enough to actually
RW> download them.

huh?? you asked for cpan modules and then you deny wanting them?
detailed to download them? several of the test modules COME with
perl. if you lifted a finger you could find the subs in question in a
few seconds. wow.

Rainer Weikusat

unread,
Jul 7, 2011, 4:59:50 PM7/7/11
to

Uri has mostly 'pointed out' that he didn't understand the code, as
exemplified in his 'the keys are always defined so this test does
nothing' and 'why the exist check after the comparison' remarks.

> As a first step I would compare the size of the two hashes and then
> check the value for each key (untested, algorithmic sketch only):
>
> my ($h1, $h2) = @_;
> return 0 unless scalar(keys(%$h1)) == scalar(keys(%$h2));
> #yes, scalar() is redundant, but this makes it very explicit
> foreach (my $elem = keys %$h1) {
> return 0 unless exists %$h2{$elem} # see note 1
> and %$h1{$elem} == %$h2{$elem} # see note 2
> }
> return 1;
>
> 1: This not only checks if each key from h1 exists in h2, too, (i.e.
> keys(h1) is subset of keys(h2)), but because h1 and h2 also have the
> same number of elements then the two sets of keys are identical.

As a complete subroutine:

sub cmp_href_4($$)
{


my ($h1, $h2) = @_;

my @k;

@k = keys(%$h1);
return 0 unless @k == keys(%$h2);

foreach my $elem (@k) {
return 0
unless exists($h2->{$elem}) # see note 1
and $h1->{$elem} == $h2->{$elem}; # see note 2
}

return 1;
}

That's similar to my 'naive' first implementation. Provided the hashes
are small and they a rather different than identical, it is not bad.

> 2: You may have to adapt this comparison somewhat to accomodate your
> special undef is equal to empty string equality.

[rw@sapphire]/tmp $perl -e 'print undef eq "", "\n"'
1

Even in absence of that, it is not 'my special undef is equal to empty
string equality', cf

The following code works for single-level arrays. It uses a
stringwise comparison, and does not distinguish defined versus
undefined empty strings. Modify if you have other needs.

$are_equal = compare_arrays(\@frogs, \@toads);

sub compare_arrays {
my ($first, $second) = @_;
no warnings; # silence spurious -w undef complaints
return 0 unless @$first == @$second;
for (my $i = 0; $i < @$first; $i++) {
return 0 if $first->[$i] ne $second->[$i];
}
return 1;
}

(this text is part of the perlfaq4 document on the computer I was
using).

Rainer Weikusat

unread,
Jul 8, 2011, 4:29:08 AM7/8/11
to
Rainer Weikusat <rwei...@mssgmbh.com> writes:

[...]

So far, I've found two deficiencies in this algorithm, namely,

> sub cmp_href($$)
> {
> my ($a, $b) = @_;
> my ($ka, $va, $kb, $vb, $rc);
>
> OUTER: {
> while (($ka, $va, $kb, $vb) = (each(%$a), each(%$b))) {
> last OUTER unless defined($ka) && defined($kb);

The defined($ka) will always return true because no matter which of
the hashes runs out of entries first, this will always result in $kb
being undef (tested with a couple of manual assignments).

> last OUTER unless
> $a->{$kb} eq $vb && $b->{$ka} eq $va
> && exists($a->{$kb}) && exists($b->{$ka});

Provided the two hashes are equal, this should (meaning, I haven't
tested that) result in each key/value pair being checked twice.
Since the lists returned by keys are supposed to have the keys in a
different order each time, there doesn't seem to be a way around that
which doesn't either turn this into a multi-pass algorithm (something
I wanted to avoid if possible, to see if avoiding it would improve
something) or requires to keep track of already checked keys.

Rainer Weikusat

unread,
Jul 8, 2011, 5:06:50 AM7/8/11
to
"Uri Guttman" <u...@StemSystems.com> writes:

>>>>>> "RW" == Rainer Weikusat <rwei...@mssgmbh.com> writes:

[...]

> RW> sub cmp_href($$)
> RW> {
> RW> my ($a, $b) = @_;
> >>
> >> don't use $a and $b for vars. they are reserved for use by sort.
>
> RW> They are not reserved. The sort routine uses two variables with names
> RW> $a and $b in the symbol table of the module sort is invoked in (as far
> RW> as I understand the documentation). These $a and $b therefore don't
> RW> collide with lexical variables and they also don't collided with other
> RW> 'package global' variables because sort localizes them (as it shoud do)
>
> it is a convention. do you even care what other coders do or care about?
> it is just a bad idea. don't use $a and $b outside of sort. can you even
> allow this into your head?

There are exactly two cases in which I'd agree with an opinion of
yours, namely

1. I happen to hold the same opinion.
2. You've convinced me with an argument.

and one additional case where I would behave according to it,

3. You are in a position to give orders to me.

It is not 1, neither "they are reserved", this being a non-true
statement, nor "it's just a bad idea" qualify in the sense of 2 and 3
is most certainly not the case.

Actually, the opposite position seems to have some merit (to me):
Since the names $a and $b are 'well-known names' for 'two things being
compared' because the sort function uses them as such, the names and
their usual roles ought to be familiar to people reasonably familiar
with Perl and thus, using them for 'things being compared' in other
contexts makes sense.

> >> even lexically declared it is bad style. of course you won't listen
> >> to me.
>
> RW> In my opinion, you are wrong.
>
> you are very off here. too bad as it is your loss. listening to others
> is a useful skill.

'Listening to others' and 'unquestionly doing their bidding' are two
very different things.

[...]

> RW> OUTER: {
> RW> while (($ka, $va, $kb, $vb) = (each(%$a), each(%$b))) {
> RW> last OUTER unless defined($ka) && defined($kb);
> >>
> >> keys are always defined so that test makes no sense.
>
> RW> It does make sense: Provided that one of the hashes contains less
> RW> key-value pairs than the other, one of the the each invocations will
> RW> return an emtpy list and in this case, either $ka or $kb will be undef
> RW> after the list assignment.
>
> and i covered that point below. it is still a silly test. you can just
> as easily scan one hash and check exists in the other and not need extra
> defined tests.

This is basically the two-pass algorithm again: 1. Scan each hash
linearly to determine the key set. 2. Provided the sizes of these key
sets are identical, traverse on of them in order to determine if it is
identical to the other and if the values also compare equal.

Another option would be to traverse one hash, checking that all its
keys exist with identical values in the other and then traverse the
other hash to determine if it has any additional keys.


> RW> last OUTER unless
> RW> $a->{$kb} eq $vb && $b->{$ka} eq $va
> >>
> >> that will generate warnings if any value is undef. oh, you don't
> >> care.
>
> RW> Indeed. Hash key exists but maps to undef is a perfectly possible
> RW> situation.
>
> and broken in other situations.

Any value associated with some hash key will be 'the wrong value' in
some (and probably, even a lot of) situations.

> then you are not looking for hash equality as most people would
> define it but your limited string only, no undef values hash
> similarity. you should state that in your specification.

Essentially, I copied this 'specification' from a Perl FAQ entry on
comparing arrays and it was both in my original posting and in my
first reply to you.

> RW> && exists($a->{$kb}) && exists($b->{$ka});
> >>
> >> why test these after you test for equality? if the equality passes, then
> >> exists will pass except for the undef issue i brought up.
>
> RW> Precisely: Provided that one of the hashes contained a key whose
> RW> values was either undef or the empty string and the other hash didn't
> RW> contain this key, the eq comparison will have returned 'they are
> RW> equal' and the exists check is supposed to copw with that.
>
> and if you reverse the order it would be clearer.

I agree with that and admit that the test I posted was/is biased
towards the data it will be processing.

[...]

> >> and it has been solved in several places.
>
> RW> So what? I would be interested in other algorithms for solving this
> RW> problem (except the two other I used for testing). I'm not so much
> RW> interested in 'can be downloaded for free from the internet'
> RW> 'solutions', especially if these aren't even detailed enough to actually
> RW> download them.
>
> huh?? you asked for cpan modules and then you deny wanting them?

I specifically didn't ask for 'download files from the internet'
suggestions.

> detailed to download them? several of the test modules COME with
> perl. if you lifted a finger you could find the subs in question in a
> few seconds. wow.

And if you know where they reside, you could have said so in less time
than your two statements on this had required and while this still
had not been what I was interested in (hash comparison algorithms), it
would at least have been useful (for others, admittedly, while
'sharing the fact that you have knowledge' [but carefully not sharing
that] might be regarded as 'being more useful to the person having the
knowledge', although I disagree with that).

Jon Du Kim

unread,
Jul 8, 2011, 11:30:52 AM7/8/11
to
On 7/8/2011 5:06 AM, Rainer Weikusat wrote:
> "Uri Guttman"<u...@StemSystems.com> writes:
>
>>>>>>> "RW" == Rainer Weikusat<rwei...@mssgmbh.com> writes:
>
> [...]
>
>> RW> sub cmp_href($$)
>> RW> {
>> RW> my ($a, $b) = @_;
>> >>
>> >> don't use $a and $b for vars. they are reserved for use by sort.
>>
>> RW> They are not reserved. The sort routine uses two variables with names
>> RW> $a and $b in the symbol table of the module sort is invoked in (as far
>> RW> as I understand the documentation). These $a and $b therefore don't
>> RW> collide with lexical variables and they also don't collided with other
>> RW> 'package global' variables because sort localizes them (as it shoud do)
>>
>> it is a convention. do you even care what other coders do or care about?
>> it is just a bad idea. don't use $a and $b outside of sort. can you even
>> allow this into your head?
>
> There are exactly two cases in which I'd agree with an opinion of
> yours, namely
>
> 1. I happen to hold the same opinion.
> 2. You've convinced me with an argument.
>
> and one additional case where I would behave according to it,
>
> 3. You are in a position to give orders to me.
The problem you are facing is a strange one in the Perl community.
The spirit of Perl has long been "ThereIsMoreThanOneWayToDoIt" TIMTOWTDI.
Larry Walls' irc handle is TimToady. This is the true spirit of Perl.
Sadly, there seems to be a sub-cult of people hawking their idea of
"standards". They
are trying to make Perl code conform to some invented corporate style
coding standard.
These are small minded people that like to be forced to do things a
certain way.
Uri is one of these people. There are others.
What you are doing is fine, as you clearly know. What is not fine are
false self described
authorities trying to invent some bland corporate coding style for the
community.


Rainer Weikusat

unread,
Jul 8, 2011, 4:38:45 PM7/8/11
to
Rainer Weikusat <rwei...@mssgmbh.com> writes:
> Jürgen Exner <jurg...@hotmail.com> writes:

[...]

>> As a first step I would compare the size of the two hashes and then
>> check the value for each key (untested, algorithmic sketch only):
>>
>> my ($h1, $h2) = @_;
>> return 0 unless scalar(keys(%$h1)) == scalar(keys(%$h2));
>> #yes, scalar() is redundant, but this makes it very explicit
>> foreach (my $elem = keys %$h1) {
>> return 0 unless exists %$h2{$elem} # see note 1
>> and %$h1{$elem} == %$h2{$elem} # see note 2
>> }
>> return 1;
>>
>> 1: This not only checks if each key from h1 exists in h2, too, (i.e.
>> keys(h1) is subset of keys(h2)), but because h1 and h2 also have the
>> same number of elements then the two sets of keys are identical.

[...]

> That's similar to my 'naive' first implementation. Provided the hashes
> are small and they a rather different than identical, it is not bad.

As it turned out to be, this idea wasn't really grounded in reality
but rather in a copy'n'paste-botched benchmark :-) and except for
insanely large hashes (>= 500,000 entries), traversing a hash by
building a list of keys via keys is going to be faster then doing the
same with repeated each-calls. This implies that the most sensible way
to perform this operation (known to me) is indeed Juergens suggestion
above. With a couple of other wrong assumptions removed, the resulting
code looks/ could look like this:

sub cmp_href_0($$)


{
my ($a, $b) = @_;

return unless keys(%$a) == keys(%$b);

exists($b->{$_}) && $b->{$_} eq $a->{$_} || return
for (keys(%$a));

return 1;
}

Uri Guttman

unread,
Jul 8, 2011, 4:56:00 PM7/8/11
to
>>>>> "RW" == Rainer Weikusat <rwei...@mssgmbh.com> writes:

RW> sub cmp_href_0($$)

that prototype is useless. it doesn't do anything worth the
bother. prototypes are about only useful when you need to pass a whole
hash/array or a code block and you need conversion to refs.

RW> {
RW> my ($a, $b) = @_;

regardless of your views, i say this for others, don't use $a and $b in
your code, lexically or otherwise. it is a CONVENTION that all decent
perl hackers are expected to do.

RW> return unless keys(%$a) == keys(%$b);

which i suggested at the beginning but it was tossed aside. something
about not understanding the problem. or maybe you didn't know how fast
keys is in a scalar context?

Rainer Weikusat

unread,
Jul 8, 2011, 5:27:43 PM7/8/11
to
"Uri Guttman" <u...@StemSystems.com> writes:
>>>>>> "RW" == Rainer Weikusat <rwei...@mssgmbh.com> writes:
>
> RW> sub cmp_href_0($$)
>
> that prototype is useless. it doesn't do anything worth the
> bother.

It provides at least some sort of compile-time checking of function
calls and that's something I decidedly want to have ...

> prototypes are about only useful when you need to pass a whole
> hash/array or a code block and you need conversion to refs.

... while this is something I don't. It means that someone who looks
at a function invocation needs to be aware of the function declaration
in order to know what it will do to its arguments and I think this is
bad.

> RW> {
> RW> my ($a, $b) = @_;
>
> regardless of your views, i say this for others, don't use $a and $b in
> your code, lexically or otherwise. it is a CONVENTION that all decent
> perl hackers are expected to do.

This is now actually a circular: Decent perl programmers
don't use variables named $a and $b because nobody who does is a
decent perl programmer.

> RW> return unless keys(%$a) == keys(%$b);
>
> which i suggested at the beginning but it was tossed aside.
> something about not understanding the problem. or maybe you didn't
> know how fast keys is in a scalar context?

You didn't bother to provide an explanation despite you now suggest
that you could have done so and I was (as I wrote in the text you have
chosen to delete) under the impression of having some experimentally
acquired data demonstrating that this pretty obvious check was
actually a bad idea.

Uri Guttman

unread,
Jul 8, 2011, 5:52:32 PM7/8/11
to
>>>>> "RW" == Rainer Weikusat <rwei...@mssgmbh.com> writes:

RW> "Uri Guttman" <u...@StemSystems.com> writes:
>>>>>>> "RW" == Rainer Weikusat <rwei...@mssgmbh.com> writes:
>>
RW> sub cmp_href_0($$)
>>
>> that prototype is useless. it doesn't do anything worth the
>> bother.

RW> It provides at least some sort of compile-time checking of function
RW> calls and that's something I decidedly want to have ...

barely and it can be bypassed with calls like &foo. stick to your style
if you want but even larry (yes, wall) deprecates prototypes for
checking stuff. it is best used to make syntax changes for sub calls.

>> prototypes are about only useful when you need to pass a whole
>> hash/array or a code block and you need conversion to refs.

RW> ... while this is something I don't. It means that someone who looks
RW> at a function invocation needs to be aware of the function declaration
RW> in order to know what it will do to its arguments and I think this is
RW> bad.

RW> {
RW> my ($a, $b) = @_;
>>
>> regardless of your views, i say this for others, don't use $a and $b in
>> your code, lexically or otherwise. it is a CONVENTION that all decent
>> perl hackers are expected to do.

RW> This is now actually a circular: Decent perl programmers
RW> don't use variables named $a and $b because nobody who does is a
RW> decent perl programmer.

you seem to come back to circular as your attack and defense. that in
itself is circular. me thinks thou dost not know what it means. the key
word i emphasized is convention. look it up.

RW> return unless keys(%$a) == keys(%$b);
>>
>> which i suggested at the beginning but it was tossed aside.
>> something about not understanding the problem. or maybe you didn't
>> know how fast keys is in a scalar context?

RW> You didn't bother to provide an explanation despite you now suggest
RW> that you could have done so and I was (as I wrote in the text you have
RW> chosen to delete) under the impression of having some experimentally
RW> acquired data demonstrating that this pretty obvious check was
RW> actually a bad idea.

no, i didn't need to provide one. i said it was a better first pass
check and i was right. you ignored it. simple. you ignore stuff that you
shouldn't ignore. as for your experimentally acquired data, why didn't
YOU publish it? it would have been more authorative from you but you ask
for that from me. circular again. me thinks you are a round file.

Rainer Weikusat

unread,
Jul 8, 2011, 6:20:15 PM7/8/11
to
"Uri Guttman" <u...@StemSystems.com> writes:
>>>>>> "RW" == Rainer Weikusat <rwei...@mssgmbh.com> writes:
>
> RW> "Uri Guttman" <u...@StemSystems.com> writes:
> >>>>>>> "RW" == Rainer Weikusat <rwei...@mssgmbh.com> writes:
> >>
> RW> sub cmp_href_0($$)
> >>
> >> that prototype is useless. it doesn't do anything worth the
> >> bother.
>
> RW> It provides at least some sort of compile-time checking of function
> RW> calls and that's something I decidedly want to have ...
>
> barely and it can be bypassed with calls like &foo.

It will cause the compiler to make noises when a function is called
with less arguments than the prototype said it should have. I found
this to be useful to me.

[...]

> RW> This is now actually a circular: Decent perl programmers
> RW> don't use variables named $a and $b because nobody who does is a
> RW> decent perl programmer.
>
> you seem to come back to circular as your attack and defense. that in
> itself is circular. me thinks thou dost not know what it means.

You thinks wrong in this case, as any definition of petitio principii/
begging the question will tell you. In case you don't know where to
find one:

http://philosophy.lander.edu/logic/circular.html

IMHO, the whole thing is worth a read.

[...]

> RW> You didn't bother to provide an explanation despite you now suggest
> RW> that you could have done so and I was (as I wrote in the text you have
> RW> chosen to delete) under the impression of having some experimentally
> RW> acquired data demonstrating that this pretty obvious check was
> RW> actually a bad idea.
>
> no, i didn't need to provide one. i said it was a better first pass
> check and i was right. you ignored it.

Trying to distinguish between gods who don't speak and real stones is
a waste of time. Feel free to behave like a stone and be ignored like
one.

Uri Guttman

unread,
Jul 8, 2011, 6:31:04 PM7/8/11
to
>>>>> "RW" == Rainer Weikusat <rwei...@mssgmbh.com> writes:

RW> "Uri Guttman" <u...@StemSystems.com> writes:
>>>>>>> "RW" == Rainer Weikusat <rwei...@mssgmbh.com> writes:
>>
RW> "Uri Guttman" <u...@StemSystems.com> writes:
>> >>>>>>> "RW" == Rainer Weikusat <rwei...@mssgmbh.com> writes:
>> >>
RW> sub cmp_href_0($$)
>> >>
>> >> that prototype is useless. it doesn't do anything worth the
>> >> bother.
>>
RW> It provides at least some sort of compile-time checking of function
RW> calls and that's something I decidedly want to have ...
>>
>> barely and it can be bypassed with calls like &foo.

RW> It will cause the compiler to make noises when a function is called
RW> with less arguments than the prototype said it should have. I found
RW> this to be useful to me.

and to few other people. also it doesn't work at all with methods and
most perl code is OO these days.

RW> You didn't bother to provide an explanation despite you now suggest
RW> that you could have done so and I was (as I wrote in the text you have
RW> chosen to delete) under the impression of having some experimentally
RW> acquired data demonstrating that this pretty obvious check was
RW> actually a bad idea.
>>
>> no, i didn't need to provide one. i said it was a better first pass
>> check and i was right. you ignored it.

RW> Trying to distinguish between gods who don't speak and real stones is
RW> a waste of time. Feel free to behave like a stone and be ignored like
RW> one.

heh. ignorance is your area it seems. feel free to ignore all others
too.

John W. Krahn

unread,
Jul 8, 2011, 10:56:56 PM7/8/11
to
Rainer Weikusat wrote:
> "Uri Guttman"<u...@StemSystems.com> writes:
>>>>>>> "RW" == Rainer Weikusat<rwei...@mssgmbh.com> writes:
>>
>> RW> "Uri Guttman"<u...@StemSystems.com> writes:
>> >>>>>>> "RW" == Rainer Weikusat<rwei...@mssgmbh.com> writes:
>> >>
>> RW> sub cmp_href_0($$)
>> >>
>> >> that prototype is useless. it doesn't do anything worth the
>> >> bother.
>>
>> RW> It provides at least some sort of compile-time checking of function
>> RW> calls and that's something I decidedly want to have ...
>>
>> barely and it can be bypassed with calls like&foo.
>
> It will cause the compiler to make noises when a function is called
> with less arguments than the prototype said it should have. I found
> this to be useful to me.


$ perl -le'
my %x = "a" .. "z";
my %y = "A" .. "Z";
my @refs = ( \%x, \%y );
sub cmp_href_0 ($$) {
my ( $a, $b ) = @_;
print %$a, %$b;
}
cmp_href_0( @refs );
'
Not enough arguments for main::cmp_href_0 at -e line 9, near "@refs )"
Execution of -e aborted due to compilation errors.

But I passed an array containing two elements. Why doesn't it work?

John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein

Henry Law

unread,
Jul 10, 2011, 9:45:16 AM7/10/11
to
On 07/07/11 21:24, Rainer Weikusat wrote:
> They are not reserved. The sort routine uses two variables with names
> $a and $b in the symbol table of the module sort is invoked in (as far
> as I understand the documentation). These $a and $b therefore don't
> collide with lexical variables and they also don't collided with other
> 'package global' variables because sort localizes them (as it shoud do)

You are right in every respect except one: your assertion that because
you are right then it's right to continue doing what has been proved to
be undesirable. "Everything that is not forbidden is permitted, but not
everything that is permitted is right".

$a and $b are a special case inasmuch as they may be used in a program
which uses "strict" and "warnings" (as all should). Look:

$ cat tryout; ./tryout
#! /usr/bin/perl

use strict;
use warnings;

$a = 1;
$c = 1;

Global symbol "$c" requires explicit package name at ./tryout line 7.
Execution of ./tryout aborted due to compilation errors.

So using $a or $b in "ordinary" code will sooner or later cause some
programmer somewhere to spend a day chasing an obscure programming error.

That is why you are wrong.

--

Henry Law Manchester, England

pepa

unread,
Jul 10, 2011, 12:14:45 PM7/10/11
to
9.7.2011 5:56, John W. Krahn kirjoitti:
>
> But I passed an array containing two elements. Why doesn't it work?

You have the answer glaring at you right in your question: "I passed
an array containing two elements." You passed an array. Your prototype
declares two elements, but instead of passing two elements, you pass
an array.

You have given a wonderful example of why perl prototypes are
deprecared.

Uri Guttman

unread,
Jul 10, 2011, 12:43:03 PM7/10/11
to
>>>>> "p" == pepa <papa....@suomi24.fi> writes:

p> 9.7.2011 5:56, John W. Krahn kirjoitti:
>>
>> But I passed an array containing two elements. Why doesn't it work?

p> You have the answer glaring at you right in your question: "I passed
p> an array containing two elements." You passed an array. Your prototype
p> declares two elements, but instead of passing two elements, you pass
p> an array.

p> You have given a wonderful example of why perl prototypes are
p> deprecared.

you seem to think john didn't understand the issue. he was showing an
example of why prototypes are deprecated so the other poster who likes
them might actually learn something.

Rainer Weikusat

unread,
Jul 10, 2011, 4:04:44 PM7/10/11
to

Why do you believe that an array evaluated in a scalar context would
be equivalent to passing two scalar values when the evaluation returns
only one?

Rainer Weikusat

unread,
Jul 10, 2011, 4:10:53 PM7/10/11
to

I don't see how "the Perl compiler suppresses "stricture" for package
globals named $a or $b" (so that the traditional sort interface
continues to work) would lend itself to the conclusion that 'using a
properly declared lexical variable named $a or $b' will 'sooner or
later cause someone to spend a day on chasing an obscure programming
error'. Shouldn't there at least be a programming error before someone
needs to chase one?

Rainer Weikusat

unread,
Jul 10, 2011, 4:19:34 PM7/10/11
to
"Uri Guttman" <u...@StemSystems.com> writes:
>>>>>> "RW" == Rainer Weikusat <rwei...@mssgmbh.com> writes:
> RW> "Uri Guttman" <u...@StemSystems.com> writes:
> >>>>>>> "RW" == Rainer Weikusat <rwei...@mssgmbh.com> writes:
> >>
> RW> "Uri Guttman" <u...@StemSystems.com> writes:
> >> >>>>>>> "RW" == Rainer Weikusat <rwei...@mssgmbh.com> writes:
> >> >>
> RW> sub cmp_href_0($$)
> >> >>
> >> >> that prototype is useless. it doesn't do anything worth the
> >> >> bother.
> >>
> RW> It provides at least some sort of compile-time checking of function
> RW> calls and that's something I decidedly want to have ...
> >>
> >> barely and it can be bypassed with calls like &foo.
>
> RW> It will cause the compiler to make noises when a function is called
> RW> with less arguments than the prototype said it should have. I found
> RW> this to be useful to me.
>
> and to few other people. also it doesn't work at all with methods and
> most perl code is OO these days.

'Methods in Perl' can indeed be a PITA because there is no compiler
support for any kind of 'method call checks'. But that's not exactly a
reason for not trying to exploit compiler support for catching simple
errors in areas where it does exist.

> RW> You didn't bother to provide an explanation despite you now suggest
> RW> that you could have done so and I was (as I wrote in the text you have
> RW> chosen to delete) under the impression of having some experimentally
> RW> acquired data demonstrating that this pretty obvious check was
> RW> actually a bad idea.
> >>
> >> no, i didn't need to provide one. i said it was a better first pass
> >> check and i was right. you ignored it.
>
> RW> Trying to distinguish between gods who don't speak and real stones is
> RW> a waste of time. Feel free to behave like a stone and be ignored like
> RW> one.
>
> heh. ignorance is your area it seems.

It is somewhat stupid to accuse others of not being aware of facts
known to you you weren't willing to share with them. Especially after
those others have acquired knowledge of these facts nevertheless.

John W. Krahn

unread,
Jul 10, 2011, 5:54:56 PM7/10/11
to

But it works correctly without the prototype:

$ perl -le'
my %x = "a" .. "z";
my %y = "A" .. "Z";
my @refs = ( \%x, \%y );

sub cmp_href_0 {


my ( $a, $b ) = @_;
print %$a, %$b;
}
cmp_href_0( @refs );
'

wxefabmnstyzuvcdklqrghijopSTABOPWXKLYZEFQRMNCDIJGHUV

Rainer Weikusat

unread,
Jul 11, 2011, 8:13:50 AM7/11/11
to

It will work 'correctly' whenever what you want to do actually happens
to be correct. A much more interesting example would be the following:

-----------
#!/usr/bin/perl
#
#

my @a = ("%s: %d\n", 'Zementmischer', 3);
print(sprintf(@a));
-----------

[assuming that the formatted string was supposed to be printed]

This, too, will 'work correctly' once you have perl changed so that it
behaves like you would like it to behave.

C.DeRykus

unread,
Jul 12, 2011, 8:39:23 PM7/12/11
to
On Jul 11, 5:13 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
> ...

>
> >> Why do you believe that an array evaluated in a scalar context would
> >> be equivalent to passing two scalar values when the evaluation returns
> >> only one?
>
> > But it works correctly without the prototype:
>
> It will work 'correctly' whenever what you want to do actually happens
> to be correct. A much more interesting example would be the following:
>
> -----------
> #!/usr/bin/perl
> #
> #
>
> my @a = ("%s: %d\n", 'Zementmischer', 3);
> print(sprintf(@a));
> -----------
>
> [assuming that the formatted string was supposed to be printed]
>
> This, too, will 'work correctly' once you have perl changed so that it
> behaves like you would like it to behave.

Hm, sprintf's prototype is '$@' so @a gets
coerced to an array. That results in a print
output of '3'. Are you suggesting that perl
could/should alter sprintf's prototype to be
just '@' ?

--
Charles DeRykus

C.DeRykus

unread,
Jul 12, 2011, 9:27:37 PM7/12/11
to
On Jul 12, 5:39 pm, "C.DeRykus" <dery...@gmail.com> wrote:

> Hm,  sprintf's prototype is '$@' so @a gets
> coerced to an array.

^^^^^
scalar

Tad McClellan

unread,
Jul 12, 2011, 10:50:11 PM7/12/11
to
C.DeRykus <der...@gmail.com> wrote:
> On Jul 12, 5:39 pm, "C.DeRykus" <dery...@gmail.com> wrote:
>
>> Hm,  sprintf's prototype is '$@' so @a gets
>> coerced to an array.
> ^^^^^
> scalar
>
>> That results in a print output of '3'.

Both

my @a = ("%s: %d\n", 'Zementmischer', 100);

and the original

my @a = ("%s: %d\n", 'Zementmischer', 3);

make the same output.


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.

C.DeRykus

unread,
Jul 13, 2011, 12:50:49 AM7/13/11
to
On Jul 12, 7:50 pm, Tad McClellan <ta...@seesig.invalid> wrote:

> C.DeRykus <dery...@gmail.com> wrote:
> > On Jul 12, 5:39 pm, "C.DeRykus" <dery...@gmail.com> wrote:
>
> >> Hm,  sprintf's prototype is '$@' so @a gets
> >> coerced to an array.
> >                 ^^^^^
> >                 scalar
>
> >> That results in a print output of '3'.
>
> Both
>
>     my @a = ("%s: %d\n", 'Zementmischer', 100);
>
> and the original
>
>     my @a = ("%s: %d\n", 'Zementmischer', 3);
>
> make the same output.
>

And... ? Yes, the array-to-scalar coercion
results in these same sized arrays printing 3
each time.


--
Charles DeRykus

Rainer Weikusat

unread,
Jul 13, 2011, 6:37:06 AM7/13/11
to

I'm "suggesting" that the person I was replying to always has two
options when something doesn't work because he is using it wrongly:

- use it correctly
- change that something so that it works how he would like it
to work

Tad McClellan

unread,
Jul 13, 2011, 1:55:09 PM7/13/11
to
C.DeRykus <der...@gmail.com> wrote:
> On Jul 12, 7:50 pm, Tad McClellan <ta...@seesig.invalid> wrote:
>> C.DeRykus <dery...@gmail.com> wrote:
>> > On Jul 12, 5:39 pm, "C.DeRykus" <dery...@gmail.com> wrote:
>>
>> >> Hm,  sprintf's prototype is '$@' so @a gets
>> >> coerced to an array.
>> >                 ^^^^^
>> >                 scalar
>>
>> >> That results in a print output of '3'.
>>
>> Both
>>
>>     my @a = ("%s: %d\n", 'Zementmischer', 100);
>>
>> and the original
>>
>>     my @a = ("%s: %d\n", 'Zementmischer', 3);
>>
>> make the same output.
>>
>
> And... ?


It was provided for those playing along at home who may
have not caught the implications of your followups.


> Yes, the array-to-scalar coercion
> results in these same sized arrays printing 3
> each time.


--

Ted Zlatanov

unread,
Jul 14, 2011, 9:24:37 AM7/14/11
to
On Fri, 08 Jul 2011 22:27:43 +0100 Rainer Weikusat <rwei...@mssgmbh.com> wrote:

RW> "Uri Guttman" <u...@StemSystems.com> writes:

>> regardless of your views, i say this for others, don't use $a and $b in
>> your code, lexically or otherwise. it is a CONVENTION that all decent
>> perl hackers are expected to do.

RW> This is now actually a circular: Decent perl programmers
RW> don't use variables named $a and $b because nobody who does is a
RW> decent perl programmer.

$a and $b are dangerous because they overlap with the sort built-ins $a
and $b. They will work most of the time and then one day you'll have to
hunt down a bug because of them that will waste more time than you could
possibly have saved by using $x and $y or whatever. If you're willing
to accept that risk, you can use them. But don't write code in a public
forum without commenting on the risk: anyone that uses your code will
unwittingly take that risk too.

The specific bugs that may come up are pretty unlikely. But since
you're writing a generic function and not a special-purpose one, you
have to be extra careful.

That is not to say that $a and $b are bad variable names. This is a
conflict between Perl's "don't worry about it" scoping facilities and
some syntactic sugar that has turned sour. But nevertheless we have to
beware using $a and $b for practical reasons, not to claim a "decent
programmer" title.

Ted

0 new messages