Quoth Rainer Weikusat <
rwei...@mssgmbh.com>:
> Peter Makholm <
pe...@makholm.net> writes:
> > "Dave Saville" <da...@invalid.invalid> writes:
> >
> >> Then I know that there is a posible performance hit as the keys are
> >> all extracted and then sorted. But is this because of the sort or are
> >> they always all extracted first? If the latter is true then how do you
> >> iterate over a hash without taking the hit?
> >
> > You kan use the each() function in a while loop:
> >
> > while (($key, $value) = each %hash) {
> > print $key, "\n";
> > }
>
> There's actually a variety of options and depending on the number of
> pairs in the hash, they perform differently ('values' came out first
> everwhere for the tests I made).
>
> ----------------------------
> use Benchmark qw(cmpthese);
>
> sub traversal_bench
> {
> my %h = map { $_, 1; } 0 .. $_[0];
>
> print("\n===\n$_[0] keys\n===\n");
>
>
> sorted_keys => sub {
> my $v;
>
> $v = $h{$_} for sort(keys(%h));
> },
>
> values => sub {
> 1 for values(%h);
> },
That's hardly a fair comparison. In fact, 'values' coming out faster is
a red herring as well: it's only happening because the values are all 1
which is much faster to copy than a string.
With a fairer test like
use Benchmark qw/cmpthese/;
my %h = map +("$_", "$_"), 1..60_000;
my $x;
cmpthese -5, {
keys => sub { $x = $_ for keys %h },
sort => sub { $x = $_ for sort keys %h },
values => sub { $x = $_ for values %h },
keach => sub { 1 while $x = each %h },
veach => sub { 1 while $x = (each %h)[1] },
};
I consistently get
Rate sort veach keys values keach
sort 9.11/s -- -61% -72% -76% -81%
veach 23.6/s 159% -- -27% -37% -51%
keys 32.5/s 256% 37% -- -13% -33%
values 37.2/s 309% 58% 15% -- -23%
keach 48.4/s 431% 105% 49% 30% --
which is pretty much what I would expect: 'sort' is very expensive,
'keach' is cheap, and 'veach' is unavoidably more expensive than either
'keys' or 'values' because it has to do more work. I'm not sure why
'values' is cheaper than 'keys', but I suspect it has something to do
with the fact that hash keys are shared.
Attempting to compensate for 'veach's unfair disadvantage like this
use Benchmark qw/cmpthese/;
use constant C => "60000";
my %h = map +("$_", "$_"), 1..60_000;
my ($k, $v);
cmpthese -5, {
keys => sub { $k = 1; ($k, $v) = ($_, C) for keys %h },
values => sub { $k = 1; ($k, $v) = (C, $_) for values %h },
keach => sub { $k = 1; ($k, $v) = (scalar each %h, C) while $k },
veach => sub { $k = 1; 1 while ($k, $v) = each %h },
};
gives
Rate keys values veach keach
keys 18.4/s -- -11% -16% -49%
values 20.7/s 12% -- -5% -43%
veach 21.8/s 18% 6% -- -39%
keach 35.9/s 95% 74% 65% --
which puts 'each' fastest again; I'm not sure why 'keach' is faster than
'veach', but I suspect that comparison still isn't entirely fair.
Ben