Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

maintaining order in a hash (without Tie::IxHash)

6 views
Skip to first unread message

nolo contendere

unread,
May 22, 2008, 1:20:17 PM5/22/08
to
Many times, there exists a module on CPAN that solves a problem I'm
attempting to solve.
I'm aware that generally the better choice to solving such a problem
is to use the CPAN module.
However, regarding the problem of maintaining sort order of a hash,
and the Tie::IxHash module, I have a question.

I've heard that Tie::IxHash can be a little slow, and the cost of
using this in addition to installation costs (where developers don't
have the permission to have modules installed, especially across all
environments, and the process for having the module(s) installed by
admins can be trying to say the least) is much higher than the simple
solution of adding an extra sortval attribute to a hash value, where
the value of sortval is simply an incremented counter.

then if you need to access the hash in the order in which items were
inserted, you could simply:

for my $key ( sort { $hash{$a}{sortval} <=> $hash{$b}{sortval} } keys
%hash ) {
# do something
}


Is there something wrong with my reasoning? Would this solution be a
good candidate for addition into the FAQ How can I make my hash
remember the order I put elements
into it?


Uri Guttman

unread,
May 22, 2008, 1:35:54 PM5/22/08
to
>>>>> "nc" == nolo contendere <simon...@fmr.com> writes:

nc> then if you need to access the hash in the order in which items were
nc> inserted, you could simply:

nc> for my $key ( sort { $hash{$a}{sortval} <=> $hash{$b}{sortval} } keys
nc> %hash ) {
nc> # do something
nc> }

and how do you set the sortval for each key? how do you know the order
of insertion without a counter (internal to this hash or external)? also
do you realize that construction means the top level keys are really
hash refs and not the original key? and where is the value for each key?

these questions and their tricky answers are why this problem is best
solved by a working and tested module. it is not a trivial one line of
perl thing to have hashes which maintain some sort of order.

nc> Is there something wrong with my reasoning? Would this solution be a
nc> good candidate for addition into the FAQ How can I make my hash
nc> remember the order I put elements
nc> into it?

well, if your answer covered all of those questions (and possibly more
as i haven't delved into this much) and showed working code, maybe. but
pointing to a working module is better and easier for almost all FAQ
answers. the excuse of not wanting/able to install modules is usually
bogus as there are many ways to install modules on almost any platform.

uri

--
Uri Guttman ------ u...@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

nolo contendere

unread,
May 22, 2008, 1:52:57 PM5/22/08
to
On May 22, 1:35 pm, Uri Guttman <u...@stemsystems.com> wrote:

> >>>>> "nc" == nolo contendere <simon.c...@fmr.com> writes:
>
>   nc> then if you need to access the hash in the order in which items were
>   nc> inserted, you could simply:
>
>   nc> for my $key ( sort { $hash{$a}{sortval} <=> $hash{$b}{sortval} } keys
>   nc> %hash ) {
>   nc>     # do something
>   nc> }
>
> and how do you set the sortval for each key? how do you know the order
> of insertion without a counter (internal to this hash or external)? also
> do you realize that construction means the top level keys are really
> hash refs and not the original key? and where is the value for each key?

Below is an example, used in conjunction with File::Find

#!/usr/bin/perl

use strict; use warnings;
use Data::Dumper;
use File::Basename;
use Getopt::Long;
use File::Find;

my $script = basename $0;
my $USAGE = <<"END_USAGE";
DESCRIPTION:
$script - finds the difference between two directory trees

USAGE EXAMPLE:
$script --tree1=/path/to/tree1 --tree2=/path/to/tree2

OPTIONS:
--toc
file containing the 'tar tvf <file>.tar' contents

--tree1
path to the root of tree1

--tree2
path to the root of tree2

END_USAGE

my ( $tree1, $tree2 ) = ( '', '' );
GetOptions( "tree1=s" => \$tree1,
"tree2=s" => \$tree2 ) or die $USAGE;

die "'$tree1' does not exist. Please read the usage notes.\n\n$USAGE
\n" unless -d $tree1;
die "'$tree2' does not exist. Please read the usage notes.\n\n$USAGE
\n" unless -d $tree2;

my ( %src, %tgt, $ctr1, $ctr2, %diffs );

find( \&get_src_info, $tree1 );
find( \&get_tgt_info, $tree2 );

for my $key ( sort { $src{$a}{sortval} <=> $src{$b}{sortval} } keys
%src ) {
my $tgt_node = $key;
$tgt_node =~ s/$tree1/$tree2/;

$diffs{$key} = '';

if ( -d $key ) {
if ( !-d $tgt_node ) {
print "$tgt_node is missing from tree2!\n";
}
}
else {
if ( -e $tgt_node ) {
my $cmd = qq{diff $key $tgt_node};
my $diff = `$cmd`;
$diffs{$key} = $diff;
}
else {
print "$tgt_node is missing from tree2!\n";
}
}
}


my @msgs_diff;
my @msgs_nodiff;

for my $key ( sort keys %diffs ) {
my $msg;
if ( $diffs{$key} eq '' ) {
$msg = "No differences for node: '$key'";
push( @msgs_nodiff, $msg );
}
else {
$msg = "For file '$key', there were the following differences:
\n$diffs{$key}";
push( @msgs_diff, $msg );
}
}

print "No Diffs in:\n", '-'x40, "\n";
print join( "\n", @msgs_nodiff ), "\n";

if ( @msgs_diff ) {
my $mailbody = join( "\n", @msgs_diff );
print "\nDiffs in:\n", '-'x40, "\n$mailbody";
# do_mail('Diff in a file', $mailbody);
}

##=====================
## subs
##=====================

sub get_src_info {
# print "\$File::Find::dir = $File::Find::dir\n";
# print "\$File::Find::name = $File::Find::name\n";
# print "\$_ = $_\n";

my ( $mode, $uid, $gid, $size, $mtime ) =
(stat( $File::Find::name ))[2,4,5,7,9];
$src{$File::Find::name} = {
mode => $mode,
uid => $uid,
gid => $gid,
size => $size,
mtime => $mtime,
sortval => ++$ctr1,
};
}

# I know, i shouldn't duplicate the code, just getting this working
for now
# this isn't even used, but thought it would be nice to have in case i
need the stats later.
sub get_tgt_info {
# print "\$File::Find::dir = $File::Find::dir\n";
# print "\$File::Find::name = $File::Find::name\n";
# print "\$_ = $_\n";

my ( $mode, $uid, $gid, $size, $mtime ) =
(stat( $File::Find::name ))[2,4,5,7,9];
$tgt{$File::Find::name} = {
mode => $mode,
uid => $uid,
gid => $gid,
size => $size,
mtime => $mtime,
sortval => ++$ctr2,
};
}


>
> these questions and their tricky answers are why this problem is best
> solved by a working and tested module. it is not a trivial one line of
> perl thing to have hashes which maintain some sort of order.
>
>   nc> Is there something wrong with my reasoning? Would this solution be a
>   nc> good candidate for addition into the FAQ How can I make my hash
>   nc> remember the order I put elements
>   nc>           into it?
>
> well, if your answer covered all of those questions (and possibly more
> as i haven't delved into this much) and showed working code, maybe. but
> pointing to a working module is better and easier for almost all FAQ
> answers. the excuse of not wanting/able to install modules is usually
> bogus as there are many ways to install modules on almost any platform.

There are certainly ways of installing modules on machines by
circumventing the "process", but this can be risky, and it can be very
difficult to propagate the modules to prod machines.

Uri Guttman

unread,
May 22, 2008, 2:24:58 PM5/22/08
to
>>>>> "nc" == nolo contendere <simon...@fmr.com> writes:

nc> On May 22, 1:35 pm, Uri Guttman <u...@stemsystems.com> wrote:
>> >>>>> "nc" == nolo contendere <simon.c...@fmr.com> writes:
>>
>>   nc> then if you need to access the hash in the order in which items were
>>   nc> inserted, you could simply:
>>
>>   nc> for my $key ( sort { $hash{$a}{sortval} <=> $hash{$b}{sortval} } keys
>>   nc> %hash ) {
>>   nc>     # do something
>>   nc> }
>>
>> and how do you set the sortval for each key? how do you know the order
>> of insertion without a counter (internal to this hash or external)? also
>> do you realize that construction means the top level keys are really
>> hash refs and not the original key? and where is the value for each key?

nc> Below is an example, used in conjunction with File::Find

an example of what??

nc> #!/usr/bin/perl

nc> use strict; use warnings;
nc> use Data::Dumper;
nc> use File::Basename;
nc> use Getopt::Long;
nc> use File::Find;

nc> my ( $tree1, $tree2 ) = ( '', '' );

useless initialization.

nc> GetOptions( "tree1=s" => \$tree1,
nc> "tree2=s" => \$tree2 ) or die $USAGE;

nc> my ( %src, %tgt, $ctr1, $ctr2, %diffs );


nc> find( \&get_src_info, $tree1 );
nc> find( \&get_tgt_info, $tree2 );

nc> for my $key ( sort { $src{$a}{sortval} <=> $src{$b}{sortval} } keys
nc> %src ) {

now that i see this is a dirtree compare (btw, unix diff -r can do this
for you), i still don't see the need to keep find order. the order of
entries in directories is not stable as it depends on the file system,
any deletions and inserts and how file::find reads the dirs. you have a
single level hash of each tree so where would insert order matter?


nc> sub get_src_info {

nc> my ( $mode, $uid, $gid, $size, $mtime ) =
nc> (stat( $File::Find::name ))[2,4,5,7,9];
nc> $src{$File::Find::name} = {
nc> mode => $mode,
nc> uid => $uid,
nc> gid => $gid,
nc> size => $size,
nc> mtime => $mtime,
nc> sortval => ++$ctr1,

gack, a global counter used by a callback!

this isn't even usable as a module. you hardwired it to be a standalone
script. so how would posting this complex code be a useful FAQ answer?

>> well, if your answer covered all of those questions (and possibly more
>> as i haven't delved into this much) and showed working code, maybe. but
>> pointing to a working module is better and easier for almost all FAQ
>> answers. the excuse of not wanting/able to install modules is usually
>> bogus as there are many ways to install modules on almost any platform.

nc> There are certainly ways of installing modules on machines by
nc> circumventing the "process", but this can be risky, and it can be very
nc> difficult to propagate the modules to prod machines.

installing your own code is not any easier/harder than installing pure
perl modules on production machines. you can even cut/paste the module
code into your scripts if that is the only way. as i said, there are
almost no good excuses for not using a module.

nolo contendere

unread,
May 22, 2008, 2:35:09 PM5/22/08
to
On May 22, 2:24 pm, Uri Guttman <u...@stemsystems.com> wrote:
> >>>>> "nc" == nolo contendere <simon.c...@fmr.com> writes:
>
>   nc> On May 22, 1:35 pm, Uri Guttman <u...@stemsystems.com> wrote:
>   >> >>>>> "nc" == nolo contendere <simon.c...@fmr.com> writes:
>   >>
>   >>   nc> then if you need to access the hash in the order in which items were
>   >>   nc> inserted, you could simply:
>   >>
>   >>   nc> for my $key ( sort { $hash{$a}{sortval} <=> $hash{$b}{sortval} } keys
>   >>   nc> %hash ) {
>   >>   nc>     # do something
>   >>   nc> }
>   >>
>   >> and how do you set the sortval for each key? how do you know the order
>   >> of insertion without a counter (internal to this hash or external)? also
>   >> do you realize that construction means the top level keys are really
>   >> hash refs and not the original key? and where is the value for each key?
>
>   nc> Below is an example, used in conjunction with File::Find
>
> an example of what??
>

An example of my initial proposal of a substitute for Tie::IxHash that
maintains the insertion order of elements into a hash.

>   nc> #!/usr/bin/perl
>
>   nc> use strict; use warnings;
>   nc> use Data::Dumper;
>   nc> use File::Basename;
>   nc> use Getopt::Long;
>   nc> use File::Find;
>
>   nc> my ( $tree1, $tree2 ) = ( '', '' );
>
> useless initialization.
>

yeah, if i called the script with no options, i got the uninitialized
value warning.

>   nc> GetOptions( "tree1=s" => \$tree1,
>   nc>             "tree2=s" => \$tree2 ) or die $USAGE;
>
>   nc> my ( %src, %tgt, $ctr1, $ctr2, %diffs );
>
>   nc> find( \&get_src_info, $tree1 );
>   nc> find( \&get_tgt_info, $tree2 );
>
>   nc> for my $key ( sort { $src{$a}{sortval} <=> $src{$b}{sortval} } keys
>   nc> %src ) {
>
> now that i see this is a dirtree compare (btw, unix diff -r can do this
> for you), i still don't see the need to keep find order. the order of
> entries in directories is not stable as it depends on the file system,
> any deletions and inserts and how file::find reads the dirs. you have a
> single level hash of each tree so where would insert order matter?
>

diff -r looks great! wasn't aware of that one...
Well, the need isn't there. But this was just supposed to illustrate
how i set the sortval value, as you asked earlier in the thread.

>   nc> sub get_src_info {
>
>   nc>     my ( $mode, $uid, $gid, $size, $mtime ) =
>   nc> (stat( $File::Find::name ))[2,4,5,7,9];
>   nc>     $src{$File::Find::name} = {
>   nc>         mode    => $mode,
>   nc>         uid     => $uid,
>   nc>         gid     => $gid,
>   nc>         size    => $size,
>   nc>         mtime   => $mtime,
>   nc>         sortval => ++$ctr1,
>
> gack, a global counter used by a callback!
>

should i have used a closure instead?

> this isn't even usable as a module. you hardwired it to be a standalone
> script. so how would posting this complex code be a useful FAQ answer?
>

my code wasn't meant to be a submission for review, but rather a
conceptual demonstration.

>   >> well, if your answer covered all of those questions (and possibly more
>   >> as i haven't delved into this much) and showed working code, maybe. but
>   >> pointing to a working module is better and easier for almost all FAQ
>   >> answers. the excuse of not wanting/able to install modules is usually
>   >> bogus as there are many ways to install modules on almost any platform.
>
>   nc> There are certainly ways of installing modules on machines by
>   nc> circumventing the "process", but this can be risky, and it can be very
>   nc> difficult to propagate the modules to prod machines.
>
> installing your own code is not any easier/harder than installing pure
> perl modules on production machines. you can even cut/paste the module
> code into your scripts if that is the only way. as i said, there are
> almost no good excuses for not using a module.
>

cut/paste module code into my scripts? that doesn't seem like an
elegant solution, and what about pre-reqs?

A. Sinan Unur

unread,
May 22, 2008, 3:03:15 PM5/22/08
to
nolo contendere <simon...@fmr.com> wrote in
news:c488ee9d-a5e5-4fd7...@2g2000hsn.googlegroups.com:

It is elegant enough. You can even automate the process of constructing
the script to ship. Do the same for all modules which your script
depends on.

Personally, I would much prefer packaging a lib subdirectory with the
script.

In any case, the point Uri is making is simple. There is really no good
excuse not to use CPAN modules. Sure XS code requires access to a
compiler on the target machine but using CPAN modules does not require
you to install in system directories.

Sinan

--
A. Sinan Unur <1u...@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/

Jim Gibson

unread,
May 22, 2008, 3:06:40 PM5/22/08
to
In article
<7d5cdff9-da1a-4877...@k37g2000hsf.googlegroups.com>,
nolo contendere <simon...@fmr.com> wrote:

I happen to agree that the FAQ on this topic is a little deficient by
only recommending a Tie'd solution. Tie is one of those things that I
avoid because I don't fully understand it, and I suspect it may cause
problems.

I would suggest two other alternatives for maintaining the order of
hash keys that might be simpler than your suggestion:

1. Prepend a fixed number of characters to each key, and strip them off
when retrieving the key (e.g. '00key1', '01key2', etc.).

2. Save the keys in a separate array and iterate over the array. This
is the simplest approach at the expense of duplicate storage of the
keys.

--
Jim Gibson

Uri Guttman

unread,
May 22, 2008, 3:12:10 PM5/22/08
to
>>>>> "nc" == nolo contendere <simon...@fmr.com> writes:

>> an example of what??

nc> An example of my initial proposal of a substitute for Tie::IxHash that
nc> maintains the insertion order of elements into a hash.

>>   nc> my ( $tree1, $tree2 ) = ( '', '' );
>>
>> useless initialization.
>>

nc> yeah, if i called the script with no options, i got the uninitialized
nc> value warning.

you should check if they were set after the GetOptions call. if not,
then you die or set them to a default like . (doable in GetOptions).

>>   nc> GetOptions( "tree1=s" => \$tree1,
>>   nc>             "tree2=s" => \$tree2 ) or die $USAGE;
>>

>>   nc> find( \&get_tgt_info, $tree2 );
>>
>>   nc> for my $key ( sort { $src{$a}{sortval} <=> $src{$b}{sortval} } keys
>>   nc> %src ) {
>>
>> now that i see this is a dirtree compare (btw, unix diff -r can do this
>> for you), i still don't see the need to keep find order. the order of
>> entries in directories is not stable as it depends on the file system,
>> any deletions and inserts and how file::find reads the dirs. you have a
>> single level hash of each tree so where would insert order matter?
>>

nc> diff -r looks great! wasn't aware of that one...
nc> Well, the need isn't there. But this was just supposed to illustrate
nc> how i set the sortval value, as you asked earlier in the thread.

but you haven't answered my latest question, why do you need insert
order here? i pointed out that dir scanning order is meaniningless (it
isn't stable or predictable) and you use a single level hash too. why
waste all the effort on maintaining a sort order if you never needed it
to begin with?

>>   nc> sub get_src_info {
>>
>>   nc>     my ( $mode, $uid, $gid, $size, $mtime ) =
>>   nc> (stat( $File::Find::name ))[2,4,5,7,9];
>>   nc>     $src{$File::Find::name} = {
>>   nc>         mode    => $mode,
>>   nc>         uid     => $uid,
>>   nc>         gid     => $gid,
>>   nc>         size    => $size,
>>   nc>         mtime   => $mtime,
>>   nc>         sortval => ++$ctr1,
>>
>> gack, a global counter used by a callback!
>>

nc> should i have used a closure instead?

if you wanted it to be a proper module, yes. this is another advantage
of the cpan module, it can be used multiple times in a program whereas
your example is hardwired to one use. also the cpan module is meant to
be tied to a hash which is very different than your solution.

>> this isn't even usable as a module. you hardwired it to be a standalone
>> script. so how would posting this complex code be a useful FAQ answer?
>>

nc> my code wasn't meant to be a submission for review, but rather a
nc> conceptual demonstration.

still, i review code when i see it and feel like it. :)

nc> cut/paste module code into my scripts? that doesn't seem like an
nc> elegant solution, and what about pre-reqs?

why not? better than cutting/pasting your own code which may nor work
nor do all you want it to do. it may be a last resort and i don't
recommend it but it works. as for prereqs, install them the same way. if
you need code from cpan then you do what is necessary. if you can write
your own code then do it. i just say that your example is not a good one
as the ordering isn't even needed and your code is too hardwired for one
usage to be a good example. the FAQ needs quality answers and the cpan
module is one.

nolo contendere

unread,
May 22, 2008, 3:24:53 PM5/22/08
to
On May 22, 3:03 pm, "A. Sinan Unur" <1...@llenroc.ude.invalid> wrote:

> nolo contendere <simon.c...@fmr.com> wrote innews:c488ee9d-a5e5-4fd7...@2g2000hsn.googlegroups.com:
>
> > On May 22, 2:24 pm, Uri Guttman <u...@stemsystems.com> wrote:
> >> installing your own code is not any easier/harder than installing
> >> pure perl modules on production machines. you can even cut/paste the
> >> module code into your scripts if that is the only way. as i said,
> >> there are almost no good excuses for not using a module.
>
> > cut/paste module code into my scripts? that doesn't seem like an
> > elegant solution, and what about pre-reqs?
>
> It is elegant enough. You can even automate the process of constructing
> the script to ship. Do the same for all modules which your script
> depends on.
>
> Personally, I would much prefer packaging a lib subdirectory with the
> script.

Good point.

nolo contendere

unread,
May 22, 2008, 3:28:36 PM5/22/08
to
On May 22, 3:06 pm, Jim Gibson <jimsgib...@gmail.com> wrote:
> In article
> <7d5cdff9-da1a-4877-932e-68d6583be...@k37g2000hsf.googlegroups.com>,

I've done something similar before, creating a composite key, with a
delimiter (like '|'), then I split the key and sort the sortval
component of the composite key. Problem here though is that you lose
the 'uniquifying' property of hashes for the original key if you are
attaching a unique counter to the original key.

>
> 2. Save the keys in a separate array and iterate over the array. This
> is the simplest approach at the expense of duplicate storage of the
> keys.
>

I like this solution. A little redundancy, but not much can go wrong
with it. Thanks!

nolo contendere

unread,
May 22, 2008, 3:34:30 PM5/22/08
to
On May 22, 3:12 pm, Uri Guttman <u...@stemsystems.com> wrote:

> >>>>> "nc" == nolo contendere <simon.c...@fmr.com> writes:
>
>   >> an example of what??
>
>   nc> An example of my initial proposal of a substitute for Tie::IxHash that
>   nc> maintains the insertion order of elements into a hash.
>
>   >>   nc> my ( $tree1, $tree2 ) = ( '', '' );
>   >>
>   >> useless initialization.
>   >>
>
>   nc> yeah, if i called the script with no options, i got the uninitialized
>   nc> value warning.
>
> you should check if they were set after the GetOptions call. if not,
> then you die or set them to a default like . (doable in GetOptions).
>

True.

>   >>   nc> GetOptions( "tree1=s" => \$tree1,
>   >>   nc>             "tree2=s" => \$tree2 ) or die $USAGE;
>   >>
>   >>   nc> find( \&get_tgt_info, $tree2 );
>   >>
>   >>   nc> for my $key ( sort { $src{$a}{sortval} <=> $src{$b}{sortval} } keys
>   >>   nc> %src ) {
>   >>
>   >> now that i see this is a dirtree compare (btw, unix diff -r can do this
>   >> for you), i still don't see the need to keep find order. the order of
>   >> entries in directories is not stable as it depends on the file system,
>   >> any deletions and inserts and how file::find reads the dirs. you have a
>   >> single level hash of each tree so where would insert order matter?
>   >>
>
>   nc> diff -r looks great! wasn't aware of that one...
>   nc> Well, the need isn't there. But this was just supposed to illustrate
>   nc> how i set the sortval value, as you asked earlier in the thread.
>
> but you haven't answered my latest question, why do you need insert
> order here? i pointed out that dir scanning order is meaniningless (it
> isn't stable or predictable) and you use a single level hash too. why
> waste all the effort on maintaining a sort order if you never needed it
> to begin with?
>

The need isn't there, as I said before. I thought I might have a use
for it, then discovered I didn't, but left the "sorting" code there
just in case. But, if, as you say, it's unreliable, it's kind of moot.

>   >>   nc> sub get_src_info {
>   >>
>   >>   nc>     my ( $mode, $uid, $gid, $size, $mtime ) =
>   >>   nc> (stat( $File::Find::name ))[2,4,5,7,9];
>   >>   nc>     $src{$File::Find::name} = {
>   >>   nc>         mode    => $mode,
>   >>   nc>         uid     => $uid,
>   >>   nc>         gid     => $gid,
>   >>   nc>         size    => $size,
>   >>   nc>         mtime   => $mtime,
>   >>   nc>         sortval => ++$ctr1,
>   >>
>   >> gack, a global counter used by a callback!
>   >>
>
>   nc> should i have used a closure instead?
>
> if you wanted it to be a proper module, yes. this is another advantage
> of the cpan module, it can be used multiple times in a program whereas
> your example is hardwired to one use. also the cpan module is meant to
> be tied to a hash which is very different than your solution.
>
>   >> this isn't even usable as a module. you hardwired it to be a standalone
>   >> script. so how would posting this complex code be a useful FAQ answer?
>   >>
>
>   nc> my code wasn't meant to be a submission for review, but rather a
>   nc> conceptual demonstration.
>
> still, i review code when i see it and feel like it. :)
>

Guess I can't stop you :-).

>   nc> cut/paste module code into my scripts? that doesn't seem like an
>   nc> elegant solution, and what about pre-reqs?
>
> why not? better than cutting/pasting your own code which may nor work
> nor do all you want it to do. it may be a last resort and i don't
> recommend it but it works. as for prereqs, install them the same way. if
> you need code from cpan then you do what is necessary. if you can write
> your own code then do it. i just say that your example is not a good one
> as the ordering isn't even needed and your code is too hardwired for one
> usage to be a good example. the FAQ needs quality answers and the cpan
> module is one.
>

I hadn't thought of packaging up the code in a lib dir, as Sinan
suggested. That's probably what you meant earlier, but I missed it.
Good point.

Uri Guttman

unread,
May 22, 2008, 3:34:38 PM5/22/08
to
>>>>> "nc" == nolo contendere <simon...@fmr.com> writes:

>> 2. Save the keys in a separate array and iterate over the array. This
>> is the simplest approach at the expense of duplicate storage of the
>> keys.
>>

nc> I like this solution. A little redundancy, but not much can go wrong
nc> with it. Thanks!

that is pretty much what the tie module does IIRC. plenty can still go
wrong with it and that is that the 'order' is still affected by deletes
and later additions. if you repeatedly add and delete you can fill up
the array and waste lots of storage. accessing a large hash by sorting a
subkey is slow and has to be done again each time you modify the
hash. my point is that order and hash are not compatible. there will
always be some aspect that will be exposed to the coder or some
compromise made. a hash has no order for its keys and forcing some order
means you don't really have a hash anymore, just something with hashlike
behavior.

and you still haven't answered WHY you need ordering in the example
code. i don't see how you benefit from the dir tree scan order.

Jürgen Exner

unread,
May 22, 2008, 3:50:16 PM5/22/08
to
nolo contendere <simon...@fmr.com> wrote:
>However, regarding the problem of maintaining sort order of a hash,
>and the Tie::IxHash module, I have a question.

Maybe I am old-fashioned but to me 'sorted' and 'hash' is a
contradiction in terms. A hash is a (partial) mapping from string to
scalar and mappings do not have a sequence.
Arrays are different because their domain (natural numbers) does have a
natural order, therefore arrays inherit this order.

If you are trying to force an order on a hash then probably you are
using the wrong data structure in the first place and you would be
better off using an array.

Just my 2 cents.

jue

nolo contendere

unread,
May 22, 2008, 4:04:55 PM5/22/08
to
On May 22, 3:50 pm, Jürgen Exner <jurge...@hotmail.com> wrote:

I understand that in many cases people who bring up ordered hashes
would be better served by using another data structure, but it seems
that enough people have asked about this, and it is useful enough to
have a module written for it on CPAN, and be included in the FAQ. Now
before you argue that just because lots of people ask about it and it
is a module doesn't mean that it's useful, don't forget my last caveat
that it is in the FAQ. Lots of people ask about variable variable
names too (symrefs, whatever you want to call them), but they are much
more harmful than useful, and their use is discouraged by the majority
of knowledgeable Perl hackers out there, and there is no FAQ
encouraging the use of a module that manages symrefs.

Also, I don't really see the contradiction of the terms 'sorted' and
'hash'. Just because you associate a key with a value, why can't you
want those pairs ordered?

s...@netherlands.co

unread,
May 22, 2008, 4:22:28 PM5/22/08
to

Why is it so important to have a hash sorted?
What would a hash be necessary for then?

Wouldn't an array be better in that case?

nolo contendere

unread,
May 22, 2008, 4:26:57 PM5/22/08
to
On May 22, 4:22 pm, s...@netherlands.co wrote:

One thing that pops into my head is automatic real-time uniqueness of
elements, another is O(1) lookup.
It's the best of both arrays and hashes.

Uri Guttman

unread,
May 22, 2008, 5:06:02 PM5/22/08
to
>>>>> "nc" == nolo contendere <simon...@fmr.com> writes:

nc> I understand that in many cases people who bring up ordered hashes
nc> would be better served by using another data structure, but it seems
nc> that enough people have asked about this, and it is useful enough to
nc> have a module written for it on CPAN, and be included in the FAQ. Now
nc> before you argue that just because lots of people ask about it and it
nc> is a module doesn't mean that it's useful, don't forget my last caveat
nc> that it is in the FAQ. Lots of people ask about variable variable
nc> names too (symrefs, whatever you want to call them), but they are much
nc> more harmful than useful, and their use is discouraged by the majority
nc> of knowledgeable Perl hackers out there, and there is no FAQ
nc> encouraging the use of a module that manages symrefs.

nc> Also, I don't really see the contradiction of the terms 'sorted' and
nc> 'hash'. Just because you associate a key with a value, why can't you
nc> want those pairs ordered?

in another post i explained why sort and hash aren't compatible. you can
hack something together that sorta works but it will always have some
flaw or problem. you may not even run into those problems with any
particular use but when you scale things up stuff will break. hashes
autoresize as needed with insert/delete. arrays do not and neither do
counters. so scaling can cause issues. sorting a special subkey means
you have more overhead in hash accesses by the 'order'. normal key
access with keys() is O(1) so that is another change. it may not
matter. it may if you expect hashes to be very fast. order hashes are
not hashes but a subtly different beast. that is the point we are trying
make. sure you can do it but don't expect it to be a pure hash
anymore. caveat coder.

Uri Guttman

unread,
May 22, 2008, 5:09:23 PM5/22/08
to
>>>>> "nc" == nolo contendere <simon...@fmr.com> writes:

nc> One thing that pops into my head is automatic real-time uniqueness of
nc> elements, another is O(1) lookup.
nc> It's the best of both arrays and hashes.

but you don't get that. read my other post. you actually get worse than
either of those if you scale and do lots of adds/deletes.

nolo contendere

unread,
May 22, 2008, 5:50:23 PM5/22/08
to
On May 22, 5:06 pm, Uri Guttman <u...@stemsystems.com> wrote:

Caveat Coder? Is that an insult?

J. Gleixner

unread,
May 22, 2008, 6:18:42 PM5/22/08
to
nolo contendere wrote:
[...]

> Caveat Coder? Is that an insult?

Someone using a Latin term for their identity that doesn't know what
caveat means?

Oh, and no that's not an insult either. :-P

Sherman Pendley

unread,
May 22, 2008, 6:38:10 PM5/22/08
to
"A. Sinan Unur" <1u...@llenroc.ude.invalid> writes:

> In any case, the point Uri is making is simple. There is really no good
> excuse not to use CPAN modules.

There is, on the other hand, a seemingly endless supply of bad excuses
for not doing so. :-)

sherm--

--
My blog: http://shermspace.blogspot.com
Cocoa programming in Perl: http://camelbones.sourceforge.net

nolo contendere

unread,
May 22, 2008, 6:56:07 PM5/22/08
to
On May 22, 6:18 pm, "J. Gleixner" <glex_no-s...@qwest-spam-no.invalid>
wrote:

I know what caveat means, I was more shocked at the name-calling. I
asked what I thought was a reasonable question, cited the FAQ, posted
working code, answered questions, and yet got insulted. Wouldn't you
be a little incredulous? It was completely uncalled for, not
constructive, and downright rude. And I HAVE posted help here, so
according to Uri's logic, DO get a say.

Jeez!

szr

unread,
May 22, 2008, 7:02:43 PM5/22/08
to
Jürgen Exner wrote:
> nolo contendere <simon...@fmr.com> wrote:
>> However, regarding the problem of maintaining sort order of a hash,
>> and the Tie::IxHash module, I have a question.
[...]

> If you are trying to force an order on a hash then probably you are
> using the wrong data structure in the first place and you would be
> better off using an array.

If one wants to use a hash && maintain order, just keep an array of
(desired) @fields and use a construct along the lines of:

foreach my $data (@hash{@fields}) { ... }
foreach my $field (sort @fields) { my $data = $hash{$field}; ... }

Or:

my @data = @hash{@fields};

To sort in a particular order, one need only sort the @fields array.

--
szr


szr

unread,
May 22, 2008, 7:05:57 PM5/22/08
to
s...@netherlands.co wrote:
> On Thu, 22 May 2008 10:20:17 -0700 (PDT), nolo contendere
> <simon...@fmr.com> wrote:
[...]

>> Is there something wrong with my reasoning? Would this solution be a
>> good candidate for addition into the FAQ How can I make my hash
>> remember the order I put elements
>> into it?
>>
>
> Why is it so important to have a hash sorted?
> What would a hash be necessary for then?
>
> Wouldn't an array be better in that case?

You don't sort the hash. You sort an array containing fields/keys from
the hash:

my @fields = sort { ... } keys %hash;
...
foreach my $field (@fields) {
my $item = $hash{$field}
...
}

--
szr


nolo contendere

unread,
May 22, 2008, 7:20:22 PM5/22/08
to

you could just do:

for my $key ( sort keys %hash ) {
# do something with $hash{$key}
}

...but i was talking about maintaining the insertion order, not
sorting by alpha or number

A. Sinan Unur

unread,
May 22, 2008, 7:35:38 PM5/22/08
to
nolo contendere <simon...@gmail.com> wrote in news:acf382ba-21a0-40b0-
9db5-b0a...@m73g2000hsh.googlegroups.com:

> On May 22, 6:18 pm, "J. Gleixner" <glex_no-s...@qwest-spam-no.invalid>
> wrote:
>> nolo contendere wrote:
>>
>> [...]
>>
>> > Caveat Coder? Is that an insult?
>>
>> Someone using a Latin term for their identity that doesn't know what
>> caveat means?
>>
>> Oh, and no that's not an insult either. :-P
>
> I know what caveat means, I was more shocked at the name-calling. I
> asked what I thought was a reasonable question, cited the FAQ, posted
> working code, answered questions, and yet got insulted. Wouldn't you
> be a little incredulous? It was completely uncalled for, not
> constructive, and downright rude.

I am sorry, I fail to see the insult. The word caveat is simply latin
for "beware". So, "caveat coder" means is a warning to the coder (in
this case you) that things are not as straightforward as they seem and
some of your statements are not necessarily valid.

Looking at it in context, Uri said:

sure you can do it but don't expect it to
be a pure hash anymore. caveat coder.

There is no insult. None. Zero. Zilch. Nil.
http://en.wiktionary.org/wiki/s%C4%B1f%C4%B1r.

> And I HAVE posted help here, so according to Uri's logic,
> DO get a say.
>
> Jeez!

Jeez indeed. You do have a say. You made some claims of dubious validity
and posted some code of dubious utility. These issues were discussed
rigorously. I cannot find anything in this thread that could be
construed as an insult to you.

nolo contendere

unread,
May 22, 2008, 7:49:27 PM5/22/08
to
On May 22, 7:35 pm, "A. Sinan Unur" <1...@llenroc.ude.invalid> wrote:
> nolo contendere <simon.c...@gmail.com> wrote in news:acf382ba-21a0-40b0-
> 9db5-b0a809f5d...@m73g2000hsh.googlegroups.com:

>
>
>
> > On May 22, 6:18 pm, "J. Gleixner" <glex_no-s...@qwest-spam-no.invalid>
> > wrote:
> >> nolo contendere wrote:
>
> >> [...]
>
> >> > Caveat Coder? Is that an insult?
>
> >> Someone using a Latin term for their identity that doesn't know what
> >> caveat means?
>
> >> Oh, and no that's not an insult either. :-P
>
> > I know what caveat means, I was more shocked at the name-calling. I
> > asked what I thought was a reasonable question, cited the FAQ, posted
> > working code, answered questions, and yet got insulted. Wouldn't you
> > be a little incredulous? It was completely uncalled for, not
> > constructive, and downright rude.
>
> I am sorry, I fail to see the insult. The word caveat is simply latin
> for "beware". So, "caveat coder" means is a warning to the coder (in
> this case you) that things are not as straightforward as they seem and
> some of your statements are not necessarily valid.
>
> Looking at it in context, Uri said:
>
>         sure you can do it but don't expect it to
>         be a pure hash anymore. caveat coder.
>
> There is no insult. None. Zero. Zilch. Nil.http://en.wiktionary.org/wiki/s%C4%B1f%C4%B1r.

>
> > And I HAVE posted help here, so according to Uri's logic,
> > DO get a say.
>
> > Jeez!
>
> Jeez indeed. You do have a say. You made some claims of dubious validity
> and posted some code of dubious utility. These issues were discussed
> rigorously. I cannot find anything in this thread that could be
> construed as an insult to you.
>

Ahh, I do think you're right. I interpreted 'caveat' as an adjective
(with a pejorative undertone), when I think Uri may have meant it as
an imperative verb. Apologies for being overly sensitive, Uri.

s...@netherlands.co

unread,
May 22, 2008, 8:09:03 PM5/22/08
to

I see you can pass parameters as arrays and read it as a hash.
I see you can pass parameters as hash and read it as array.

Is that true? If so, why is it important to sort hash?

A. Sinan Unur

unread,
May 22, 2008, 8:09:13 PM5/22/08
to
nolo contendere <simon...@gmail.com> wrote in
news:bd0a63b4-2604-4259...@p25g2000hsf.googlegroups.com:

> On May 22, 7:35 pm, "A. Sinan Unur" <1...@llenroc.ude.invalid> wrote:
>> nolo contendere <simon.c...@gmail.com> wrote in
>> news:acf382ba-21a0-40b0-
>> 9db5-b0a809f5d...@m73g2000hsh.googlegroups.com:
>>
>>
>>
>> > On May 22, 6:18 pm, "J. Gleixner"
>> > <glex_no-s...@qwest-spam-no.invalid>
>
>> > wrote:
>> >> nolo contendere wrote:
>>
>> >> [...]
>>
>> >> > Caveat Coder? Is that an insult?

...

>> > I know what caveat means, I was more shocked at the name-calling.

...

>> I am sorry, I fail to see the insult. The word caveat is simply latin
>> for "beware". So, "caveat coder" means is a warning to the coder (in
>> this case you) that things are not as straightforward as they seem
>> and some of your statements are not necessarily valid.
>>
>> Looking at it in context, Uri said:
>>
>>         sure you can do it but don't expect it to
>>         be a pure hash anymore. caveat coder.

...

> Ahh, I do think you're right. I interpreted 'caveat' as an adjective
> (with a pejorative undertone),

Say you tell me you are going to go to Turkey and you want to go
shopping for rugs. My response to you would be "caveat emptor". That is
a warning to you to be cautious in your dealings rather than an insult.

As far as "ordered" hashes are concerned, I feel they only slightly less
dangerous than Turkish carpet salesmen. ;-)

> when I think Uri may have meant it as an imperative verb.

That is the only apparent interpretation to me.

See, when my code is reviewed by Uri, I feel good because I know I have
just received advice that I could not have found in my immediate circle.

The same valuation applies to the comments of many others here even when
they point out embarassing errors I have made.

> Apologies for being overly sensitive, Uri.

I would like to suggest that you avoid being so sensitive to serious
criticism of your code and ideas and distinguish better among those who
provide useful input from the rest of the crowd. ;-)

Michael Carman

unread,
May 22, 2008, 9:44:34 PM5/22/08
to
Jim Gibson wrote:
>
> Tie is one of those things that I avoid because I don't fully
> understand it

The only thing you need to know about tie() to use Tie:: modules is that
tied variables behaves exactly like a normal variables but have some
extra functionality provided by the class. In the case of Tie::IxHash
that functionality is that keys(), values() and each() return data in
insertion order instead of hash order.

Obviously you need to know more if you want to write your own tie
classes. I suggest doing so. It's probably not a tool that you'll use
often (at least I don't) but when it's extremely useful when you do.

> I suspect it may cause problems.

What is this, superstition? What kind of problems? Tied variables are
definitely slower than non-tied ones -- that's the price of the power --
but the performance penalty isn't significant in most cases. Other
problems are possible, but it's always possible to have bugs in code.
There's nothing special about tie() in that regard.

-mjc

Jürgen Exner

unread,
May 22, 2008, 10:33:14 PM5/22/08
to
nolo contendere <simon...@fmr.com> wrote:
[snipping many good points]

>Also, I don't really see the contradiction of the terms 'sorted' and
>'hash'. Just because you associate a key with a value, why can't you
>want those pairs ordered?

True, but I can just as well want to order some other complex data type,
be it employee records or customer orders or US presidents by age when
inaugurated. Or even just primitive data like a set of numbers. To get
any of those into a specific order I put them into a datastructure that
has a first and a next element and conveniently maybe last, previous,
n-th, etc.
If I want a ordered sequence of pairs then an AoA would be the right
data structure, just like maybe a AoH would be the right datastructure
for employee data and the customer orders where each hash represents one
employee record or order.

Hashes don't have those operations. There is no natural 'first' or
'next' element of a hash. Which one would that be? The one that was
added first? The one with the smallest alphanumerical key? The one with
the shortest key? The one with the smallest numerical value? There is
simply no meaningful definition of 'first element of a hash'.

Taking literally sorting a hash is about as meaningful as sorting the
the lenth() function. Which one is the next element after
length('Hello')? The question itself does not make sense. Just the same
way there is no way to say what is the next element after
$myhash{'Hello'}. In contrast the next element after $myarr[5] is
obviously $myarr[6].

Therefore if you want to create some ordering of hash elements it must
be done using an additional datastructue, be it explicit (like your
suggestion with the explicit sortval) or implicit (like Tie::IxHash). In
either case you are creating an additional datastructure on top of the
hash to support that ordering. And therefore it is not a simple hash any
longer.
And that in turn is why I keep saying sorting and hash is a
contradiction. It is conceptually impossible without an additional data
structure that supports 'first' and 'next'.

jue

Uri Guttman

unread,
May 22, 2008, 10:39:09 PM5/22/08
to
>>>>> "nc" == nolo contendere <simon...@gmail.com> writes:

>> Looking at it in context, Uri said:
>>
>>         sure you can do it but don't expect it to
>>         be a pure hash anymore. caveat coder.
>>
>> There is no insult. None. Zero. Zilch. Nil.http://en.wiktionary.org/wiki/s%C4%B1f%C4%B1r.

nc> Ahh, I do think you're right. I interpreted 'caveat' as an adjective
nc> (with a pejorative undertone), when I think Uri may have meant it as
nc> an imperative verb. Apologies for being overly sensitive, Uri.

you never heard the phrase caveat emptor (buyer beware)? i just made a
variation i have seen before. like sinan said it just means if you do
make an ordered hash, you must beware of the possible issues. nothing
insulting was meant nor said.

and this thread has been perfectly normal. real perl issues discussed
calmly and in some detail. you haven't done anything wrong other than
keeping on the idea that sorted hashes are a good thing. :)

Jürgen Exner

unread,
May 22, 2008, 10:39:53 PM5/22/08
to

Why would a warning "coder beware" be an insult?

jue

nolo contendere

unread,
May 22, 2008, 10:51:17 PM5/22/08
to
On May 22, 10:39 pm, Jürgen Exner <jurge...@hotmail.com> wrote:

> nolo contendere <simon.c...@fmr.com> wrote:
> >On May 22, 5:06 pm, Uri Guttman <u...@stemsystems.com> wrote:
> >>  caveat coder.
>
> >Caveat Coder? Is that an insult?
>
> Why would a warning "coder beware" be an insult?

It isn't. My initial (erroneous) interpretation was that Uri was
calling me a "Caveat Coder"--a term I had never heard before, but took
to mean that I made up excuses for employing bad coding practices and
called them good coding practices. I misinterpreted the context.
Dealing with Perl, the irony is not lost on me. :-\. Fortunately,
everyone seems to be good-natured and forgiving about my faux pas (me
making an ass out of myself).

szr

unread,
May 23, 2008, 1:33:23 AM5/23/08
to

Ah. Well you could still use the @fields array. Just push each key you
insert but that isn't too intuitive. I believe this is why Tie::IxHash
was invented :-)

--
szr


Dr.Ruud

unread,
May 23, 2008, 4:35:32 AM5/23/08
to
Jürgen Exner schreef:

> Which one is the next element after length('Hello')?

Well, length('World') of course.

--
Affijn, Ruud

"Gewoon is een tijger."

Jürgen Exner

unread,
May 23, 2008, 8:32:58 AM5/23/08
to
"Dr.Ruud" <rvtol...@isolution.nl> wrote:
>Jürgen Exner schreef:
>> Which one is the next element after length('Hello')?
>
>Well, length('World') of course.

:-))

jue

comp.llang.perl.moderated

unread,
May 24, 2008, 7:12:24 AM5/24/08
to
On May 22, 12:50 pm, Jürgen Exner <jurge...@hotmail.com> wrote:
I agree mostly. But, even stretching the hash's natural model a bit,
an ordered hash is a convenient amenity at times. Also, with a big
array and lots of lookups, even a slow tied hash could be faster.

--
Charles DeRykus

Ben Morrow

unread,
May 24, 2008, 12:42:35 PM5/24/08
to

Quoth "comp.llang.perl.moderated" <c...@blv-sam-01.ca.boeing.com>:
>
[re: Tie::IxHash]

> I agree mostly. But, even stretching the hash's natural model a bit,
> an ordered hash is a convenient amenity at times. Also, with a big
> array and lots of lookups, even a slow tied hash could be faster.

Faster than...? Just an array, or an array and a hash in parallel? All
Tie::IxHash does for you is maintain a parallel hash and array, so the
tie overhead (not insignificant, if this is a performance-critical part
of the application) is on top of that.

Of course, attempting to emulate a hash by iterating through an array is
just daft... :)

Ben

--
The cosmos, at best, is like a rubbish heap scattered at random.
Heraclitus
b...@morrow.me.uk

comp.llang.perl.moderated

unread,
May 25, 2008, 12:42:32 AM5/25/08
to
On May 24, 9:42 am, Ben Morrow <b...@morrow.me.uk> wrote:
> Quoth "comp.llang.perl.moderated" <c...@blv-sam-01.ca.boeing.com>:
>
>
>
> [re: Tie::IxHash]
> > I agree mostly. But, even stretching the hash's natural model a bit,
> > an ordered hash is a convenient amenity at times. Also, with a big
> > array and lots of lookups, even a slow tied hash could be faster.
>
> Faster than...? Just an array, or an array and a hash in parallel? All
> Tie::IxHash does for you is maintain a parallel hash and array, so the
> tie overhead (not insignificant, if this is a performance-critical part
> of the application) is on top of that.
>
> Of course, attempting to emulate a hash by iterating through an array is
> just daft... :)
>

Quite. I wasn't sure if "hash purity" was turning down a daft
corridor or not :)
I have to admit I was once bitten by the
significant overhead of Tie::IxHash, but,
there are times when you need hash insertion order. Rolling your own
separate array in lieu
of Tie::IxHash may be worth looking at if you
need to squeeze out more speed though.

--
Charles DeRykus

--
Charles DeRykus

Martijn Lievaart

unread,
May 25, 2008, 5:03:53 PM5/25/08
to
On Thu, 22 May 2008 19:50:16 +0000, Jürgen Exner wrote:

> nolo contendere <simon...@fmr.com> wrote:
>>However, regarding the problem of maintaining sort order of a hash, and
>>the Tie::IxHash module, I have a question.
>
> Maybe I am old-fashioned but to me 'sorted' and 'hash' is a
> contradiction in terms. A hash is a (partial) mapping from string to
> scalar and mappings do not have a sequence. Arrays are different because
> their domain (natural numbers) does have a natural order, therefore
> arrays inherit this order.

Perl has a relatively limited set of data structures. As far as mapping
structures, Perl only provides a hash.

There are many applications for a mapping structure that maintains order
(although insertion order is better done by an array).

In C++ you can use a map<>. One of the arguments is the sorting order (a
less-than routine, taking two keys). In fact, in standard C++, a Perl
like hash structure is missing, but is provided by f.i. the boost library
and many vendors. This is widely seen as a deficiency in the standard.

Similarly, although Perl can simulate most data structures with ease and
elegance (like using a hash as a set), an ordered map is something Perl
could very well use.

A real world example? Storing IP(v4 only) networks and checking if an IPA
is contained in a network stored in the map. Trivial with a sorted map
(store networks as 5 byte integers, 4 for the network address 1 for the
mask, search first key for less-than-or-equal ipa/32, check if ipa in
network found).

Probably not to difficult to do in Perl as well, but I cannot come up
directly with an easy and efficient (both in time and memory) way like
the above.

M4

0 new messages