regex to clean path

Yuri Shtil

unread,

Oct 18, 2004, 3:34:40 PM10/18/04

to

I need to clean the PATH variable from redundant entries:

a:b:c:c:c:s:a
should become a:b:v:s.

I have written a few line script that does it using hash.

Does anybody know an elegant oneliner using regex?

Jon Ericson

unread,

Oct 18, 2004, 4:39:53 PM10/18/04

to

"Yuri Shtil" <ysh...@synopsys.com> writes:

> I need to clean the PATH variable from redundant entries:
>
> a:b:c:c:c:s:a
> should become a:b:v:s.

I think you meant a:b:c:s.

> I have written a few line script that does it using hash.
>
> Does anybody know an elegant oneliner using regex?

$ perl -e '%h = map {$_ => 1} split(/:/, "a:b:c:c:c:s:a");\
print join(":", sort keys %h), "\n"'

Finding the regex is left as an exercise for the reader. ;-) Sorting
alphabeticaly might not be what you want.

Seriously, the regex language probably is *not* the best choice for a
problem like this. What's wrong with using a hash?

Jon

Michele Dondi

unread,

Oct 18, 2004, 5:03:27 PM10/18/04

to

On Mon, 18 Oct 2004 12:34:40 -0700, "Yuri Shtil" <ysh...@synopsys.com>
wrote:

>I need to clean the PATH variable from redundant entries:
>
>a:b:c:c:c:s:a
>should become a:b:v:s.
>
>I have written a few line script that does it using hash.

It should/could be a one-line script except possibly for the hash
declaration. Anyway, what's wrong with it?

>Does anybody know an elegant oneliner using regex?

It's not clear what you mean by "using regex": assuming you refer to a
substitution operator... I just wouldn't do it, but if i really have
to, then

s/(^|:)([^:]*)/$seen{$2}++?'':$1.$2/ge;

Of course if you use this in anything that is not a simple one liner
and C<use strict> as one always should, then you must also have
C<my %seen>.

HTH,
Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
.'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,

Tad McClellan

unread,

Oct 18, 2004, 5:11:36 PM10/18/04

to

Yuri Shtil <ysh...@synopsys.com> wrote:
> I need to clean the PATH variable from redundant entries:
>
> a:b:c:c:c:s:a
> should become a:b:v:s.
>
> I have written a few line script that does it using hash.

If you had shown it to use, we could have suggested improvements.

But you didn't, so we can't.

> Does anybody know an elegant oneliner using regex?

A regex is not likely the best tool for this job.

I'd just to it using a hash:

$_ = join ':', grep !$seen{$_}++, split /:/;

--
Tad McClellan SGML consulting
ta...@augustmail.com Perl programming
Fort Worth, Texas

parv

unread,

Oct 18, 2004, 5:45:22 PM10/18/04

to

in message <10n86m9...@corp.supernews.com>,
wrote Yuri Shtil ...

No regex, just split(); on top of that, in more than one line ...

#!/usr/local/bin/perl

use warnings;
use strict;

my @paths = @ARGV;
printf "Unordered: %s\n\nOrdered: %s\n"
, ${ make_path( @paths ) }
, ${ make_path_ordered( @paths ) }
;

# Make path string from given list of array references or strings
sub make_path
{ my @paths = @_;

my %uniq;
map $uniq{$_} = undef , @{ split_path( [@paths] ) };
keep_usable(\%uniq);

return delimit_path( [ keys %uniq ] ) ;
}

# Make ordered path -- as indicated by the first position of each unique
# path -- string from given list of array references or just plain
# strings
sub make_path_ordered
{ my @paths = @_;

my %uniq;
{ my $i = 0;
map $uniq{$_} = !exists $uniq{$_} ? $i++ : $uniq{$_}
, @{ split_path( [ @paths ]) };
}
keep_usable(\%uniq);

return
delimit_path
( [ map $_->[0]
, sort { $a->[1] <=> $b->[1] }
map [$_ , $uniq{$_} ] , keys %uniq
]
) ;
}

# Make ':' delimited string from given path array reference
sub delimit_path
{ my ($paths) = @_;
return \join ':' , @{$paths};
}

# Input is a list of paths, each element being an array reference or a
# string
sub split_path
{ my ($paths) = @_;
return
[ map split( /[:\s]/ , ref $_ ? @$_ : $_ ) , @{$paths} ];
}

# Keep only read/execute-able directories from the keys of given hash
# reference
sub keep_usable
{ my ($hash) = @_;
foreach ( keys %{$hash} )
{ delete $hash->{$_} unless -d $_ || -x $_;
}
}

__END__

- parv

--
As nice it is to receive personal mail, too much sweetness causes
tooth decay. Unless you have burning desire to contact me, do not do
away w/ WhereElse in the address for private communication.

Uri Guttman

unread,

Oct 18, 2004, 6:49:57 PM10/18/04

to

>>>>> "TM" == Tad McClellan <ta...@augustmail.com> writes:

TM> Yuri Shtil <ysh...@synopsys.com> wrote:
>> I need to clean the PATH variable from redundant entries:
>>
>> a:b:c:c:c:s:a
>> should become a:b:v:s.
>>
>> I have written a few line script that does it using hash.

>> Does anybody know an elegant oneliner using regex?

TM> A regex is not likely the best tool for this job.

while i agree a hash is the best way it does need some ordering control
as this is a path. so here is a regex solution.

echo 'a:b:b:c:d:c:s:a' | perl -lpe '1 while s/([^:]+)(.*):\1/$1$2/g'
a:b:c:d:s

and that isn't fully tested but it looks ok so far. it could use an
improvement to remove the 1 while part but i leave that as an exercise

uri

Abigail

unread,

Oct 18, 2004, 8:20:52 PM10/18/04

to

Uri Guttman (ugut...@athenahealth.com) wrote on MMMMLXVI September
MCMXCIII in <URL:news:m3lle3m...@linux.local>:

`' >>>>> "TM" == Tad McClellan <ta...@augustmail.com> writes:
`'
`' TM> Yuri Shtil <ysh...@synopsys.com> wrote:
`' >> I need to clean the PATH variable from redundant entries:
`' >>
`' >> a:b:c:c:c:s:a
`' >> should become a:b:v:s.
`' >>
`' >> I have written a few line script that does it using hash.
`'
`' >> Does anybody know an elegant oneliner using regex?
`'
`'
`' TM> A regex is not likely the best tool for this job.
`'
`' while i agree a hash is the best way it does need some ordering control
`' as this is a path. so here is a regex solution.

The solution Tad gave,

$_ = join ':', grep !$seen{$_}++, split /:/;

does preserve order.

`' echo 'a:b:b:c:d:c:s:a' | perl -lpe '1 while s/([^:]+)(.*):\1/$1$2/g'

`' a:b:c:d:s
`'
`' and that isn't fully tested but it looks ok so far. it could use an
`' improvement to remove the 1 while part but i leave that as an exercise

It ain't ok though:

echo "/bin:/usr/bin:/usr/local/bin" |\

perl -lpe '1 while s/([^:]+)(.*):\1/$1$2/g'

/binusr/binusr/local/bin

Abigail
--
perl -le 's[$,][join$,,(split$,,($!=85))[(q[0006143730380126152532042307].
q[41342211132019313505])=~m[..]g]]e and y[yIbp][HJkP] and print'

Anno Siegel

unread,

Oct 19, 2004, 5:46:44 AM10/19/04

to

Jon Ericson <Jon.E...@jpl.nasa.gov> wrote in comp.lang.perl.misc:

> "Yuri Shtil" <ysh...@synopsys.com> writes:
>
> > I need to clean the PATH variable from redundant entries:
> >
> > a:b:c:c:c:s:a
> > should become a:b:v:s.
>
> I think you meant a:b:c:s.
>
> > I have written a few line script that does it using hash.
> >
> > Does anybody know an elegant oneliner using regex?
>
> $ perl -e '%h = map {$_ => 1} split(/:/, "a:b:c:c:c:s:a");\
> print join(":", sort keys %h), "\n"'
>
> Finding the regex is left as an exercise for the reader. ;-) Sorting
> alphabeticaly might not be what you want.

Indeed. The sequence of path elements is significant, so they should
appear in the cleaned-up path in the same order they had in the original.

Anno

Uri Guttman

unread,

Oct 19, 2004, 3:00:07 PM10/19/04

to

>>>>> "A" == Abigail <abi...@abigail.nl> writes:
A> Uri Guttman (ugut...@athenahealth.com) wrote on MMMMLXVI September
A> MCMXCIII in <URL:news:m3lle3m...@linux.local>:
A> The solution Tad gave,

A> $_ = join ':', grep !$seen{$_}++, split /:/;

A> does preserve order.

true. i was just in a regex mood :).

A> `' and that isn't fully tested but it looks ok so far. it could use an
A> `' improvement to remove the 1 while part but i leave that as an exercise

A> It ain't ok though:

A> echo "/bin:/usr/bin:/usr/local/bin" |\
A> perl -lpe '1 while s/([^:]+)(.*):\1/$1$2/g'
A> /binusr/binusr/local/bin

as i said, it needed more testing. this one passes your input:

echo '/bin:/usr/bin:/usr/local/bin:/bin' |
perl -lpe '1 while s/([^:]+)((:?:[^:]*)+):\1/$1$2/g'
/bin:/usr/bin:/usr/local/bin

echo 'a/b:/b:b:c:d:c:s:a/b' | perl -lpe '1 while s/([^:]+)((:?:[^:]*)+):\1/$1$2/g'
a/b:/b:c:d:s

seems much better but again, it could use more testing. :)

uri

Michele Dondi

unread,

Oct 19, 2004, 5:24:05 PM10/19/04

to

On Mon, 18 Oct 2004 18:49:57 -0400, Uri Guttman
<ugut...@athenahealth.com> wrote:

> TM> A regex is not likely the best tool for this job.
>
>while i agree a hash is the best way it does need some ordering control
>as this is a path. so here is a regex solution.

There's an obvious way of using a hash yet preserving order as per
Tad's solution. It can also be cast into a substitution as per my
previous post.

>echo 'a:b:b:c:d:c:s:a' | perl -lpe '1 while s/([^:]+)(.*):\1/$1$2/g'
>a:b:c:d:s

Is there anything wrong with my s///olution?

Michele Dondi

unread,

Oct 19, 2004, 5:40:51 PM10/19/04

to

On Mon, 18 Oct 2004 16:45:22 -0500, parv <pa...@yahooWhereElse.com>
wrote:

>> Does anybody know an elegant oneliner using regex?
>
>No regex, just split(); on top of that, in more than one line ...

[snip code]

Huh?!? 66 lines of code? I'm not even watching into it, but... I
heartily hope that it does *much* more than the OP requested...
;-)

> printf "Unordered: %s\n\nOrdered: %s\n"
> , ${ make_path( @paths ) }
> , ${ make_path_ordered( @paths ) }
> ;

Hmmm, I couldn't help giving a peek at least into the first few
lines... however this makes me think I should be moderately glad I'm
refusing to read it all!

Abigail

unread,

Oct 19, 2004, 5:49:50 PM10/19/04

to

Uri Guttman (ugut...@athenahealth.com) wrote on MMMMLXVII September
MCMXCIII in <URL:news:m3u0sql...@linux.local>:
$$ >>>>> "A" == Abigail <abi...@abigail.nl> writes:
$$ A> Uri Guttman (ugut...@athenahealth.com) wrote on MMMMLXVI September
$$ A> MCMXCIII in <URL:news:m3lle3m...@linux.local>:
$$ A> The solution Tad gave,
$$
$$ A> $_ = join ':', grep !$seen{$_}++, split /:/;
$$
$$ A> does preserve order.
$$
$$ true. i was just in a regex mood :).
$$
$$ A> `' and that isn't fully tested but it looks ok so far. it could use an
$$ A> `' improvement to remove the 1 while part but i leave that as an exercise
$$
$$ A> It ain't ok though:
$$
$$ A> echo "/bin:/usr/bin:/usr/local/bin" |\
$$ A> perl -lpe '1 while s/([^:]+)(.*):\1/$1$2/g'
$$ A> /binusr/binusr/local/bin
$$
$$ as i said, it needed more testing. this one passes your input:
$$
$$ echo '/bin:/usr/bin:/usr/local/bin:/bin' |
$$ perl -lpe '1 while s/([^:]+)((:?:[^:]*)+):\1/$1$2/g'
$$ /bin:/usr/bin:/usr/local/bin

Still fails:

$ echo /bin:/bin | perl -lpe '1 while s/([^:]+)((:?:[^:]*)+):\1/$1$2/g'
/bin:/bin
$ echo /bin:/bar:/bin/bar |\

perl -lpe '1 while s/([^:]+)((:?:[^:]*)+):\1/$1$2/g'

/bin:/bar/bar
$ echo 'poof:oof:of' | perl -lpe '1 while s/([^:]+)((:?:[^:]*)+):\1/$1$2/g'
poof:oof

Abigail
--
perl -we 'print split /(?=(.*))/s => "Just another Perl Hacker\n";'

Uri Guttman

unread,

Oct 19, 2004, 6:09:56 PM10/19/04

to

>>>>> "A" == Abigail <abi...@abigail.nl> writes:

A> $$
A> $$ echo '/bin:/usr/bin:/usr/local/bin:/bin' |
A> $$ perl -lpe '1 while s/([^:]+)((:?:[^:]*)+):\1/$1$2/g'
A> $$ /bin:/usr/bin:/usr/local/bin

A> Still fails:

A> $ echo /bin:/bin | perl -lpe '1 while s/([^:]+)((:?:[^:]*)+):\1/$1$2/g'
A> /bin:/bin
A> $ echo /bin:/bar:/bin/bar |\
A> perl -lpe '1 while s/([^:]+)((:?:[^:]*)+):\1/$1$2/g'
A> /bin:/bar/bar
A> $ echo 'poof:oof:of' | perl -lpe '1 while s/([^:]+)((:?:[^:]*)+):\1/$1$2/g'
A> poof:oof

bah!

why don't you fix it! :)

this fixes the first 2:

echo '/bin:/bin' | perl -lpe '1 while s/([^:]+)((:?:[^:]*)*):\1(:?=:|$)/$1$2/g'
/bin

echo '/bin:/bar:/bin/bar' | perl -lpe '1 while s/([^:]+)((:?:[^:]*)*):\1(:?=:|$)/$1$2/g'
/bin:/bar:/bin/bar

the poof:oof:of bug is trickier. i need something like perl6's : (a
commit) so it doesn't backtrack past a previous path.

uri

Abigail

unread,

Oct 19, 2004, 6:37:23 PM10/19/04

to

Uri Guttman (ugut...@athenahealth.com) wrote on MMMMLXVII September

MCMXCIII in <URL:news:m3ekjul...@linux.local>:

`' >>>>> "A" == Abigail <abi...@abigail.nl> writes:
`'
`' A> $$
`' A> $$ echo '/bin:/usr/bin:/usr/local/bin:/bin' |
`' A> $$ perl -lpe '1 while s/([^:]+)((:?:[^:]*)+):\1/$1$2/g'
`' A> $$ /bin:/usr/bin:/usr/local/bin
`'
`' A> Still fails:

`'
`' bah!

`'
`' why don't you fix it! :)

I like to see you sweat ;-)

`'
`' the poof:oof:of bug is trickier. i need something like perl6's : (a

`' commit) so it doesn't backtrack past a previous path.

Well, Perl5 has (?> ).

I think the following works:

#!/usr/bin/perl

use strict;
use warnings;
no warnings qw /syntax/;

while (<DATA>) {
chomp;
print "$_ -> ";
1 while s/(^|(?<=:))([^:]*)(?=:)(.*):\2(?=:|$)/$2$3/;
print "$_\n";
}

__DATA__

a:b:c:c:c:s:a

/bin:/usr/bin:/usr/local/bin
/bin:/usr/bin:/usr/local/bin:/bin
/bin:/usr/bin:/usr/local/bin:/bin:
/bin:/usr/bin:/usr/local/bin:/bin:/sbin
/bin:/bin
/bin:/bar:/bin/bar
/flup:/bin:/usr/bin:/usr/local/bin
/flup:/bin:/usr/bin:/usr/local/bin:/bin
/flup:/bin:/bin
/flup:/bin:/bar:/bin/bar
poof:oof:of

a:b:c:c:c:s:a -> a:b:c:s
/bin:/usr/bin:/usr/local/bin -> /bin:/usr/bin:/usr/local/bin
/bin:/usr/bin:/usr/local/bin:/bin -> /bin:/usr/bin:/usr/local/bin
/bin:/usr/bin:/usr/local/bin:/bin: -> /bin:/usr/bin:/usr/local/bin:
/bin:/usr/bin:/usr/local/bin:/bin:/sbin -> /bin:/usr/bin:/usr/local/bin:/sbin
/bin:/bin -> /bin
/bin:/bar:/bin/bar -> /bin:/bar:/bin/bar
/flup:/bin:/usr/bin:/usr/local/bin -> /flup:/bin:/usr/bin:/usr/local/bin
/flup:/bin:/usr/bin:/usr/local/bin:/bin -> /flup:/bin:/usr/bin:/usr/local/bin
/flup:/bin:/bin -> /flup:/bin
/flup:/bin:/bar:/bin/bar -> /flup:/bin:/bar:/bin/bar
poof:oof:of -> poof:oof:of

Abigail
--
perl -swleprint -- -_=Just\ another\ Perl\ Hacker

parv

unread,

Oct 20, 2004, 4:03:45 PM10/20/04

to

in message <el1bn0la63rr4td74...@4ax.com>,
wrote Michele Dondi ...

> On Mon, 18 Oct 2004 16:45:22 -0500, parv <pa...@yahooWhereElse.com>
> wrote:
>
>>> Does anybody know an elegant oneliner using regex?
>>
>>No regex, just split(); on top of that, in more than one line ...
>

> Huh?!? 66 lines of code? I'm not even watching into it, but... I

66. How did you get that number? (No, no need to reply.)

> heartily hope that it does *much* more than the OP requested...

If interested, read or try it yourself.

>> printf "Unordered: %s\n\nOrdered: %s\n"
>> , ${ make_path( @paths ) }
>> , ${ make_path_ordered( @paths ) }
>> ;
>
> Hmmm, I couldn't help giving a peek at least into the first few
> lines... however this makes me think I should be moderately glad
> I'm refusing to read it all!

Hey, whatever suits you, fine w/ me.

Michele Dondi

unread,

Oct 21, 2004, 3:10:48 AM10/21/04

to

On Wed, 20 Oct 2004 15:03:45 -0500, parv <pa...@yahooWhereElse.com>
wrote:

>>> printf "Unordered: %s\n\nOrdered: %s\n"
>>> , ${ make_path( @paths ) }
>>> , ${ make_path_ordered( @paths ) }
>>> ;
>>
>> Hmmm, I couldn't help giving a peek at least into the first few
>> lines... however this makes me think I should be moderately glad
>> I'm refusing to read it all!
>
>Hey, whatever suits you, fine w/ me.

I mean, yes: my cmts were not particularly constructive, an I
apologize for this, but the overall impression is that you have the
tendency to abuse printf() and referencing-dereferencing. As I said,
this is just an impression, maybe I'll take the time to look more
carefully into your code.

parv

unread,

Oct 21, 2004, 3:14:17 PM10/21/04

to

in message <61oen0pb0c029kgk8...@4ax.com>,
wrote Michele Dondi ...

> On Wed, 20 Oct 2004 15:03:45 -0500, parv <pa...@yahooWhereElse.com>

> wrote:
>
>>>> printf "Unordered: %s\n\nOrdered: %s\n"
>>>> , ${ make_path( @paths ) }
>>>> , ${ make_path_ordered( @paths ) }
>>>> ;
>>>
>>> Hmmm, I couldn't help giving a peek at least into the first few
>>> lines... however this makes me think I should be moderately glad
>>> I'm refusing to read it all!
>>
>>Hey, whatever suits you, fine w/ me.
>

> the overall impression is that you have the tendency to abuse
> printf() and referencing-dereferencing

I do not agree about printf() at all, but do agree in some limited
ways about references.

Uri Guttman

unread,

Oct 21, 2004, 4:08:11 PM10/21/04

to

>>>>> "p" == parv <pa...@yahooWhereElse.com> writes:

>> the overall impression is that you have the tendency to abuse
>> printf() and referencing-dereferencing

p> I do not agree about printf() at all, but do agree in some limited
p> ways about references.

printf is so rarely needed in perl. you can usually use interpolation,
sprintf, or formats (which i have never used in 11 years of perl
hacking). and i would prefer sprintf as then you can easily control
where the output goes rather than always going to a single handle. think
about the cases where you want output to be returned instead of printed,
or sent to two places like stdout and a log, etc. my rule for this is:

print rarely, print late.

print rarely means don't call print so often as it is slow. use .= to
build strings as that is fast.

print late means don't print until you have all the text you want in one
string. that way you can easily decide to print to multiple places and
control the printing. my dump, status and other text generating routines
always build strings and return them. their caller can decide to print
or add other text to them and return to a higher caller, etc. if you
print in a low level sub, you can't prepend text or mung it or control
its output.

so printf falls under that rule but since sprintf does the same work but
returns the string, i never use printf.

uri

Michele Dondi

unread,

Oct 21, 2004, 5:58:31 PM10/21/04

to

On Mon, 18 Oct 2004 16:45:22 -0500, parv <pa...@yahooWhereElse.com>
wrote:

> #!/usr/local/bin/perl

OK, eventually I looked at your script more carefully. Here you have
my IMHO constructive cmts...

> use warnings;
> use strict;
>
> my @paths = @ARGV;

As a side note, this may be a good thing for clarity, but in this
particular case I'd use @ARGV directly. And as a side note to the side
note I'd check that args are actually supplied.

> printf "Unordered: %s\n\nOrdered: %s\n"
> , ${ make_path( @paths ) }
> , ${ make_path_ordered( @paths ) }
> ;

Slightly awkward formatting/indenting apart, as I hinted in my other
post, (since 'Perl' neq 'C') while [s]printf() at times comes very
handy, in most cases a simple print() will do, with additional
advantage of its bells and whistles (say, stuff related to C<$\>,
C<$,>, etc.)

In this case you may have had

print 'Unordered: ', ${ make_path( @paths ) }, "\n\n";
print 'Ordered: ', ${ make_path_ordered ( @paths ) }, "\n";

instead.

But then, I would have avoided all that playing with "ref/unref", so I
would have had

print 'Unordered: ', make_path(@paths), "\n\n";
print 'Ordered: ', make_path_ordered(@paths), "\n";

(rewriting those subs suitably of course.)

As a general rule it is *my impression* that you're abusing the
"ref/unref" trick/technique in all of your script.

> # Make path string from given list of array references or strings

^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^

Why, if they're all coming from @ARGV by means of split()ting? But I
didn't check really any single statement of your program because as an
overall judgement I found it very difficult to read. So if you *did*
take references, then you shouldn't have, for it wouldn't have been
necessary...

> sub make_path
> { my @paths = @_;
>
> my %uniq;
> map $uniq{$_} = undef , @{ split_path( [@paths] ) };
> keep_usable(\%uniq);

Now I see! You have a make_path and a make_path_ordered because the
hash-based algorithm you're using to keep unique entries relies on
keys(). But then the algorithm that Tad posted (hash-based, too), and
that one could have well devised him/herself, and that IIRC is even in
the faq, would have taken care of preserving order in the first place.

> return delimit_path( [ keys %uniq ] ) ;

Next-to-useless sub, had you avoided to take references everywhere!

> # Make ordered path -- as indicated by the first position of each unique

[snip]

> my %uniq;
> { my $i = 0;
> map $uniq{$_} = !exists $uniq{$_} ? $i++ : $uniq{$_}
> , @{ split_path( [ @paths ]) };
> }

I must admit it took me a while, and the help of the cmt above, to
understand the logic of this sub. All I can say is that it is a
reasonably working but definitely clumsy workaround for not having
chosen the "correct" algorithm in the first place. Again, give a peek
into Tad's post (and possibly mine, and possibly the faq!)

> # Make ':' delimited string from given path array reference

^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^

> sub delimit_path
> { my ($paths) = @_;
> return \join ':' , @{$paths};
> }

Why?!?

> # Input is a list of paths, each element being an array reference or a

^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^

> # string
> sub split_path
> { my ($paths) = @_;
> return
> [ map split( /[:\s]/ , ref $_ ? @$_ : $_ ) , @{$paths} ];
> }

Why?!?

> # Keep only read/execute-able directories from the keys of given hash
> # reference
> sub keep_usable
> { my ($hash) = @_;
> foreach ( keys %{$hash} )
> { delete $hash->{$_} unless -d $_ || -x $_;

This will keep non-readable directories as well as executable files.
You want

delete $hash->{$_} unless -d $_ && -x $_;
# ^^
# ^^

instead. But then C<$_> is not necessary and can be omitted, and C<_>,
err well, you'll find it in the docs... so I'd do:

delete $hash->{$_} unless -d && -x _;

> __END__

All in all I'd rewrite your script like thus:

#!/usr/bin/perl -l

use strict;
use warnings;

die "Usage: $0 <path> [<paths>]\n" unless @ARGV;

my %seen;
print join ':', grep {
!$seen{$_}++ and
-d and -x _;
} map { split /:/ } @ARGV;

__END__

[14 lines! And certainly not golfing!!]

If I understood everything correctly, then it should yield the same
output your "Ordered" line does (apart from "Ordered: "). Or am I
missing something?

It doesn't do the "Unordered" thing, though, but I don't see why it
should, since *now* it wouldn't be portable even across different
runs!

Let me try it:

# echo $PATH
/sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin:/opt/bin:~/bin
# ./parv.pl $PATH $PATH /usr/bin
/sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin:/opt/bin

Seems to be OK, apart from a "missing" tilde expansion (-> faq!),
should we take care of that too?

HTH,

parv

unread,

Oct 22, 2004, 12:08:10 AM10/22/04

to

in message <m37jpjk...@linux.local>,
wrote Uri Guttman ...

>>>>>> "p" == parv <pa...@yahooWhereElse.com> writes:
>
> >> the overall impression is that you have the tendency to abuse
> >> printf() and referencing-dereferencing
>
> p> I do not agree about printf() at all, but do agree in some limited
> p> ways about references.
>
> printf is so rarely needed in perl. you can usually use

> interpolation, sprintf, or formats ... i would prefer sprintf as

> then you can easily control where the output goes rather than
> always going to a single handle. think about the cases where you
> want output to be returned instead of printed, or sent to two
> places like stdout and a log, etc.

Agree w/ you on use of s?printf() functions. I use sprintf() where
i need a formatted string which is not to be printed immediately as
you have listed. I personally found formats to be quite cumbersome
to work with.

Inside the print(), i want to keep the quoted text to minimum; i use
it to to print list of variables which does not involving any kind
of formatting or print just quoted text.

parv

unread,

Oct 22, 2004, 1:12:41 AM10/22/04

to

in message <snbgn0hm6goa7b211...@4ax.com>,
wrote Michele Dondi ...

> On Mon, 18 Oct 2004 16:45:22 -0500, parv <pa...@yahooWhereElse.com>
> wrote:
...

>> # Make path string from given list of array references or

>> # strings

> ^^^^^^^^^^^^^^^^
> Why, if they're all coming from @ARGV by means of split()ting?

Well, @ARGV is just one source of paths *in OP's case* & in the
program posted.

>> my %uniq;
>> { my $i = 0;
>> map $uniq{$_} = !exists $uniq{$_} ? $i++ : $uniq{$_}
>> , @{ split_path( [ @paths ]) };
>> }
>

> All I can say is that it is a reasonably working but definitely
> clumsy workaround for not having chosen the "correct" algorithm in
> the first place. Again, give a peek into Tad's post (and possibly
> mine, and possibly the faq!)

Above algorithm is similar to the faq entry[0] that you refer to;
this one uses map() instead of grep().

Care to explain how above algorithm is not "correct"? Clumsy, i
agree (in comparison to [0]).

[0] It seems to be the example b in answer to question "How can I
remove duplicate elements from a list or array?" in perlfaq4.

Somebody tell me this: Is there a guarantee that grep() will always
give the filtered output in the order in which input was received?

>> sub delimit_path
>> { my ($paths) = @_;
>> return \join ':' , @{$paths};
>> }
>
> Why?!?

To delimit the path list w/ the possibility of alternate delimiter.

(What is with "?!?" anyway? Expression of high annoyance?)

>> delete $hash->{$_} unless -d $_ || -x $_;
>
> This will keep non-readable directories as well as executable
> files. You want
>
> delete $hash->{$_} unless -d $_ && -x $_;

Right you are.

> instead. But then C<$_> is not necessary and can be omitted

Thanks for the reminder about '_'; i can't remember it when it needs
to be remembered.

And, thanks, generally, for your comments.

parv

unread,

Oct 22, 2004, 1:19:52 AM10/22/04

to

in message <slrncnh22...@localhost.holy.cow>,
wrote parv ...

> in message <m37jpjk...@linux.local>,
> wrote Uri Guttman ...
>
>>>>>>> "p" == parv <pa...@yahooWhereElse.com> writes:
>>
>> >> the overall impression is that you have the tendency to abuse
>> >> printf() and referencing-dereferencing
>>
>> p> I do not agree about printf() at all, but do agree in some limited
>> p> ways about references.
>>
>> printf is so rarely needed in perl. you can usually use
>> interpolation, sprintf, or formats

> Agree w/ you on use of s?printf() functions
...

> Inside the print(), i want to keep the quoted text to minimum

Forgot to add that using printf() allows to list all the variables
in one place, instead of being interspersed w/ print()[0], thus
variables are much easily locatable & changeable.

[0] Does not apply if the string is already contained in single
variable as Uri G had ponited out earlier in his reply.

Michele Dondi

unread,

Oct 22, 2004, 2:59:46 AM10/22/04

to

I was forgetting...

On Mon, 18 Oct 2004 16:45:22 -0500, parv <pa...@yahooWhereElse.com>
wrote:

> map $uniq{$_} = undef , @{ split_path( [@paths] ) };
> keep_usable(\%uniq);

It is *reasonably* recommended not to use map() for its side effects
in void context. See e.g. 'perldoc perlstyle'.

$uniq{$_} = undef for @{ split_path( [@paths] ) };

would have been just the same and inutitively more clear. Also, take
into account the possibility of a slice:

@uniq{ @{ split_path( [@paths] ) } } = ();

of course this would have been more clear had you avoided all those
references:

@uniq{ split_path @paths } = ();

> map $uniq{$_} = !exists $uniq{$_} ? $i++ : $uniq{$_}
> , @{ split_path( [ @paths ]) };
> }
> keep_usable(\%uniq);

It is recommended not to use map() for its side effects in void
context! (Repetita iuvant)

Michele Dondi

unread,

Oct 22, 2004, 7:48:09 AM10/22/04

to

On Fri, 22 Oct 2004 00:12:41 -0500, parv <pa...@yahooWhereElse.com>
wrote:

>>> # Make path string from given list of array references or
>>> # strings
>> ^^^^^^^^^^^^^^^^
>> Why, if they're all coming from @ARGV by means of split()ting?
>
>Well, @ARGV is just one source of paths *in OP's case* & in the
>program posted.

AFAICT the OP (see <10n86m9...@corp.supernews.com>) was asking
for *a regex* to remove duplicate entries from "a path" (to be
intended as ':' separated string). Also he talked about "PATH
variable", but it didn't specify how the program would have been
supposed to work on it.

Leaving the OP aside, it seems to me that in *your* program paths are
input only through @ARGV. But then I explicitly asked you if I were
missing something. So, am I missing something?

If so, then I still see no benefit of mixing strings and arrayrefs all
the way. IMHO it would be best to uniform the data at an early stage.

>>> my %uniq;
>>> { my $i = 0;
>>> map $uniq{$_} = !exists $uniq{$_} ? $i++ : $uniq{$_}
>>> , @{ split_path( [ @paths ]) };
>>> }
>>
>> All I can say is that it is a reasonably working but definitely
>> clumsy workaround for not having chosen the "correct" algorithm in

[snip]

>Care to explain how above algorithm is not "correct"? Clumsy, i
>agree (in comparison to [0]).

It is not "correct" in that it is *overly* clumsy. For otherwise I
would have written that it is not correct, not that it is not
"correct"... ;-)

The point is that the "unordered algorihtm" would yield different
results even across different runs with a recent perl. So if there's a
compact, self-evident, self-explanatory alternative that as an added
bonus even preserves order, then I'd tend to identify *that* with the
"correct" one.

>Somebody tell me this: Is there a guarantee that grep() will always
>give the filtered output in the order in which input was received?

I'm not sure I can grasp the sense of your words ("Somebody tell me
this"). If you're asking: "is there a guarantee that grep() will

always give the filtered output in the order in which input was

received?", then the answer is: "yes". See 'perldoc -f grep'

>>> sub delimit_path
>>> { my ($paths) = @_;
>>> return \join ':' , @{$paths};
>>> }
>>
>> Why?!?
>
>To delimit the path list w/ the possibility of alternate delimiter.

"Why" was referred to the quoted portion of the text underlined with
carets. However I hardly see how a sub that is hardly something more
than a wrapper around join() can improve the manageability of using an
alternate delimiter. Had you used something like

sub delimit_path {
my ($delim, $paths)=@_;
join $delim, @{$paths};
}

it would have made much more sense. But then I would have had
something like

use constant DELIM => ':';

at the top of my script.

>(What is with "?!?" anyway? Expression of high annoyance?)

Well, to some extent... but I'd rather say educated amazement! No
offence intended, of course...

>And, thanks, generally, for your comments.

Nice to see you didn't take them as offensive...

Michele

Michele Dondi

unread,

Oct 22, 2004, 7:48:08 AM10/22/04

to

On Fri, 22 Oct 2004 00:19:52 -0500, parv <pa...@yahooWhereElse.com>
wrote:

>> Inside the print(), i want to keep the quoted text to minimum
>
>Forgot to add that using printf() allows to list all the variables
>in one place, instead of being interspersed w/ print()[0], thus
>variables are much easily locatable & changeable.

Well, it is obvious that readability is not an absolute concept, and
of course an experienced C programmer will find the syntax/semantics
of printf() very intuitive. However one should not forget that most of
its conversions are there because C is a low level enough language to
require you to take care of them whereas Perl is high level enough to
do that for you and is smart enough to usually do it right too!

Personally I'd find something along this lines much more readable:

print 'Unordered: ', make_path(@paths), "\n\n";
print 'Ordered: ', make_path_ordered(@paths), "\n";

in fact it clearly separates the two things that are being printed.
And taking into account the context, of course efficiency matters as
those suggested by Uri are negligible.

OTOH I don't see how "listing all the variables in one place" would
make the whole lot more manageable.

Also, it is sensible to (C<local>ly) set $\ to "\n" or, for short
enough (one's mileage may vary, though!) scripts, to use -l. So that
probably I'd have

print 'Unordered: ', make_path @paths;
print 'Ordered: ', make_path_ordered @paths;

or, if I wanted exactly your output,

print 'Unordered: ', make_path @paths;
print '';
print 'Ordered: ', make_path_ordered @paths;

Or an alternative may be to use an HERE doc:

print <<"EOT"; # In this case @{[ ... ]} does make sense!
Unordered: @{[ make_path @paths ]}

Ordered: @{[ make_path_ordered @paths ]}
EOT

Tad McClellan

unread,

Oct 22, 2004, 7:26:22 AM10/22/04

to

parv <pa...@yahooWhereElse.com> wrote:

> Somebody tell me this: Is there a guarantee that grep() will always
> give the filtered output in the order in which input was received?

No.

grep() is guaranteed to give the filtered output in the order in which
the original list was provided though.

ie. the "list" may not be "input".

>> delete $hash->{$_} unless -d $_ && -x $_;

>> But then C<$_> is not necessary and can be omitted

>
> Thanks for the reminder about '_';

The reminder was about $_, not about _, they are not the same thing .

The above will execute faster if you *do* use _ :

delete $hash->{$_} unless -d $_ && -x _;

parv

unread,

Oct 23, 2004, 10:51:15 AM10/23/04

to

in message <slrncnhriu...@magna.augustmail.com>,
wrote Tad McClellan ...

> parv <pa...@yahooWhereElse.com> wrote:
>
>> Somebody tell me this: Is there a guarantee that grep() will always
>> give the filtered output in the order in which input was received?
>
> No.
>
> grep() is guaranteed to give the filtered output in the order in which
> the original list was provided though.

Ok then; works for me.

> ie. the "list" may not be "input".

You pedantic you! I admit i was not using correct terminology.

>>> delete $hash->{$_} unless -d $_ && -x $_;
>
>>> But then C<$_> is not necessary and can be omitted
>>
>> Thanks for the reminder about '_';
>
> The reminder was about $_, not about _, they are not the same
> thing .

Above could have been just as well for some $key ...

delete $hash->{$key} unless -d $key && -x $key;

...and i will still miss using '_' on the first writing.

See, i do not remember to use '_' myself. When i see it being used
elsewhere or somebody else brings it to my attention, i go "of course,
i forgot again", much like in this case;

> The above will execute faster if you *do* use _ :
>
> delete $hash->{$_} unless -d $_ && -x _;

I understood from Michele's post what you have stated above; tried
to convey that in my reply but obviously failed.