Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Traversal of a directory tree with housekeeping per directory

2 views
Skip to first unread message

Michael Rolfe

unread,
Aug 23, 2001, 2:23:46 PM8/23/01
to
<newbie warning>

I want to traverse a tree but, within each directory, I need to do a fair
amount of housekeeping. In particular, I need to read through all the files
in the directory before I have the information I need to start doing the
housekeeping in that directory.

Is use File::Find ; still the way to go?

Why I am doubtful is that it seems klutzy to check for a change in directory
for every file. And, even if I do that, detecting a change of directory
means, "Oops. Ungracefully scramble back to the directory you just left and
housekeep."

I think that doing an independent find() in each directory is what I want
but not using find() to step through those directories seems stupid.

Any insights welcome.

</newbie warning>

Ren Maddox

unread,
Aug 23, 2001, 5:21:03 PM8/23/01
to

finddepth() may help. That way, whenever a directory is processed,
you know that all of the files have already been processed. Depending
on what exactly you need to do, that may be sufficient.

--
Ren Maddox
r...@tivoli.com

Dave Tweed

unread,
Aug 23, 2001, 9:11:42 PM8/23/01
to
Michael Rolfe wrote:
> I want to traverse a tree but, within each directory, I need to do a fair
> amount of housekeeping. In particular, I need to read through all the files
> in the directory before I have the information I need to start doing the
> housekeeping in that directory.

Again, I offer my simple tree-walker as an alternative to File::Find. The
compact version recurses into subdirectories as it finds them, so it may
not quite meet your requirements. The full version separates the names before
processing any of them, so you can do something to all the files, do some
housekeeping, and then start recursing into the subdirectories.

Note that in either case, unlike File::Find, these functions do not chdir()
you to the various directories. Instead, you have the full pathname to each
object to manipulate as needed.

Enjoy!

-- Dave Tweed

=============================================================================

#!perl -w
# treewalk.pl - example of walking a directory, for comp.lang.perl.misc

&process_directory ('/path/to/root');

# compact version

sub process_directory {
my ($path) = @_;

# get all of the names from the directory, excluding "." and ".."
local (*DIR);
opendir (DIR, $path) || die "can't open directory $path: $!";
my @names = grep (!/^\.\.?$/, readdir DIR);
closedir DIR;

# the sort is optional
for (sort @names) {
my $temp = "$path/$_";
if (-d $temp) {
&process_directory ($temp);
} else {
&process_file ($temp);
}
}
}

sub process_file {
my ($path) = @_;

# whatever ...

}

=============================================================================

#!perl -w
# treewalk.pl - example of walking a directory, for comp.lang.perl.misc

&process_directory ('/path/to/root');

# full version

sub process_directory {
my ($path) = @_;

# get the names out of the current directory and separate them into
# files and subdirectories
my (@files, @directories);
my @names = &read_directory ($path);
for (@names) {
if (-d "$path/$_") {
push @directories, $_;
} else {
push @files, $_;
}
}

# process all the files
for (@files) {
&process_file ("$path/$_");
}

# do any housekeeping here, before recursing into subdirectories

# process all the subdirectories
for (@directories) {
&process_directory ("$path/$_");
}
}

sub process_file {
my ($path) = @_;

# whatever ...

}

# customize the filtering and sorting of names here

sub read_directory {
my ($path) = @_;

# get all of the names from a directory, excluding "." and ".."
local (*DIR);
opendir (DIR, $path) || die "can't open directory $path: $!";
my @names = grep (!/^\.\.?$/, readdir DIR);
closedir DIR;

# optional - filter out all other names starting with '.'
@names = grep (!/^\./, @names);

# optional - sort the names
@names = sort @names;

@names;
}

=============================================================================

Randal L. Schwartz

unread,
Aug 24, 2001, 2:12:04 AM8/24/01
to
>>>>> "Dave" == Dave Tweed <dtw...@acm.org> writes:

Dave> Again, I offer my simple tree-walker as an alternative to File::Find. The
Dave> compact version recurses into subdirectories as it finds them, so it may
Dave> not quite meet your requirements.

...

Dave> if (-d $temp) {
Dave> &process_directory ($temp);
Dave> } else {
Dave> &process_file ($temp);
Dave> }

Bad. It chases symlinks. Please make it not do that, or you will
ruin a good day.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<mer...@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Dave Tweed

unread,
Aug 24, 2001, 1:50:54 PM8/24/01
to
"Randal L. Schwartz" wrote:
> Bad. It chases symlinks. Please make it not do that, or you will
> ruin a good day.

Not an issue for me, since my platform doesn't support them.

The OP may or may not want to follow links.

You, of all people, should be able to insert "unless -l _" where needed.

These were obviously skeletal scripts, not polished platform-independent
applications. Sheesh.

-- Dave Tweed

Randal L. Schwartz

unread,
Aug 24, 2001, 3:07:30 PM8/24/01
to
>>>>> "Dave" == Dave Tweed <dtw...@acm.org> writes:

Dave> "Randal L. Schwartz" wrote:
>> Bad. It chases symlinks. Please make it not do that, or you will
>> ruin a good day.

Dave> Not an issue for me, since my platform doesn't support them.

Fine.

Dave> The OP may or may not want to follow links.

Well, if you follow links, and you aren't doing duplicate elimination,
you'll ruin a good day. That's my point. Do I need to repeat it?

Dave> You, of all people, should be able to insert "unless -l _" where needed.

Right. *I* can. But I'm not the only person reading this newsgroup (I
hope :). The warning was as much for the people reading this group as
it was for you.

Dave> These were obviously skeletal scripts, not polished
Dave> platform-independent applications. Sheesh.

Perhaps a flag that said "tested on DOS where symlinks don't exist"
might have been prudent. It wasn't clear to me that you were running
on DOS.

Dave Tweed

unread,
Aug 24, 2001, 7:23:11 PM8/24/01
to
"Randal L. Schwartz" wrote:
> Dave> The OP may or may not want to follow links.
>
> Well, if you follow links, and you aren't doing duplicate elimination,
> you'll ruin a good day. That's my point. Do I need to repeat it?

Well, yes. Help me understand the issue here. You seem to be asserting
that if a filesystem supports symlinks, then people will invariably
use them, and when they do, they cause problems (what, infinite loops?).

> The warning was as much for the people reading this group as
> it was for you.

Then you need to be less oblique with your comments. You made it look
like my script was somehow creating a horrendous problem that would
render it completely useless. However, I would note that File::Find does
not address this problem either; the user needs to put the appropriate
test in his wanted() function. Therefore, you should have responded in
a way that made it clear that you were talking about an issue common
to all tree-walkers. In fact, it isn't a Perl issue at all.

> Perhaps a flag that said "tested on DOS where symlinks don't exist"
> might have been prudent. It wasn't clear to me that you were running
> on DOS.

It shouldn't matter. Even if my platform supported symlinks, it isn't
clear that I'd be using them in the trees I'd be running this tool on.

Neither one of us knows why the OP was walking his trees.

-- Dave Tweed

Randal L. Schwartz

unread,
Aug 24, 2001, 9:59:45 PM8/24/01
to
>>>>> "Dave" == Dave Tweed <dtw...@acm.org> writes:

Dave> "Randal L. Schwartz" wrote:
Dave> The OP may or may not want to follow links.
>>
>> Well, if you follow links, and you aren't doing duplicate elimination,
>> you'll ruin a good day. That's my point. Do I need to repeat it?

Dave> Well, yes. Help me understand the issue here. You seem to be asserting
Dave> that if a filesystem supports symlinks, then people will invariably
Dave> use them, and when they do, they cause problems (what, infinite loops?).

Yes. Sorry. Let me slow down a bit.

If someone creates:

ln -s .. FOO

in a directory they are searching with your routine, it will go into
an infinite loop, because it'll keep reading FOO then FOO/FOO then
FOO/FOO/FOO, etc etc. The problem is that a symlink can point to a
directory, especially directories that are above the search point.
And you weren't testing for that.

>> The warning was as much for the people reading this group as
>> it was for you.

Dave> Then you need to be less oblique with your comments. You made it look
Dave> like my script was somehow creating a horrendous problem that would
Dave> render it completely useless.

It renders it useless in directories that may contain symlinks. :)

Dave> However, I would note that File::Find does not address this
Dave> problem either; the user needs to put the appropriate test in
Dave> his wanted() function.

Yes it does. It avoids following symlinks. It goes to special pains
to do that. You do not need to test in the wanted(), because it'll be
avoided in the recursion part, not the wanted part. You'll get the
symlink in your wanted(), but it won't follow it.

Dave> Therefore, you should have responded in
Dave> a way that made it clear that you were talking about an issue common
Dave> to all tree-walkers. In fact, it isn't a Perl issue at all.

Well, it's an issue in *YOUR* perl code. It's not an issue for
tree-walkers written with File::Find (which is also Perl code) or in
properly written Perl tree walkers. So it's not an issue for *all*
tree walkers, just the ones that don't do the right thing there. :)

>> Perhaps a flag that said "tested on DOS where symlinks don't exist"
>> might have been prudent. It wasn't clear to me that you were running
>> on DOS.

Dave> It shouldn't matter. Even if my platform supported symlinks, it
Dave> isn't clear that I'd be using them in the trees I'd be running
Dave> this tool on.

Symlinks can be anywhere.

Dave> Neither one of us knows why the OP was walking his trees.

Granted. :)

print "Just another Perl hacker,"

Joe Smith

unread,
Aug 24, 2001, 11:36:20 PM8/24/01
to
In article <3B86E1DE...@acm.org>, Dave Tweed <dtw...@acm.org> wrote:
>"Randal L. Schwartz" wrote:
>> Dave> The OP may or may not want to follow links.
>>
>> Well, if you follow links, and you aren't doing duplicate elimination,
>> you'll ruin a good day. That's my point. Do I need to repeat it?
>
>Well, yes. Help me understand the issue here. You seem to be asserting
>that if a filesystem supports symlinks, then people will invariably
>use them, and when they do, they cause problems (what, infinite loops?).

Yes. It is very easy to create a symlink that will cause a naive
treewalker to go into an infinite loop.

Another example is if you roll your own treewalker to do the
equivalent of "rm -r foo". If you naively follow the symlink to
a directory somewhere else, you can wipe out a lot more than you
intended.

>> The warning was as much for the people reading this group as
>> it was for you.
>
>Then you need to be less oblique with your comments. You made it look
>like my script was somehow creating a horrendous problem that would
>render it completely useless.

>From experience, I can tell you that the horrendous problem does exist.
It doesn't render the script completely useless; the problem makes
the script extremely destructive.

> However, I would note that File::Find does
>not address this problem either;

It appears that you are using an out-of-date version of File/Find.pm
as a reference. To quote from the version included with perl-5.6.1:

NAME
find - traverse a file tree

finddepth - traverse a directory structure depth-first

SYNOPSIS
use File::Find;
find(\&wanted, '/foo', '/bar');
sub wanted { ... }

use File::Find;
finddepth(\&wanted, '/foo', '/bar');
sub wanted { ... }

use File::Find;
find({ wanted => \&process, follow => 1 }, '.');


> the user needs to put the appropriate test in his wanted() function.

No, the appropriate test needs to be in the find function. Always test
for -l() before testing for -d(). If explicitly following symlinks, use
a hash with device_number&inode_number to make sure that you never
process the same directory twice.

>Therefore, you should have responded in
>a way that made it clear that you were talking about an issue common
>to all tree-walkers. In fact, it isn't a Perl issue at all.

It was a problem with the sample code that was posted.

Anyone who has been burned by code like that wants to make sure
that no-one else has to suffer such an ignominious fate.

>> Perhaps a flag that said "tested on DOS where symlinks don't exist"
>> might have been prudent. It wasn't clear to me that you were running
>> on DOS.
>
>It shouldn't matter. Even if my platform supported symlinks, it isn't
>clear that I'd be using them in the trees I'd be running this tool on.

Always check for dangerous conditions, even you intend to never
run into such a situation. The people who copy-and-paste the posted
code may not be so diligent.
-Joe

--
See http://www.inwap.com/ for PDP-10 and "ReBoot" pages.

Dave Tweed

unread,
Aug 25, 2001, 12:16:33 AM8/25/01
to
"Randal L. Schwartz" wrote:
> Dave> However, I would note that File::Find does not address this
> Dave> problem either; the user needs to put the appropriate test in
> Dave> his wanted() function.
>
> Yes it does. It avoids following symlinks. It goes to special pains
> to do that.
[snip]

> It's not an issue for tree-walkers written with File::Find (which is
> also Perl code) or in properly written Perl tree walkers. So it's not
> an issue for *all* tree walkers, just the ones that don't do the right
> thing there. :)

OK, you're right. The latest versions of File::Find do check for symlinks.
However, this was not true of the module supplied with Perls up through
version 5.005, and I'm sure there are lots of machines out there running
old code. People still need to beware.

Let me guess: You were the one who finally fixed it ...

-- Dave Tweed

Randal L. Schwartz

unread,
Aug 25, 2001, 2:14:08 AM8/25/01
to
>>>>> "Dave" == Dave Tweed <dtw...@acm.org> writes:

Dave> OK, you're right. The latest versions of File::Find do check for
Dave> symlinks. However, this was not true of the module supplied
Dave> with Perls up through version 5.005, and I'm sure there are lots
Dave> of machines out there running old code. People still need to
Dave> beware.

I think you need to read the older code a bit better. Here's the code
from 5.005_03:


foreach $topdir (@_) {
(($topdev,$topino,$topmode,$topnlink) =
($Is_VMS ? stat($topdir) : lstat($topdir)))
|| (warn("Can't stat $topdir: $!\n"), next);
if (-d _) {

Notice the "lstat" rather than "stat", followed by -d. If it's a
symlink, the -d cannot report true at this point, so a symlink
pointing at a directory is *not* followed. And I'm very very sure
that this is also the behavior all the way back through the find.pl
subroutine included in perl4 (perl3?). Because following symlinks is
universally a *bad* thing if you don't also keep from looping, and
you've got to give credit to Larry for certainly knowing that.

In fact, I just hunted down 4.036 in the CPAN, and found in find.pl:

# Get link count and check for directoriness.

($dev,$ino,$mode,$nlink) = lstat($_) unless $nlink;

if (-d _) {

# It really is a directory, so do it recursively.

There it is. lstat() followed by -d _. Won't pull true for a symlink
pointing to a directory.

Again, since you seem to want everything pointed out to you, the
following are dangerous, because they can report true on a symlink
pointing to a directory:

-d $foo
stat($foo) ... -d _

the following are safe:

not -l $foo and -d $foo
lstat($foo) ... -d _

The last one is the one used by File::Find (5.5.3) and find.pl (4.036)
above.

Does that help? I'm sorry I'm having to reiterate... I guess I
presumed you had more knowledge than you seem to be showing. :)

Dave Tweed

unread,
Aug 25, 2001, 10:51:45 AM8/25/01
to
"Randal L. Schwartz" wrote:
> Does that help? I'm sorry I'm having to reiterate... I guess I
> presumed you had more knowledge than you seem to be showing. :)

Yes, thank you very much. I apologize for not realizing that lstat()
would prevent -d. I guess I didn't think about it very hard.

As I said, I don't generally use links (symbolic or otherwise) in
my own directory structures, even on platforms that support them.
I presumed that people who use them would be aware of the issues,
especially if they're deliberately creating loops.

All bets are off if you're using a tree walker on something that
isn't a tree; what you really need is a generalized directed-graph
walker. I guess that's what File::Find really is.

Now all it needs is the housekeeping hook that the OP was looking
for.

(But I think we've scared him off ... :-)

-- Dave Tweed

Randal L. Schwartz

unread,
Aug 26, 2001, 9:50:45 AM8/26/01
to
>>>>> "Dave" == Dave Tweed <dtw...@acm.org> writes:

Dave> All bets are off if you're using a tree walker on something that
Dave> isn't a tree; what you really need is a generalized directed-graph
Dave> walker. I guess that's what File::Find really is.

Well, it *is* a tree if you ignore the symlinks! Thus, you either
write a tree-walker by ignoring the symlinks, or a
directed-graph-walker by doing a lot of housekeeping.

Dave> Now all it needs is the housekeeping hook that the OP was looking
Dave> for.

Maybe I should fess up that I was thinking a lot about File::Find
because my upcoming column article in Linux Magazine shows a
treewalker that works as an iterator, not a callback. Can't tip my
hand more than that, but it'll probably end up in the CPAN as my first
real module submission after a bit more polish.

I can't prepublish the article here (work for hire, ya know), but
it'll eventually show up along with the other ones at

<http://www.stonehenge.com/merlyn/LinuxMag/>

Dave Tweed

unread,
Aug 26, 2001, 2:09:23 PM8/26/01
to
"Randal L. Schwartz" wrote:
> Well, it *is* a tree if you ignore the symlinks! Thus, you
> either write a tree-walker by ignoring the symlinks, or a
> directed-graph-walker by doing a lot of housekeeping.

No, hard links can create loops as well.

-- Dave Tweed

Rich Lafferty

unread,
Aug 26, 2001, 2:59:39 PM8/26/01
to
In comp.lang.perl.moderated,

But multiple hard links to directories are /expected/ to produce nasal
demons. That's why they require a specific option to ln, and why
they're only do-able by root. The idea is that if you do hard links of
directories, you expect breakage.

On the other hand, symlinks of directories are expected to /work/,
therefore programs that encounter them are expected to deal with them
nicely.

-Rich

--
Rich Lafferty --------------+-----------------------------------------------
Montreal, Quebec, Canada | Help save the endangered Mountain Walrus!
http://www.lafferty.ca/ | http://www.end.com/~jynx/walrus/
rich+...@lafferty.ca ----+-----------------------------------------------

Mark-Jason Dominus

unread,
Aug 26, 2001, 2:19:29 PM8/26/01
to

Says Dave Tweed:

> No, hard links can create loops as well.

You are mistaken. Great pains are taken so that this never occurs; if
it does occur, the filesystem is completely broken.

There is a restriction on the 'link' operation that prevents anyone
but the superuser from making a hard link to a directory, and it's
precisely to avoid loops in the file system.

Here's a quote from "The UNIX Time-Sharing System" by D.M. Ritchie and
Ken Thompson:

The directory structure is constrained to have the form of a
rooted tree. Except for the special entries ``.'' and ``..'',
each directory must appear as an entry in exactly one other
directory, which is its parent.

Why is this restriction enforced?

The reason for this is to simplify the writing of programs
that visit subtrees of the directory structure, and more
important, to avoid the separation of portions of the
hierarchy. If arbitrary links to directories were permitted,
it would be quite difficult to detect when the last connection
from the root to a directory was severed.

There is no reason for a program like 'find' to worry about detecting
a malformed file system. That is the job of the 'fsck' program.

(http://cm.bell-labs.com/cm/cs/who/dmr/cacm.html)

Dave Tweed

unread,
Aug 26, 2001, 10:12:13 PM8/26/01
to
Mark-Jason Dominus wrote:
> Here's a quote from "The UNIX Time-Sharing System" by D.M. Ritchie and
> Ken Thompson:

Thanks for looking that up. I had a copy of that book many many years ago,
but a friend borrowed it and never returned it, and I never bothered to
track down another copy.

This whole exchange has been a real learning experience for me, and I want
to thank everyone for their time. One of these days RSN, I'll escape MS
hell and run Linux on most of the machines around here. Then I'll be able
to get back up to speed on how real systems work.

-- Dave Tweed

Randal L. Schwartz

unread,
Aug 27, 2001, 6:39:26 AM8/27/01
to
>>>>> "Dave" == Dave Tweed <dtw...@acm.org> writes:

Dave> This whole exchange has been a real learning experience for me,
Dave> and I want to thank everyone for their time. One of these days
Dave> RSN, I'll escape MS hell and run Linux on most of the machines
Dave> around here. Then I'll be able to get back up to speed on how
Dave> real systems work.

One thing you might also meta-learn is that you appeared to be quick
to dismiss my first complaint and absolutely defend your original
code, instead of looking into it more thoroughly even though in the
back of some part of your head, you *knew* that you didn't know
everything there was about symlinks and directory structures on Perl's
home turf, Unix.

I don't complain about a lot of stuff here, but when I do, I'm usually
right. :) It's the sign of a good programmer to accept criticism
directed at their code with "I'll look into that" rather than "gosh
durn it, it's exactly right already, shove off". I'm no spring
chicken when it comes to programming (I got paid for my first program
some 25 years ago), and I've had my share of criticism to accept.

So, besides the takeaway of some specific Unix info, I suggest you
re-review this thread for how you reacted at each statement thrown at
you (except this last), and notice how much work we had to do to beat
the truth into you. :) That makes you unhireable if taken to an
extreme, or at least a bit less of a team player.

print "Just another Perl hacker," # with 30 years of programming experience!

Ilya Zakharevich

unread,
Aug 27, 2001, 9:23:59 AM8/27/01
to
[A complimentary Cc of this posting was sent to
Mark-Jason Dominus
<m...@plover.com>], who wrote in article <2001082618192...@plover.com>:
> There is a restriction on the 'link' operation that prevents anyone
> but the superuser from making a hard link to a directory, and it's
> precisely to avoid loops in the file system.

AFAIU, Loops are not a big deal. '..' is.

Suppose you hardlink /d/c/ to /a/b/. Before this, .. of /a/b/ was
/a/. For simplicity, suppose that this fact is still true after the
hardlinking (does not matter much as you will see in a moment).

Now unlink /a/b/. You can still reach this directory as /d/c/. But
its .. is still /a/ - or nothing (how to find /d/ other than by
scanning the whole filesystem?). You got a broken filesystem.

Ilya

James Copland

unread,
Aug 27, 2001, 9:45:31 AM8/27/01
to
Hello CLPM,

Just to beat this dead horse some more and since I'm really dumb and
need everything spelled out for me. 8=)

--Start Quotes
Dave Tweed <dtw...@acm.org> Wrote:

for (@names) {
if (-d "$path/$_") {
push @directories, $_;
} else {
push @files, $_;
}
}

and mer...@stonehenge.com (Randal L. Schwartz) Wrote:

the following are safe:

not -l $foo and -d $foo
lstat($foo) ... -d _

-- End Quotes
This implies that:

for (@names) {
if (not -l "$path/$_" and -d "$path/$_") {


push @directories, $_;
} else {
push @files, $_;
}
}

is safe, is this correct? Will it work on both *nix and Win32?

--
Best regards,
James


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

Dave Tweed

unread,
Aug 27, 2001, 10:55:38 AM8/27/01
to
Randal --

Sorry, I can't let you have the last word with inflammatory statements
like these.

> One thing you might also meta-learn is that you appeared to be quick
> to dismiss my first complaint and absolutely defend your original
> code, instead of looking into it more thoroughly even though in the
> back of some part of your head, you *knew* that you didn't know
> everything there was about symlinks and directory structures on Perl's
> home turf, Unix.

> I don't complain about a lot of stuff here, but when I do, I'm usually
> right. :) It's the sign of a good programmer to accept criticism
> directed at their code with "I'll look into that" rather than "gosh
> durn it, it's exactly right already, shove off".

I never defended my original code, and I never asserted that it was
exactly right. I merely pointed out that it was a skeleton and that
the issue you were raising was not relevant to me and may or may not
have been relevant to Michael, the OP.

The entire rest of the thread was simply educating me on some extremely
subtle (to me) details of Unix filesystems and how people use them, and
the implementation of File::Find. I was learning at a furious rate,
studying several different versions of File::Find, the Perl documentation,
and some old Unix manuals before each post.

The tone of my posts may have seemed a bit arrogant, but they were
designed to elicit information, and basically followed the tone you
set with your first post:

Bad. It chases symlinks. Please make it not do that, or you
will ruin a good day.

Let's face it, this was very blunt and full of hidden assumptions.

> I'm no spring chicken when it comes to programming (I got paid for my
> first program some 25 years ago), and I've had my share of criticism
> to accept.

We're not that dissimilar. I was programming an 8008 for a real-time
embedded application in the power industry in 1976. My formal training
is in hardware engineering, but my work these days is about a 50-50 mix
of hardware and software design for embedded computers. I started using
Perl3 and Perl4 in my work around 1989. At that time, I was using the
Unix-ish Aegis operating system on Apollo workstations.

> So, besides the takeaway of some specific Unix info, I suggest you
> re-review this thread for how you reacted at each statement thrown at
> you (except this last), and notice how much work we had to do to beat
> the truth into you. :) That makes you unhireable if taken to an
> extreme, or at least a bit less of a team player.

OK, here's a very brief summary:

Michael: I need to do some housekeeping that File::Find doesn't support.
Dave: Here's a template script that I often use. Modify as needed.
Randal: Bad. Chases symlinks.
Dave: You should know how to deal with that.
Randal: I was warning people.
Dave: Ask for clarification.
Mistakenly state that old File::Find doesn't check.
Randal: Explain loops. Yes, File::Find checks. (no details)
Joe: More details about File::Find internals.
Dave: Point out differences in versions of File::Find, not realizing
that lstat() prevents -d.
Randal: Point out that lstat() prevents -d.
Dave: Thank you. Comment about tree vs. directed graph.
Randal: Without symlinks, directories are trees.
Dave: Hard links can create loops.
Rich
& Mark: No, the filesystem code prevents that.
Dave: Thank you.
Randal: Various personal comments, based only on this exchange.

I would suggest you talk to people I've actually worked beside before
you start making public statements about my personality, skills or
ability to find work.

I do defend my positions vigorously, and make no apologies for that.
But I *do* change when presented with sufficient details. I was missing
two key facts: lstat() prevents -d, and Unix filesystem code (but not
the underlying data structures) prevents hard links to directories. I
think we cleared those up rather nicely, all in all.

If you think this was a difficult exchange, then I would venture to
say that you wouldn't be prepared to work in some of the teams that
I've been on. Also keep in mind that if we had been face-to-face,
this would have been a 5-10 minute hallway conversation at most, and
no one would have been upset when it was over.

-- Dave Tweed

Christopher Biow

unread,
Aug 27, 2001, 4:40:50 PM8/27/01
to
mer...@stonehenge.com (Randal L. Schwartz) wrote:

>you *knew* that you didn't know everything there was about symlinks
>and directory structures on Perl's home turf, Unix.

It's not just Unix. Win2K/NTFS5 has had symlinks for a while--they're just
not terribly well documented or widely used:

|E:\Temp\parent>
|e:\util\junction junc e:\temp
|
|Junction v1.02 - Win2K junction creator and reparse point viewer
|Copyright (C) 2000 Mark Russinovich
|Systems Internals - http://www.sysinternals.com
|
|Created: E:\Temp\parent\junc
|Targetted at: e:\temp
|
|E:\Temp\parent>
|dir junc\parent\junc\parent\junc\parent
| Volume in drive E is WIN2K PRO
| Volume Serial Number is B1F4-E5F7
|
| Directory of E:\Temp\parent\junc\parent\junc\parent\junc\parent
|
|08/27/2001 04:19p <DIR> .
|08/27/2001 04:19p <DIR> ..
|08/27/2001 04:19p <JUNCTION> junc
| 0 File(s) 0 bytes
| 3 Dir(s) 4,994,040,320 bytes free
|
|E:\Temp\parent>
|perl -e "print -d 'junc'"
|1
|E:\Temp\parent>
|perl -e "print -l 'junc'"
|
|E:\Temp\parent>
|perl -v
|
|This is perl, v5.6.1 built for MSWin32-x86-multi-thread
|(with 1 registered patch, see perl -V for more detail)
|
|Copyright 1987-2001, Larry Wall
|
|Binary build 628 provided by ActiveState Tool Corp. http://www.ActiveState.com
|Built 15:41:05 Jul 4 2001

I suspect that the default Win32 Perl port is far from the only software
not to support NTFS5 junctions. Win32::NTFS from
<http://www.generation.net/~aminer/Perl/NTFS.html> should help.

Dave Tweed

unread,
Aug 27, 2001, 5:42:31 PM8/27/01
to
James Copland wrote:
> This implies that:
>
> for (@names) {
> if (not -l "$path/$_" and -d "$path/$_") {
> push @directories, $_;
> } else {
> push @files, $_;
> }
> }
>
> is safe, is this correct? Will it work on both *nix and Win32?

Yes, now that I thoroughly understand the issues, it is both safe*
and portable. On any system on which symlinks are not supported by
the Perl implementation, -l will simply never return true.

*Actually, now that I think about it some more, I'm not sure.
(Scans perlfunc one more time...) It depends on whether
not -l $foo and -d $foo # [1]
is exactly equivalent to
not -l $foo and -d _ # [2]

In other words, is Perl guaranteed to recognize that the -d test
in [1] should reuse the cached lstat() information and not do its
own stat()? It would be safer to use [2], for which that behavior
is documented.

In any case, this will put all of your symlinks into the @files list,
even if they point to directories. This might not be exactly what
you want. It might be better to have a 3-way splitting of the list:

for (@names) {
if (-l "$path/$_") {
push @symlinks, $_;
} elsif (-d _) {


push @directories, $_;
} else {
push @files, $_;
}
}

-- Dave Tweed

Yves Orton

unread,
Aug 28, 2001, 8:35:39 AM8/28/01
to
Dave Tweed <dtw...@acm.org> wrote in message news:<3B86E1DE...@acm.org>...

> "Randal L. Schwartz" wrote:
> > Dave> The OP may or may not want to follow links.
> >
> > Well, if you follow links, and you aren't doing duplicate elimination,
> > you'll ruin a good day. That's my point. Do I need to repeat it?
>
SNIP

> However, I would note that File::Find does
> not address this problem either; the user needs to put the appropriate
> test in his wanted() function.

>From File::Find:
follow

Causes symbolic links to be followed. Since directory trees with
symbolic links (followed) may contain files more than once and may
even have cycles, a hash has to be built up with an entry for each
file. This might be expensive both in space and time for a large
directory tree. See follow_fast and follow_skip below. If either
follow or follow_fast is in effect:
It is guarantueed that an lstat has been called before the user's
wanted() function is called. This enables fast file checks involving
_.

There is a variable $File::Find::fullname which holds the absolute
pathname of the file with all symbolic links resolved

....

no_chdir

Does not chdir() to each directory as it recurses. The wanted()
function will need to be aware of this, of course. In this case, $_
will be the same as $File::Find::name.

> Neither one of us knows why the OP was walking his trees.

Shouldnt that be climbing his trees? :-)

Yves

0 new messages