Re: recursive search

Jeff 'japhy' Pinyan

unread,

Dec 2, 2005, 10:36:41 AM12/2/05

to The Ghost, beginners@perl.org Beginners

On Dec 2, The Ghost said:

> I want to know how many new line chars there are in all files in a directory
> (and it's subdirectories). What's the best way?

You'll want to use File::Find (a standard module) to do your directory
recursion for you. For each file you get to, open it, count its newlines,
and add that to your running total.

--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
http://www.perlmonks.org/ % have long ago been overpaid?
http://princeton.pm.org/ % -- Meister Eckhart

The Ghost

unread,

Dec 2, 2005, 10:24:54 AM12/2/05

to beginners@perl.org Beginners

I want to know how many new line chars there are in all files in a
directory (and it's subdirectories). What's the best way?

Thanks!

Chris Devers

unread,

Dec 2, 2005, 10:39:07 AM12/2/05

to The Ghost, beginners@perl.org Beginners

On Fri, 2 Dec 2005, The Ghost wrote:

> I want to know how many new line chars there are in all files in a
> directory (and it's subdirectories). What's the best way?

I'm sure this isn't how you want to do it, but this might work:

$ cat `find . -type f` | wc -l

It'll choke if you have too many files in the directory in question, as
there are limits to how long the argument list can be in the shell, but
provided that you don't exceed that limit, this will get you a quick and
dirty answer to your question.

Otherwise, you'll need to build up a list with File::Find or similar
module, then work through the list looking for newline chatacters for
each file in that list. It should get the same result as above, but will
take more hand-coding to get to the final result, and it shouldn't hit
the limitation of too many files that the shell approach will have.

--
Chris Devers

™*

Charles K. Clarkson

unread,

Dec 2, 2005, 10:39:32 AM12/2/05

to begi...@perl.org

The Ghost <mailto:gh...@madisonip.com> wrote:

: I want to know how many new line chars there are in all files

: in a directory (and it's subdirectories). What's the best way?

A lot depends on your idea of "best". It might be that the
best way is to hand the project off to someone else and reap the
benefits of their skills. In fact, many very rich people say this
is the best way to do just about anything. So what do you mean by
"best"?

One way to tackle this problem is to figure out how to find
the number of new lines in any file. Since any file may be very
large, assume at least one file cannot be loaded into memory.
Now take that solution and File::Find to apply it to many files.

HTH,

Charles K. Clarkson
--
Mobile Homes Specialist
254 968-8328

. . . With Liberty and Justice for all (heterosexuals).

Thomas Bätzler

unread,

Dec 2, 2005, 10:43:43 AM12/2/05

to beginners@perl.org Beginners, The Ghost

The Ghost <gh...@madisonip.com>asked:

> I want to know how many new line chars there are in all files
> in a directory (and it's subdirectories). What's the best way?

Use File::Find to iterate over the files and then sum up the
newlines you find in each file. Counting the newlines in a
single file is left as an exercise for the reader.

HTH,
Thomas

Jennifer Garner

unread,

Dec 2, 2005, 10:53:25 AM12/2/05

to begi...@perl.org

#!/usr/local/bin/perl
#
# recurs.pl
#
# This script executes recursively on subdirs the command you supply as a
parameter
#
# Run "program -h" to see the run options
#
# Last modified: Apr 10 1997
# Author: Bekman Stas <c040...@techst02.technion.ac.il>;
# <sbe...@iil.intel.com>;

$|=1;

# Set here the pattern extensions of your image files

# Usage

(@ARGV == 1 ) || die ("Usage: recurs.pl [-h] \n\t-h this help\n\n");

$command=$ARGV[0];
#$command=~s/(.*)/'$1'/;

&recursive();

# Subroutine "recursive" goes recursively down at the dir tree and
# and runs the $ARGV[0] for you. After comming to the end it's coming back
up
# at the tree.

sub recursive {
system($command);
#print "$command\n";
#die;
foreach $dir (<*>;) {
if (-d $dir) {
# print "$dir\n";
chdir $dir;
&recursive();
chdir "..";
}
}
}

> --
> To unsubscribe, e-mail: beginners-...@perl.org
> For additional commands, e-mail: beginne...@perl.org
> <http://learn.perl.org/> <http://learn.perl.org/first-response>
>
>
>

Charles K. Clarkson

unread,

Dec 2, 2005, 11:32:40 AM12/2/05

to begi...@perl.org

Jennifer Garner <mailto:mlis...@gmail.com> wrote:

[snip]
: # Last modified: Apr 10 1997
[snip]

Please do not provide outdated, buggy solutions to a beginners
list. We are trying to do much more than just solve problems. We
are (hopefully) fostering good programming skills first and good
solutions second. Your solution not only didn't run, it provided
several excellent reasons why simple approaches do not always work
across many OS platforms.

use...@davidfilmer.com

unread,

Dec 2, 2005, 1:44:51 PM12/2/05

to

Jeff 'japhy' Pinyan wrote:
> > I want to know how many new line chars there are in all files in a directory

As others have said, you can use File::Find (or, my favorite module,
IO::All) to identify the files. For counting the number of lines, you
ought to check your docs:

perldoc -q "How do I count the number of lines in a file?"

Shawn Corey

unread,

Dec 2, 2005, 1:18:26 PM12/2/05

to begi...@perl.org

Jennifer Garner wrote:
> $|=1;
>

Be careful with this one. The documentation for it makes it sound like
it's a good idea to set this but doing so turns buffering OFF, not ON.
Normally you leave this alone, even for pipes and sockets; Perl does the
right thing in almost every case.

See:
perldoc perlvar (search for $|)
perldoc FileHandle

--

Just my 0.00000002 million dollars worth,
--- Shawn

"Probability is now one. Any problems that are left are your own."
SS Heart of Gold, _The Hitchhiker's Guide to the Galaxy_

The Ghost

unread,

Dec 2, 2005, 1:30:40 PM12/2/05

to Perl Beginners

So far I did this:

#!/usr/bin/perl

use File::Find;
my $totalLines;
find(\&wanted, '@directories');
sub wanted {
unless ($_=~m/.html|.mas|.pl|.txt$/i) {return 0;} #filter the kinds
of files you want
open FILE, "<$File::Find::name";
print "$_: ";
my @lines=<FILE>;
print "$#lines\n";
$totalLines+=$#lines; #wanted's value is ignored so we have to
do this here.
return;}
print "$totalLines\n";

This only limits me by the size of the file, or no?

Thanks!

Shawn Corey

unread,

Dec 2, 2005, 4:03:43 PM12/2/05

to Perl Beginners

The Ghost wrote:
> So far I did this:
>
> #!/usr/bin/perl
>
> use File::Find;
> my $totalLines;
> find(\&wanted, '@directories');
> sub wanted {
> unless ($_=~m/.html|.mas|.pl|.txt$/i) {return 0;} #filter the
> kinds of files you want
> open FILE, "<$File::Find::name";
> print "$_: ";
> my @lines=<FILE>;
> print "$#lines\n";
> $totalLines+=$#lines; #wanted's value is ignored so we have to
> do this here.

$#lines is the index of the last entry in @lines. scalar( @lines ) is
the number of items in the array. Normally, $#lines + 1 == scalar(
@lines ). I think you should use scalar( @lines ) here.

> return;}
> print "$totalLines\n";
>
> This only limits me by the size of the file, or no?

Yes. If you have big files, replace the slurp with a loop.

Purl Gurl

unread,

Dec 2, 2005, 7:40:30 PM12/2/05

to

ghost wrote:

(snipped)

> So far I did this:

> #!/usr/bin/perl

> use File::Find;

My personal advice is to not use a module unless
you have good reason, such as speed of a good
module or inability to write code equal to a module.

Use a module when doing so is beneficial. Many
modules are worth using. Most are not.

Research and read about Perl's $. default variable.

Below you will find an exceptionally efficient method
and an easy-to-configure method. This method will
present an alternative for you to study.

Purl Gurl

#!perl

$internal_path = "c:/your/path/directory"; # your path here

chdir($internal_path);

@Directory_Parent = $internal_path;

while (@Directory_Parent)
{
$directory = shift (@Directory_Parent);
opendir(DIR, $directory) || next;
while (defined($child = readdir(DIR)))
{
if (-d "$directory/$child" && $child ne "." && $child ne "..")
{ push(@Directory_Parent, "$directory/$child"); }

if (-f "$directory/$child")
{ push (@File_List, "$directory/$child"); }
}
closedir(DIR);
}

for $file (@File_List)
{
open (COUNT, $file) || die $!;
while (<COUNT>)
{ $total_lines += $.; }
close (COUNT);
}

print "Total Lines: $total_lines";

# optional

# {
# local ($") = "\n";
# print "Files Checked:\n@File_List";
# }

John W. Krahn

unread,

Dec 2, 2005, 11:44:32 PM12/2/05

to Perl Beginners

The Ghost wrote:
> So far I did this:
>
> #!/usr/bin/perl

That should be followed by these two lines:

use warnings;
use strict;

> use File::Find;
> my $totalLines;
> find(\&wanted, '@directories');

Do you actually have a directory in the current directory named '@directories'?

> sub wanted {
> unless ($_=~m/.html|.mas|.pl|.txt$/i) {return 0;} #filter the

Your regular expression says to match any character (.) followed by the string
'html' anywhere in $_ OR any character followed by the string 'mas' anywhere
in $_ OR any character followed by the string 'pl' anywhere in $_ OR any
character followed by the string 'txt' only at the end of $_. What you
probably what is:

return unless /\.(?:html|mas|pl|txt)$/i;

> kinds of files you want
> open FILE, "<$File::Find::name";
> print "$_: ";
> my @lines=<FILE>;
> print "$#lines\n";
> $totalLines+=$#lines; #wanted's value is ignored so we have to

$#lines is one less then the number of lines in the file so your total will
not be accurate. That should be:

$totalLines += @lines;

But you don't really need to store all the lines in an array, you can do it
more simply as:

() = <FILE>;
print "$.\n";
$totalLines += $.;

Or use the example in the FAQ:

perldoc -q "How do I count the number of lines in a file"

> do this here.
> return;}
> print "$totalLines\n";
>
> This only limits me by the size of the file, or no?

John
--
use Perl;
program
fulfillment

John Doe

unread,

Dec 3, 2005, 6:38:41 AM12/3/05

to begi...@perl.org

The Ghost am Freitag, 2. Dezember 2005 19.30:

Hi,

In addition to John W. Krahn's good advices:

> So far I did this:
>
> #!/usr/bin/perl
>
> use File::Find;
> my $totalLines;
> find(\&wanted, '@directories');
> sub wanted {
> unless ($_=~m/.html|.mas|.pl|.txt$/i) {return 0;} #filter the kinds
> of files you want
> open FILE, "<$File::Find::name";

Always check if operations succeeded:

open (FILE, '<', $File::Find::name)
or die "couldn't open $File::Find::name: $!";

> print "$_: ";
> my @lines=<FILE>;

and close opened files:

close FILE or die "couldn't close $File::Find::name: $!";

John W. Krahn

unread,

Dec 3, 2005, 8:06:26 AM12/3/05

to Perl Beginners

John Doe wrote:
> The Ghost am Freitag, 2. Dezember 2005 19.30:
>

>> open FILE, "<$File::Find::name";
>
> Always check if operations succeeded:
>
> open (FILE, '<', $File::Find::name)
> or die "couldn't open $File::Find::name: $!";

Thanks, don't know how I missed that. :-)

John W. Krahn

unread,

Dec 3, 2005, 9:27:27 AM12/3/05

to Perl Beginners

John Doe wrote:
> The Ghost am Freitag, 2. Dezember 2005 19.30:
>

>> print "$_: ";
>> my @lines=<FILE>;
>
> and close opened files:
>
> close FILE or die "couldn't close $File::Find::name: $!";
>
>> print "$#lines\n";
>> $totalLines+=$#lines; #wanted's value is ignored so we have to
>>do this here.
>> return;}
>> print "$totalLines\n";

You bring up an interesting point about closing the filehandle because
normally you don't have to worry about that as perl will do the right thing.
However in the example I posted using the $. variable:

sub wanted {

...

() = <FILE>;
print "$.\n";
$totalLines += $.;
}

print "$totalLines\n";

Produces an incorrect value for $totalLines unless you close the filehandle
but if you don't close the filehandle then you can do this:

sub wanted {

...

() = <FILE>;
}
print "$.\n";

John

John Doe

unread,

Dec 3, 2005, 10:13:50 AM12/3/05

to begi...@perl.org

John W. Krahn am Samstag, 3. Dezember 2005 15.27:

> John Doe wrote:
> > The Ghost am Freitag, 2. Dezember 2005 19.30:
> >> print "$_: ";
> >> my @lines=<FILE>;
> >
> > and close opened files:
> >
> > close FILE or die "couldn't close $File::Find::name: $!";
> >
> >> print "$#lines\n";
> >> $totalLines+=$#lines; #wanted's value is ignored so we have to
> >>do this here.
> >> return;}
> >> print "$totalLines\n";
>
> You bring up an interesting point about closing the filehandle because
> normally you don't have to worry about that as perl will do the right
> thing.

Not sure if I understand you correctly:
Do you suggest *not* to close filehandles, because it's done by perl "doing
the right thing"?
Or should one decide in every case, if closing should be explicitly done or
not?

My thought was: When I always close filehandles, I don't have to think about
closing or not closing them (comparable to give always signs when driving,
even if nobody else is on the road).

For exemple perldoc -f open states:

"[...] You don't have to close FILEHANDLE if you are immediately
going to do another "open" on it [...]"

In a normal case, there is one point at which any filehandle is not reopened:
After the last reopening. So this case would have to be checked in a loop
(pseudocode: close FH if finished reopening)?

> However in the example I posted using the $. variable:
>
> sub wanted {
> ...
> () = <FILE>;
> print "$.\n";
> $totalLines += $.;
> }
> print "$totalLines\n";
>
> Produces an incorrect value for $totalLines unless you close the filehandle

I did not see this point.

> but if you don't close the filehandle then you can do this:
>
> sub wanted {
> ...
> () = <FILE>;
> }
> print "$.\n";

Sorry if it's just a misunderstanding by me wasting your time!

joe