Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Why Does This Script Rin So Slow?

0 views
Skip to first unread message

Eric Robinson

unread,
Nov 18, 2009, 10:43:58 AM11/18/09
to
I have a directory with 2 million files. This script runs pretty fast...

#!/bin/bash
j=0
for i in *
do
let j+=1
echo "$j: $i"
done

The next script runs a little slower....

#!/bin/bash
j=0
for i in *
do
let j+=1
the_file=$i
echo "$j: $the_file"
done

But THE NEXT script runs ridiculously slow...

#!/bin/bash
j=0
for i in *
do
let j+=1
the_file=`echo $i`
echo "$j: $the_file"
done

And the FINAL script (which represents the functionality I need) runs so
slow that it is completely unusable. Why the big difference?

#!/bin/bash
j=0
for i in *
do
let j+=1
the_file_lower_case=`echo $i | tr [:upper:] [:lower:]`
echo "$j: $the_file_lower_case"
done


Robert Newson

unread,
Nov 18, 2009, 6:02:27 PM11/18/09
to
Eric Robinson wrote:
...

> But THE NEXT script runs ridiculously slow...
...
> the_file=`echo $i`
...

> And the FINAL script (which represents the functionality I need) runs so
> slow that it is completely unusable. Why the big difference?
...

> the_file_lower_case=`echo $i | tr [:upper:] [:lower:]`
...
At a guess:


The `command` bits. Each of those executes in a separate process. The
kernel has to fork() for each command, followed by exec().

In looping over the directory of 2+ million entries, the ridiculously
slow script is creating 2+ million child processes (each of which does
"echo <filename>" for each filename); the unusable script is creating 4+
million child processes (one of which does echo <filename> with its
output piped into another doing the tr for each filename). Creating
child processes isn't free and creating 2+ million of them is certainly
going to have some impact, as is the 2+ million runnings of tr along
with their startup.

You may be much better off using awk, eg:

$ /bin/ls | awk '{print "mv -n \"" $0 "\" \"" tolower($0) "\""} | /bin/sh

[I'm no awk expert, just worked that out from the man page.]

which uses 3 processes, the first lists the files, the second creates a
command to rename the file to lower case (using the mv command) and the
third then executes the created commands. The only problem in processes
comes in that /bin/mv will be created as a child process 2+ million times.

The only way I can think of to rename the files FAST is to use a C
program, to either

a) replace the shell part of the above pipeline (along with a modified
awk print command) by reading 2 filenames from stdin and using rename(2)
to do it; or

b) have the C program do the whole lot: scan the directory, convert to
lower case, rename as appropriate - then it'd only be one process
regardless of the number of files

0 new messages