bash 'while read' VERY slow, any alternatives??

Larry D'Cunha

unread,

Jul 2, 2002, 7:44:52 PM7/2/02

to

Does anyone know of another FASTER way to read the in the lines of a
file? I am using bash in Cygwin on Windows and the 'while read' or
'cat' is horribly slow.

cat myfile.txt | while read ENTRY; do
(do something)
done

--
Larry

William Park

unread,

Jul 2, 2002, 9:38:41 PM7/2/02

to

Well, no further optimization is possible from what you've written. If you
need loop, you need loop. What is "do something"?

--
William Park, Open Geometry Consulting, <openge...@yahoo.ca>
8-CPU Cluster, Hosting, NAS, Linux, LaTeX, python, vim, mutt, tin

Nithyanandham M

unread,

Jul 2, 2002, 10:37:40 PM7/2/02

to

William Park wrote:

> In comp.unix.shell Larry D'Cunha <lpdc...@engmail.uwaterloo.ca> wrote:
> > Does anyone know of another FASTER way to read the in the lines of a
> > file? I am using bash in Cygwin on Windows and the 'while read' or
> > 'cat' is horribly slow.
> >
> > cat myfile.txt | while read ENTRY; do
> > (do something)
> > done
>
> Well, no further optimization is possible from what you've written.

No.

while read ENTRY; do
(do something)

done < myfile.txt

looks faster to me.

--

Nithyanand.
Siemens, Bangalore, India.
(Opinions expressed are my own and do not reflect the opinions of my employer,
SIEMENS)

Greg Andrews

unread,

Jul 2, 2002, 11:01:17 PM7/2/02

to

What kind of testing have you done that tells you the
slowness is coming from the read rather than from the
cat or the "do something" commands?

You've cross-posted this to comp.lang.awk. Does that
mean awk is somehow involved?

-Greg
--
::::::::::::: Greg Andrews ::::: ge...@panix.com :::::::::::::
I have a map of the United States that's actual size.
-- Steven Wright

Dan Haygood

unread,

Jul 2, 2002, 11:55:12 PM7/2/02

to

The awk program is simple. The whole command line looks like this:
awk "function do_something(){}{ do_something() }" myfile.txt
This should be really fast, because, indeed, do_something() doesn't.

But whether or not this is appropriate depends on what you are trying to do.
If you are runnning commands based on the line, maybe schell script is
better.
If you are doing calculations on data from the line and generating output,
maybe awk is better.
But both can do either. You knew that.
- Dan

"Larry D'Cunha" <lpdc...@engmail.uwaterloo.ca> wrote in message
news:456de2c6.02070...@posting.google.com...

The awk program is simple. The whole command line looks like this:
awk "{ do_something }" myfile.txt
But referencing the uninitialized variable do_something, evaluating it as 0,
and throwing the 0 away, is not very useful. But consider:
awk "function do_something(){}{ do_something() }" myfile.txt
This should be pretty fast, too, because, indeed, do_something() doesn't.

But whether or not this is appropriate depends on what you are trying to do.
If you are runnning commands based on the lines read, maybe schell script is
better.
If you are doing calculations on data from the line and generating output,
maybe awk is better.
But both can do either. You knew that.
- Dan

Dan Haygood

unread,

Jul 2, 2002, 11:58:11 PM7/2/02

to

Oops, ignore the last top-post, the full post is on the bottom!
Don't ignore this top post.
- Dan

"Dan Haygood" <reply...@you.read.it> wrote in message
news:AAuU8.4862$P%6.49...@news2.west.cox.net...

William Park

unread,

Jul 3, 2002, 12:18:49 AM7/3/02

to

In comp.lang.awk Nithyanandham M <M.Nithy...@blr.spcnl.co.in> wrote:
>> > cat myfile.txt | while read ENTRY; do
>> > (do something)
>> > done
>>
>> Well, no further optimization is possible from what you've written.
>
> No.
> while read ENTRY; do
> (do something)
> done < myfile.txt
>
> looks faster to me.

Both are same -- one redirects output, and the other redirects input.
Chances are that "do something" is responsible for "slowness" rather than
'cat' or 'while'.

Ian Wild

unread,

Jul 3, 2002, 4:14:46 AM7/3/02

to

Larry D'Cunha wrote:
>
> Does anyone know of another FASTER way to read the in the lines of a
> file? I am using bash in Cygwin on Windows and the 'while read' or
> 'cat' is horribly slow.
>

On the Cygwin site there's a FAQ along the lines of

Q: Why is everything so.o.o.o slow?
A: Because your PATH is asking for infinite network look-ups. Fix it.

Certainly worked for me.

Dan Mercer

unread,

Jul 3, 2002, 7:56:06 AM7/3/02

to

In article <afttv9$hejfi$2...@id-99293.news.dfncis.de>,

William Park <openge...@NOSPAM.yahoo.ca> writes:
> In comp.lang.awk Nithyanandham M <M.Nithy...@blr.spcnl.co.in> wrote:
>>> > cat myfile.txt | while read ENTRY; do
>>> > (do something)
>>> > done
>>>
>>> Well, no further optimization is possible from what you've written.
>>
>> No.
>> while read ENTRY; do
>> (do something)
>> done < myfile.txt
>>
>> looks faster to me.
>
> Both are same -- one redirects output, and the other redirects input.
> Chances are that "do something" is responsible for "slowness" rather than
> 'cat' or 'while'.
>

Not under Cygwin where process creation is extremely expensive - for
one thing, Cygwin writes the name of executing processes to the titlebar.

This is a truly UUOC.

--
Dan Mercer
dame...@mmm.com

Dan Mercer

unread,

Jul 3, 2002, 7:54:29 AM7/3/02

to

In article <456de2c6.02070...@posting.google.com>,

lpdc...@engmail.uwaterloo.ca (Larry D'Cunha) writes:

This is a useless use of cat (UUOC). Under Cygwin and bash, it
is especially useless. All segments of pipelines in bash are
executed in separate processes from the main shell process. So,
counting the original bash process, you have four processes going
when only one is needed. Starting processes under Cygwin is
extremely expensive - for one thing, every process started gets
reflected to the Cygwin window titlebar. If you put up Xwindows
under Cygwin and use xterm pipelines might actually run faster.

while read entry
do
:
done < myfile.txt

runs 17 times faster by my calculations. Cygwin is one environment
where avoiding process creation by using bash builtins is of
paramount importance.

--
Dan Mercer
dame...@mmm.com

Larry D'Cunha

unread,

Jul 3, 2002, 1:39:57 PM7/3/02

to

Ok thanks for all your suggestions. It turns out my loop was slow
because of the way I used awk inside the loop, not because of the cat
myfile/while read.

cat myfile.txt | while read ENTRY; do

...(some other stuff)
num=`echo $ENTRY| awk '{print $1}'` (this line is slooooow)
done;

In the loop code I need to extract the 1st and 2nd columns of the
current line and do some things with it, so I use awk but it's so slow
in my bash script on WinNT-Cygwin. Is there a way to use awk to loop
through and process each line one at a time?

Thanks,
Larry

lpdc...@engmail.uwaterloo.ca (Larry D'Cunha) wrote in message news:<456de2c6.02070...@posting.google.com>...

Barry Margolin

unread,

Jul 3, 2002, 1:40:37 PM7/3/02

to

In article <afttv9$hejfi$2...@ID-99293.news.dfncis.de>,

William Park <openge...@NOSPAM.yahoo.ca> wrote:
>In comp.lang.awk Nithyanandham M <M.Nithy...@blr.spcnl.co.in> wrote:
>>> > cat myfile.txt | while read ENTRY; do
>>> > (do something)
>>> > done
>>>
>>> Well, no further optimization is possible from what you've written.
>>
>> No.
>> while read ENTRY; do
>> (do something)
>> done < myfile.txt
>>
>> looks faster to me.
>
>Both are same -- one redirects output, and the other redirects input.

Huh? One redirects input to a pipe, the other redirects input to a file.
If pipes are slow in Cygwin, that could make a difference.

>Chances are that "do something" is responsible for "slowness" rather than
>'cat' or 'while'.

That's easy enough to tell by replacing the "do something" with nothing.

It would be nice if all the Cygwin questions could be sent to Windows
groups rather than Unix groups. I don't want to sound like Rev.Don, but a
Windows system running Cygwin is not really a Unix system, it's just a
facade, and many of the questions and answers appropriate for Unix (and
workalike systems like *BSD and Linux) often don't apply to Cygwin.

In particular, if something you know works on typical Unix systems doesn't
work under Cygwin, that should be a strong signal that the question isn't
appropriate for a Unix newsgroup.

--
Barry Margolin, bar...@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Volker Hetzer

unread,

Jul 3, 2002, 2:05:35 PM7/3/02

to

"Larry D'Cunha" <lpdc...@engmail.uwaterloo.ca> wrote in message news:456de2c6.02070...@posting.google.com...

> Ok thanks for all your suggestions. It turns out my loop was slow
> because of the way I used awk inside the loop, not because of the cat
> myfile/while read.
>
> cat myfile.txt | while read ENTRY; do
> ...(some other stuff)
> num=`echo $ENTRY| awk '{print $1}'` (this line is slooooow)
> done;
>
> In the loop code I need to extract the 1st and 2nd columns of the
> current line and do some things with it, so I use awk but it's so slow
> in my bash script on WinNT-Cygwin. Is there a way to use awk to loop
> through and process each line one at a time?

function one () { echo $1; }

num=`one $ENTRY` maybe?

Greetings!
Volker

William Park

unread,

Jul 3, 2002, 2:07:32 PM7/3/02

to

In comp.lang.awk Larry D'Cunha <lpdc...@engmail.uwaterloo.ca> wrote:
> Ok thanks for all your suggestions. It turns out my loop was slow
> because of the way I used awk inside the loop, not because of the cat
> myfile/while read.
>
> cat myfile.txt | while read ENTRY; do
> ...(some other stuff)
> num=`echo $ENTRY| awk '{print $1}'` (this line is slooooow)
> done;
>
> In the loop code I need to extract the 1st and 2nd columns of the
> current line and do some things with it, so I use awk but it's so slow
> in my bash script on WinNT-Cygwin. Is there a way to use awk to loop
> through and process each line one at a time?

Well, "process each line" is the key insight here. I mean, process how?
As for looping, try
cat myfile.txt | while read a b rest; do
# do something with 'a' and 'b'...
done

Chris F.A. Johnson

unread,

Jul 3, 2002, 2:17:18 PM7/3/02

to

In article <456de2c6.02070...@posting.google.com>, Larry D'Cunha wrote:
> Ok thanks for all your suggestions. It turns out my loop was slow
> because of the way I used awk inside the loop, not because of the cat
> myfile/while read.
>
> cat myfile.txt | while read ENTRY; do
> ...(some other stuff)
> num=`echo $ENTRY| awk '{print $1}'` (this line is slooooow)

You're calling awk once for each line of your file; no wonder it's
slow.

> done;
>
> In the loop code I need to extract the 1st and 2nd columns of the
> current line and do some things with it, so I use awk but it's so slow
> in my bash script on WinNT-Cygwin. Is there a way to use awk to loop
> through and process each line one at a time?

Isn't that what I suggested in my previous reply? If you want to
use awk, the above script can be written as:

awk '{print $1}' myfile.txt

Depending on the format of the file, you could probably use cut (I
believe it comes with Cygwin), which is a much smaller utility:

cut -d " " -f1 myfile.txt

The character in quotation marks is the character that delimits
the first field of the file (space or tab).

You could also do this with just the shell, not calling any
external command:

while read ENTRY junk
do
echo $ENTRY
done

--
Chris F.A. Johnson http://cfaj.freeshell.org
===================================================================
My code (if any) in this post is copyright 2002, Chris F.A. Johnson
and may be copied under the terms of the GNU General Public License

Adam Price

unread,

Jul 3, 2002, 2:24:28 PM7/3/02

to

I think I read somewhere that Larry D'Cunha
<lpdc...@engmail.uwaterloo.ca> wrote ... :

> Ok thanks for all your suggestions. It turns out my loop was slow
> because of the way I used awk inside the loop, not because of the cat
> myfile/while read.
>
> cat myfile.txt | while read ENTRY; do
> ...(some other stuff)
> num=`echo $ENTRY| awk '{print $1}'` (this line is slooooow)
> done;
>

This can be replaced with

#while read num rest ; do
#ENTRY="${num} ${rest}"
#....(some other stuff)
#done < myfile.txt

> In the loop code I need to extract the 1st and 2nd columns of the
> current line and do some things with it, so I use awk but it's so slow
> in my bash script on WinNT-Cygwin. Is there a way to use awk to loop
> through and process each line one at a time?
>

Why do you need awk at all, if you want the fisrt two fields instead
of just the first one then change the first line of what I wrote to

# while read num other rest ; do

Then you get the fields you want in $num and $other and the rest of the
line in $rest.

If you need a different field separator then set IFS before you call read.

HTH
Adam

Kevin Rodgers

unread,

Jul 3, 2002, 2:56:26 PM7/3/02

to

Larry D'Cunha wrote:

> Ok thanks for all your suggestions. It turns out my loop was slow
> because of the way I used awk inside the loop, not because of the cat
> myfile/while read.
>
> cat myfile.txt | while read ENTRY; do
> ...(some other stuff)
> num=`echo $ENTRY| awk '{print $1}'` (this line is slooooow)
> done;
>
> In the loop code I need to extract the 1st and 2nd columns of the
> current line and do some things with it, so I use awk but it's so slow
> in my bash script on WinNT-Cygwin. Is there a way to use awk to loop
> through and process each line one at a time?

awk '{print $1, $2}' myfile.txt |

while read column_1 column_2; do
# do something with $column_1 and $colunn_2
done

--
Kevin Rodgers <kev...@ihs.com>

Chris F.A. Johnson

unread,

Jul 3, 2002, 3:03:42 PM7/3/02

to

How does that differ from:

while read column_1 column_2 junk; do
# do something with $column_1 and $column_2
done < myfile.txt

...besides using awk as an expensive (and unnecessary) form of cat?

Stepan Kasal

unread,

Jul 8, 2002, 3:10:42 AM7/8/02

to

Hallo,

On 3 Jul 2002 10:39:57 -0700, Larry D'Cunha wrote:
> cat myfile.txt | while read ENTRY; do
> ...(some other stuff)
> num=`echo $ENTRY| awk '{print $1}'` (this line is slooooow)
> done;
>
> In the loop code I need to extract the 1st and 2nd columns of the
> current line and do some things with it, so I use awk but it's so slow
> in my bash script on WinNT-Cygwin. Is there a way to use awk to loop
> through and process each line one at a time?

awk '{print $1}' myfile.txt | while read num; do
...(some other stuff)
done;

or, if you need the ENTRY:

awk '{print $1, $0}' myfile.txt | while read num ENTRY; do
...(some other stuff)
done;

Until you send more complete example, noone can really help you.

Stepan