awk file size limit?

claudegps

unread,

Jul 23, 2008, 12:32:13 PM7/23/08

to

Hi all,
I'm trying to learn awk and use it for files manipulation.
I'm trying a simple:
awk 'gsub("\t"," "); print >FILENAME}' *.py

this should give me all the python scripts with tabs substituted by 4
spaces.
It works, but the output file are always truncated: using Ubuntu to
4km using cygwin to 80k.

Any suggestion?
Thanks in advance

Claudio

Kenny McCormack

unread,

Jul 23, 2008, 12:50:57 PM7/23/08

to

In article <8f3d2375-1950-48a0...@w7g2000hsa.googlegroups.com>,

Short summary: It's not what you think.

You can't write back to the same file you're reading from (simultaneously).

You either have to arrange to do the usual "write to temp file, delete
original file, rename temp file to original file" dance, *or*, if you're
careful, in AWK, you can build up the new file in an array, then after
the original file is fully read and has been closed, then write out the
array to the original filename.

claudegps

unread,

Jul 23, 2008, 1:05:37 PM7/23/08

to

On 23 Lug, 18:50, gaze...@xmission.xmission.com (Kenny McCormack)
wrote:
> In article <8f3d2375-1950-48a0-9068-3da253f84...@w7g2000hsa.googlegroups.com>,

>
> claudegps <claude...@gmail.com> wrote:
> >Hi all,
> > I'm trying to learn awk and use it for files manipulation.
> >I'm trying a simple:
> >awk 'gsub("\t"," "); print >FILENAME}' *.py
>
> >this should give me all the python scripts with tabs substituted by 4
> >spaces.
> >It works, but the output file are always truncated: using Ubuntu to
> >4km using cygwin to 80k.
>
> >Any suggestion?
> >Thanks in advance
>
> >Claudio
>
> Short summary: It's not what you think.
>
> You can't write back to the same file you're reading from (simultaneously).

I see...

> You either have to arrange to do the usual "write to temp file, delete
> original file, rename temp file to original file" dance, *or*, if you're

Sure this works!

> careful, in AWK, you can build up the new file in an array, then after
> the original file is fully read and has been closed, then write out the
> array to the original filename.

Ok, I'll study to try this.
I think I should forget the "single line does all the work I need" :)
Thanks for your help!

Claudio

Ed Morton

unread,

Jul 23, 2008, 11:38:31 PM7/23/08

to

OK, but then forget it again and never, ever do it :-). Seriously - it's one of
those things you CAN do for an exercise, e.g. if you just have one file:

awk '
function printout(_str) { _out[++_nr] = _str }
function flushout( _i) { close(FILENAME);
for (_i=1; _i<=_nr;_i++)
print _out[_i] > FILENAME
}
{ gsub("\t"," "); printout($0) }
END { flushout() }' file

but it's just obscure and complicated compared to a simple tmp file, e.g. in UNIX:

awk '{gsub("\t"," ")}1' file > tmp && mv tmp file

> I think I should forget the "single line does all the work I need" :)

You can do a LOT with a single line. Of course, it somewhat depends on how long
a line you think is reasonable.

Ed.

Kenny McCormack

unread,

Jul 24, 2008, 7:39:30 AM7/24/08

to

In article <4887F937...@lsupcaemnt.com>,
Ed Morton <mor...@lsupcaemnt.com> wrote:
...
>... compared to a simple tmp file,

>e.g. in UNIX:
>
>awk '{gsub("\t"," ")}1' file > tmp && mv tmp file

The problem is that this doesn't easily generalize to handling multiple
files (in a single AWK program). You end up doing a shell loop, and
that is, of course, 1) OT here and 2) Not generalizable to other
(non-Unix) OSs.

OB CYA: Yes, of course there are ways to do it, but the point is that it
doesn't flow naturally in AWK. This is an area where Perl/sed's -i
option is actually a nice piece of syntactic sugar.