Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

AWK-Editing files in "place" and a --write switch

712 views
Skip to first unread message

rubi...@gmail.com

unread,
Sep 14, 2008, 11:37:36 AM9/14/08
to
Trying to construct a convenient awk option similar to sed -i, I’m
using the following code(s) used as
functions, to edit the file(s) in "place":


awk ' c-FNR >= 0 { close(temp); print rename | "sh" }
{ c=FNR ; rename=("mv "(temp=(f=FILENAME)".tmp")"
"f);
## close("sh")
sub(/A/, "B"); print > temp }
END { print rename | "sh" ; close("sh") } ' files*

or


awk ' NR-FNR != c { c=NR-FNR; close(temp); print rename | "sh" }
{ rename=("mv "(temp=(f=FILENAME)".tmp")" "f);
## close("sh")
sub(/A/, "B"); print > temp}
END { print rename | "sh"; close("sh") } ' files*

The idea is simply going the known way of writing first into a
temporary file, then renaming it
to the original file name by printing a mv statement with the
respective file names to the shell.
A nested variable "rename" is used, only to preserve the code
compactness. The empty files
are silently skipped; they can either be processed separately, or in
the same code's END
statement. Due to a significant code slow down, close("sh") is used in
the main code's body,
only if there is a fatal error of too many open pipes or if there's
not enough disk space. For
these very reasons, I have another function awk -f write-safe ... ,
where close("sh") is used.
In order to have unique variable and temp file names, they can be
fancier than the ones given here.

The code works fine in other awks too, say HP-UX, or Solaris nawk, and
a small modification
for /usr/xpg4/bin/awk ( the variables temp, f are set separately, not
nested).
It's interesting to see that, while sed -i '/../' is faster for small
files, the awk code performs better
for big files ( MBs in size ), especially in Cygwin and Ubuntu.


I'm currently running the code as a function "write.awk", copying it
to the gawk's search PATH:
.:/usr/share/awk


gawk -f write.awk --source '{ sub(/A/,"B"); print > temp }'
files* ## gawk only.


$ cat write.awk

NR-FNR != c { c=NR-FNR; close(temp); print rename | "sh" }
{ rename=("mv "(temp=(f=FILENAME)".tmp")" "f) ;
## close("sh") }
END { print rename | "sh"; close("sh") }

Other awks :

awk -f /path/to/write.awk -f /path/to/main_program.awk files*


Because of the inevitable issues related to the use of functions,
( their names must be unique,
usage of long full paths, the name of temporary file "temp" needs to
be explicitly stated into
the second code, ...), I'd like to know if the source code can be
modified to implement the above
concept, how, what are the steps needed to have a convenient and
silent awk --write ... switch,
so the steps of writing to the temp files and renaming them, take
place quietly in the background,
preferably having something like:

awk --write '{ sub(/A/,"B"); print }' files*

similar to: sed -i 's/A/B/' files* ?


OS: Linux, Cygwin
awk: GNU awk 3.1.6

Thanks.

Kenny McCormack

unread,
Sep 14, 2008, 11:51:42 AM9/14/08
to
In article <00e23435-3bae-49f4...@2g2000hsn.googlegroups.com>,

<rubi...@gmail.com> wrote:
>Trying to construct a convenient awk option similar to sed -i, I’m
>using the following code(s) used as
>functions, to edit the file(s) in "place":

If I understand you correctly, your "reason for posting" is that you've
hacked up a "script kludge" way of doing what you want, and are now asking
the maintainers if they would consider implementing this functionality
in the "core" (i.e., in the C source code). Is that correct?

The previous paragraph is _my_ "reason for posting", but I might as well
add a comment or two about the general idea:
1) As far as I'm concerned, the "right way" to do this (*) is to
store up the results in an array, then, in the END block,
dump the array out to the file:
close(FILENAME)
for (i=1; i<=n; i++) print myarray[i] > FILENAME
2) Regarding getting it changed in the source: My guess is that
you'll have better luck going ahead and doing it yourself
(it shouldn't be very difficult) than in getting the
maintainers to do it.

(*) I do this rarely, but in some isolated situations, it has seemed
warranted.

Ed Morton

unread,
Sep 14, 2008, 1:06:42 PM9/14/08
to
On 9/14/2008 10:37 AM, rubi...@gmail.com wrote:
> Trying to construct a convenient awk option similar to sed -i, I’m
> using the following code(s) used as
> functions, to edit the file(s) in "place":
>
>
> awk ' c-FNR >= 0 { close(temp); print rename | "sh" }
> { c=FNR ; rename=("mv "(temp=(f=FILENAME)".tmp")"
> "f);
> ## close("sh")
> sub(/A/, "B"); print > temp }
> END { print rename | "sh" ; close("sh") } ' files*
>
> or
>
>
> awk ' NR-FNR != c { c=NR-FNR; close(temp); print rename | "sh" }
> { rename=("mv "(temp=(f=FILENAME)".tmp")" "f);
> ## close("sh")
> sub(/A/, "B"); print > temp}
> END { print rename | "sh"; close("sh") } ' files*
<snip>

FWIW, if I really HAD to do this I'd use:

awk '
function printout(_str) { _out[++_nr] = _str }
function flushout( _i) { close(FILENAME);
for (_i=1; _i<=_nr;_i++)
print _out[_i] > FILENAME
}
{ printout( NR " " $0 ) }
END { flushout() }'

but in reality using a tmp file is much better in every way:

awk '{ print NR,$0 }' file > tmp && mv tmp file

Regards,

Ed.

rubi...@gmail.com

unread,
Sep 14, 2008, 5:25:26 PM9/14/08
to
> If I understand you correctly, your "reason for posting" is that you've
> hacked up a "script kludge" way of doing what you want, and are now asking
> the maintainers if they would consider implementing this functionality
> in the "core" (i.e., in the C source code). Is that correct?

Nope. Not in the way you stated it. I don't see anything wrong asking
the forum or even maintainers for a missing option, when sed has
implemented it for quite some time, and an option that many people
find it useful and asking for. That reason is clearly posted in the
last part of the thread.
AFAIK the same "kludge" concept is silently used in sed -i, and
surely it can do more than sed.

> 1) As far as I'm concerned, the "right way" to do this (*) is to
> store up the results in an array, then, in the END block,
> dump the array out to the file:
> close(FILENAME)

(*)
You've answered your & my concern.

> 2) Regarding getting it changed in the source: My guess is that
> you'll have better luck going ahead and doing it yourself
> (it shouldn't be very difficult) than in getting the
> maintainers to do it.

Being easy, it makes things look more optimistic. Haven't got to C
yet. If I knew it, I'd have posted that solution,
so _really_ contributing to the thread.
I'd be glad to see a functional, working C lines of code posted here
(?).

> but in reality using a tmp file is much better in every way:

> awk '{ print NR,$0 }' file > tmp && mv tmp file

> Regards,

Yes, that's my preferred way and that's what the code does, writing to
a temp file & renaming, but my main point was to use the code on
_many_ files at a time, not only one, so avoiding an external for/
while loop renaming _many_ temp files one by one.


Thanks.

Ed Morton

unread,
Sep 15, 2008, 4:32:30 PM9/15/08
to

There's aren't _many_ temp files, just one, and it makes more sense to just use
the external loop:

for f in files*; do
awk '{print NR,$0}' "$f" > tmp && mv tmp "$f"
done

so you only get "OS->awk" rather than to try to stick awk in the middle of things:

awk ' c-FNR >= 0 { close(temp); print rename | "sh" }
{ c=FNR ; rename=("mv "(temp=(f=FILENAME)".tmp")" "f);
## close("sh")
sub(/A/, "B"); print > temp }
END { print rename | "sh" ; close("sh") } ' files*

and get "OS->awk->OS". It'd be different if it was in some way decoupling your
script from the OS, but it's actually coupling it much tighter.

You could argue that there are times when you need to produce some output in the
END section that's the result of processing all the files, e.g.:

awk '{...} END{print NR}' file*

but IMHO it's cleaner to just work around that with a tmp file and a second awk
invocation if necessary, e.g.:

for f in files*; do
awk '{print NR,$0} END{print NR >> "nr"}' "$f" > tmp && mv tmp "$f"
done
awk '{nr+=$0} END{print nr}' nr

Regards,

Ed.

Kenny McCormack

unread,
Sep 15, 2008, 4:40:30 PM9/15/08
to
In article <b77c23e4-0039-46f0...@34g2000hsh.googlegroups.com>,

<rubi...@gmail.com> wrote:
>> If I understand you correctly, your "reason for posting" is that you've
>> hacked up a "script kludge" way of doing what you want, and are now asking
>> the maintainers if they would consider implementing this functionality
>> in the "core" (i.e., in the C source code). Is that correct?
>
>Nope. (really?) Not in the way you stated it. (really?) I don't see

>anything wrong asking the forum or even maintainers for a missing
>option, when sed has implemented it for quite some time, and an option
>that many people find it useful and asking for. That reason is clearly
>posted in the last part of the thread.

How is this different from what I said?

0 new messages