Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

changing a field without recompiling the record

2 views
Skip to first unread message

hpt

unread,
Mar 15, 2009, 9:34:48 PM3/15/09
to
> Someone recently asked how to do this (all spaces in "file" are tabs):
>
> $ cat file
> aaa bbb cXcXc ddd
> $ awk 'BEGIN{FS=OFS="\t"}gsub(/X/,"+",$3)' file
> aaa bbb c+c+c ddd
>
> i.e. change all ocurrences of a character in a field without changing
> the spacing between fields. The above only "works" because the field
> separator is a specific character and we can set the OFS to the same
> character so that though the record gets recompiled it looks the same
> after as before wrt field spacing. If the FS had been a space
> character (or any RE), however, we can't use that same trick (all
> spaces in "file" are chains of blank chars):
>
> $ cat file
> aaa bbb cXcXc ddd
> $ awk 'BEGIN{FS=OFS=" "}gsub(/X/,"+",$3)' file
> aaa bbb c+c+c ddd
> $ awk 'BEGIN{FS=OFS=" +"}gsub(/X/,"+",$3)' file
> aaa +bbb +c+c+c +ddd
>
> This problem of unwanted recompilation of the record comes up so
> often, I wonder if anyone has a suggestion on a simple way to work
> around it in general. Unless you use GNU awks gensub(), the best I can

Could you please tell me how to use gensub() to achieve this?

Ed Morton

unread,
Mar 17, 2009, 7:40:28 AM3/17/09
to

Here's how you'd do it with the default FS:

gawk --re-interval 'BEGIN{n=3}{f=$n;gsub(/X/,"+",f); $0=gensub
("([[:blank:]]*)(([^[:blank:]]+[[:blank:]]+){"n-1"})[^[:blank:]]
(.*)","\\1\\2"f"\\4","")}1' file

and you can customize that to any FS that's one character or a
sequence of those characters since you can negate characters inside a
[..], but it's not obvious how to do it for a general RE.

Ed.

0 new messages