Is this a job for awk?

Fritz Wuehler

unread,

May 8, 2012, 11:06:35 AM5/8/12

to

Hello guys I stumbled upon awk today and it's pretty neat. I am thinking
about an application and I don't know if it's better to write it in C or
awk. Basically I want to scan source for certain commands and do a
substitution like the cpp on steroids. Doing the regexp and matching for
hits on the input text and doing the replacements is pretty easy so far but
I don't know how to copy lines that don't match to the output! Every line
will either be echoed or transformed. I don't get how to copy non matching
lines unless I process every line as $0 and do a bunch of if else and then
fall through to just writing the whole line if it doesn't match. Like I
would do it in C. But that seems to go against awk's pattern action
philosophy. Would that be too inefficient or is there an elegant way to
(gr)awk this?

Hermann Peifer

unread,

May 8, 2012, 11:37:08 AM5/8/12

to

In short:
pattern1 { replacement action 1 }
pattern2 { replacement action 2 }
{ print }

Hermann

Luuk

unread,

May 8, 2012, 1:16:15 PM5/8/12

to Hermann Peifer

or slightly longer ;)

~/tmp> cat test
aap
noot
mies

~/tmp> awk '/a/{ $0=$0 "X"; }/o/{ $0="X" $0; }{ print }' test
aapX
Xnoot
mies

~/tmp>

Anton Treuenfels

unread,

May 8, 2012, 11:10:48 PM5/8/12

to

"Hermann Peifer" <pei...@gmx.eu> wrote in message
news:jobej5$h1v$1...@news.albasani.net...

> In short: # commented a bit
> pattern1 { replacement action 1 } # if $0 ~ /pattern1/ change $0
> pattern2 { replacement action 2 } # if $0 ~ /pattern2/ change $0
> { print } # print $0 (whether it
> changed or not)
>
> Hermann

Fritz Wuehler

unread,

May 9, 2012, 1:01:18 AM5/9/12

to

Nomen Nescio

unread,

May 9, 2012, 4:58:55 AM5/9/12

to

Hello guys, sorry for the duplicate emails and thanks for everyone who replied.

I guess I didn't explain very well. I want to either do some processing on
the text or echo it as is. If I do the processing I don't want the line
printed. That is what I can't figure out how to do. How do I suppress
printing the lines I process, while still printing the lines I don't
process? Thanks.

Janis Papanagnou

unread,

May 9, 2012, 6:17:22 AM5/9/12

to

/pattern1/ { gsub(/xyz/,"ABC") } # process & continue with next pattern
/pattern2/ { print } # print unprocessed and continue
/pattern3/ { sub(/X/,"U"); next } # process & skip to next input line
!/pattern1/ { do_whatever } # do or don't do something if not pattern1

Be aware that
* print - will print the line, if you have an action without print (or
printf) only the action will be performed without printing
* next - will skip further evaluation of the subsequent patterns and
continue with the first pattern for the next data input line
* Use negated patterns where you want to trigger on not matching lines
And of course
* read the manual
* read "How to post smart questions" (google for phrases like that)

Janis

Ed Morton

unread,

May 9, 2012, 9:06:08 AM5/9/12

to

On 5/9/2012 3:58 AM, Nomen Nescio wrote:
<snip>

> I guess I didn't explain very well. I want to either do some processing on
> the text or echo it as is. If I do the processing I don't want the line
> printed. That is what I can't figure out how to do. How do I suppress
> printing the lines I process, while still printing the lines I don't
> process? Thanks.

/pattern/ {
<...do some processing...>
next
}
{ print }

Regards,

Ed.

Kaz Kylheku

unread,

May 9, 2012, 12:33:22 PM5/9/12

to

On 2012-05-08, Fritz Wuehler <fr...@spamexpire-201205.rodent.frell.theremailer.net> wrote:
> Hello guys I stumbled upon awk today and it's pretty neat. I am thinking
> about an application and I don't know if it's better to write it in C or
> awk. Basically I want to scan source for certain commands and do a
> substitution like the cpp on steroids. Doing the regexp and matching for

A rege replacement hack of this sort will envariably start more like a cpp
on weed, and in all likelihood will stay stoned.

Firstly CPP recognizes macro calls of this form

stuff stuff stuff MACRO(
ARG,
ARG ) more stuff

so you need proper lexical analysis that spans across lines.

Secondly, macro calls can nest, because macro arguments can themselves
contain arbitrary material, including macro calls.

stuff stuff stuff MACRO(
stuff MAC2(arg, mac3,
mac4, xyz()) junk, // comment: xyz is a function
ARG ) more stuff

Sure, that *can* all be done by iterating with regular expressions,
but your program will no longer be very ... awky. More like awkward.
Still, perhaps better than the C program to do the same thing.

You have to treat the entire file as a string and look for macro calls
which are primary: an identifier followed by parenthes which do not
have a parenthesis between them. Expand those first and then iterate.
This is not so simple because some things will not expand, like xyz()
above which isn't a macro. What you can do is translate it anyway, to some
special encoding encoding for unexpanded calls which hides the parentheses (so
you can regex over it), and which is then decoded in a final pass over the
buffer back to the original notation.

One possible encoding for unexpanded calls is some notation which replaces
xyz() with, say '@123'. This 123 is just a key in a hash table which maps
to the string "xyz()". Of course if you see '@' in the original input, you
escape it to "@@" so it doesn't confuse your final pass.

So given MAC1(a,b,MAC2(xyz(MAC3(c)),d,MAC4())), we make a first pass
in which our regex finds

MAC1(a,b,MAC2(xyz(MAC3(c)),d,MAC4()))
^^^^^^^ ^^^^^^

These are macros, and so they get expanded. (Oh, and by the way, the C
preprocessor automatically kills macro recursion. If a macro expansion produces a macro call that was already expanded during that expansion, it is not
expanded again.)

So now we iterate, looking again for a macro which contains no parentheses.

MAC1(a,b,MAC2(xyz(MAC5(4),FOO),d,MAC 4 REP))
^^^^^^^

Let's say MAC5(4) goes to the replacement text M54. Next:

MAC1(a,b,MAC2(xyz(M54,FOO),d,MAC 4 REP))
^^^^^^^^^^^^

Oops, this is not a macro, so we have to leave it alone. But of course
M54 and FOO could be object-like macros. We have to expand those first
to obtain a fully expanded version of xyz(M54,FOO) which could look like
something else, say xyz(1,2,3,4). We "freeze dry that" into @1 (the first
entry in our freeze dry hash table). This lets us continue:

MAC1(a,b,MAC2(@1,d,MAC 4 REP))
^^^^^^^^^^^^^^^^^^^^

We expand MAC2, etc.

Also, you have to keep in mind that there are macro defining constructs which
can appear anywhere. If you assume that they have to be set off by a #define at
the beginning of the line, this simplifies things.

You have to break the file into regions in between directives like #define,
and do the macro expansion between those regions.

awk record separation might be useful for this, giving you strings that either
begin with '#' and are preprocessing directives, or else giving you strings
that are multi-line text in between preprocessing directives (or at the
beginning or end of a file).

Good luck.

Kenny McCormack

unread,

May 9, 2012, 3:58:20 PM5/9/12

to

In article <201205090...@kylheku.com>,
Kaz Kylheku <k...@kylheku.com> wrote:
...

>A rege replacement hack of this sort will envariably start more like a cpp
>on weed, and in all likelihood will stay stoned.

I think the point of your post (with which I agree) is that you can write a
language parser in AWK (since by definition, subject to the usual
assumptions, you can write anything in any language), but it isn't pretty.
And, more specifically, you can't do it while retaining AWK's strength,
which is the simple pattern/action, pattern/action, pattern/action, etc
model. You end up writing regular procedural code; in the end case, it all
ends up in a BEGIN block.

I've been through this once, trying to write a language parser in AWK. I
kept getting to about 90%, but couldn't quite get it to work. Eventually, I
gave up and re-did it in Txl (www.txl.ca). Txl is a great language
(although really hard to "get" and figure out, takes a total re-wire of your
brain to get used to it) admirably suited to this sort of work.

--

Some of the more common characteristics of Asperger syndrome include:

* Inability to think in abstract ways (eg: puns, jokes, sarcasm, etc)
* Difficulties in empathising with others
* Problems with understanding another person's point of view
* Hampered conversational ability
* Problems with controlling feelings such as anger, depression
and anxiety
* Adherence to routines and schedules, and stress if expected routine
is disrupted
* Inability to manage appropriate social conduct
* Delayed understanding of sexual codes of conduct
* A narrow field of interests. For example a person with Asperger
syndrome may focus on learning all there is to know about
baseball statistics, politics or television shows.
* Anger and aggression when things do not happen as they want
* Sensitivity to criticism
* Eccentricity
* Behaviour varies from mildly unusual to quite aggressive
and difficult

Nomen Nescio

unread,

May 10, 2012, 5:44:03 AM5/10/12

to

gaz...@shell.xmission.com (Kenny McCormack) wrote:

> I've been through this once, trying to write a language parser in AWK. I
> kept getting to about 90%, but couldn't quite get it to work. Eventually, I
> gave up and re-did it in Txl (www.txl.ca). Txl is a great language
> (although really hard to "get" and figure out, takes a total re-wire of your
> brain to get used to it) admirably suited to this sort of work.

At this point I am looking for tools that come with every UNIX like OS. I've
been through m4 and it looked promising but became too ugly very fast. awk
can probably do what I want. I'll know soon. If not back to C I go. Thanks
to all.

> Some of the more common characteristics of Asperger syndrome include:
>
> * Inability to think in abstract ways (eg: puns, jokes, sarcasm, etc)
> * Difficulties in empathising with others
> * Problems with understanding another person's point of view
> * Hampered conversational ability
> * Problems with controlling feelings such as anger, depression
> and anxiety
> * Adherence to routines and schedules, and stress if expected routine
> is disrupted
> * Inability to manage appropriate social conduct
> * Delayed understanding of sexual codes of conduct
> * A narrow field of interests. For example a person with Asperger
> syndrome may focus on learning all there is to know about
> baseball statistics, politics or television shows.
> * Anger and aggression when things do not happen as they want
> * Sensitivity to criticism
> * Eccentricity
> * Behaviour varies from mildly unusual to quite aggressive
> and difficult

So all of usenet is afflicted with Asperger syndrome?

Fritz Wuehler

unread,

May 10, 2012, 7:42:11 AM5/10/12

to

Thank you, Ed. I tried that after reading Janis's post/insult and it worked
fine. Looks like awk is going to be a winner for this application.

Kenny McCormack

unread,

May 10, 2012, 8:08:03 AM5/10/12

to

In article <a3a96300e9edf98a...@msgid.frell.theremailer.net>,
Fritz Wuehler <fr...@spamexpire-201205.rodent.frell.theremailer.net> wrote:
...

>I tried that after reading Janis's post/insult and it worked

Heh heh. If you think Janis's post was an insult, then I really don't think
Usenet is for you.

(
I assume you are referring to:

>And of course
>* read the manual
>* read "How to post smart questions" (google for phrases like that)

)

--
"The anti-regulation business ethos is based on the charmingly naive notion
that people will not do unspeakable things for money." - Dana Carpender

Quoted by Paul Ciszek (pciszek at panix dot com). But what I want to know
is why is this diet/low-carb food author doing making pithy political/economic
statements?

Nevertheless, the above quote is dead-on, because, the thing is - business
in one breath tells us they don't need to be regulated (which is to say:
that they can morally self-regulate), then in the next breath tells us that
corporations are amoral entities which have no obligations to anyone except
their officers and shareholders, then in the next breath they tell us they
don't need to be regulated (that they can morally self-regulate) ...

Janis Papanagnou

unread,

May 10, 2012, 9:59:06 AM5/10/12

to

Am 10.05.2012 13:42, schrieb Fritz Wuehler:
>
> Thank you, Ed. I tried that after reading Janis's post/insult [...]

Your reply here I consider to be an offensive and insulting imputation;
specifically after I gave you examples and explanations in addition to
hints how to improve your requests here.

Mind to explain where you §$/%*! think that *I* insulted you, newbie?!

Heck!

>

Kenny McCormack

unread,

May 10, 2012, 10:10:18 AM5/10/12

to

In article <joghj7$rbe$1...@speranza.aioe.org>,

Of course, the newbie will point out that, regardless of whether or not you
did before, you certainly have done so now!

(Something about arguing with fools - and wrestling with pigs - comes to mind)

--
Modern Christian: Someone who can take time out from
complaining about "welfare mothers popping out babies we
have to feed" to complain about welfare mothers getting
abortions that PREVENT more babies to be raised at public
expense.

Janis Papanagnou

unread,

May 10, 2012, 10:39:30 AM5/10/12

to

Am 10.05.2012 16:10, schrieb Kenny McCormack:
> In article<joghj7$rbe$1...@speranza.aioe.org>,
> Janis Papanagnou<janis_pa...@hotmail.com> wrote:
>>
>> Mind to explain where you §$/%*! think that *I* insulted you, newbie?!
>

> Of course, the newbie will point out that, regardless of whether or not you
> did before, you certainly have done so now!

Really? And I thought that "§$/%*!" was cryptic enough. What a mishap!

Janis

Nomen Nescio

unread,

May 10, 2012, 11:06:11 AM5/10/12

to

Hello Kaz Kylheku

> On 2012-05-08, Fritz Wuehler <fr...@spamexpire-201205.rodent.frell.theremailer.net> wrote:
> > Hello guys I stumbled upon awk today and it's pretty neat. I am thinking
> > about an application and I don't know if it's better to write it in C or
> > awk. Basically I want to scan source for certain commands and do a
> > substitution like the cpp on steroids. Doing the regexp and matching for
>
> A rege replacement hack of this sort will envariably start more like a cpp
> on weed, and in all likelihood will stay stoned.

I guess I could understand that better if I knew what it meant.

> Firstly CPP recognizes macro calls of this form
>
> stuff stuff stuff MACRO(
> ARG,
> ARG ) more stuff
>
> so you need proper lexical analysis that spans across lines.

I don't want to replace cpp, I have some ideas on something I want to accomplish.
The comparison was just for the sake of explaining somewhat what direction I
am going in so I didn't have to give too much background. Thanks for your
post there is lots of good info there. I enjoy reading your posts on
comp.lang.c as well.

Fritz Wuehler

unread,

May 10, 2012, 1:14:55 PM5/10/12

to

Thanks but this still isn't what I want to do.

Do I have to specify a ! action for every pattern? That doesn't sound very
good.

I want to do something different for each pattern. Then if none of those
patterns match I want to print the line unchanged.

It is somewhat like a CASE statement but many languages don't let you do
SWITCH or CASE except on integers. I want printing to be the OTHER or
fall through case.

>
> Be aware that
> * print - will print the line, if you have an action without print (or
> printf) only the action will be performed without printing

Understood

> * next - will skip further evaluation of the subsequent patterns and
> continue with the first pattern for the next data input line

Thanks. Maybe a better way to do what I want would be to put next at the end
of my processing for each pattern?

> * Use negated patterns where you want to trigger on not matching lines

This becomes cumbersome and error prone after more than a couple of patterns.

> And of course
> * read the manual

Did that

> * read "How to post smart questions" (google for phrases like that)

Maybe there should be another article explaining how to answer questions. My
question is understandable as I asked it:

"I don't know how to copy lines that don't match to the output!"

The answers all copied every line to the output whether it matched or
not. Technically that may satisfy the question but it's obviously not very
useful. I appreciated those answers since at least they tried to help
without being rude about it.

>
> Janis

Manuel Collado

unread,

May 10, 2012, 6:22:36 PM5/10/12

to

El 10/05/2012 19:14, Fritz Wuehler escribió:
>
> Thanks but this still isn't what I want to do.
>
> Do I have to specify a ! action for every pattern? That doesn't sound very
> good.
>
> I want to do something different for each pattern. Then if none of those
> patterns match I want to print the line unchanged.
>
> It is somewhat like a CASE statement but many languages don't let you do
> SWITCH or CASE except on integers. I want printing to be the OTHER or
> fall through case.

Given a SWITCH schema:

switch (x) {
case A: actionA; break;
case B: actionB; break;
case C: actionC; break;
default: actionD;
}

The awkish way to implement it is:

A { actionA; next }
B { actionB; next }
C { actionC; next }
{ actionD }

Please note the awk "next" statement that plays a similar role than
"break" in the SWITCH schema.

--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado

Hermann Peifer

unread,

May 11, 2012, 2:52:16 PM5/11/12

to

Thanks for taking the time and adding comments. However, I had a more
generic definition of patterns in mind, covering also these sample
patterns which I use quite often: 'NF > 3' , 'length($0) != 72', '!
a[$1]++', ...

I should have added a link to [1] in my earlier mail.

Hermann

[1] http://www.gnu.org/software/gawk/manual/gawk.html#Pattern-Overview

Eric Pement

unread,

May 12, 2012, 12:18:06 AM5/12/12

to

For Fritz Wuehler:

I have tried to follow this thread from the beginning, and I have a few comments to make on a few points:

> > >>> On 08/05/2012 17:06, Fritz Wuehler wrote:
> > >>>> Hello guys I stumbled upon awk today and it's pretty neat.

I always thought so too. I first became interested in programming through awk.

[ ... deletion ... ]

> > > I guess I didn't explain very well. I want to either do some processing on
> > > the text or echo it as is. If I do the processing I don't want the line
> > > printed. That is what I can't figure out how to do. How do I suppress
> > > printing the lines I process, while still printing the lines I don't
> > > process? Thanks.
> >
> > /pattern1/ { gsub(/xyz/,"ABC") } # process & continue with next pattern
> > /pattern2/ { print } # print unprocessed and continue
> > /pattern3/ { sub(/X/,"U"); next } # process & skip to next input line
> > !/pattern1/ { do_whatever } # do or don't do something if not pattern1
>
> Thanks but this still isn't what I want to do.
>
> Do I have to specify a ! action for every pattern? That doesn't sound very
> good.
>
> I want to do something different for each pattern. Then if none of those
> patterns match I want to print the line unchanged.
>
> It is somewhat like a CASE statement but many languages don't let you do
> SWITCH or CASE except on integers. I want printing to be the OTHER or
> fall through case.

Moving from the bottom up, GNU awk does have a "switch" statement, so that option is available to you if you want it or are free to install it.

Second, most replies began simply with the fundamental awk syntax of

/pattern1/ { action; action; }
/pattern2/ { some other action; }

But I can see that you expect to have a lot of patterns and are understandably reluctant to have to insert negated patterns like !/this/ .

I can think of two workarounds. One was mentioned earlier, but needs to be reemphasized. The "next" command will cease processing the current line ($0), avoiding the need to negate the pattern that selected it.

The other workaround looks something like this:

{ # select every line
if ( /pattern1/ ) {
process1
process2
# no printing gets done here . . .
} else if ( /pattern2/ || /pattern3/ ) {
process3
process4
# nor any here . . .
} else {
print # everything else
}
}

This all assumes that GNU awk with "switch" is not available or not wanted.

[ ... deletion ... ]

> "I don't know how to copy lines that don't match to the output!"
>
> The answers all copied every line to the output whether it matched or
> not. Technically that may satisfy the question but it's obviously not very
> useful.

I believe the answers I sketched above will work.

Finally, I work quite a bit in both Windows and Unix, and I also have an interest in short, succinct one-line scripts, so the following list may be useful to you. It's a few years old, but hey . . . so is awk (smile).

http://www.pement.org/awk/awk1line.txt

Hope this helps.
--
Eric P.

presidentbyamendment

unread,

May 18, 2012, 4:22:57 PM5/18/12

to

On May 8, 11:06 am, Fritz Wuehler

Awk is OK for that, but you might want ed. Or sed.