Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

I would like an eval function for awk

1,448 views
Skip to first unread message

Steve Calfee

unread,
Nov 8, 2004, 4:20:58 PM11/8/04
to
Hi, Guys,

First let me say that this is the friendliest and most helpful of the
newsgroups I ever go to. I am a big fan of awk and have used it in
various flavors for years. Yes, I even used tawk on dos. I especially
liked its compiler. (On Linux the compiler is not needed since scripts
can be executable.)

Anyway, I sometimes try to do too much in awk. I wrote a complete
macro assembler in it, but I do not like it, it is too complicated.

Someone here suggested using M4 and I have been trying it. It is a
able to do the assembler easier than awk, but forces a weird M4 syntax
on the original source (ie. MOV(src,dest) instead of the normal
assembler syntax: MOV src,dest etc). I find its quote and parenthesis
laden syntax very hard to read and debug.

In short I like AWK better, but having to do a recursive descent
parser is so cumbersome. If instead there was an eval(x) function, it
would yield a more normal AWK like program instead of being so C like.

What I would like to see for eval(x) is to have awk push its entire
internal state ($0, NF, and all builtin variables). Then I would like
to see $0 replaced by x and then the main pattern match loop executed
(NOT BEGIN and END). After that eval(x) line, the state would be
popped and execution of the original $0 continued right after the
eval(x) statement. Obviously I would like eval to be recursive to some
reasonable depth, but it would be up to me using global variables to
maintain a global application state.

I don't like cluttering a simple, clean language like awk with
features (like perl), especially features that can be done as
functions or included source files. If there is a crossplatform
alternative to eval I would like to hear about it.

Any ideas? Regards, Steve


Jürgen Kahrs

unread,
Nov 8, 2004, 5:56:58 PM11/8/04
to
Steve Calfee wrote:

> I don't like cluttering a simple, clean language like awk with
> features (like perl), especially features that can be done as
> functions or included source files. If there is a crossplatform
> alternative to eval I would like to hear about it.

On 28th of October, Arnold Robbins already
commented on this idea:

>>But, yes, Perl has eval, AWK does not.
>
> Right. I don't anticpate adding eval either. It opens up many cans
> of worms, both implementation wise and in terms of how you might
> accidentally affect the state of your awk program.
>
>>Consider it an exercise to write an expression evaluator in AWK.
>>It's not that difficult.
>
> See the Aho, Kernighan and Weinberger book on Awk, if I remember
> correctly. There's one in it.

He means calc3 in this file:

http://cm.bell-labs.com/cm/cs/who/bwk/awkcode.txt

In January 2000, Kenny McCormack and Alan Linton
posted extended versions:

# calc3 - infix calculator - derived from calc3 in TAPL, chapter 6.
# by Kenny McCormack, Mon 3 Jan 2000
# modified by Alan Linton, $Date: 2000/01/06 21:37:36 $, $Revision: 1.16 $

BEGIN { eval("x=86") ; eval("y=99") }

{
printf "%20s = %15s\n", $0,eval($0)
}

# The rest is functions...
function eval(s ,e) {
_S_expr = s
gsub(/[ \t]+/,"",_S_expr)
if (length(_S_expr)==0) return 0
_f = 1
e = _expr()
if (_f <= length(_S_expr))
printf("An error occurred at %s\n", substr(_S_expr,_f))
else return e
}

function _expr( var,e) { # term | term [+-] term
if (match(substr(_S_expr,_f),/^[A-Za-z_][A-Za-z0-9_]*=/)) {
var = _advance()
sub(/=$/,"",var)
return _vars[var] = _expr()
}
e = _term()
while (substr(_S_expr,_f,1) ~ /[+-]/)
e = substr(_S_expr,_f++,1) == "+" ? e + _term() : e - _term()
return e
}

function _term( e) { # factor | factor [*/%] factor
e = _factor()
while (substr(_S_expr,_f,1) ~ /[*\/%]/) {
_f++
if (substr(_S_expr,_f-1,1) == "*") return e * _factor()
if (substr(_S_expr,_f-1,1) == "/") return e / _factor()
if (substr(_S_expr,_f-1,1) == "%") return e % _factor()
}
return e
}

function _factor( e) { # factor2 | factor2^factor
e = _factor2()
if (substr(_S_expr,_f,1) != "^") return e
_f++
return e^_factor()
}

function _factor2( e) { # [+-]?factor3 | !*factor2
e = substr(_S_expr,_f)
if (e~/^[\+\-\!]/) { #unary operators [+-!]
_f++
if (e~/^\+/) return +_factor3() # only one unary + allowed
if (e~/^\-/) return -_factor3() # only one unary - allowed
if (e~/^\!/) return !(_factor2()+0) # unary ! may repeat
}
return _factor3()
}

function _factor3( e,fun,e2) { # number | varname | (expr) | function(...)
e = substr(_S_expr,_f)

#number
if (match(e,/^([0-9]+[.]?[0-9]*|[.][0-9]+)([Ee][+-]?[0-9]+)?/)) {
return _advance()
}

#function()
if (match(e,/^([A-Za-z_][A-Za-z0-9_]+)?\(\)/)) {
fun=_advance()
if (fun~/^srand()/) return srand()
if (fun~/^rand()/) return rand()
printf("error: unknown function %s\n", fun)
return 0
}

#(expr) | function(expr) | function(expr,expr)
if (match(e,/^([A-Za-z_][A-Za-z0-9_]+)?\(/)) {
fun=_advance()
if (fun~/^((cos)|(exp)|(int)|(log)|(sin)|(sqrt)|(srand))?\(/) {
e=_expr()
e=_calcfun(fun,e)
}
else if (fun~/^atan2\(/) {
e=_expr()
if (substr(_S_expr,_f,1) != ",") {
printf("error: missing , at %s\n", substr(_S_expr,_f))
return 0
}
_f++
e2=_expr()
e=atan2(e,e2)
}
else {
printf("error: unknown function %s\n", fun)
return 0
}
if (substr(_S_expr,_f++,1) != ")") {
printf("error: missing ) at %s\n", substr(_S_expr,_f))
return 0
}
return e
}

#variable name
if (match(e,/^[A-Za-z_][A-Za-z0-9_]*/)) {
return _vars[_advance()]
}

#error
printf("error in factor: expected number or ( at %s\n", substr(_S_expr,_f))
return 0
}

function _calcfun(fun,e) { #built-in functions of one variable
if (fun=="(") return e
if (fun=="cos(") return cos(e)
if (fun=="exp(") return exp(e)
if (fun=="int(") return int(e)
if (fun=="log(") return log(e)
if (fun=="sin(") return sin(e)
if (fun=="sqrt(") return sqrt(e)
if (fun=="srand(") return srand(e)
}

function _advance( tmp) {
tmp = substr(_S_expr,_f,RLENGTH)
_f += RLENGTH
return tmp
}

Steve Calfee

unread,
Nov 8, 2004, 8:45:06 PM11/8/04
to
On Mon, 08 Nov 2004 23:56:58 +0100, Jürgen Kahrs
<Juergen.Kah...@vr-web.de> wrote:

>Steve Calfee wrote:
>
>> I don't like cluttering a simple, clean language like awk with
>> features (like perl), especially features that can be done as
>> functions or included source files. If there is a crossplatform
>> alternative to eval I would like to hear about it.
>
>On 28th of October, Arnold Robbins already
>commented on this idea:
>
>>>But, yes, Perl has eval, AWK does not.
>>
>> Right. I don't anticpate adding eval either. It opens up many cans
>> of worms, both implementation wise and in terms of how you might
>> accidentally affect the state of your awk program.
>>
>>>Consider it an exercise to write an expression evaluator in AWK.
>>>It's not that difficult.
>>
>> See the Aho, Kernighan and Weinberger book on Awk, if I remember
>> correctly. There's one in it.
>
>He means calc3 in this file:
>
> http://cm.bell-labs.com/cm/cs/who/bwk/awkcode.txt
>
>In January 2000, Kenny McCormack and Alan Linton
>posted extended versions:
>
># calc3 - infix calculator - derived from calc3 in TAPL, chapter 6.
># by Kenny McCormack, Mon 3 Jan 2000
># modified by Alan Linton, $Date: 2000/01/06 21:37:36 $, $Revision: 1.16 $
>

snip....
Thanks for your suggestion. I did implement a form of that function. I
also added hexadecimal in both the 0xHH and the $HH forms and then
symbol table lookups and then forward references for symbols etc.
However, I consider that code ugly C like stuff. It takes no advantage
of the awk power of pattern/match.

It seems to me that anything that takes more than a "few" lines of awk
code is meandering into the C ballpark. Unfortunately, C is rotten at
dealing with strings. But something that is processing strings into
other strings should be awk's specialty. M4 should have no advantages
over awk, except m4 has that built in recursive eval capability
(different from the m4 eval() function).

I saw Arnold Robbins' post earlier. I am willing to work with
documented side effects of an eval function. I guess it is up to him
if it is too difficult to implement.

Regards, Steve


Aharon Robbins

unread,
Nov 9, 2004, 1:33:38 AM11/9/04
to
In article <mg70p0512bmicino7...@4ax.com>,

Steve Calfee <steve...@hotmail.com> wrote:
>It seems to me that anything that takes more than a "few" lines of awk
>code is meandering into the C ballpark.

Structuring an awk program with functions can mitigate this, somewhat.

>I saw Arnold Robbins' post earlier. I am willing to work with
>documented side effects of an eval function. I guess it is up to him
>if it is too difficult to implement.

As you proposed it, it is indeed too difficult to implement. Calling
it eval is also misleading, since what you want is to reinvoke the
current program on a different $0, when usually an eval would be
to construct some awk language code in a string and then evaluate it.

Although the latter has considerable precedent in m4, the shell and
perl, it would be painful to do in gawk, and I feel like gawk already
has too many features.

Of course, as with all Free Software, You Have The Source, and are
welcome to make any changes you see fit, in order to try out your
ideas.

Sorry,

Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL

Jürgen Kahrs

unread,
Nov 9, 2004, 12:48:14 PM11/9/04
to
Steve Calfee wrote:

> However, I consider that code ugly C like stuff. It takes no advantage
> of the awk power of pattern/match.

Let's assume you had such a feature built into
gawk, how would the eval script look like ?
Would it really be so much shorter and easier than
the current solution ? I doubt it would be shorter.
It would only be shorter if the functions you want
to implement in eval already exist in gawk (with
the exactly same semantic).

William Park

unread,
Nov 9, 2004, 4:19:31 PM11/9/04
to
Steve Calfee <steve...@hotmail.com> wrote:
> It seems to me that anything that takes more than a "few" lines of awk
> code is meandering into the C ballpark. Unfortunately, C is rotten at
> dealing with strings.

You mean regex stuffs, don't you? Because <string.h>, <ctype.h>, and
<stdio.h> are C's strength. In any case, have you tried it in Bash
shell?

Kenny McCormack

unread,
Nov 9, 2004, 4:38:55 PM11/9/04
to
In article <2vcqj3F...@uni-berlin.de>,

Actually, C's string handling *is* primitive, compared to most real HLLs (*).
Having to do your own memory management/garbage collection, all the
various ways that you can, as they say in clc, "invoke UB", the simple fact
that you can't say: a = b (where a & b are strings - not to mention the
fact that there is no string type to begin with), etc, etc.

(*) I consider C to be a 2.5GL language...

William Park

unread,
Nov 9, 2004, 5:30:17 PM11/9/04
to

But, when you type 'a = b', exactly what do you think is happening
underneath? Awk, Sed, Bash, Python, Perl, all are written in C.

Kenny McCormack

unread,
Nov 9, 2004, 5:44:31 PM11/9/04
to
In article <2vcunoF...@uni-berlin.de>,

Actually, I would imagine that what is happening "underneath" is something
like: REPZ MOVSB
(if I remember my x86 assembly right).

The point is that (standard) C doesn't have a string type - it is, as they
say, high level assembler. I.e., "strcpy(a,b)" is just a thin wrapper
around "REPZ MOVSB".

Not that any of this should be taken as disparaging of C. The only reason
I am posting about this is that I understand the OP's distate for AWK code
that "looks like C".

Steve Calfee

unread,
Nov 9, 2004, 8:43:34 PM11/9/04
to
On 9 Nov 2004 08:33:38 +0200, arn...@skeeve.com (Aharon Robbins)
wrote:

>In article <mg70p0512bmicino7...@4ax.com>,
>Steve Calfee <steve...@hotmail.com> wrote:
>>It seems to me that anything that takes more than a "few" lines of awk
>>code is meandering into the C ballpark.
>
>Structuring an awk program with functions can mitigate this, somewhat.
>
>>I saw Arnold Robbins' post earlier. I am willing to work with
>>documented side effects of an eval function. I guess it is up to him
>>if it is too difficult to implement.
>
>As you proposed it, it is indeed too difficult to implement. Calling
>it eval is also misleading, since what you want is to reinvoke the
>current program on a different $0, when usually an eval would be
>to construct some awk language code in a string and then evaluate it.
>

You are right. I really want two functions. One is eval, where it
would emit the code in x and execute it. This allows "a=(b+7)*2" like
I can put in an awk program. The other is recurse(x) which will
process x as $0 in my awk mainloop. This stuff takes some thought. The
other poster that challenged me to do a "more elegant" expression
evaluator for my assembler if I had the recursive($0) processing is
indeed a gedanken.

I would like to process a line like:
label+generatedlabel: operation expr1,expr2....exprn ;comment
where everything is optional on any source line.
This involves symbol tables, forward references and arbitrarily
complex expressions. exprx should be evaluated by the base awk
interpreter.

I do not expect you to rush off and implement this, but I would like
to discuss alternatives and elegant extensions to awk.


>Although the latter has considerable precedent in m4, the shell and
>perl, it would be painful to do in gawk, and I feel like gawk already
>has too many features.
>
>Of course, as with all Free Software, You Have The Source, and are
>welcome to make any changes you see fit, in order to try out your
>ideas.
>

Yes, this is the curse/blessing from Chinese: "may you live in
interesting times"

Regards, Steve


William Park

unread,
Nov 9, 2004, 11:36:01 PM11/9/04
to
Kenny McCormack <gaz...@yin.interaccess.com> wrote:
> Not that any of this should be taken as disparaging of C. The only
> reason I am posting about this is that I understand the OP's distate
> for AWK code that "looks like C".

I also found OP's remark rather strange. Because if you go past the
simplistic
/pattern/ {print}
and do some "programming" in Awk, it's practically spitting image of C.

Doug McClure

unread,
Nov 10, 2004, 2:06:32 AM11/10/04
to
Probably relatively few readers of c.l.a. use awk to write awk
programs, but it can be easily done. The expression that you want to
pass to an eval() function can just be written to the output.awk
script that you create and later execute. Let the awk interpreter do
the work for you.

Would this solve the problem for you?

DKM


To contact me directly, send EMAIL to (single letters all)
DEE_KAY_EMM AT EarthLink.net. [For example X_...@EarthLink.net.]

Kenny McCormack

unread,
Nov 10, 2004, 8:18:04 AM11/10/04
to
In article <2vdk5gF...@uni-berlin.de>,

I think you misunderstand. I *agree* with OP. AWK should not (*) look
like C, regardless of how complex the programming.

(*) And, when done by skilled hands, it does not.

Ed Morton

unread,
Nov 10, 2004, 8:21:12 AM11/10/04
to


Whenever I find I've written an awk script that looks like C I revisit
it to see what I've done wrong. I suspect for large jobs, though, it
can't help ending up looking a tad C-ish.

Ed.

Paul Boekholt

unread,
Nov 11, 2004, 6:58:11 AM11/11/04
to
On Mon, 08 Nov 2004 13:20:58 -0800, Steve Calfee <steve...@hotmail.com> said:
> What I would like to see for eval(x) is to have awk push its entire
> internal state ($0, NF, and all builtin variables). Then I would like
> to see $0 replaced by x and then the main pattern match loop executed
> (NOT BEGIN and END). After that eval(x) line, the state would be
> popped and execution of the original $0 continued right after the
> eval(x) statement.
Part of the GNU enscript package is an awk-like language called States.
From the manpage:

DESCRIPTION
States is an awk-alike text processing tool with some
state machine extensions. It is designed for program
source code highlighting and to similar tasks where state
information helps input processing.

At a single point of time, States is in one state, each
quite similar to awk's work environment, they have regular
expressions which are matched from the input and actions
which are executed when a match is found. From the action
blocks, states can perform state transitions; it can move
to another state from which the processing is continued.
State transitions are recorded so states can return to the
calling state once the current state has finished.

I've tried to do some scripting with it, but didn't have much luck -
maybe because States is not line-oriented. If you try it, let us know
how you fare.

--
http://jedmodes.sf.net/mode/awk

0 new messages