Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Analysing an awk program

17 views
Skip to first unread message

Mark Katz

unread,
Apr 30, 1998, 3:00:00 AM4/30/98
to

Some 20+ years ago, I was a great fan of Fortran and one of the
nice options available from the 'H' compiler was a cross-reference
listing of all variables and which lines they were on.

Very important for large programs....

I'd like to see a similar facility for my awk programs
I set up a very crude version (in awk of course) but got bogged
down with implied assignments ++/-- and stuff inside quote "'s

Has anyone written something?

While on the subject of analysing awk programs has anyone written
a program that would either
- compress (multiple statements on a line subject to a max line width) or
- expand (1 statement on a line and nicely format 'blocks')
awk programs

Thanks
Mark
--
Mark Katz
ISPC, London - Innovation in data-delivery tools
Tel: (44) 181-455 4665, Fax (44) 181-458 9554
** Visit our website on http://www.efiche.com/efiche **


Phil Bewig

unread,
Apr 30, 1998, 3:00:00 AM4/30/98
to

XREF(AWK) Philip L. Bewig XREF(AWK)


NAME

xref(awk) - produce a cross reference listing of an awk program


SYNOPSIS

awk -f xref.awk [ file ... ]


DESCRIPTION

XREF(AWK) takes as input a valid awk program and produces as out-
put a cross-reference listing of all variables and function calls
which appear in the program.

For ordinary variables and array variables, a line of the form

count var(func) lines ...

is produced, where "count" is the number of times the variable is
used, "var" is the name of the variable, "func" is the function
name to which the variable is local (a null "func" indicates that
the variable is global), and "lines" is the number of each line
where the variable appears. Appearances of the variable in a
function's parameter list are ignored. The number of lines shown
may differ from "count" if the variable appears more than once on
the same line.

For functions, a line of the form

count func(define) lines ...

is produced, where "count" is the number of times the function is
called, "func" is the name of the function, "define" is the lime
number where the function is defined, and "lines" is the number of
each line where the function is called. As for variables, the
number of lines shown may differ from "count."

Output lines for variables and functions are intermixed and are
sorted by name. Though terse, the output is informative, easy to
read, and amenable to further processing.


EXAMPLE

The cross-reference listing produced by running xref.awk against
itself is shown below:

5 NR() 39 45 50 53 68
8 RLENGTH() 119 120 123 124 127 128 132 133
10 act() 31 34 36 40 51 54 56 63 65 67
1 arr(asplit) 90
2 arr(inarray) 93 94
1 asplit(89) 6
3 braces() 55 57 58
2 flines() 53 79
1 fs(asplit) 89
3 funcname() 42 52 61
16 i() 76 77 78 79 81 82 83 84 90
3 inarray(92) 37 43 48
6 j() 82 83 84
3 j(inarray) 93 94
3 keywords() 10 125 129
1 lex(97) 29
31 line() 103 104 107 108 109 110 112 113 114 115 116 117 118
119 120 122 123 124 126 127 128 131 132 133
6 lines() 39 45 50 83 84
4 local() 41 59 60 64
3 machine() 17 30 31
2 n(asplit) 89 90
15 names() 37 38 43 44 48 49 77 78 79 81 82 83 84
3 nextstate() 30 62 73
4 nnames() 38 44 49 76
4 state() 23 30 31 73
1 str(asplit) 89
3 symb() 29 30 31
2 temp() 59 60
2 temp_asplit() 89 90
31 tok() 37 38 39 41 42 43 44 45 47 48 49 50 52 53 64 101 105
113 115 117 119 123 125 127 129 132
1 val(inarray) 94
5 xnames() 39 45 50 77 82

For readability, some lines have been folded.


SOURCE CODE

# xref.awk - cross reference an awk program

BEGIN {

# create array of keywords to be ignored by lexer
asplit("BEGIN:END:atan2:break:close:continue:cos:delete:" \
"do:else:exit:exp:for:getline:gsub:if:in:index:int:" \
"length:log:match:next:print:printf:rand:return:sin:" \
"split:sprintf:sqrt:srand:sub:substr:system:while",
keywords,":")

# build the symbol-state table
split("00:00:00:00:00:00:00:00:00:00:" \
"20:10:10:12:12:11:07:00:00:00:" \
"08:08:08:08:08:33:08:00:00:00:" \
"08:44:08:36:08:08:08:00:00:00:" \
"08:44:45:42:42:41:08",machine,":")

# parse the input and store an intermediate representation
# of the cross-reference information

# set up the machine
state = 1

# run the machine
for (;;) {

# get next symbol
symb = lex()
nextstate = substr(machine[state symb],1,1)
act = substr(machine[state symb],2,1)

# perform required action
if ( act == "0" )
; # do nothing
else if ( act == "1" ) {
if ( ! inarray(tok,names) )
names[++nnames] = tok
lines[tok,++xnames[tok]] = NR }
else if ( act == "2" ) {
if ( tok in local ) {
tok = tok "(" funcname ")"
if ( ! inarray(tok,names) )
names[++nnames] = tok
lines[tok,++xnames[tok]] = NR }
else {
tok = tok "()"
if ( ! inarray(tok,names) )
names[++nnames] = tok
lines[tok,++xnames[tok]] = NR } }
else if ( act == "3" ) {
funcname = tok
flines[tok] = NR }
else if ( act == "4" )
braces++
else if ( act == "5" ) {
braces--
if ( braces == 0 ) {
for ( temp in local )
delete local[temp]
funcname = ""
nextstate = 1 } }
else if ( act == "6" ) {
local[tok] = 1 }
else if ( act == "7" )
break
else if ( act == "8" ) {
print "error: xref.awk: line " NR ": aborting" \
> "/dev/con"
exit 1 }

# finished with current token
state = nextstate }

# finished parsing, now ready to print output
for ( i = 1; i <= nnames; i++ ) {
printf "%d ", xnames[names[i]] |"sort +1"
if ( index(names[i],"(") == 0 )
printf "%s(%d)", names[i], flines[names[i]] |"sort +1"
else
printf "%s", names[i] |"sort +1"
for ( j = 1; j <= xnames[names[i]]; j++ )
if ( lines[names[i],j] != lines[names[i],j-1] )
printf " %d", lines[names[i],j] |"sort +1"
printf "\n" |"sort +1" }

} # END OF PROGRAM

function asplit(str,arr,fs, n) { n = split(str,temp_asplit,fs)
for ( i = 1; i <= n; i++ ) arr[temp_asplit[i]]++ }

function inarray(val,arr, j) {
for ( j in arr )
if ( arr[j] == val ) return j
return "" }

function lex() {

for (;;) {

if ( tok == "(eof)" ) return 7

while ( length(line) == 0 )
if ( getline line == 0 ) {
tok = "(eof)"; return 7 }

sub(/^[ \t]+/,"",line) # remove white space,
sub(/^"([^"]|\\")*"/,"",line) # quoted strings,
sub(/^\/([^\/]|\\\/)+\//,"",line) # regular expressions,
sub(/^#.*/,"",line) # and comments

if ( line ~ /^function/ ) {
tok = "function"; line = substr(line,9); return 1 }
else if ( line ~ /^{/ ) {
tok = "{"; line = substr(line,2); return 2 }
else if ( line ~ /^}/ ) {
tok = "}"; line = substr(line,2); return 3 }
else if ( match(line,/^[A-Za-z_][A-Za-z_0-9]*\[/) ) {
tok = substr(line,1,RLENGTH-1)
line = substr(line,RLENGTH+1)
return 5 }
else if ( match(line,/^[A-Za-z_][A-Za-z_0-9]*\(/) ) {
tok = substr(line,1,RLENGTH-1)
line = substr(line,RLENGTH+1)
if ( ! ( tok in keywords ) ) return 6 }
else if ( match(line,/^[A-Za-z_][A-Za-z_0-9]*/) ) {
tok = substr(line,1,RLENGTH)
line = substr(line,RLENGTH+1)
if ( ! ( tok in keywords ) ) return 4 }
else {
match(line,/^[^A-Za-z_{}]/)
tok = substr(line,1,RLENGTH)
line = substr(line,RLENGTH+1) } } }


TECHNICAL DISCUSSION

Broadly, XREF(AWK) parses an awk program using a symbol-state
table, in much the same way as a yacc-generated parser. The
lexical analyzer recognizes seven distinct symbols: the word
"function", the left brace, the right brace, identifiers used
as variables, identifiers used as arrays, identifiers used as
functions, and end of file. The type of symbol is returned to
the parser as the value of the "lex" function, and the global
variable "tok" is set to the text of the current token.

The symbol-state table is stored in the "machine" array. The
table can be represented as follows:

symbol | 1 2 3 4 5 6 7
|
state | "function" { } var array func eof
-- -- -- -- -- -- -- -+- -- -- -- -- -- -- -- -- -- -- -- -- --
1 any | 20 10 10 12 12 11 07
2 "function" | 08 08 08 08 08 33 08
3 "function" name | 08 44 08 36 08 08 08
4 "function" name "{" | 08 44 45 42 42 41 08

where the first digit is the state to be entered after process-
ing the current token and the second digit is an action to be
performed. The actions are listed below:

1 found a function call
2 found a variable or array
3 found a function definition
4 found a left brace
5 found a right brace
6 found a local variable declaration
7 found end of file
8 found an error

Each of the first six actions causes some information about the
target program to be stored for later processing; the structures
used will be discussed below. The seventh action causes the
parser to exit. The eighth action causes errors to be reported
to standard error and the program to abort.

Before describing the intermediate data structures, we will
discuss some of the more interesting points in the action calls.
The "braces" variable keeps track of whether we are currently
within a functions; it is positive within a function and zero
without. When the right brace which causes the value of "braces"
to go from one to zero is found, the value of "nextstate" is
changed from four (scanning a function) to one (any) and the
names of local variables are forgotten. The "local" array is
accumulated from the variables found after the function name but
before the opening left brace of the function; action two care-
fully checks whether a variable is global or local before writing
to the intermediate data structure. The variable "funcname" is
the name of the current function when within a function and null
without.

The following arrays store an intermediate representation of the
variable and function identifiers of the target program:

names[1..nnames] = list of all identifiers, both variable and
function names; for variables, the name has the form
var(func), but for functions, there are no parentheses

xnames[names[i]] = number of times names[i] is used

lines[names[i],1..xnames[names[i]]] = list of line numbers
where names[i] is used

flines[names[i]] = line number where function names[i] is
defined

These arrays are created as the parser reads the input; when the
parser is finished, the arrays are output in user-readable form.


PORTABILITY

XREF(AWK) will work with any implementation of nawk. The MKS
ToolKit implementation requires the large-model version of awk.


HISTORY

Written by Phil Bewig on February 10, 1990. Inspired by
Exercise 3-16 of the book "The Awk Programming Language" by
Alfred V. Aho, Brian W. Kernighan and Peter J. Weinberger
(Addison-Wesley: 1988).


COPYRIGHT

This program is placed in the public domain. However, the
author requests credit when distributed.
--
Phil Bewig ... pbe...@netcom.COM

Peter J. Farley III

unread,
May 1, 1998, 3:00:00 AM5/1/98
to

pbe...@netcom.com (Phil Bewig) wrote:

>ma...@ispc001.demon.co.uk (Mark Katz) writes:
<Snipped>
>>Has anyone written something?
>
>Yes. See the companion posting to this one.

Phil,

That is a *very* nice piece of work. Thank you for posting it, and
for placing it in the public domain. An excellent example of
netizenship, IMHO.

----------------------------------------------------
Peter J. Farley III (pjfa...@dorsai.org)

0 new messages