Submitted-By: Jawaid Bazyar (baz...@netcom.com)
Posting-number: Volume 1, Source 94
Archive-Name: gno/util/awk.01
Architecture: 2gs,UNIX
Version-Number: 1.00
AWK is a powerful string processing language that is widely used in
the Unix world.
This particular version of AWK is a port of the actual AT&T AWK source
code, which AT&T has graciously made available to the general public.
The changes necessary to make AWK function on the 16-bit Apple //GS
will likely be of interest to anyone trying to port AWK to MS-DOS or
some other 16-bit machine.
This package requires the GNO/ME 2.0 and ORCA/C 2.0.1.
Parts 1 through 4 contain the actual source; Parts 5 and 6 contain the
output of YACC and LEX so you can compile the source even if you do not
have these programs. The contents of Parts 5 and 6 are fully rebuildable
from the source.
Packed in AAF.
Enjoy.
********************************************************************************
=Manifest
-FIXES
-Manifest
-README
-README.gno
-awk.1
-awk.g.y
-awk.h
-awk.lx.l
-b.c
-lex.yy.c
-lib.c
-main.c
-makefile
-makefile.gno
-maketab.c
-parse.c
-proctab.c
-proto.h
-run.c
-tran.c
-y.tab.c
-y.tab.h
=README.gno
-
-AWK
-
-AWK is a powerful string processing language that is widely used in
-the Unix world. There has been a longstanding request for a GNO/ME
-version.
-
-This particular version of AWK is a port of the actual AT&T AWK source
-code, which AT&T has graciously made available to the general public.
-The changes necessary to make AWK function on the 16-bit Apple //GS
-will likely be of interest to anyone trying to port AWK to MS-DOS or
-some other 16-bit machine.
-
-While I have done some reasonable testing of AWK, its extensive feature
-set makes any sort of exhaustive testing very difficult. While I have
-not been able to give it a full-fledged test, I am confident enough
-that it works to release it.
-
-It is certainly conceivable that there bugs were introduced by the
-porting process; if you find one, please let me know.
-
-This package requires the GNO/ME 2.0 and ORCA/C 2.0.1.
-
-=========
-Compiling
-=========
-To build this package, you will need GNO/ME 2.0 and ORCA/C 2.0.1.
-An Orca compatible Makefile is included.
-
-The AWK executable should be placed in your /usr/bin directory.
-The awk.1 manual page should be placed in /usr/man/man1/.
-
-==================================
-Changes To Make AWK Work Under GNO
-==================================
-There were numberous changes made to this version of AWK to allow it to
-be ported to the Apple IIGS. The vast majority of these changes replace
-the use of large stack-based arrays and data structures with a call to
-malloc() at the beginning of the function and a call to free() at the
-end. These changes will likely be of interest to anyone trying to port
-AWK to MS-DOS or some other 16-bit machine.
-
-The changes involved are:
- (1) Reduce the size of the data structure (in the case of an array
- of possible open file pointers, I reduced the system's MAX_OPEN
- from 32768 to 40 by redefining it (in "run.c").
- (2) Allocate all large local structures via malloc() and free.
- (in "run.c")
- (3) Set the IIGS OMF load segment names, since AWK is bigger than
- 64K and the code must therefore be segmented.
- (4) in main.c, signal(SIGFPE,fpecatch) was removed, because the
- IIGS floating point libraries don't send signals on floating
- point exceptions.
-
-All changes are marked with:
- #ifdef __ORCAC__ or #ifndef __ORCAC__
- ...
- #else
- ..
- #endif
-
-The original code still remains in all cases.
-
-==================
-Author Information
-==================
-Jawaid Bazyar
-baz...@netcom.com
-Procyon, Inc.
-P.O Box 620334
-Littleton, CO 80162-0334
-303-781-3273
-
-Version 1.00
-December 1994
=README
-/****************************************************************
-Copyright (C) AT&T 1993
-All Rights Reserved
-
-Permission to use, copy, modify, and distribute this software and
-its documentation for any purpose and without fee is hereby
-granted, provided that the above copyright notice appear in all
-copies and that both that the copyright notice and this
-permission notice and warranty disclaimer appear in supporting
-documentation, and that the name of AT&T or any of its entities
-not be used in advertising or publicity pertaining to
-distribution of the software without specific, written prior
-permission.
-
-AT&T DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
-INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
-IN NO EVENT SHALL AT&T OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
-SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
-WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
-IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
-ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
-THIS SOFTWARE.
-****************************************************************/
-
-This is the version of awk described in "The AWK Programming Language",
-by A. V. Aho, B. W. Kernighan, and P. J. Weinberger
-(Addison-Wesley, 1988, ISBN 0-201-07981-X).
-Changes, mostly bug fixes, are listed in FIXES.
-If you distribute this code further, please please please
-distribute FIXES with it. If you find errors, please report
-them to b...@research.att.com. Thanks.
-
-The program itself is created by
- make
-which should produce a longish sequence of
-messages roughly like this:
-
- yacc -d awk.g.y
-
- conflicts: 43 shift/reduce, 85 reduce/reduce
- cc -g -c y.tab.c
- rm y.tab.c
- mv y.tab.o awk.g.o
- cmp -s y.tab.h prevy.tab.h || (cp y.tab.h prevy.tab.h; echo change maketab)
- prevy.tab.h: No such file or directory
- change maketab
- lex awk.lx.l
- cc -g -c lex.yy.c
- rm lex.yy.c
- mv lex.yy.o awk.lx.o
- cc -g -c b.c
- cc -g -c main.c
- cc -g -c parse.c
- cc maketab.c -o maketab
- ./maketab >proctab.c
- cc -g -c proctab.c
- cc -g -c tran.c
- cc -g -c lib.c
- cc -g -c run.c
- cc -g awk.g.o awk.lx.o b.o main.o parse.o proctab.o tran.o lib.o run.o -lm
-
-This produces an executable a.out; you will eventually
-want to move this to some place like /usr/bin/awk.
-
-The -g option (which generates symbol table information
-for debuggers) can be disabled by removing the line
- CFLAGS = -g
-from the makefile. Alternatively, you might replace
-it by
- CFLAGS = -O
-if your compiler does significant optimization.
-
-NOTE: This version uses ANSI C, as you should also.
=FIXES
-/****************************************************************
-Copyright (C) AT&T 1993
-All Rights Reserved
-
-Permission to use, copy, modify, and distribute this software and
-its documentation for any purpose and without fee is hereby
-granted, provided that the above copyright notice appear in all
-copies and that both that the copyright notice and this
-permission notice and warranty disclaimer appear in supporting
-documentation, and that the name of AT&T or any of its entities
-not be used in advertising or publicity pertaining to
-distribution of the software without specific, written prior
-permission.
-
-AT&T DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
-INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
-IN NO EVENT SHALL AT&T OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
-SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
-WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
-IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
-ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
-THIS SOFTWARE.
-****************************************************************/
-
-Sep 12, 1987:
- Very long printf strings caused core dump;
- fixed aprintf, asprintf, format to catch them.
- Can still get a core dump in printf itself.
-
-Sep 17, 1987:
- Error-message printer had printf(s) instead of
- printf("%s",s); got core dumps when the message
- included a %.
-
-Oct xx, 1987:
- Reluctantly added toupper and tolower functions.
- Subject to rescinding without notice.
-
-Dec 2, 1987:
- Newer C compilers apply a strict scope rule to extern
- declarations within functions. Two extern declarations in
- lib.c and tran.c have been moved to obviate this problem.
-
-Mar 25, 1988:
- main.c fixed to recognize -- as terminator of command-
- line options. Illegal options flagged.
- Error reporting slightly cleaned up.
-
-May 10, 1988:
- Fixed lib.c to permit _ in commandline variable names.
-
-May 22, 1988:
- Removed limit on depth of function calls.
-
-May 28, 1988:
- srand returns seed value it's using.
- see 1/18/90
-
-June 1, 1988:
- check error status on close
-
-July 2, 1988:
- performance bug in b.c/cgoto(): not freeing some sets of states.
- partial fix only right now, and the number of states increased
- to make it less obvious.
-
-July 2, 1988:
- flush stdout before opening file or pipe
-
-July 24, 1988:
- fixed egregious error in toupper/tolower functions.
- still subject to rescinding, however.
-
-Aug 23, 1988:
- setting FILENAME in BEGIN caused core dump, apparently
- because it was freeing space not allocated by malloc.
-
-Sep 30, 1988:
- Now guarantees to evaluate all arguments of built-in
- functions, as in C; the appearance is that arguments
- are evaluated before the function is called. Places
- affected are sub (gsub was ok), substr, printf, and
- all the built-in arithmetic functions in bltin().
- A warning is generated if a bltin() is called with
- the wrong number of arguments.
-
- This requires changing makeprof on p167 of the book.
-
-Oct 12, 1988:
- Fixed bug in call() that freed local arrays twice.
-
- Fixed to handle deletion of non-existent array right;
- complains about attempt to delete non-array element.
-
-Oct 20, 1988:
- Fixed %c: if expr is numeric, use numeric value;
- otherwise print 1st char of string value. still
- doesn't work if the value is 0 -- won't print \0.
-
- Added a few more checks for running out of malloc.
-
-Oct 30, 1988:
- Fixed bug in call() that failed to recover storage.
-
- A warning is now generated if there are more arguments
- in the call than in the definition (in lieu of fixing
- another storage leak).
-
-Nov 27, 1988:
- With fear and trembling, modified the grammar to permit
- multiple pattern-action statements on one line without
- an explicit separator. By definition, this capitulation
- to the ghost of ancient implementations remains undefined
- and thus subject to change without notice or apology.
- DO NOT COUNT ON IT.
-
-Dec 7, 1988:
- Added a bit of code to error printing to avoid printing nulls.
- (Not clear that it actually would.)
-
-Dec 17, 1988:
- Catches some more commandline errors in main.
- Removed redundant decl of modf in run.c (confuses some compilers).
- Warning: there's no single declaration of malloc, etc., in awk.h
- that seems to satisfy all compilers.
-
-Jan 9, 1989:
- Fixed bug that caused tempcell list to contain a duplicate.
- The fix is kludgy.
-
-Apr 9, 1989:
- Changed grammar to prohibit constants as 3rd arg of sub and gsub;
- prevents class of overwriting-a-constant errors. (Last one?)
- This invalidates the "banana" example on page 43 of the book.
-
- Added \a ("alert"), \v (vertical tab), \xhhh (hexadecimal),
- as in ANSI, for strings. Rescinded the sloppiness that permitted
- non-octal digits in \ooo. Warning: not all compilers and libraries
- will be able to deal with \x correctly.
-
-Apr 26, 1989:
- Debugging output now includes a version date,
- if one compiles it into the source each time.
-
-Apr 27, 1989:
- Line number now accumulated correctly for comment lines.
-
-Jun 4, 1989:
- ENVIRON array contains environment: if shell variable V=thing,
- ENVIRON["V"] is "thing"
-
- multiple -f arguments permitted. error reporting is naive.
- (they were permitted before, but only the last was used.)
-
- fixed a really stupid botch in the debugging macro dprintf
-
- fixed order of evaluation of commandline assignments to match
- what the book claims: an argument of the form x=e is evaluated
- at the time it would have been opened if it were a filename (p 63).
- this invalidates the suggested answer to ex 4-1 (p 195).
-
- removed some code that permitted -F (space) fieldseparator,
- since it didn't quite work right anyway. (restored aug 2)
-
-Jun 14, 1989:
- added some missing ansi printf conversion letters: %i %X %E %G.
- no sensible meaning for h or L, so they may not do what one expects.
-
- made %* conversions work.
-
- changed x^y so that if n is a positive integer, it's done
- by explicit multiplication, thus achieving maximum accuracy.
- (this should be done by pow() but it seems not to be locally.)
- done to x ^= y as well.
-
-Jun 23, 1989:
- add newline to usage message.
-
-Jul 10, 1989:
- fixed ref-thru-zero bug in environment code in tran.c
-
-Jul 30, 1989:
- added -v x=1 y=2 ... for immediate commandline variable assignment;
- done before the BEGIN block for sure. they have to precede the
- program if the program is on the commandline.
- Modified Aug 2 to require a separate -v for each assignment.
-
-Aug 2, 1989:
- restored -F (space) separator
-
-Aug 11, 1989:
- fixed bug: commandline variable assignment has to look like
- var=something. (consider the man page for =, in file =.1)
-
- changed number of arguments to functions to static arrays
- to avoid repeated malloc calls.
-
-Aug 24, 1989:
- removed redundant relational tests against nullnode if parse
- tree already had a relational at that point.
-
-Oct 11, 1989:
- FILENAME is now defined in the BEGIN block -- too many old
- programs broke.
-
- "-" means stdin in getline as well as on the commandline.
-
- added a bunch of casts to the code to tell the truth about
- char * vs. unsigned char *, a right royal pain. added a
- setlocale call to the front of main, though probably no one
- has it usefully implemented yet.
-
-Oct 18, 1989:
- another try to get the max number of open files set with
- relatively machine-independent code.
-
- small fix to input() in case of multiple reads after EOF.
-
-Jan 5, 1990:
- fix potential problem in tran.c -- something was freed,
- then used in freesymtab.
-
-Jan 18, 1990:
- srand now returns previous seed value (0 to start).
-
-Feb 9, 1990:
- fixed null pointer dereference bug in main.c: -F[nothing]. sigh.
-
- restored srand behavior: it returns the current seed.
-
-May 6, 1990:
- AVA fixed the grammar so that ! is uniformly of the same precedence as
- unary + and -. This renders illegal some constructs like !x=y, which
- now has to be parenthesized as !(x=y), and makes others work properly:
- !x+y is (!x)+y, and x!y is x !y, not two pattern-action statements.
- (These problems were pointed out by Bob Lenk of Posix.)
-
- Added \x to regular expressions (already in strings).
- Limited octal to octal digits; \8 and \9 are not octal.
- Centralized the code for parsing escapes in regular expressions.
- Added a bunch of tests to T.re and T.sub to verify some of this.
-
-Jun 26, 1990:
- changed struct rrow (awk.h) to use long instead of int for lval,
- since cfoll() stores a pointer in it. now works better when int's
- are smaller than pointers!
-
-Aug 24, 1990:
- changed NCHARS to 256 to handle 8-bit characters in strings
- presented to match(), etc.
-
-Oct 8, 1990:
- fixed horrible bug: types and values were not preserved in
- some kinds of self-assignment. (in assign().)
-
-Oct 14, 1990:
- fixed the bug on p. 198 in which it couldn't deduce that an
- argument was an array in some contexts. replaced the error
- message in intest() by code that damn well makes it an array.
-
-Oct 29, 1990:
- fixed sleazy buggy code in lib.c that looked (incorrectly) for
- too long input lines.
-
-Nov 2, 1990:
- fixed sleazy test for integrality in getsval; use modf.
-
-Jan 11, 1991:
- failed to set numeric state on $0 in cmd|getline context in run.c.
-
-Jan 28, 1991:
- awk -f - reads the program from stdin.
-
-Feb 10, 1991:
- check error status on all writes, to avoid banging on full disks.
-
-May 6, 1991:
- fixed silly bug in hex parsing in hexstr().
- removed an apparently unnecessary test in isnumber().
- warn about weird printf conversions.
- fixed unchecked array overwrite in relex().
-
- changed for (i in array) to access elements in sorted order.
- then unchanged it -- it really does run slower in too many cases.
- left the code in place, commented out.
-
-May 13, 1991:
- removed extra arg on gettemp, tempfree. minor error message rewording.
-
-Jun 2, 1991:
- better defense against very long printf strings.
- made break and continue illegal outside of loops.
-
-Jun 30, 1991:
- better test for detecting too-long output record.
-
-Jul 21, 1991:
- fixed so that in self-assignment like $1=$1, side effects
- like recomputing $0 take place. (this is getting subtle.)
-
-Jul 27, 1991:
- allow newline after ; in for statements.
-
-Aug 18, 1991:
- enforce variable name syntax for commandline variables: has to
- start with letter or _.
-
-Sep 24, 1991:
- increased buffer in gsub. a very crude fix to a general problem.
- and again on Sep 26.
-
-Nov 12, 1991:
- cranked up some fixed-size arrays in b.c, and added a test for
- overflow in penter. thanks to mark larsen.
-
-Nov 19, 1991:
- use RAND_MAX instead of literal in builtin().
-
-Nov 30, 1991:
- fixed storage leak in freefa, failing to recover [N]CCL.
- thanks to Bill Jones (jo...@skorpio.usask.ca)
-
-Dec 2, 1991:
- die-casting time: converted to ansi C, installed that.
-
-Feb 20, 1992:
- recompile after abortive changes; should be unchanged.
-
-Apr 12, 1992:
- added explicit check for /dev/std(in,out,err) in redirection.
- unlike gawk, no /dev/fd/n yet.
-
- added fflush(file/pipe) builtin. hard to test satisfactorily.
- not posix.
-
-Apr 24, 1992:
- remove redundant close of stdin when using -f -.
-
- got rid of core dump with -d; awk -d just prints date.
-
-May 31, 1992:
- added -mr N and -mf N options: more record and fields.
- these really ought to adjust automatically.
-
- cleaned up some error messages; "out of space" now means
- malloc returned NULL in all cases.
-
- changed rehash so that if it runs out, it just returns;
- things will continue to run slow, but maybe a bit longer.
-
-Nov 28, 1992:
- deleted yyunput and yyoutput from proto.h;
- different versions of lex give these different declarations.
-
-Jul 23, 1993:
- cosmetic changes: increased sizes of some arrays,
- reworded some error messages.
-
- added CONVFMT as in posix (just replaced OFMT in getsval)
-
- FILENAME is now "" until the first thing that causes a file
- to be opened.
=awk.g.y
-/****************************************************************
-Copyright (C) AT&T 1993
-All Rights Reserved
-
-Permission to use, copy, modify, and distribute this software and
-its documentation for any purpose and without fee is hereby
-granted, provided that the above copyright notice appear in all
-copies and that both that the copyright notice and this
-permission notice and warranty disclaimer appear in supporting
-documentation, and that the name of AT&T or any of its entities
-not be used in advertising or publicity pertaining to
-distribution of the software without specific, written prior
-permission.
-
-AT&T DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
-INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
-IN NO EVENT SHALL AT&T OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
-SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
-WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
-IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
-ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
-THIS SOFTWARE.
-****************************************************************/
-
-%{
-#include <stdio.h>
-#include "awk.h"
-yywrap(void) { return(1); }
-
-Node *beginloc = 0;
-Node *endloc = 0;
-int infunc = 0; /* = 1 if in arglist or body of func */
-int inloop = 0; /* = 1 if in while, for, do */
-uchar *curfname = 0; /* current function name */
-Node *arglist = 0; /* list of args for current function */
-%}
-
-%union {
- Node *p;
- Cell *cp;
- int i;
- uchar *s;
-}
-
-%token <i> FIRSTTOKEN /* must be first */
-%token <p> PROGRAM PASTAT PASTAT2 XBEGIN XEND
-%token <i> NL ',' '{' '(' '|' ';' '/' ')' '}' '[' ']'
-%token <i> ARRAY
-%token <i> MATCH NOTMATCH MATCHOP
-%token <i> FINAL DOT ALL CCL NCCL CHAR OR STAR QUEST PLUS
-%token <i> AND BOR APPEND EQ GE GT LE LT NE IN
-%token <i> ARG BLTIN BREAK CLOSE CONTINUE DELETE DO EXIT FOR FUNC
-%token <i> SUB GSUB IF INDEX LSUBSTR MATCHFCN NEXT
-%token <i> ADD MINUS MULT DIVIDE MOD
-%token <i> ASSIGN ASGNOP ADDEQ SUBEQ MULTEQ DIVEQ MODEQ POWEQ
-%token <i> PRINT PRINTF SPRINTF
-%token <p> ELSE INTEST CONDEXPR
-%token <i> POSTINCR PREINCR POSTDECR PREDECR
-%token <cp> VAR IVAR VARNF CALL NUMBER STRING FIELD
-%token <s> REGEXPR
-
-%type <p> pas pattern ppattern plist pplist patlist prarg term re
-%type <p> pa_pat pa_stat pa_stats
-%type <s> reg_expr
-%type <p> simple_stmt opt_simple_stmt stmt stmtlist
-%type <p> var varname funcname varlist
-%type <p> for if while
-%type <i> pst opt_pst lbrace rparen comma nl opt_nl and bor
-%type <i> subop print
-
-%right ASGNOP
-%right '?'
-%right ':'
-%left BOR
-%left AND
-%left GETLINE
-%nonassoc APPEND EQ GE GT LE LT NE MATCHOP IN '|'
-%left ARG BLTIN BREAK CALL CLOSE CONTINUE DELETE DO EXIT FOR FIELD FUNC
-%left GSUB IF INDEX LSUBSTR MATCHFCN NEXT NUMBER
-%left PRINT PRINTF RETURN SPLIT SPRINTF STRING SUB SUBSTR
-%left REGEXPR VAR VARNF IVAR WHILE '('
-%left CAT
-%left '+' '-'
-%left '*' '/' '%'
-%left NOT UMINUS
-%right POWER
-%right DECR INCR
-%left INDIRECT
-%token LASTTOKEN /* must be last */
-
-%%
-
-program:
- pas { if (errorflag==0)
- winner = (Node *)stat3(PROGRAM, beginloc, $1, endloc); }
- | error { yyclearin; bracecheck(); ERROR "bailing out" SYNTAX; }
- ;
-
-and:
- AND | and NL
- ;
-
-bor:
- BOR | bor NL
- ;
-
-comma:
- ',' | comma NL
- ;
-
-do:
- DO | do NL
- ;
-
-else:
- ELSE | else NL
- ;
-
-for:
- FOR '(' opt_simple_stmt ';' opt_nl pattern ';' opt_nl opt_simple_stmt rparen {inloop++;} stmt
- { --inloop; $$ = stat4(FOR, $3, notnull($6), $9, $12); }
- | FOR '(' opt_simple_stmt ';' ';' opt_nl opt_simple_stmt rparen {inloop++;} stmt
- { --inloop; $$ = stat4(FOR, $3, NIL, $7, $10); }
- | FOR '(' varname IN varname rparen {inloop++;} stmt
- { --inloop; $$ = stat3(IN, $3, makearr($5), $8); }
- ;
-
-funcname:
- VAR { setfname($1); }
- | CALL { setfname($1); }
- ;
-
-if:
- IF '(' pattern rparen { $$ = notnull($3); }
- ;
-
-lbrace:
- '{' | lbrace NL
- ;
-
-nl:
- NL | nl NL
- ;
-
-opt_nl:
- /* empty */ { $$ = 0; }
- | nl
- ;
-
-opt_pst:
- /* empty */ { $$ = 0; }
- | pst
- ;
-
-
-opt_simple_stmt:
- /* empty */ { $$ = 0; }
- | simple_stmt
- ;
-
-pas:
- opt_pst { $$ = 0; }
- | opt_pst pa_stats opt_pst { $$ = $2; }
- ;
-
-pa_pat:
- pattern { $$ = notnull($1); }
- ;
-
-pa_stat:
- pa_pat { $$ = stat2(PASTAT, $1, stat2(PRINT, rectonode(), NIL)); }
- | pa_pat lbrace stmtlist '}' { $$ = stat2(PASTAT, $1, $3); }
- | pa_pat ',' pa_pat { $$ = pa2stat($1, $3, stat2(PRINT, rectonode(), NIL)); }
- | pa_pat ',' pa_pat lbrace stmtlist '}' { $$ = pa2stat($1, $3, $5); }
- | lbrace stmtlist '}' { $$ = stat2(PASTAT, NIL, $2); }
- | XBEGIN lbrace stmtlist '}'
- { beginloc = linkum(beginloc, $3); $$ = 0; }
- | XEND lbrace stmtlist '}'
- { endloc = linkum(endloc, $3); $$ = 0; }
- | FUNC funcname '(' varlist rparen {infunc++;} lbrace stmtlist '}'
- { infunc--; curfname=0; defn((Cell *)$2, $4, $8); $$ = 0; }
- ;
-
-pa_stats:
- pa_stat
- | pa_stats opt_pst pa_stat { $$ = linkum($1, $3); }
- ;
-
-patlist:
- pattern
- | patlist comma pattern { $$ = linkum($1, $3); }
- ;
-
-ppattern:
- var ASGNOP ppattern { $$ = op2($2, $1, $3); }
- | ppattern '?' ppattern ':' ppattern %prec '?'
- { $$ = op3(CONDEXPR, notnull($1), $3, $5); }
- | ppattern bor ppattern %prec BOR
- { $$ = op2(BOR, notnull($1), notnull($3)); }
- | ppattern and ppattern %prec AND
- { $$ = op2(AND, notnull($1), notnull($3)); }
- | ppattern MATCHOP reg_expr { $$ = op3($2, NIL, $1, (Node*)makedfa($3, 0)); }
- | ppattern MATCHOP ppattern
- { if (constnode($3))
- $$ = op3($2, NIL, $1, (Node*)makedfa(strnode($3), 0));
- else
- $$ = op3($2, (Node *)1, $1, $3); }
- | ppattern IN varname { $$ = op2(INTEST, $1, makearr($3)); }
- | '(' plist ')' IN varname { $$ = op2(INTEST, $2, makearr($5)); }
- | ppattern term %prec CAT { $$ = op2(CAT, $1, $2); }
- | re
- | term
- ;
-
-pattern:
- var ASGNOP pattern { $$ = op2($2, $1, $3); }
- | pattern '?' pattern ':' pattern %prec '?'
- { $$ = op3(CONDEXPR, notnull($1), $3, $5); }
- | pattern bor pattern %prec BOR
- { $$ = op2(BOR, notnull($1), notnull($3)); }
- | pattern and pattern %prec AND
- { $$ = op2(AND, notnull($1), notnull($3)); }
- | pattern EQ pattern { $$ = op2($2, $1, $3); }
- | pattern GE pattern { $$ = op2($2, $1, $3); }
- | pattern GT pattern { $$ = op2($2, $1, $3); }
- | pattern LE pattern { $$ = op2($2, $1, $3); }
- | pattern LT pattern { $$ = op2($2, $1, $3); }
- | pattern NE pattern { $$ = op2($2, $1, $3); }
- | pattern MATCHOP reg_expr { $$ = op3($2, NIL, $1, (Node*)makedfa($3, 0)); }
- | pattern MATCHOP pattern
- { if (constnode($3))
- $$ = op3($2, NIL, $1, (Node*)makedfa(strnode($3), 0));
- else
- $$ = op3($2, (Node *)1, $1, $3); }
- | pattern IN varname { $$ = op2(INTEST, $1, makearr($3)); }
- | '(' plist ')' IN varname { $$ = op2(INTEST, $2, makearr($5)); }
- | pattern '|' GETLINE var { $$ = op3(GETLINE, $4, (Node*)$2, $1); }
- | pattern '|' GETLINE { $$ = op3(GETLINE, (Node*)0, (Node*)$2, $1); }
- | pattern term %prec CAT { $$ = op2(CAT, $1, $2); }
- | re
- | term
- ;
-
-plist:
- pattern comma pattern { $$ = linkum($1, $3); }
- | plist comma pattern { $$ = linkum($1, $3); }
- ;
-
-pplist:
- ppattern
- | pplist comma ppattern { $$ = linkum($1, $3); }
- ;
-
-prarg:
- /* empty */ { $$ = rectonode(); }
- | pplist
- | '(' plist ')' { $$ = $2; }
- ;
-
-print:
- PRINT | PRINTF
- ;
-
-pst:
- NL | ';' | pst NL | pst ';'
- ;
-
-rbrace:
- '}' | rbrace NL
- ;
-
-re:
- reg_expr
- { $$ = op3(MATCH, NIL, rectonode(), (Node*)makedfa($1, 0)); }
- | NOT re { $$ = op1(NOT, notnull($2)); }
- ;
-
-reg_expr:
- '/' {startreg();} REGEXPR '/' { $$ = $3; }
- ;
-
-rparen:
- ')' | rparen NL
- ;
-
-simple_stmt:
- print prarg '|' term { $$ = stat3($1, $2, (Node *) $3, $4); }
- | print prarg APPEND term { $$ = stat3($1, $2, (Node *) $3, $4); }
- | print prarg GT term { $$ = stat3($1, $2, (Node *) $3, $4); }
- | print prarg { $$ = stat3($1, $2, NIL, NIL); }
- | DELETE varname '[' patlist ']' { $$ = stat2(DELETE, makearr($2), $4); }
- | DELETE varname { yyclearin; ERROR "you can only delete array[element]" SYNTAX; $$ = stat1(DELETE, $2); }
- | pattern { $$ = exptostat($1); }
- | error { yyclearin; ERROR "illegal statement" SYNTAX; }
- ;
-
-st:
- nl | ';' opt_nl
- ;
-
-stmt:
- BREAK st { if (!inloop) ERROR "break illegal outside of loops" SYNTAX;
- $$ = stat1(BREAK, NIL); }
- | CLOSE pattern st { $$ = stat1(CLOSE, $2); }
- | CONTINUE st { if (!inloop) ERROR "continue illegal outside of loops" SYNTAX;
- $$ = stat1(CONTINUE, NIL); }
- | do {inloop++;} stmt {--inloop;} WHILE '(' pattern ')' st
- { $$ = stat2(DO, $3, notnull($7)); }
- | EXIT pattern st { $$ = stat1(EXIT, $2); }
- | EXIT st { $$ = stat1(EXIT, NIL); }
- | for
- | if stmt else stmt { $$ = stat3(IF, $1, $2, $4); }
- | if stmt { $$ = stat3(IF, $1, $2, NIL); }
- | lbrace stmtlist rbrace { $$ = $2; }
- | NEXT st { if (infunc)
- ERROR "next is illegal inside a function" SYNTAX;
- $$ = stat1(NEXT, NIL); }
- | RETURN pattern st { $$ = stat1(RETURN, $2); }
- | RETURN st { $$ = stat1(RETURN, NIL); }
- | simple_stmt st
- | while {inloop++;} stmt { --inloop; $$ = stat2(WHILE, $1, $3); }
- | ';' opt_nl { $$ = 0; }
- ;
-
-stmtlist:
- stmt
- | stmtlist stmt { $$ = linkum($1, $2); }
- ;
-
-subop:
- SUB | GSUB
- ;
-
-term:
- term '+' term { $$ = op2(ADD, $1, $3); }
- | term '-' term { $$ = op2(MINUS, $1, $3); }
- | term '*' term { $$ = op2(MULT, $1, $3); }
- | term '/' term { $$ = op2(DIVIDE, $1, $3); }
- | term '%' term { $$ = op2(MOD, $1, $3); }
- | term POWER term { $$ = op2(POWER, $1, $3); }
- | '-' term %prec UMINUS { $$ = op1(UMINUS, $2); }
- | '+' term %prec UMINUS { $$ = $2; }
- | NOT term %prec UMINUS { $$ = op1(NOT, notnull($2)); }
- | BLTIN '(' ')' { $$ = op2(BLTIN, (Node *) $1, rectonode()); }
- | BLTIN '(' patlist ')' { $$ = op2(BLTIN, (Node *) $1, $3); }
- | BLTIN { $$ = op2(BLTIN, (Node *) $1, rectonode()); }
- | CALL '(' ')' { $$ = op2(CALL, valtonode($1,CVAR), NIL); }
- | CALL '(' patlist ')' { $$ = op2(CALL, valtonode($1,CVAR), $3); }
- | DECR var { $$ = op1(PREDECR, $2); }
- | INCR var { $$ = op1(PREINCR, $2); }
- | var DECR { $$ = op1(POSTDECR, $1); }
- | var INCR { $$ = op1(POSTINCR, $1); }
- | GETLINE var LT term { $$ = op3(GETLINE, $2, (Node *)$3, $4); }
- | GETLINE LT term { $$ = op3(GETLINE, NIL, (Node *)$2, $3); }
- | GETLINE var { $$ = op3(GETLINE, $2, NIL, NIL); }
- | GETLINE { $$ = op3(GETLINE, NIL, NIL, NIL); }
- | INDEX '(' pattern comma pattern ')'
- { $$ = op2(INDEX, $3, $5); }
- | INDEX '(' pattern comma reg_expr ')'
- { ERROR "index() doesn't permit regular expressions" SYNTAX;
- $$ = op2(INDEX, $3, (Node*)$5); }
- | '(' pattern ')' { $$ = $2; }
- | MATCHFCN '(' pattern comma reg_expr ')'
- { $$ = op3(MATCHFCN, NIL, $3, (Node*)makedfa($5, 1)); }
- | MATCHFCN '(' pattern comma pattern ')'
- { if (constnode($5))
- $$ = op3(MATCHFCN, NIL, $3, (Node*)makedfa(strnode($5), 1));
- else
- $$ = op3(MATCHFCN, (Node *)1, $3, $5); }
- | NUMBER { $$ = valtonode($1, CCON); }
- | SPLIT '(' pattern comma varname comma pattern ')' /* string */
- { $$ = op4(SPLIT, $3, makearr($5), $7, (Node*)STRING); }
- | SPLIT '(' pattern comma varname comma reg_expr ')' /* const /regexp/ */
- { $$ = op4(SPLIT, $3, makearr($5), (Node*)makedfa($7, 1), (Node *)REGEXPR); }
- | SPLIT '(' pattern comma varname ')'
- { $$ = op4(SPLIT, $3, makearr($5), NIL, (Node*)STRING); } /* default */
- | SPRINTF '(' patlist ')' { $$ = op1($1, $3); }
- | STRING { $$ = valtonode($1, CCON); }
- | subop '(' reg_expr comma pattern ')'
- { $$ = op4($1, NIL, (Node*)makedfa($3, 1), $5, rectonode()); }
- | subop '(' pattern comma pattern ')'
- { if (constnode($3))
- $$ = op4($1, NIL, (Node*)makedfa(strnode($3), 1), $5, rectonode());
- else
- $$ = op4($1, (Node *)1, $3, $5, rectonode()); }
- | subop '(' reg_expr comma pattern comma var ')'
- { $$ = op4($1, NIL, (Node*)makedfa($3, 1), $5, $7); }
- | subop '(' pattern comma pattern comma var ')'
- { if (constnode($3))
- $$ = op4($1, NIL, (Node*)makedfa(strnode($3), 1), $5, $7);
- else
- $$ = op4($1, (Node *)1, $3, $5, $7); }
- | SUBSTR '(' pattern comma pattern comma pattern ')'
- { $$ = op3(SUBSTR, $3, $5, $7); }
- | SUBSTR '(' pattern comma pattern ')'
- { $$ = op3(SUBSTR, $3, $5, NIL); }
- | var
- ;
-
-var:
- varname
- | varname '[' patlist ']' { $$ = op2(ARRAY, makearr($1), $3); }
- | FIELD { $$ = valtonode($1, CFLD); }
- | IVAR { $$ = op1(INDIRECT, valtonode($1, CVAR)); }
- | INDIRECT term { $$ = op1(INDIRECT, $2); }
- ;
-
-varlist:
- /* nothing */ { arglist = $$ = 0; }
- | VAR { arglist = $$ = valtonode($1,CVAR); }
- | varlist comma VAR { arglist = $$ = linkum($1,valtonode($3,CVAR)); }
- ;
-
-varname:
- VAR { $$ = valtonode($1, CVAR); }
- | ARG { $$ = op1(ARG, (Node *) $1); }
- | VARNF { $$ = op1(VARNF, (Node *) $1); }
- ;
-
-
-while:
- WHILE '(' pattern rparen { $$ = notnull($3); }
- ;
-
-%%
-
-void setfname(Cell *p)
-{
- if (isarr(p))
- ERROR "%s is an array, not a function", p->nval SYNTAX;
- else if (isfunc(p))
- ERROR "you can't define function %s more than once", p->nval SYNTAX;
- curfname = p->nval;
-}
-
-constnode(Node *p)
-{
- return isvalue(p) && ((Cell *) (p->narg[0]))->csub == CCON;
-}
-
-uchar *strnode(Node *p)
-{
- return ((Cell *)(p->narg[0]))->sval;
-}
-
-Node *notnull(Node *n)
-{
- switch (n->nobj) {
- case LE: case LT: case EQ: case NE: case GT: case GE:
- case BOR: case AND: case NOT:
- return n;
- default:
- return op2(NE, n, nullnode);
- }
-}
=awk.1
-.TH AWK 1
-.CT 1 files prog_other
-.SH NAME
-awk \- pattern-directed scanning and processing language
-.SH SYNOPSIS
-.B awk
-[
-.BI -F
-.I fs
-]
-[
-.BI -v
-.I var=value
-]
-[
-.I 'prog'
-|
-.BI -f
-.I progfile
-]
-[
-.I file ...
-]
-.SH DESCRIPTION
-.I Awk
-scans each input
-.I file
-for lines that match any of a set of patterns specified literally in
-.IR prog
-or in one or more files
-specified as
-.B -f
-.IR progfile .
-With each pattern
-there can be an associated action that will be performed
-when a line of a
-.I file
-matches the pattern.
-Each line is matched against the
-pattern portion of every pattern-action statement;
-the associated action is performed for each matched pattern.
-The file name
-.L -
-means the standard input.
-Any
-.IR file
-of the form
-.I var=value
-is treated as an assignment, not a filename,
-and is executed at the time it would have been opened if it were a filename.
-The option
-.B -v
-followed by
-.I var=value
-is an assignment to be done before
-.I prog
-is executed;
-any number of
-.B -v
-options may be present.
-The
-.B -F
-.IR fs
-option defines the input field separator to be the regular expression
-.IR fs.
-.PP
-An input line is normally made up of fields separated by white space.
-(This default can be changed by using the FS built-in variable or the
-.B -F
-.IR fs
-option.)
-The fields are denoted
-.BR $1 ,
-.BR $2 ,
-\&..., while
-.B $0
-refers to the entire line.
-.PP
-A pattern-action statement has the form
-.IP
-.IB pattern " { " action " }
-.PP
-A missing
-.BI { " action " }
-means print the line;
-a missing pattern always matches.
-Pattern-action statements are separated by newlines or semicolons.
-.PP
-An action is a sequence of statements.
-A statement can be one of the following:
-.PP
-.EX
-.ta \w'\f(CWdelete array[expression]'u
-.RS
-.nf
-if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
-while(\fI expression \fP)\fI statement\fP
-for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
-for(\fI var \fPin\fI array \fP)\fI statement\fP
-do\fI statement \fPwhile(\fI expression \fP)
-break
-continue
-{\fR [\fP\fI statement ... \fP\fR] \fP}
-\fIexpression\fP #\fR commonly\fP\fI var = expression\fP
-print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
-printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
-return\fR [ \fP\fIexpression \fP\fR]\fP
-next #\fR skip remaining patterns on this input line\fP
-delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
-exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
-.fi
-.RE
-.EE
-.DT
-.PP
-Statements are terminated by
-semicolons, newlines or right braces.
-An empty
-.I expression-list
-stands for
-.BR $0 .
-String constants are quoted \&\f(CW"\ "\fR,
-with the usual C escapes recognized within.
-Expressions take on string or numeric values as appropriate,
-and are built using the operators
-.B + - * / % ^
-(exponentiation), and concatenation (indicated by a blank).
-The operators
-.B
-! ++ -- += -= *= /= %= ^= > >= < <= == != ?:
-are also available in expressions.
-Variables may be scalars, array elements
-(denoted
-.IB x [ i ] )
-or fields.
-Variables are initialized to the null string.
-Array subscripts may be any string,
-not necessarily numeric;
-this allows for a form of associative memory.
-Multiple subscripts such as
-.B [i,j,k]
-are permitted; the constituents are concatenated,
-separated by the value of
-.BR SUBSEP .
-.PP
-The
-.B print
-statement prints its arguments on the standard output
-(or on a file if
-.BI > file
-or
-.BI >> file
-is present or on a pipe if
-.BI | cmd
-is present), separated by the current output field separator,
-and terminated by the output record separator.
-.I file
-and
-.I cmd
-may be literal names or parenthesized expressions;
-identical string values in different statements denote
-the same open file.
-The
-.B printf
-statement formats its expression list according to the format
-(see
-.IR printf (3)) .
-The built-in function
-.BI close( expr )
-closes the file or pipe
-.IR expr .
-.PP
-The mathematical functions
-.BR exp ,
-.BR log ,
-.BR sqrt ,
-.BR sin ,
-.BR cos ,
-and
-.BR atan2
-are built in.
-Other built-in functions:
-.TF length
-.TP
-.B length
-the length of its argument
-taken as a string,
-or of
-.B $0
-if no argument.
-.TP
-.B rand
-random number on (0,1)
-.TP
-.B srand
-sets seed for
-.B rand
-and returns the previous seed.
-.TP
-.B int
-truncates to an integer value
-.TP
-.BI substr( s , " m" , " n\fB)
-the
-.IR n -character
-substring of
-.I s
-that begins at position
-.IR m
-counted from 1.
-.TP
-.BI index( s , " t" )
-the position in
-.I s
-where the string
-.I t
-occurs, or 0 if it does not.
-.TP
-.BI match( s , " r" )
-the position in
-.I s
-where the regular expression
-.I r
-occurs, or 0 if it does not.
-The variables
-.B RSTART
-and
-.B RLENGTH
-are set to the position and length of the matched string.
-.TP
-.BI split( s , " a" , " fs\fB)
-splits the string
-.I s
-into array elements
-.IB a [1] ,
-.IB a [2] ,
-\&...,
-.IB a [ n ] ,
-and returns
-.IR n .
-The separation is done with the regular expression
-.I fs
-or with the field separator
-.B FS
-if
-.I fs
-is not given.
-.TP
-.BI sub( r , " t" , " s\fB)
-substitutes
-.I t
-for the first occurrence of the regular expression
-.I r
-in the string
-.IR s .
-If
-.I s
-is not given,
-.B $0
-is used.
-.TP
-.B gsub
-same as
-.B sub
-except that all occurrences of the regular expression
-are replaced;
-.B sub
-and
-.B gsub
-return the number of replacements.
-.TP
-.BI sprintf( fmt , " expr" , " ...\fB )
-the string resulting from formatting
-.I expr ...
-according to the
-.IR printf (3)
-format
-.I fmt
-.TP
-.BI system( cmd )
-executes
-.I cmd
-and returns its exit status
-.PD
-.PP
-The ``function''
-.B getline
-sets
-.B $0 to
-the next input record from the current input file;
-.B getline
-.BI < file
-sets
-.B $0
-to the next record from
-.IR file .
-.B getline
-.I x
-sets variable
-.I x
-instead.
-Finally,
-.IB cmd " | getline
-pipes the output of
-.I cmd
-into
-.BR getline ;
-each call of
-.B getline
-returns the next line of output from
-.IR cmd .
-In all cases,
-.B getline
-returns 1 for a successful input,
-0 for end of file, and \-1 for an error.
-.PP
-Patterns are arbitrary Boolean combinations
-(with
-.BR "! || &&" )
-of regular expressions and
-relational expressions.
-Regular expressions are as in
-.IR egrep ;
-see
-.IR grep (1).
-Isolated regular expressions
-in a pattern apply to the entire line.
-Regular expressions may also occur in
-relational expressions, using the operators
-.BR ~
-and
-.BR !~ .
-.BI / re /
-is a constant regular expression;
-any string (constant or variable) may be used
-as a regular expression, except in the position of an isolated regular expression
-in a pattern.
-.PP
-A pattern may consist of two patterns separated by a comma;
-in this case, the action is performed for all lines
-from an occurrence of the first pattern
-though an occurrence of the second.
-.PP
-A relational expression is one of the following:
-.IP
-.I expression matchop regular-expression
-.br
-.I expression relop expression
-.br
-.IB expression " in " array-name
-.br
-.BI ( expr , expr,... ") in " array-name
-.PP
-where a relop is any of the six relational operators in C,
-and a matchop is either
-.B ~
-(matches)
-or
-.B !~
-(does not match).
-A conditional is an arithmetic expression,
-a relational expression,
-or a Boolean combination
-of these.
-.PP
-The special patterns
-.B BEGIN
-and
-.B END
-may be used to capture control before the first input line is read
-and after the last.
-.B BEGIN
-and
-.B END
-do not combine with other patterns.
-.PP
-Variable names with special meanings:
-.TF FILENAME
-.TP
-.B FS
-regular expression used to separate fields; also settable
-by option
-.BI -F fs.
-.TP
-.BR NF
-number of fields in the current record
-.TP
-.B NR
-ordinal number of the current record
-.TP
-.B FNR
-ordinal number of the current record in the current file
-.TP
-.B FILENAME
-the name of the current input file
-.TP
-.B RS
-input record separator (default newline)
-.TP
-.B OFS
-output field separator (default blank)
-.TP
-.B ORS
-output record separator (default newline)
-.TP
-.B OFMT
-output format for numbers (default
-.BR "%.6g" )
-.TP
-.B SUBSEP
-separates multiple subscripts (default 034)
-.TP
-.B ARGC
-argument count, assignable
-.TP
-.B ARGV
-argument array, assignable;
-non-null members are taken as filenames
-.TP
-.B ENVIRON
-array of environment variables; subscripts are names.
-.PD
-.PP
-Functions may be defined (at the position of a pattern-action statement) thus:
-.IP
-.L
-function foo(a, b, c) { ...; return x }
-.PP
-Parameters are passed by value if scalar and by reference if array name;
-functions may be called recursively.
-Parameters are local to the function; all other variables are global.
-Thus local variables may be created by providing excess parameters in
-the function definition.
-.SH EXAMPLES
-.TP
-.L
-length > 72
-Print lines longer than 72 characters.
-.TP
-.L
-{ print $2, $1 }
-Print first two fields in opposite order.
-.PP
-.EX
-BEGIN { FS = ",[ \et]*|[ \et]+" }
- { print $2, $1 }
-.EE
-.ns
-.IP
-Same, with input fields separated by comma and/or blanks and tabs.
-.PP
-.EX
-.nf
- { s += $1 }
-END { print "sum is", s, " average is", s/NR }
-.fi
-.EE
-.ns
-.IP
-Add up first column, print sum and average.
-.TP
-.L
-/start/, /stop/
-Print all lines between start/stop pairs.
-.PP
-.EX
-.nf
-BEGIN { # Simulate echo(1)
- for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
- printf "\en"
- exit }
-.fi
-.EE
-.SH SEE ALSO
-.IR lex (1),
-.IR sed (1)
-.br
-A. V. Aho, B. W. Kernighan, P. J. Weinberger,
-.I
-The AWK Programming Language,
-Addison-Wesley, 1988.
-.SH BUGS
-There are no explicit conversions between numbers and strings.
-To force an expression to be treated as a number add 0 to it;
-to force it to be treated as a string concatenate
-\&\f(CW""\fP to it.
-.br
-The scope rules for variables in functions are a botch;
-the syntax is worse.
-
+ END OF ARCHIVE