"Ed Morton" <
morto...@gmail.com> wrote in message
news:k52o4b$u52$1...@dont-email.me...
> In every situation I can think of off the top of my head (but I admittedly
> haven't spent a lot of time thinking about it), doing that would make your
> code unnecessarily cryptic. Could you give an example of when it would be
> desirable?
As part of an assembler I wrote in TAWK I did this for defining what label
variants should look like:
======================================================
# source language: Thompson AWK 4.0
# first created: 03/08/03
# last revision: 06/09/12
# public function prefix: "SYM"
# ----------------------------
# public constants
# for ease of consistent definition, all symbol/label patterns
# are defined here (even if not used in this file)
# base label form (default; most others add prefixes/suffixes to this)
global SYMglobal = /^[_A-Z](\.?[_A-Z0-9])*$/i
# label types differ based on their initial and final characters
global SYMlocLabel = /^@/ # local label
local varLabel = /^]/ # variable label
local glbLabel = /^[_A-Z]/ # global label
local autLabel = /^[bfl]/ # auto label (internal form)
# "user label" - all the forms available to the user (in label field)
local userLabel = /^[]@]?[_A-Z](\.?[_A-Z0-9])*\$?:?$/i
# branch target "auto-labels"
# - automatically replaced by assembler-generated labels
# branch target (in label field)
local branchLabel = /^-\+?$|^\+-?$/
# branch target references (in expression field)
local fwdtargetRef = /\++/
local baktargetRef = /-+/
# user labels recognized in expressions
# suffixes:
# - b = branch target (decorated)
# - g = global
# - l = local
# - Ub = branch target (undecorated = complete expression)
# - v = variable
global SYMnumLabel_glv = /^[]@]?[_A-Z](\.?[_A-Z0-9])*\:?/i
global SYMnumLabel_b = /^:(\++|-+)/
global SYMnumLabel_Ub = /^(\++|-+)$/
global SYMstrLabel_glv = /^[]@]?_?[A-Z]([_\.]?[A-Z0-9])*_?\$:?/i
# macro text formal argument name pattern
# - replaced by the text of its actual argument during macro expansion
# suffix:
# - e = embedded match (within longer string)
global SYMtxtArg = /^\?[_A-Z](\.?[_A-Z0-9])*$/i
global SYMtxtArg_e = /\?[_A-Z](\.?[_A-Z0-9])*/i
# macro label formal argument name
#- assigned the value of its actual argument during macro expansion
global SYMlblArg = /^[]@][_A-Z](\.?[_A-Z0-9])*\$?$/i
=================================================
I can - and HAVE, several times - changed the REs defined here. Usually the
purpose was to make the patterns less restrictive, eliminating needless
limitations. The point is I do that once here, and the effect ripples out to
whereever these variables are used. I don't have to worry about missing any
use of an RE literal in some other file.
In point of fact, these are only the definitions in the current released
version of the assembler. The development version has slightly different
definitions yet again.
And from a different source file of that assembler:
==========================================================
# field separators
global CKfieldSep = "," # default (could be changed, though not so far)
local fsStopChar
# character, string and regular expression literal operand patterns
# - basic pattern is delimiter, (1) any single char except delimiter or
# backslash or (2) backslash plus next char as a pair, delimiter
global CKcharLitToken = /^'([^'\\]|\\.)+'/
global CKstrLitToken = /^"([^"\\]|\\.)*"/
global CKregexLitToken = /^\/([^/\\]|\\.)+\/i?/
# char and string literal escape codes
local allEscPat = /\\(.|(\$|0?X)[0-9A-F]+|([0-9]|0[A-F])[0-9A-F]*H)/i
{ ....and a bit later.... }
INIT {
# field split scan stop characters
fsStopChar[ "," ] = /[,"'\\\(\/]/
fsStopChar[ "=" ] = /[="'\\]/
fsStopChar[ ":" ] = /[:"'\\]/
}
{ ..okay, even I'm not daring enough to try to statically initialize an
array element :) ...}
{...and a bit later the above are used in this function (note especially the
use of 'stopch' to hold different REs): }
# split string into fields
# - but must not split where separator is escaped or within literal
# - or, if separator is ",", within possible function call
global function CKsplitfield(str, result, fsep) {
local i, j
local ch
local stopch, inparen
local fcount, fval
local field
# no split char -> no split processing
# - still, we make sure result is definitely a string ('0' is ambiguous)
if ( !index(str, fsep) ) {
result[ 1 ] = str ""
return( 1 )
}
# check if split necessary
i = j = 1
inparen = fcount = 0
stopch = fsStopChar[ fsep ]
while ( match(str, stopch, j) ) {
# assume we're going to restart just after this stop char
j = RSTART + 1
# field separator ?
if ( (ch = substr(str, RSTART, 1)) == fsep ) {
field[ ++fcount ] = substr( str, i, RSTART - i )
i = j
continue
}
# open parenthesis ?
# - removes field split char from stops, adds close parenthesis
if ( ch == "(" ) {
inparen++
stopch = /["'\\\(\/\)]/
continue
}
# close parenthesis ?
if ( ch == ")" ) {
if ( --inparen == 0 )
stopch = fsStopChar[ fsep ]
continue
}
# must be double quote, single quote, slash or escape char
# - if we can match a pattern, skip to its end
if ( ch == "\"" )
match( str, CKstrLitToken, RSTART )
else if ( ch == "'" )
match( str, CKcharLitToken, RSTART )
else if ( ch == "/" )
match( str, CKregexLitToken, RSTART )
else
match( str, allEscPat, RSTART )
# if found match to pattern, restart after it
# - if no match there's a good chance it's an error,
# but we'll restart just after the stop char anyway
if ( RSTART )
j = RSTART + RLENGTH
}
# unconditionally take everything left
# - if i == 1 here, then any split char was escaped or within literal
# - does this happen often enough to be worth checking for ?
# - if i > length(str), then the last char in str must have been
# a field separator
# - which is an error we want to catch anyway !
field[ ++fcount ] = substr( str, i )
# eliminate leading and trailing whitespace and save result
i = fcount
do {
fval = field[ i ]
if ( !match(fval, /[^ \t]/) ) {
UMerror( "BadField", "#" i "> <" str )
fcount = 0
}
else {
if ( RSTART > 1 )
fval = substr( fval, RSTART )
if ( match(fval, /[ \t]+$/) )
fval = substr( fval, 1, RSTART-1 )
result[ i ] = fval
}
} while ( --i )
return( fcount )
}
========================================
So there's a couple of examples. HTH.
- Anton Treuenfels