On Sunday, March 14, 2021 at 7:52:12 PM UTC-5, Rock Brentwood wrote:
> The grammar specified by POSIX is actually a *regular* grammar that specifies a finite state transducer.
> Apologies if any of the columns spill over.
Correction: union UnionC { κ₁ } should read union '{' UnionC '}' { κ₁ }.
The list of tokens used by the spec:
token - "%token" or "%0"
left - "%left" or "%<"
right - "%right" or "%>"
binary - "%binary" or "%nonassoc" or "%2"
type - "%type"
prec - "%prec"
start - "%start"
union - "%union"
mark - "%%"
en - "%{"
de - "%}"
id - An alphanumeric identifier [a-zA-Z_.][a-zA-Z_.0-9]*
num - A numeral [0-9][0-9]*.
literal - A literal identifier: any valid C character enclosed in single quotes.
Multi-character literals may be allowed, I haven't read the POSIX standard closely on this issue.
UnionC, CodeC, ProgramC, ActionC: C code snippets consisting of items in a format that would be accepted by C preprocessor.
CodeC: has no "%}" in it, other than those appearing in comments or string literals or character literals.
ActionC, UnionC: cannot have unbalanced "{" and "}" characters since the curly brackets are being used to determine when the snippet starts and ends.
ProgramC: can have anything in it and consists of whatever appears after the "%%" mark up to the end of file.
ActionC: May contain the macros $$, $n, $<tag>, $<tag>n which are to be converted. Parentheses '(' and ')' cannot be unbalanced in ActionC.
In the Unix and BSD versions of Yacc, code snippets are stowed away in temporary files, rather than in memory. These days, this is no longer necessary: contemporary OS's *already* cache excess memory to secondary storage. But it would be exceedingly rare for an actual yacc file to have code snippets so large as to require so much memory as to make this necessary - not even for a natural language grammar!