It is called RULECOM, the Rule Compiler and uses FALIB, which stands
for Finite Automaton LIBrary (included). It accepts EBNF-style
grammars and then converts each rule into an NFA, which has lambda
loops and transitions removed, then is converted to a DFA for quick
parsing.
I made heavy use of DFA theory in the parser. Nonterminals in the
grammar are treated like regular symbols, although it makes sure there
are no conflicts. In the generated switch statement, the nonterminal
is expanded to a series of terminals that the nonterminal can start
with. If one of the nonterminal's symbols are accepted, the out edge
is taken only after a call is made to the nonterminal method, itself.
I used some boostrap code to get the process started. Now, the
compiler compiler (RULECOM) actually is partly self-generated, the
grammar it uses to parse grammars, is a grammar in the same language!
Willow
---
Here is the compiler compiler's grammar for the grammars it accepts:
// in_rulegram.c - grammar for rulecom grammars
// Copyright (C) 2009 Willow Schlanger
// *** After bootstrap, add a ';' to end of copyright & prefix.
copyright "Copyright (C) 2009 Willow Schlanger"
prefix "rc" // for rulecom
token IDENT LITCHAR LITSTRING ;
keyword "::=" ISDEFAS ;
start ::=
statement { statement } ;
statement ::=
"token" IDENT { IDENT } ';' |
name "::=" alternate { '|' alternate } ';' |
"keyword" LITSTRING IDENT ';' |
"copyright" LITSTRING |
"prefix" LITSTRING ;
name ::= IDENT | "start" ;
alternate ::= symbol { symbol } ;
symbol ::=
item |
'{' alternate '}' |
'[' alternate ']' ;
item ::=
LITCHAR |
LITSTRING |
IDENT ; // note: IDENT may start with 0-9 too