Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Introducing FALIB, a compiler compiler

3 views
Skip to first unread message

Willow

unread,
Dec 19, 2009, 6:54:22 AM12/19/09
to
I just finished the first release of my very own compiler compiler,
similar to Yacc or Bison.
You can find the C++ source code at the top of this page:
http://code.google.com/p/vm64dec/downloads/list

It is called RULECOM, the Rule Compiler and uses FALIB, which stands
for Finite Automaton LIBrary (included). It accepts EBNF-style
grammars and then converts each rule into an NFA, which has lambda
loops and transitions removed, then is converted to a DFA for quick
parsing.

I made heavy use of DFA theory in the parser. Nonterminals in the
grammar are treated like regular symbols, although it makes sure there
are no conflicts. In the generated switch statement, the nonterminal
is expanded to a series of terminals that the nonterminal can start
with. If one of the nonterminal's symbols are accepted, the out edge
is taken only after a call is made to the nonterminal method, itself.

I used some boostrap code to get the process started. Now, the
compiler compiler (RULECOM) actually is partly self-generated, the
grammar it uses to parse grammars, is a grammar in the same language!

Willow
---
Here is the compiler compiler's grammar for the grammars it accepts:

// in_rulegram.c - grammar for rulecom grammars
// Copyright (C) 2009 Willow Schlanger

// *** After bootstrap, add a ';' to end of copyright & prefix.
copyright "Copyright (C) 2009 Willow Schlanger"
prefix "rc" // for rulecom

token IDENT LITCHAR LITSTRING ;
keyword "::=" ISDEFAS ;

start ::=
statement { statement } ;

statement ::=
"token" IDENT { IDENT } ';' |
name "::=" alternate { '|' alternate } ';' |
"keyword" LITSTRING IDENT ';' |
"copyright" LITSTRING |
"prefix" LITSTRING ;

name ::= IDENT | "start" ;

alternate ::= symbol { symbol } ;

symbol ::=
item |
'{' alternate '}' |
'[' alternate ']' ;

item ::=
LITCHAR |
LITSTRING |
IDENT ; // note: IDENT may start with 0-9 too

0 new messages