Hi All,
I'm new to ANTLR and trying to use it to parse config files of a FortiGate firewall. I found several, quite messy, implementations with regexes that are difficult to maintain and adapt to newer version of the FortiGate. So I started working on a project to generate the parser with ANTLR; just to see if it's possible. After reading the first couple of chapters of the book I focused on the big structure first: I created a grammar that parses the mandatory top level config sections first. When that works I want to gradually zoom in on the more specific language constructs. It's stunning how well ANTLR already manages to parse the example config files into a parse tree.
The config file format in a nutshell:
- starts with some comments with version info
- continues with a "config section" listing vdoms, then a global "config section", then a "config_section" for each vdom
- contains a lot of nesting; config sections can be nested; a edit section can be nested in a set statement section (and vice versa)
- it has some weird inconsistencies: a edit section can be closed by an "end"; or alternatively by an "next" in one of it outer sections
The example.conf attachment contains a representative config file.
I'm stuck on some concepts:
- I need a way to match all config that doesn't belong to one of the defined parts yet: without this every change in the grammar has unpredictable results for the complete parsing tree
- It's difficult, at least for me, to get the nested config section stuff right
- I'm struggling with stuff like: should I include the mandatory newline before the end of a section or not; it seems a lot cleaner, and possible, without the \n
- Should I define each config section that is possible: e.g. "config router" instead of "config KEYWORD"
- The book is very clear; but it's quite hard to find some examples of grammars parsing config files with it
The grammar so far:
grammar FortiGate;
file: config_vdom config_global config_vdom+;
config_global : 'config global' config_system+ NEWLINE NEWLINE 'end' ;
config_system : 'config system' IDENTIFIER NEWLINE statements+ NEWLINE 'end' ;
config_vdom : 'config vdom' (statements)* NEWLINE 'end' ;
edit_section : 'edit' '"'IDENTIFIER'"' NEWLINE set_statement* 'next' ;
set_statement : 'set' IDENTIFIER+ NEWLINE ;
/* a statement can be either a nested config section or some statements */
statements : edit_section
| set_statement
| IDENTIFIER+
;
//LEXER RULES
IDENTIFIER : [a-zA-Z0-9-=]+ ;
NEWLINE : '\r'? '\n' ; // return newlines to parser: TODO; determine if we need this to determine END of section
WS : [ \t]+ -> skip ; // Define whitespace rule, toss it out
COMMENT : '#' -> skip ;
Any help is appreciated, bear with me; I'm still reading the book :)
Kind Regards,
R. Dohmen