Use ANTLR to parse fortigate config

341 views
Skip to first unread message

Rene Dohmen

unread,
Oct 24, 2016, 6:20:09 AM10/24/16
to antlr-discussion
Hi All,

I'm new to ANTLR and trying to use it to parse config files of a FortiGate firewall. I found several, quite messy, implementations with regexes that are difficult to maintain and adapt to newer version of the FortiGate. So I started working on a project to generate the parser with ANTLR; just to see if it's possible. After reading the first couple of chapters of the book I focused on the big structure first: I created a grammar that parses the mandatory top level config sections first. When that works I want to gradually zoom in on the more specific language constructs. It's stunning how well ANTLR already manages to parse the example config files into a parse tree.

The config file format in a nutshell:
- starts with some comments with version info
- continues with a "config section" listing vdoms, then a global "config section", then a "config_section" for each vdom
- contains a lot of nesting; config sections can be nested; a edit section can be nested in a set statement section (and vice versa)
- it has some weird inconsistencies: a edit section can be closed by an "end"; or alternatively by an "next" in one of it outer sections

The example.conf attachment contains a representative config file.

I'm stuck on some concepts:
- I need a way to match all config that doesn't belong to one of the defined parts yet: without this every change in the grammar has unpredictable results for the complete parsing tree
- It's difficult, at least for me, to get the nested config section stuff right
- I'm struggling with stuff like: should I include the mandatory newline before the end of a section or not; it seems a lot cleaner, and possible, without the \n
- Should I define each config section that is possible: e.g. "config router" instead of "config KEYWORD"
- The book is very clear; but it's quite hard to find some examples of grammars parsing config files with it 

The grammar so far:
grammar FortiGate;

file
: config_vdom config_global config_vdom+;

config_global
: 'config global' config_system+ NEWLINE NEWLINE 'end' ;
config_system
: 'config system' IDENTIFIER NEWLINE statements+ NEWLINE 'end' ;
config_vdom
: 'config vdom' (statements)* NEWLINE 'end' ;

edit_section
: 'edit' '"'IDENTIFIER'"' NEWLINE set_statement* 'next' ;
set_statement
: 'set' IDENTIFIER+ NEWLINE ;

/* a statement can be either a nested config section or some statements */
statements
: edit_section
 
| set_statement
 
| IDENTIFIER+
 
;

//LEXER RULES
IDENTIFIER
: [a-zA-Z0-9-=]+ ;
NEWLINE
: '\r'? '\n' ; // return newlines to parser: TODO; determine if we need this to determine END of section
WS
: [ \t]+ -> skip ; // Define whitespace rule, toss it out
COMMENT
: '#' -> skip ;

Src code of the complete project including more example config's: https://github.com/acidjunk/fortigate-config-parser

Any help is appreciated, bear with me; I'm still reading the book :)

Kind Regards,

R. Dohmen
example.conf

Rene Dohmen

unread,
Nov 1, 2016, 8:08:26 AM11/1/16
to antlr-discussion
Hi,

After reading some other posts and answers on stackoverflow and examples from batfish (which also uses ANTLR for the parsing of config files) I decided to work on the lexer first.
Fortigate has around 140 different config section and I'm adding them all to grammar. https://github.com/acidjunk/fortigate-config-parser/blob/develop/antlr_grammar/FortiGate.g4

Probably will need some help when working on the parser; escpecially with the ambigious config section endings (some sections end with "next" and will close 2 or more surrounding sections.) and with the best way to support the nesting of sections.

Is this the correct place to ask for some help or should it all be on StackOverflow?

Kind Regards,

Rene
Reply all
Reply to author
Forward
0 new messages