Is it possible to use yacc for writing context-aware languages? In
particular, I am looking for a way to parse Cisco IOS config files.
For example:
---------------
ip route 0.0.0.0
interface Serial3/0
description blah
shut
router bgp 1
no synchronization
network 1.2.3.4
access-list 10 permit 10.10.10.10 255.255.255.255
---------------
This example contains two indented blocks, each of which uses a
different grammar. Is it possible to support such a language with
yacc? Else I'd build my own parser on top of the lex module, but
perhaps there's an easier way?
-Samuel
With pure yacc, no.
> This example contains two indented blocks, each of which uses a
> different grammar. Is it possible to support such a language with
> yacc? Else I'd build my own parser on top of the lex module, but
> perhaps there's an easier way?
You can steel your solution from Python that performs indent/dedent counting,
and injects { and } -like brackets between the lexer and parser to give the
parser a happy context-free world.
Albert
How would that make it context-free? In my example, the "no
synchronization" is only valid syntax under the "router bgp 1" block.
Would it be found elsewhere, the parser should throw a syntax error.
Inserting brackets does not change that.
-Samuel
I think that the builtin lexer allows you to change states. You
should be able to change to a "BGP" state and then throw an error if
it hits a token that isn't valid for that state. Something along
those lines at least.
http://www.dabeaz.com/ply/ply.html#ply_nn21
Christian
http://www.dowski.com
Ah, that's what I was looking for. Thanks a lot!
-Samuel
Context-free means that layout is not relevant, ie
interface Serial3/0
description blah
is the same as
interface Serial3/0
description blah
and
interface Serial3/0 description blah
which is not true in your case. By inserting brackets you make the above true
again, ie
interface Serial3/0 {
description blah
}
and
interface Serial3/0 { description blah }
can both be interpreted in the same way.
One way to get these brackets is to force the user to include them, another
solution is what Python did, counting indentation, and injecting them
automatically. See also http://docs.python.org/ref/indentation.html
> synchronization" is only valid syntax under the "router bgp 1" block.
> Would it be found elsewhere, the parser should throw a syntax error.
With brackets, the parser can do this for you. Assume your top-level
non-terminal is 'config', then some rules like
config : entry
config : config entry
entry : IP_ROUTE IP_ADDR
entry : INTERFACE IDENTIFIER BR_OPEN if_entries BR_CLOSE
entry : ROUTER NAME BR_OPEN rt_entries BR_CLOSE
if_entries : if_entry
if_entries : if_entries if_entry
if_entry : DESCRIPTION NAME
if_entry : SHUT
rt_entries : rt_entry
rt_entries : rt_entries rt_entry
rt_entry : SYNCHRONIZATION
rt_entry : NO SYNCHRONIZATION
rt_entry : NETWORK IP_ADDR
Above, the rules/keywords you can use below a header-line are put inside two
brackets. Also, each set of rules has its own non-terminal so you have precise
control over what is allowed.
> Inserting brackets does not change that.
The brackets (BR_OPEN and BR_CLOSE) help decide the parser decide when a
'sub-section' starts and ends so it knows when to stop looking for eg if_entries.
Sincerely,
Albert