Context-Aware Parsing (Cisco Config Files)

537 views
Skip to first unread message

Samuel

unread,
Oct 24, 2007, 11:08:18 AM10/24/07
to ply-hack
Hi,

Is it possible to use yacc for writing context-aware languages? In
particular, I am looking for a way to parse Cisco IOS config files.
For example:

---------------
ip route 0.0.0.0
interface Serial3/0
description blah
shut
router bgp 1
no synchronization
network 1.2.3.4
access-list 10 permit 10.10.10.10 255.255.255.255
---------------

This example contains two indented blocks, each of which uses a
different grammar. Is it possible to support such a language with
yacc? Else I'd build my own parser on top of the lex module, but
perhaps there's an easier way?

-Samuel

A.T.Hofkamp

unread,
Oct 24, 2007, 11:53:18 AM10/24/07
to ply-...@googlegroups.com
Samuel wrote:
> Hi,
>
> Is it possible to use yacc for writing context-aware languages? In

With pure yacc, no.

> This example contains two indented blocks, each of which uses a
> different grammar. Is it possible to support such a language with
> yacc? Else I'd build my own parser on top of the lex module, but
> perhaps there's an easier way?

You can steel your solution from Python that performs indent/dedent counting,
and injects { and } -like brackets between the lexer and parser to give the
parser a happy context-free world.

Albert

Samuel

unread,
Oct 24, 2007, 1:19:01 PM10/24/07
to ply-hack
On Oct 24, 5:53 pm, "A.T.Hofkamp" <a.t.hofk...@tue.nl> wrote:
> You can steel your solution from Python that performs indent/dedent counting,
> and injects { and } -like brackets between the lexer and parser to give the
> parser a happy context-free world.

How would that make it context-free? In my example, the "no
synchronization" is only valid syntax under the "router bgp 1" block.
Would it be found elsewhere, the parser should throw a syntax error.
Inserting brackets does not change that.

-Samuel

Christian Wyglendowski

unread,
Oct 24, 2007, 1:40:47 PM10/24/07
to ply-...@googlegroups.com

I think that the builtin lexer allows you to change states. You
should be able to change to a "BGP" state and then throw an error if
it hits a token that isn't valid for that state. Something along
those lines at least.

http://www.dabeaz.com/ply/ply.html#ply_nn21

Christian
http://www.dowski.com

Samuel

unread,
Oct 24, 2007, 1:55:19 PM10/24/07
to ply-hack
On Oct 24, 7:40 pm, "Christian Wyglendowski" <christ...@dowski.com>
wrote:

> I think that the builtin lexer allows you to change states. You
> should be able to change to a "BGP" state and then throw an error if
> it hits a token that isn't valid for that state. Something along
> those lines at least.
>
> http://www.dabeaz.com/ply/ply.html#ply_nn21

Ah, that's what I was looking for. Thanks a lot!

-Samuel

A.T.Hofkamp

unread,
Oct 25, 2007, 2:42:01 AM10/25/07
to ply-...@googlegroups.com
Samuel wrote:
> On Oct 24, 5:53 pm, "A.T.Hofkamp" <a.t.hofk...@tue.nl> wrote:
>> You can steel your solution from Python that performs indent/dedent counting,
>> and injects { and } -like brackets between the lexer and parser to give the
>> parser a happy context-free world.
>
> How would that make it context-free? In my example, the "no

Context-free means that layout is not relevant, ie

interface Serial3/0
description blah

is the same as

interface Serial3/0
description blah

and

interface Serial3/0 description blah

which is not true in your case. By inserting brackets you make the above true
again, ie

interface Serial3/0 {
description blah
}

and

interface Serial3/0 { description blah }

can both be interpreted in the same way.

One way to get these brackets is to force the user to include them, another
solution is what Python did, counting indentation, and injecting them
automatically. See also http://docs.python.org/ref/indentation.html


> synchronization" is only valid syntax under the "router bgp 1" block.
> Would it be found elsewhere, the parser should throw a syntax error.

With brackets, the parser can do this for you. Assume your top-level
non-terminal is 'config', then some rules like

config : entry
config : config entry

entry : IP_ROUTE IP_ADDR
entry : INTERFACE IDENTIFIER BR_OPEN if_entries BR_CLOSE
entry : ROUTER NAME BR_OPEN rt_entries BR_CLOSE

if_entries : if_entry
if_entries : if_entries if_entry

if_entry : DESCRIPTION NAME
if_entry : SHUT

rt_entries : rt_entry
rt_entries : rt_entries rt_entry

rt_entry : SYNCHRONIZATION
rt_entry : NO SYNCHRONIZATION
rt_entry : NETWORK IP_ADDR

Above, the rules/keywords you can use below a header-line are put inside two
brackets. Also, each set of rules has its own non-terminal so you have precise
control over what is allowed.

> Inserting brackets does not change that.

The brackets (BR_OPEN and BR_CLOSE) help decide the parser decide when a
'sub-section' starts and ends so it knows when to stop looking for eg if_entries.

Sincerely,
Albert

Reply all
Reply to author
Forward
0 new messages