A note on subgrammars in GraphViz2::Marpa

9 views

Skip to first unread message

Ron Savage

unread,

Nov 25, 2014, 10:37:13 PM11/25/14

to marpa-...@googlegroups.com

GraphViz2::Marpa V 2.00, just released, parses Graphviz's DOT files. Part of the DOT syntax is a HTML-like language.

I parse the latter by switching to a tiny grammar, devised by rns, specialized for double-quoted strings nested to arbitrary depth. I simply tweaked it to handle HTML.

To do that, (and I'm using pauses), I had to call Marpa::R2::Scanless::R's last_completed_span(), when an expected error (see below) is triggered. Anyone interested in this should consult the source of GraphViz2::Marpa, around line 1058, _process_html().

This is not related to the Ruby Slippers technique. There, the aim is to fabricate input tokens and keep parsing what's assumed to be nothing but HTML.

For DOT, any error in the HTML is fatal, and the expected error is triggered after the HTML part has been correctly scanned, and is simply because these HTML-like strings are embedded in DOT files, and must always be followed by something in the input stream.

So, the error message 'Error in SLIF parse: Parse exhausted, but lexemes remain' is true but not an error in this context. I simply switch bask to the original BNF for those remaining lexemes (i.e. the rest of the input stream). The only scary thing is that I (currently) check for that exact text. I do realize a recent patch to Marpa made parse exhaustion an event, but hey, release early and release often.

Anyway, every time HTML is entered or left, a grammar switch takes place. And guess which fabulous parsing lib makes this all possible......

Now, thinking, if I had a - released - XML parser, my SVG parser could skip using XML::SAX::ParserFactory for the XML and Marpa for the tag attributes, and use Marpa for both. Calling all Jean-Damien's.....