why is this string not recognized? trouble with complex L0 and quotes, I think.

30 views
Skip to first unread message

stefan.g...@gmail.com

unread,
Feb 19, 2018, 2:49:09 PM2/19/18
to marpa parser
I'm scarcely more than a novice with Marpa, so please forgive me if I'm asking for too much or being naive.  

A sample of my legacy DSL looks like this:

sublevel: -only "-R[SUNLF]{0,1}\d+\s" -testargsmore -foo 
{ 
  < test { foo } >
}

The sublevel: is supposed to be a key word, introducing a kind of statement.  

Following the key word is a bunch of nearly arbitrary text (containing options and parameters), terminated by a {}-delimited body.  So the overall structure is this:

sublevel: <options>
{
    <more statements>
} 

So, the open curly signals the end of the <options> and the start of the body.

This legacy DSL allows curlies inside the <options> provided they are quoted (single or double) or escaped.  So, the curlies in {0,1} should not be interpreted as special, but taken verbatim.  

I studied the string grammar listed in https://gist.github.com/jddurand/8d3238c22731a85eb890 and used it as a guide for my development for the grammar I list below.  In particular, that example taught me that L0 rules permit alternative productions, and also allow sequences.  Anyway, the portion of the grammar in ALL CAPS was derived from that example.

But, I get this "No lexeme" error when it hits the first dquote, and I cannot figure out why!

Setting trace_terminals option
Setting trace_values option
Discarded lexeme L1c1: whitespace
Accepted lexeme L2c1-9 e1: 'sublevel:'; value="sublevel:"
Accepted lexeme L2c1-9 e1: 'sublevel:'; value="sublevel:"
Accepted lexeme L2c10-16 e2: SUBLEVELOPTIONS; value=" -only "
****** FAILED TO PARSE ******
MSG:
Error in SLIF parse: No lexeme found at line 2, column 17
* String before error: \nsublevel: -only\s
* The error was at line 2, column 17, and at character 0x0022 '"', ...
* here: "-R[SUNLF]{0,1}\\d+\\s" -testargsmore -foo\n{\n  <
Marpa::R2 exception at ./marpa_bnf_1.pl line 31.

I would be grateful for any insights.  

I intended that my sample input would have been interpreted, at some depth of productions, as 

sublevel: SublevelOptions
{
  NamedBlockList
}

and I thought that the SublevelOptions

 -only "-R[SUNLF]{0,1}\d+\s" -testargsmore -foo

would decompose into 

STRING_UNQUOTED = ( -only )
STRING_DQUOTED = ("-R[SUNLF]{0,1}\d+\s")
STRING_UNQUOTED = ( -testargsmore -foo)

and I failed to see why it doesn't do so.  Instead, Marpa tells me it doesn't know what to do when it sees that dquote.  

Below is my full grammar.  

:default ::= action => [name, start, length, values]
lexeme default = latm => 1

File ::= BodyStatements
File ::=

BodyStatements ::= BodyStatement+

BodyStatement ::=
    Sublevel
  | SingleTest


Sublevel ::= ('sublevel:') SublevelOptionsMaybe ('{') BodyStatements ('}')
Sublevel ::= ('sublevel:') SublevelOptionsMaybe ('{') ('}')
SublevelOptionsMaybe ::= SublevelOptions
SublevelOptionsMaybe ::=

SublevelOptions      ::=  SUBLEVELOPTIONS

SUBLEVELOPTIONS             ~ SUBLEVELOPTIONS_STRING+

SUBLEVELOPTIONS_STRING      ~ STRING_UNQUOTED
                            | STRING_SQUOTED
                            | STRING_DQUOTED

STRING_UNQUOTED             ~ CHAR_UNQUOTED+
CHAR_UNQUOTED               ~ [^"'\}\{;\\\n]
CHAR_UNQUOTED               ~ ES

STRING_SQUOTED              ~ SQUOTE STRING_INSIDE_SQUOTES SQUOTE
STRING_INSIDE_SQUOTES       ~ CHAR_INSIDE_SQUOTES*
CHAR_INSIDE_SQUOTES         ~ [^'\\]
CHAR_INSIDE_SQUOTES         ~ [\\] [']
SQUOTE                      ~ [']

STRING_DQUOTED              ~ DQUOTE STRING_INSIDE_DQUOTES DQUOTE
STRING_INSIDE_DQUOTES       ~ CHAR_INSIDE_DQUOTES*
CHAR_INSIDE_DQUOTES         ~ [^"\\\n]
CHAR_INSIDE_DQUOTES         ~ [\\] [^#]
DQUOTE                      ~ ["]

ES                          ~ [\\] [\\'"\{\};]

NamedBlockList ::= NamedBlock+
NamedBlock ::= ArgTag ('{') ArgBodyMaybe ('}')
ArgBodyMaybe ::= ArgBody
ArgBodyMaybe ::=
ArgBody ~ [^\{\}]+
ArgTag ~ [\w]+

SingleTest ::=
    SingleSimpleTest

SingleSimpleTest ::= ('<') NamedBlockList ('>')

# whitespace
:discard ~ whitespace
whitespace ~ [\s]+

 

Jeffrey Kegler

unread,
Feb 19, 2018, 4:34:13 PM2/19/18
to Marpa Parser Mailing LIst
[ This is off the top of my head and untested. ]

SubLevelOptions, despite its name, only allows for one sublevel option.  You perhaps want something more like:

SublevelOptionsMaybe ::= SublevelOptions
SubLevelOptions ::= SubLevelOption+
SublevelOptionsMaybe ::=

SublevelOption      ::=  SUBLEVELOPTIONS

There are alternative ways to write the above that are more elegant and probably better, but I think it conveys the idea.  Again, untested.

I hope this helps, jeffrey

--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

stefan.g...@gmail.com

unread,
Feb 19, 2018, 10:18:34 PM2/19/18
to marpa parser
I thought that 

SUBLEVELOPTIONS             ~ SUBLEVELOPTIONS_STRING+

would cause the input to be chopped up into a series of quoted and unquoted segments, where the quoted segments were allowed to contain spaces (see CHAR_INSIDE_SQUOTES and CHAR_INSIDE_DQUOTES), but the unquoted segments were not (see CHAR_UNQUOTED). 

Anyway, I will give your suggestion a try.

Thank you!

-stefan

To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.

stefan.g...@gmail.com

unread,
Feb 20, 2018, 3:09:21 PM2/20/18
to marpa parser
Okay, I tried that, and it mostly worked!  But I don't understand why.  I think I may have to study this a bit more - my mental model for marpa's handling of L0 grammar is probably not right.

-stefan

Jeffrey Kegler

unread,
Feb 20, 2018, 8:26:15 PM2/20/18
to Marpa Parser Mailing LIst
The model for the relationship between G1 and L0 is the classic parser/lexer divide of yacc, bison and the 1970s textbooks.  A lexer divides the input into tokens and a higher-level parser parses the token stream.

Marpa adds a new wrinkle.  The classic lexer was "blind" -- it has no idea of the parsing context.  Marpa's L0 grammar only looks for tokens actually expected by the G1 parser.

If you look back at your grammar and the error message you got, it might give you some insight.   Your grammar did not allow for multiple sublevel options -- only one.  So a first sublevel option was read, but when what was intended as your 2nd sublevel option was encountered, the G1 grammar was not expecting it, so that the L0 parser didn't look for it.  What L0 encountered at that point did not, in fact, match anything it was looking for, so it reported "No lexeme".

I hope this helps, jeffrey
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages