Re: Particular grammar transition from Parse::RecDescent

47 views
Skip to first unread message

Jeffrey Kegler

unread,
Mar 15, 2013, 1:12:06 PM3/15/13
to marpa-...@googlegroups.com
@rushpl: Unless I missed something, this is a straight translation, except for the C and C++-style comments.  I wrote some rules that handle C-style comments, and those are in Jean-Damien's c2ast.  The C++-style comments can be dealt with as two rules, one for the newline-terminated case, the other for the final-unterminated-line case.

-- jeffrey

rus...@gmail.com wrote:
Today I discovered Marpa::R2 and it looks interesting. I am particularly interested in its scanless interface. I have looked through 
the JSON parser implemented with the scanless interface but I am still in doubt whether it would allow to convert
my grammer .. which is very unstrict .. almost any kind of string can be an expression and needs to accept unicode in \w.
Expressions need to be separated by semicolon, but semicolons are optional where possible.
text: /(?:\\\\|\\\)|\\\(|\w|[^()])+/
text3: /(?:\\\\|\\\[|\\}|\\{|\\;|\w|[^\[;{}])+/
expression: text3 (';'|)
else_block: 'else' instruction
if_block: 'if' '(' text ')'  instruction 
ifs_chain: if_block else_block(?)
comment: /\/\/[^\n]*(\n|$)/ {$item[1];} | /\/\*(\s|\S)+?\*\// 
instruction: comment | ifs_chain | expression
Simplified example input:

   Some ↑Unicode↑ in\;put; 
 // comment which needs to be parsed, not ignored
   Second expression;
   if(Something)
      First thing
   else
      Other thing;
   End

I would glad to get the advice first about practicality of converting such grammar .. if possible I would welcome any tips you could give. :)

Best Regards,
Damian Kaczmarek 
--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Durand Jean-Damien

unread,
Mar 15, 2013, 6:15:55 PM3/15/13
to marpa-...@googlegroups.com, rus...@gmail.com
Tentative answer at https://gist.github.com/jddurand/5173485
Output is in attachement.
Regards, Jean-Damien.
transitiongg.txt

Jeffrey Kegler

unread,
Mar 15, 2013, 7:44:13 PM3/15/13
to marpa-...@googlegroups.com
@Jean-Damien: Thanks for doing this.  Your handling of C++ comments improves on my suggestion.  -- jeffrey

Durand Jean-Damien wrote:
--

Peter Stuifzand

unread,
Mar 16, 2013, 8:35:26 PM3/16/13
to marpa-...@googlegroups.com
An easy way to make the AST more friendly is by using the 'bless' adverb. This will bless the array returned from the rule. You could write methods that return parts of the tree.

For example you could do something like this:

expr   ::=  expr '+' expr        bless => op

sub left {
    my $self = shift;
    return $self->[0];
}

sub right {
    my $self = shift;
    return $self->[2];
}

sub op {
    my $self = shift;
    return $self->[1];
}



--
Peter Stuifzand | peterstuifzand.nl | @pstuifzand


On Sun, Mar 17, 2013 at 12:42 AM, <rus...@gmail.com> wrote:
I am trying to make the AST a little more friendly for the code I already have for processing the format created by Parse::RecDescent, meaning:

$ast->{start}->{instructions}->[0]->{instruction}->{expression}->{text} .. you get the idea

Now converting the format from Marpa would be simple enough but still requires doing a lot of manual work so I am wondering is there maybe some better way or preferred way for creating such AST structure?

Such format would be especially nice if converted to JSON.  Thanks a lot.

Durand Jean-Damien

unread,
Mar 18, 2013, 10:11:41 AM3/18/13
to marpa-...@googlegroups.com, rus...@gmail.com
To debug the grammar, a useful option is: { trace_terminals => 1} to put in Marpa::R2::Scanless::R->new arguments.
Seems to me this is because \s also matches text3_0.
Gist updated.
Regards, Jean-Damien.

2013/3/18 <rus...@gmail.com>
I have an issue with Marpa I find hard to understand: given this gist https://gist.github.com/jddurand/5173485 by Durand Jean-Damien when I change input to:

my $input  = "first; second; if(a) b; else c; d;";

then I seem to trigger an error:
        Error in Scanless read: G1 Parse exhausted
        * Error was at string position: 24
        * String before error:
        first; second; if(a) b;\s
        * String after error:
        else c; d;

Which is weird .. because I can't seem to find anything wrong with the grammar definition. Particularly buffing is that Given the string "first; second;" it reads out the second expression with the leading space, but the expression definition clearly says:
expression       ::= text3 ';'
                 |   text3
text3_0     ~ '\\' | '\[' | '\}' | '\{' | '\;' | [\w] | [^\[;{}]
text3       ::= text3_0+
ws ~ [\s]+
:discard ~ ws 
So after encountering the ';' scanner should finish the text3 element for the parser and skip the latter whitespace but it doesn't happen here - could anybody point me in the right direction? How do you debug what the scanner does?
Best Regards,
Damian Kaczmarek 

--
You received this message because you are subscribed to a topic in the Google Groups "marpa parser" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/marpa-parser/8QK1x69MmXA/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to marpa-parser...@googlegroups.com.

Durand Jean-Damien

unread,
Mar 18, 2013, 11:11:16 AM3/18/13
to marpa-...@googlegroups.com, rus...@gmail.com
Remove the underscore: text3Rest instead of text3_rest

Le lundi 18 mars 2013 16:01:02 UTC+1, rus...@gmail.com a écrit :
Thanks for the interest. ;) Don't want to be picky but current update reads "Second expression;" (from original example input) as "Secondexpression" disregarding the middle whitespace. The right approach I think would be to treat first character separately and accept only non-white then accept all characters until encountered a non-escaped semicolon. Unfortunately the way I tried to do it yields error in Marpa:

text3_lead_0  ~ '\\' | '\[' | '\}' | '\{' | '\;' | [\w] | [^\s\[;{}]
text3_rest_0     ~ '\\' | '\[' | '\}' | '\{' | '\;' | [\w] | [^\[;{}]
text3_rest     ::= text3_0+
text3_lead     ::= text3_lead_0
text3       ::= text3_lead text3_rest

The error:
     "::lhs" blessing only allowed if LHS is whitespace and alphanumerics
           LHS was <text3_rest>

What's the rule of thumb in such cases?
Thanks a lot,
Best Regards, Damian
Reply all
Reply to author
Forward
0 new messages