Help on Marpa fails

29 views
Skip to first unread message

Rk

unread,
Apr 29, 2022, 11:26:05 PM4/29/22
to marpa parser
I started writing a simple parser using Marpa.Made some progress and however as I started adding new rule, my parser started failing.

use strict;
use warnings;
use warnings qw(FATAL utf8); # Fatalize encoding glitches.
use feature 'say' ;
#use Data::Dumper;
use Data::Dumper::Concise;
#use Log::Handler;
use Marpa::R2;
use PyGen;
#use Moo;

my @flt;
sub flattenArray {
        foreach (@_) {
          if (ref $_ eq "ARRAY") {
                flattenArray(@$_);
                }
          else {
                push @flt,$_;
                }
        }
}

sub PYGEN_Action::ifAction {
  my (undef,@a,@b) = @_;
  my $cndHsh = {
    'var'   => $a[0]->[1]->[0],
    'op'    => $a[0]->[1]->[1],
    'val'   => $a[0]->[1]->[2],
  };
  my $actHsh = {
    'act'   => $a[1]->[0]->[0],
    'var'   => $a[1]->[0]->[1],
    };
  print $$cndHsh{'var'};
  print "\n";
  my $pygen = PyGen->new();
  $pygen->write_if($cndHsh);
  $pygen->write_act($actHsh);
}


sub PYGEN_Action::verbAction {
  print "verbAction inputs\n";
  print @_;
}

sub PYGEN_Action::listAction {
  #unpack lol. output is global @flt;
  flattenArray(@_);
  my $pygen = PyGen->new();
  shift @flt; #discard empty hash.
  $pygen->write_list(\@flt);
}

my $g = Marpa::R2::Scanless::G->new({
        default_action => '::array',
        source         => \(<<'END_OF_SOURCE'),

:start    ::= language
#
# include begin
language          ::= if_statement action_statements action => ifAction
                      | action_statements action => verbAction
                      | ('List') variable ('is') comma_list  action => listAction

action_statements ::= action_statement*

if_statement      ::= if_type var_op_val
                      | if_type var_in_list
action_statement  ::= action_verb variable
                       | action_verb var_op_val
var_op_val        ::= variable operator value
                        | variable operator variable
var_in_list       ::= variable 'in' variable
comma_list        ::= str
                       | comma_list (',') str
#comma_list        ::= variable (',') variable
:discard        ~ ws
ws              ~ [\s]+

# include end
action_verb     ~  'remove' | 'Create' | 'Get'| 'Set' | 'CREATE' |'Revise'
if_type         ~ 'If'
variable        ~ [\w]+
operator        ~ 'has' | '>' | '<' | '='
value           ~ 'duplicate'
str             ~ [\w]+
END_OF_SOURCE
});
my $re = Marpa::R2::Scanless::R->new({ grammar => $g });
my $input = <<'INPUT';
 If documentTitle = A100
    remove Document
 List Author is Dante, Murakami, Ash, Harris
INPUT
print "Trying to parse:\n$input\n\n";
print "\n", $re->show_progress(0, -1);
$g->parse(\$input, 'PYGEN_Action');
=====================================
Output is
P0 @0-0 L0c0 language -> . if_statement action_statements
P1 @0-0 L0c0 language -> . action_statements
P2 @0-0 L0c0 language -> . 'List' variable 'is' comma_list
P3 @0-0 L0c0 action_statements -> . action_statement *
P4 @0-0 L0c0 if_statement -> . if_type var_op_val
P5 @0-0 L0c0 if_statement -> . if_type var_in_list
P6 @0-0 L0c0 action_statement -> . action_verb variable
P7 @0-0 L0c0 action_statement -> . action_verb var_op_val
P13 @0-0 L0c0 :start -> . language
Error in SLIF parse: No lexemes accepted at line 3, column 2
  Rejected lexeme #0: str; value="List"; length = 4
  Rejected lexeme #1: variable; value="List"; length = 4
  Rejected lexeme #2: 'List'; value="List"; length = 4
* String before error:  If documentTitle = A100\n\tremove Document\n\s
* The error was at line 3, column 2, and at character 0x004c 'L', ...
* here: List Author is Dante, Murakami, Ash, Harris\n
Marpa::R2 exception at gendeb.py line 99.
================================
1. I tested the grammar for c('List') variable ('is') comma_list  action => listAction
in a separate parser program, and it worked fine.
Any insights into why this is happening?

Thanks,
Rk

Lukas Atkinson

unread,
Apr 30, 2022, 4:27:15 AM4/30/22
to marpa-...@googlegroups.com
Your grammar does not allow the list in that location. Here is the
relevant excerpt:

```
:start    ::= language
#
# include begin
language          ::= if_statement action_statements action => ifAction
                      | action_statements action => verbAction
                      | ('List') variable ('is') comma_list  action =>
listAction
```

So you say that a document can
EITHER be an if-statement followed by any number of action statements,
OR any number of action statements,
OR a List statement.

This is why testing a List statement by itself works fine. But in your
input, you have an if-statement that includes one action statement,
followed by an unexpected List statement:

```
If documentTitle = A100
  remove Document
List Author is Dante, Murakami, Ash, Harris
```

Instead, you would likely want to change your grammar so that you can
have multiple statements of different type, for example:

```
language ::= statement*

statement ::= if_statement | action | list_statement
```

But this might become ambiguous. For example, consider this input:

```
If foo = bar
remove Doc1
remove Doc2
```

Should the `remove Doc2` action be within the if-statement, or should it
be an unconditional action? In your current grammar, both actions would
be part of the if-statement. In my suggested grammar, both are allowed
and the parse is ambiguous.

This problem is solved most easily if you can change your language to
have explicit delimiters, e.g. curly braces:

```
If foo = bar {
remove Doc1
}
remove Doc2
```

or other compound statement terminators, e.g. an `End` like in Ruby or Lua:

```
If foo = bar
remove Doc1
End
remove Doc2
```

Of course, many languages use the indentation level to indicate nesting,
for example Python or YAML. Marpa has no built-in features for
indentation-sensitive parsing. If you are trying to parse an existing
format or really want to use indentation to indicate the scope of the
conditional, you would have to use Marpa's event system to handle
indentation. You could use the same trick that Python uses: having
special/invisible INDENT and DEDENT tokens that mark where the indented
block begins and ends. These tokens are emitted whenever the indentation
changes. In the grammar, these tokens can then be used just like curly
braces in curly-brace languages. For example, you might then define your
if-statement like:

```
if_statement ::= ('If') condition (INDENT) action_statements (DEDENT)
```

Jeffrey Kegler

unread,
Apr 30, 2022, 11:45:37 AM4/30/22
to marpa-...@googlegroups.com


Good answer, Lukas, thanks! -- jeffrey


Sent with ProtonMail secure email.
------- Original Message -------
> --
> You received this message because you are subscribed to the Google Groups "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/marpa-parser/71c6eb23-b33d-7003-1342-5781e1dc0dc5%40LukasAtkinson.de.

Rk

unread,
Apr 30, 2022, 10:43:04 PM4/30/22
to marpa parser
Hi Lukas,

Thanks for the detailed analysis and insight. It was very helpful.

I have rewritten the grammar based on your inputs. And ran into same issue. After some thinking I identified the issue. My grammar accepts and parses only one statement!

I modified grammar in the following manner. Agreed that this is not the best approach. Still for a prototype, it resolved my original issue.

language          ::= if_statements action_statements| action_statements
 Modified code now is:
===========================================================
use strict;
use warnings;
use warnings qw(FATAL utf8); # Fatalize encoding glitches.
use feature 'say' ;
#use Data::Dumper;
use Data::Dumper::Concise;
#use Log::Handler;
use Marpa::R2;
use PyGen;
#use Moo;

my @flt;
sub flattenArray {
        foreach (@_) {
          if (ref $_ eq "ARRAY") {
                flattenArray(@$_);
                }
          else {
                push @flt,$_;
                }
        }
}

sub PYGEN_Action::ifAction {
  my (undef,@a,@b) = @_;
  print Dumper(@a);
  print "\n";

}


sub PYGEN_Action::verbAction {
  print "verbAction inputs\n";
  print Dumper(@_);

}

sub PYGEN_Action::listAction {
  #unpack lol. output is global @flt;
  flattenArray(@_);
  my $pygen = PyGen->new();
  shift @flt; #discard empty hash.
  $pygen->write_list(\@flt);
}

my $g = Marpa::R2::Scanless::G->new({
        default_action => '::array',
        source         => \(<<'END_OF_SOURCE'),
lexeme default = latm => 1
:start    ::= language
# include begin
language          ::= if_statements action_statements| action_statements
if_statements         ::= if_statement*
if_statement           ::= 'If' condition_statement action_statements 'End' action=>ifAction
condition_statement     ::= variable operator value
                | variable operator variable
action_statements     ::= action_statement*
action_statement     ::= action_verb variable action => verbAction
                     | variable ('List') ('is') comma_list action => listAction
comma_list         ::= variable
                    | comma_list (',') variable

:discard        ~ ws
ws              ~ [\s]+

# include end
action_verb     ~  'remove' | 'Create' | 'Get'| 'Set' | 'CREATE' |'Revise'
variable        ~ [\w]+
operator        ~ 'has' | '>' | '<' | '='
value           ~ 'duplicate'
#str             ~ [\w]+

END_OF_SOURCE
});
my $re = Marpa::R2::Scanless::R->new({ grammar => $g });
my $input = <<'INPUT';
 If DocumentTitle = A222
    Revise Document
    Create Author
 End
Revise Author
Author List is Wodehouse, Shelley, Hardy, Murakami

INPUT
print "Trying to parse:\n$input\n\n";
print "\n", $re->show_progress(0, -1);
$g->parse(\$input, 'PYGEN_Action');
==========================================================
Output comes includes the List setting.
Output
======

verbAction inputs
{}
"Revise"
"Document"
verbAction inputs
{}
"Create"
"Author"
"If"
[
  "DocumentTitle",
  "=",
  "A222",
]
[
  1,
  1,
]
"End"

verbAction inputs
{}
"Revise"
"Author"
AuthorList=['Wodehouse','Shelley','Hardy','Murakami',]
Reply all
Reply to author
Forward
0 new messages