Parsing files containing several lines of data

21 views
Skip to first unread message

phi...@free.fr

unread,
Aug 16, 2015, 10:57:23 AM8/16/15
to marpa parser
Hello,
I would like to parse a text file containing the following data, with MARPA:

EUR=089980
GBP=063886
AUD=135358
...

When I run my program (see below), it only displays the first exchange rate, namely "089980", although my DSL file says that Catx (i.e., the exchange rate file) should contain one or more Expressions ("Expression+").

Furthermore, why can't you specify tokens with specific lengths in MARPA DSLs, e.g.,

Label ~ \w{3}

In the Catx data file's case, [currency] labels are always 3 characters long.

Many thanks.

Best regards,

Philippe



PERL script:

-------------------
#!/usr/bin/perl
use strict;
use warnings;
use Marpa::R2;
use Data::Dumper;

my $data_file = '/Users/philippe/Desktop/MARPA/data.txt';
my $dsl_file = '/Users/philippe/Desktop/MARPA/catx.dsl';
my $input = slurp_file($data_file);
my $dsl = slurp_file($dsl_file);

my $grammar = Marpa::R2::Scanless::G->new( { source => \$dsl } );
my $recce = Marpa::R2::Scanless::R->new(
    { grammar => $grammar, semantics_package => 'My_Actions' } );
my $length_read = $recce->read ( \$input );
die "Read ended after $length_read of ", length $input, " characters"
    if $length_read != length $input;
   
if ( my $ambiguous_status = $recce->ambiguous() ) {
    chomp $ambiguous_status;
    die "Parse is ambiguous\n", $ambiguous_status;
}
my $value_ref = $recce->value;
print "$$value_ref\n";

sub slurp_file {
    my $file = shift;
    local $/ = undef;
    open my $fh, '<', $file or die "$!";
    my $data = <$fh>;
    close $fh;
    return $data;
}
sub My_Actions::do_extract_rate {
    my ( undef, $t1, undef, $t2 ) = @_;
    return $t2;
}
--------------------

DSL file:

-------------------

:default ::= action => [name,values]
lexeme default = latm => 1

Catx ::= Expression+ action => ::first
Expression ::= Label '=' Rate action => do_extract_rate
Label ~ [\w]+
Rate ~ [\d]+
:discard ~ whitespace
:discard ~ cr
whitespace ~ [\s]+
cr ~ [\n]+



--------------------




Jeffrey Kegler

unread,
Aug 16, 2015, 11:06:40 AM8/16/15
to Marpa Parser Mailing LIst
As a hasty guess (I have not tested it) the problem may be here:


     Catx ::= Expression+ action => ::first

The ":first" causes only the first expression to be seen by the semantics.

Re no '\w{3}'  -- here 'the better became the enemy of the good'.  I developed a scheme for efficient binarization of arbitrary counts and sequences, but it won't come out in Marpa::R2, which is now frozen.  The next version is Kollos, into which the binarized implementation is already coded.  But I've hear that someone else has a layer on top of Marpa::R2, which may come out shortly, and might include that feature and a lot of other syntactic sugars.

Hope this helps, jeffrey



--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ruslan Shvedov

unread,
Aug 16, 2015, 11:17:18 AM8/16/15
to marpa-...@googlegroups.com
Hello,

On Sun, Aug 16, 2015 at 1:19 PM, <phi...@free.fr> wrote:
Hello,
I would like to parse a text file containing the following data, with MARPA:

EUR=089980
GBP=063886
AUD=135358
...

When I run my program (see below), it only displays the first exchange rate, namely "089980", although my DSL file says that Catx (i.e., the exchange rate file) should contain one or more Expressions ("Expression+").
action => ::first in Catx ::= Expression+ action => ::first means that the rule value will be limited to only the first element of the Expression+ sequence, so when I removed it, it showed all three rates.
 
Furthermore, why can't you specify tokens with specific lengths in MARPA DSLs, e.g.,

Label ~ \w{3}

In the Catx data file's case, [currency] labels are always 3 characters long.
You can write Label ~ [\w] [\w] [\w] to achieve the same effect.
 
Many thanks.
Hope this helps.
 

phi...@free.fr

unread,
Aug 16, 2015, 11:41:44 AM8/16/15
to marpa parser
Removing ::first did the trick, thanks.

Is there a more idiomatic way in MARPA to store the currency labels and their respective rates than to store them in a hash in the closure?

my %rate_hash = ();...

and then

sub My_Actions::do_extract_rate {
   
my ( undef, $t1, undef, $t2 ) = @_;

    $rate_hash
{$t1} = $t2;
}

Ruslan Shvedov

unread,
Aug 16, 2015, 12:02:01 PM8/16/15
to marpa-...@googlegroups.com
you can do

sub My_Actions::do_extract_rate {
    my ( undef, $t1, undef, $t2 ) = @_;
    return [ $t1, $t2 ];
}

this will return [ [ 'EUR', '089980' ], [ 'GBP', '063886' ], [ 'AUD', '135358' ] ] as the parse value.

Reply all
Reply to author
Forward
0 new messages