Benchmark of a simple arithmetic parser (vs Perl 5 regex)

59 views
Skip to first unread message

perl...@gmail.com

unread,
Jun 17, 2016, 11:36:37 PM6/17/16
to marpa parser
Hi guys,

I've been using Perl 5.010 regex to do some parsing, e.g. in Data::Csel [1] to parse CSS-selector-like expression, due to its relatively low startup overhead compared to Marpa or Regexp::Grammars. A couple of days ago, after reading about a topic in perl6 subreddit [2], I did a comparison benchmark [5] for a simple arithmetic parser using perl [3] vs Marpa [4]. My questions:

1) how do I improve the Marpa version's performance?
2) how to remove the "Earley item count (N) exceeds warning threshold"? This happens for 1+1+..+1 (100x) expression but not for the 20x or below.
3) how do I make right associativity work? The Marpa version still evaluates ** operator left to right.

regards,
perlancar

[1] https://metacpan.org/pod/Data::CSel
[2] https://www.reddit.com/r/perl6
[3] https://metacpan.org/pod/release/PERLANCAR/PERLANCAR-Parse-Arithmetic-0.001/lib/PERLANCAR/Parse/Arithmetic.pm
[4] https://metacpan.org/pod/release/PERLANCAR/PERLANCAR-Parse-Arithmetic-0.001/lib/PERLANCAR/Parse/Arithmetic/Marpa.pm
[5] https://metacpan.org/pod/release/PERLANCAR/Bencher-Scenarios-PERLANCARParseArithmetic-0.002/lib/Bencher/Scenario/PERLANCARParseArithmetic/parse_arithmetic.pm

Jeffrey Kegler

unread,
Jun 17, 2016, 11:49:16 PM6/17/16
to Marpa Parser Mailing LIst
Quick answers for 2 & 3

2.) See https://metacpan.org/pod/distribution/Marpa-R2/pod/Scanless/R.pod#too_many_earley_items for how to raise this limit or eliminate it entirely.  Although, the fact you are hitting this limit is almost always a sign your grammar is too ambiguous.

3.) Look at the grammar in the synopsis of https://metacpan.org/pod/distribution/Marpa-R2/pod/Scanless/DSL.pod.  You are breaking your arithmetic expression into separate statements and "assoc => right" is only effective within a single precedenced statement.  This means the way you wrote it, it has no effect.

Doing a precedenced statement for 3) may fix 1) and 2)  as well.  These answers are off the top of my head and untested.


--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

perl...@gmail.com

unread,
Jun 18, 2016, 7:03:30 AM6/18/16
to marpa parser
Thanks, my initial googling didn't point to the Scanless/R.pod document. Anyway I didn't raise the limit and simply modified the grammar to be like in Scanless/DSL.pod. I entirely don't understand why the previous grammar was considered too ambiguous by Marpa though, will it seems to be doing okay with perl regex or Regexp::Grammars or perl 6. The right associativity now works though.

The new version improves the speed, but not drastically [1]. As the number of terms in the benchmark expression (1+1+..+1) is increased to hundreds and up to a thousand, the Marpa version seems to be close to 3x slower than the regex version.

[1] https://metacpan.org/pod/release/PERLANCAR/Bencher-Scenarios-PERLANCARParseArithmetic-0.004/lib/Bencher/Scenario/PERLANCARParseArithmetic/parse_arithmetic.pm

Ruslan Shvedov

unread,
Jun 18, 2016, 9:51:15 AM6/18/16
to marpa-...@googlegroups.com
Don't know to what extent this is a time-waster, but 

my $h = shift;

in [1] seems to be unneeded. 

Another (off the top of my head and untested) way to speed up would be to let value() just build an abstract-syntax tree (AST) by specifying

:default ::= action => [ name, value ]
lexeme default = action => [ name, value] latm => 1

or

:default ::= action => [ name, value ]
lexeme default = action => [ value] latm => 1

or even (untried yet)

:default ::= action => [ name, value ]
lexeme default = action => ::first latm => 1

in the grammar and trying a non-recursive traversal on AST thus avoiding action sub calls that could arguably be faster.

Ruslan Shvedov

unread,
Jun 18, 2016, 9:54:56 AM6/18/16
to marpa-...@googlegroups.com
There is also this a blog post by Jeffrey -- http://blogs.perl.org/users/jeffrey_kegler/2012/08/marpa-v-perl-regexes-a-rematch.html -- which shows what to expect in Marpa vs. Perl regexes case.


On Sat, Jun 18, 2016 at 2:03 PM, <perl...@gmail.com> wrote:

Jeffrey Kegler

unread,
Jun 18, 2016, 12:22:19 PM6/18/16
to Marpa Parser Mailing LIst
The other parsers deal with ambiguity by throwing it away.  So, no, they would not have a problem with excessive ambiguity. :-)

An interesting question is if the only parse not thrown away is actually the one you want.  If I read the code correctly, you're not actually evaluating the expressions.  I think it would be important to do so, and to see if they all produce the same answer.

Also, I note that all test examples are of the form "1+1+1+ [...]".  As the blog post Ruslan mentioned details, Marpa::R2 has a higher constant than regular expressions, but is linear for more grammars than any other parser out there.  So the pattern is that regexes are a huge win on the simple ones, and a huge loss once regexes start going linear.

Jeffrey Kegler

unread,
Jun 18, 2016, 10:29:29 PM6/18/16
to Marpa Parser Mailing LIst
Re why Marpa considers your grammar ambiguous.  Marpa's big departure from yacc and PEG is that it does not leave you guessing why it does what it does.

If Marpa considers a parse ambiguous you can ask it why: http://search.cpan.org/~jkegl/Marpa-R2-3.000000/pod/Scanless/R.pod#ambiguous%28%29

That just picks two of the parses and points out the difference, which will often give you the idea.  For more details, you can get all the parse results using multiple calls to $recce->value().

Btw, Marpa::R2's $grammar->parse() call uses $grammar->ambiguous internally -- it considers parse ambiguity an error and reports it as such automatically.  If this is in keeping with the philosophy of MarpaX::Simple, you may want to include it.

On Sat, Jun 18, 2016 at 4:03 AM, <perl...@gmail.com> wrote:

perl...@gmail.com

unread,
Jun 19, 2016, 4:25:43 AM6/19/16
to marpa parser
Skipping shifting the first argument saves some time, but less than 1% of total runtime according to my benchmark. I'll think about your other suggestion.

perl...@gmail.com

unread,
Jun 19, 2016, 4:27:00 AM6/19/16
to marpa parser
Yes, thanks, I remember reading the article but didn't pay too much attention to it at the time. I'll reread it again.

perl...@gmail.com

unread,
Jun 19, 2016, 4:31:58 AM6/19/16
to marpa parser
On Saturday, June 18, 2016 at 11:22:19 PM UTC+7, Jeffrey Kegler wrote:
An interesting question is if the only parse not thrown away is actually the one you want.  If I read the code correctly, you're not actually evaluating the expressions.  I think it would be important to do so, and to see if they all produce the same answer.

Did you mean not evaluating in the perl regex version? I think I do, with the embedded code zero-width assertions (?{ ... }). All the versions of the parser (regex, Marpa, Pegex) do return the result of the expression. For example:

% bencher -m PERLANCARParseArithmetic/parse_arithmetic --include-dataset-name '1+1+1+1+1' --show-items-results
#0 (participant=PERLANCAR::Parse::Arithmetic::parse_arithmetic):
5

#1 (participant=PERLANCAR::Parse::Arithmetic::Marpa::parse_arithmetic):
5

#2 (participant=PERLANCAR::Parse::Arithmetic::NoHash::parse_arithmetic):
5

#3 (participant=PERLANCAR::Parse::Arithmetic::Pegex::parse_arithmetic):
5


Also, I note that all test examples are of the form "1+1+1+ [...]".  As the blog post Ruslan mentioned details, Marpa::R2 has a higher constant than regular expressions, but is linear for more grammars than any other parser out there.  So the pattern is that regexes are a huge win on the simple ones, and a huge loss once regexes start going linear.

I do plan to exercise the parsers with other expressions.


perl...@gmail.com

unread,
Jun 19, 2016, 4:33:38 AM6/19/16
to marpa parser
Thanks for this information, I might add an interface for this to MarpaX::Simple when I develop my grammars further.

Jeffrey Kegler

unread,
Jun 19, 2016, 11:31:47 AM6/19/16
to Marpa Parser Mailing LIst
I only read the code hastily, too hastily to tell if results were evaluated and cross-checked, or not.

Reply all
Reply to author
Forward
0 new messages