New developer's release, with efficient Longest Acceptable Tokens Matching

49 views
Skip to first unread message

Jeffrey Kegler

unread,
Feb 14, 2014, 8:30:51 PM2/14/14
to Marpa Parser Mailing LIst
Just in time for Valentine's Day, I've uploaded a developer's release
-- Marpa-R2-2.079_015.  This one
has significant, visible value-added for the user -- efficient
Longest Acceptable Tokens Matching (LATM).  I sometimes call this
"cheap forgiveness", because the previous implementation relied on
"forgiving" rejected tokens.  And for that reason, it is implemented
via the "forgiving" adverb.

This is one of those nice improvements, that just "drop in" -- if you
using the forgiving adverb, you get the improvement automatically.
I encourage folks who've been wanting to use the forgiving adverb as
the default in their grammars to do so.  As a result of refactorings in
Marpa, using the forgiving adverb is now more efficient than not using it.
In other words, LATM is more efficient than LTM (Longest Tokens Matching).

LATM most important advantage is that it is more flexible than LTM.
With both LATM and LTM, an application has to make sure the desired
token is either the longest found, or else that it ties for longest.
The difference between LTM and LATM, is that LTM did not take into
account context, while LATM does.  With LTM, the right token must be
longer or as long as any other token, including all those tokens that
the G1 parser would reject.  With LATM, tokens that are not acceptable to
the G1 parser in the current context are not counted in determining the
"longest" tokens.

One way to think of it is that LATM is "smarter" than LTM.  This
"smartness" allows you to be more aggressive in designing your lexer --
it increases the likelihood that the parser will know "what you meant".
When context is used to help make the decision, the chance that two
of your tokens will treat the same input string in conflicting ways
is reduced.  With LATM, unacceptable tokens will not cause conflicts.

The LATM implementation is also more efficient.  Now the lexer only looks
for lexemes that might be accepted.  The resulting parser is usually
simpler, often much simpler.  Note that, as usual with efficiency
improvements at the C language level, this may not be measureable --
random fluctations in the Perl overhead tend to swamp any changes,
either way, in the efficiency of the C language implementation.

If I had things to do over, LATM would be the default.
Instead backward compatibility will triumph.  The backward
incompatibility is that some inputs which had failed for some
SLIF DSL's, with an error message, will now succeed.  Usually
unexpected success is a good thing, but this is *not* always true.

Once this change comes to an indexed release, I recommend that everyone,
as their preferred design choice, start all new scripts with

lexeme default = forgiving => 1

Also, you should also be able to convert
most old scripts to use LATM without a problem, making them faster and
easier to extend.

Marpa-R2-2.079_015 is a release candidate.

Ruslan Shvedov

unread,
Feb 15, 2014, 11:28:05 AM2/15/14
to marpa-...@googlegroups.com
Marpa-R2-2.079_015 installed ok for me on cygwin (5.14.2), windows xp (5.18.1, 5.14.2) and windows 7 (5.18.2). LATM by default is a great thing to see and work with.

I have a (probably pathological) test case that fails on win32 (winxp (5.18.1, 5.14.2) and win7 (5.18.2)) but passes on cygwin at https://github.com/rns/win32-fail-cygwin-pass that contains 2 scripts and their output files. 

I' be grateful if Jean-Damien or somebody else on a windows machine run them to confirm or refute.



--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ruslan Shvedov

unread,
Feb 15, 2014, 12:28:46 PM2/15/14
to marpa-...@googlegroups.com
Please ignore this message, this must be some cpan glitch (cpan croaked about the checksum [1], but I agreed) — once I downloaded and installed Marpa-R2-2.079_015.tar.gz manually (perl Build.pl, Build, Build test, Build install), the above tests pass on both win32 and cygwin. Also, install by cpan showed some msvc compiler warnings that was not the case with manual install. Strange thing indeed.

[1]
Fetching with LWP:

Warning: No checksum for Marpa-R2-2.077_015.tar.gz in C:\.cpan\sources\authors\id\J\JK\JKEGL\CHECKSUMS.

The cause for this may be that the file is very new and the checksum
has not yet been calculated, but it may also be that something is
going awry right now.
Proceed? [yes] 
CPAN: Compress::Zlib loaded ok (v2.06)


Ron Savage

unread,
Feb 15, 2014, 6:13:04 PM2/15/14
to marpa-...@googlegroups.com
Marpa::R2 V 2.079015
Counts: Tests: 542. Modules: 8. Passes: 8. Fails: 0
Duration: 1 minute and 37 seconds

Reply all
Reply to author
Forward
0 new messages