New developer's release, with efficient Longest Acceptable Tokens Matching

Jeffrey Kegler

unread,

Feb 14, 2014, 8:30:51 PM2/14/14

to Marpa Parser Mailing LIst

Just in time for Valentine's Day, I've uploaded a developer's release

-- Marpa-R2-2.079_015. This one

has significant, visible value-added for the user -- efficient

Longest Acceptable Tokens Matching (LATM). I sometimes call this

"cheap forgiveness", because the previous implementation relied on

"forgiving" rejected tokens. And for that reason, it is implemented

via the "forgiving" adverb.

This is one of those nice improvements, that just "drop in" -- if you

using the forgiving adverb, you get the improvement automatically.

I encourage folks who've been wanting to use the forgiving adverb as

the default in their grammars to do so. As a result of refactorings in

Marpa, using the forgiving adverb is now more efficient than not using it.

In other words, LATM is more efficient than LTM (Longest Tokens Matching).

LATM most important advantage is that it is more flexible than LTM.

With both LATM and LTM, an application has to make sure the desired

token is either the longest found, or else that it ties for longest.

The difference between LTM and LATM, is that LTM did not take into

account context, while LATM does. With LTM, the right token must be

longer or as long as any other token, including all those tokens that

the G1 parser would reject. With LATM, tokens that are not acceptable to

the G1 parser in the current context are not counted in determining the

"longest" tokens.

One way to think of it is that LATM is "smarter" than LTM. This

"smartness" allows you to be more aggressive in designing your lexer --

it increases the likelihood that the parser will know "what you meant".

When context is used to help make the decision, the chance that two

of your tokens will treat the same input string in conflicting ways

is reduced. With LATM, unacceptable tokens will not cause conflicts.

The LATM implementation is also more efficient. Now the lexer only looks

for lexemes that might be accepted. The resulting parser is usually

simpler, often much simpler. Note that, as usual with efficiency

improvements at the C language level, this may not be measureable --

random fluctations in the Perl overhead tend to swamp any changes,

either way, in the efficiency of the C language implementation.

If I had things to do over, LATM would be the default.

Instead backward compatibility will triumph. The backward

incompatibility is that some inputs which had failed for some

SLIF DSL's, with an error message, will now succeed. Usually

unexpected success is a good thing, but this is *not* always true.

Once this change comes to an indexed release, I recommend that everyone,

as their preferred design choice, start all new scripts with

lexeme default = forgiving => 1

Also, you should also be able to convert

most old scripts to use LATM without a problem, making them faster and

easier to extend.

Marpa-R2-2.079_015 is a release candidate.

Ruslan Shvedov

unread,

Feb 15, 2014, 11:28:05 AM2/15/14

to marpa-...@googlegroups.com

Marpa-R2-2.079_015 installed ok for me on cygwin (5.14.2), windows xp (5.18.1, 5.14.2) and windows 7 (5.18.2). LATM by default is a great thing to see and work with.

I have a (probably pathological) test case that fails on win32 (winxp (5.18.1, 5.14.2) and win7 (5.18.2)) but passes on cygwin at https://github.com/rns/win32-fail-cygwin-pass that contains 2 scripts and their output files.

I' be grateful if Jean-Damien or somebody else on a windows machine run them to confirm or refute.

--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ruslan Shvedov

unread,

Feb 15, 2014, 12:28:46 PM2/15/14

to marpa-...@googlegroups.com

Please ignore this message, this must be some cpan glitch (cpan croaked about the checksum [1], but I agreed) — once I downloaded and installed Marpa-R2-2.079_015.tar.gz manually (perl Build.pl, Build, Build test, Build install), the above tests pass on both win32 and cygwin. Also, install by cpan showed some msvc compiler warnings that was not the case with manual install. Strange thing indeed.

[1]

Fetching with LWP:

http://www.cpan.org/authors/id/J/JK/JKEGL/CHECKSUMS

Warning: No checksum for Marpa-R2-2.077_015.tar.gz in C:\.cpan\sources\authors\id\J\JK\JKEGL\CHECKSUMS.

The cause for this may be that the file is very new and the checksum

has not yet been calculated, but it may also be that something is

going awry right now.

Proceed? [yes]

CPAN: Compress::Zlib loaded ok (v2.06)

Ron Savage

unread,

Feb 15, 2014, 6:13:04 PM2/15/14

to marpa-...@googlegroups.com

Marpa::R2 V 2.079015

Counts: Tests: 542. Modules: 8. Passes: 8. Fails: 0

Duration: 1 minute and 37 seconds

Reply all

Reply to author

Forward