-- Marpa-R2-2.079_015. This one
has significant, visible value-added for the user -- efficient
Longest Acceptable Tokens Matching (LATM). I sometimes call this
"cheap forgiveness", because the previous implementation relied on
"forgiving" rejected tokens. And for that reason, it is implemented
via the "forgiving" adverb.
This is one of those nice improvements, that just "drop in" -- if you
using the forgiving adverb, you get the improvement automatically.
I encourage folks who've been wanting to use the forgiving adverb as
the default in their grammars to do so. As a result of refactorings in
Marpa, using the forgiving adverb is now more efficient than not using it.
In other words, LATM is more efficient than LTM (Longest Tokens Matching).
LATM most important advantage is that it is more flexible than LTM.
With both LATM and LTM, an application has to make sure the desired
token is either the longest found, or else that it ties for longest.
The difference between LTM and LATM, is that LTM did not take into
account context, while LATM does. With LTM, the right token must be
longer or as long as any other token, including all those tokens that
the G1 parser would reject. With LATM, tokens that are not acceptable to
the G1 parser in the current context are not counted in determining the
"longest" tokens.
One way to think of it is that LATM is "smarter" than LTM. This
"smartness" allows you to be more aggressive in designing your lexer --
it increases the likelihood that the parser will know "what you meant".
When context is used to help make the decision, the chance that two
of your tokens will treat the same input string in conflicting ways
is reduced. With LATM, unacceptable tokens will not cause conflicts.
The LATM implementation is also more efficient. Now the lexer only looks
for lexemes that might be accepted. The resulting parser is usually
simpler, often much simpler. Note that, as usual with efficiency
improvements at the C language level, this may not be measureable --
random fluctations in the Perl overhead tend to swamp any changes,
either way, in the efficiency of the C language implementation.
If I had things to do over, LATM would be the default.
Instead backward compatibility will triumph. The backward
incompatibility is that some inputs which had failed for some
SLIF DSL's, with an error message, will now succeed. Usually
Once this change comes to an indexed release, I recommend that everyone,
as their preferred design choice, start all new scripts with
lexeme default = forgiving => 1
Also, you should also be able to convert
most old scripts to use LATM without a problem, making them faster and
easier to extend.
Marpa-R2-2.079_015 is a release candidate.