I've just put Marpa-R2-2.021_004
<https://metacpan.org/release/JKEGL/Marpa-R2-2.021_004/> (a developer's
release) on CPAN.
It's most important new feature is a configurable HTML parser.
Marpa::HTML already accepted user-defined tokens. It silently
accepted these in the input, giving them very liberal defaults for
where they could go and what they could contain. As of 2.021_004,
Marpa::R2::HTML allows the user to define the context and contents
of new tokens. It also allows the user to redefine these for
The definition of the HTML is isolated in its own file
is self-contained -- any changes needed to redefine the HTML variant
are confined to Marpa::R2::HTML::dev::Configuration.pm. Substituting
different files could change Marpa::R2::HTML to parse loose or
strict HTML 4.01, for example; or to use a "lowest common denominator"
based on the set of browsers most important to your situation.
is less than 3 pages of code.
It could be significantly shortened, by removing the definitions
for deprecated tags. Their behavior would revert to liberal defaults.
(In this configurable version, Marpa::R2::HTML allows tags not
mentioned in the configuration.)
Factoring out the definition of the HTML into this very concise
form, makes Marpa::R2::HTML much more easier to maintain than
alternatives. In many HTML parsers, once you've determined which
3 pages of code contain the problem, you can consider the problem
"Configurable" here for the moment means "very easily forked." The
extension from isolation of the HTML into its own file to allowing
the user to supply his configuration file, (or even alter the
configuration via command-line options) is very direct. I am
undecided what priority to give this -- there are many directions
in which to go at this point.