Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Significant internal updates to PGE

0 views
Skip to first unread message

Patrick R. Michaud

unread,
Oct 14, 2005, 3:10:44 PM10/14/05
to perl6-i...@perl.org, perl6-c...@perl.org
I've just checked in (r9843) a new version of PGE (the grammar engine)
with some substantial changes to its internal calling sequences and
data structures. For those who are using PGE according to its
defined interfaces things work largely the same -- anyone who is
developing for PGE or making use of PGE's internals may see some
differences as described below.

The biggest difference is that single-element captures in Match objects
are now internally represented with the same structure as seen by the
"outside world". For example, with an expression like

rule = p6rule(":w (mv) [ (\w+)]*")

$/[0] (aka $0) ends up with a single Match object, while $/[1] (aka $1)
is an array of Match objects because of the "*" quantifier.

In previous versions of PGE, the PGE::Match class internally stored
all captures (quantified or not) in arrays, and used an "isarray"
property on the array to indicate if it was to act as a single
Match object or an array of Match objects.

In the version I've just checked in, the "isarray" property is gone,
and the $0, $1, $2, ... captures are stored internally as single
Match objects (unquantified) or arrays as specified by the rule.
In particular, this means one can now use the "get_array" and
"get_hash" methods on Match objects and get exactly the correct
structure.

Other key differences in this new version:

- PGE's internal rule calling conventions (e.g., to PIR-coded
subrules such as <alpha>, <upper>, etc.) are now consistent
with rules generated by PGE itself. Thus, if one wants to
call the <alpha> rule directly, it can be done with:

.local pmc alpha
alpha = find_global "PGE::Rule", "alpha"
$P0 = alpha("Some string")

and $P0 will be a Match object for the "S". Note that many of
PGE's built-in rules tend to act as if the :p modifier is
set -- in this case anchoring the match to the beginning of
the string.

- The PIR code that PGE generates can now be stored externally
and directly included by other PIR modules. For example, when
a previous version of PGE was loaded, the initialization code
executed at load-time would dynamically compile and install
<ident> and <name> subrules, thus slowing down program
initialization. In this new version, the PIR code for
<ident> and <name> is generated as part of building PGE, so
that PGE.pbc already contains the bytecode for these precompiled
rules when it is loaded.

- PGE Match objects can now distinguish array keys from hash keys
that begin with a digit. Previously Match objects assumed that
any key starting with a digit was addressing solely the array
component of the Match object.

- A number of performance enhancements and code cleanups, especially
in the code that handles matching of quantified groups and
subrules.

Questions, comments, feedback welcomed as always. My next area
of focus is on providing subrules that can match quoted and bracketed
constructs (similar to Text::Balanced), and on completing a
shift/reduce parser that integrates with PGE's rule matching
capabilities.

Pm

0 new messages