* Rules are Parrot subroutines that know how to match strings. To
compile a rule, one uses the "PGE::p6rule" function:
.sub main
.local pmc p6rule
.local pmc rulesub
.local pmc match
load_bytecode "PGE.pbc"
p6rule = find_global "PGE", "p6rule"
rulesub = p6rule(":w (\w+) \:= (\S+)")
match = rulesub("dog := spot")
* A rule subroutine returns a "match object" containing the
results of the match. In Perl 6, this object will be known as C< $/ >.
* A match object returned from a successful match has the following
characteristics:
- true in boolean context
- 1 in a numeric context (may change later with :g modifier)
- the string matched in string context
- .from() and .to() are offsets delimiting the string where
the match was found
- contains other match objects resulting from captured
subpatterns or subrules in the match
* A rule containing capturing parens gets additional match objects
for each set of parens. Thus a rule like:
$1 $2
rulesub = p6rule(":w(\w+) \:= (\S+)")
captures the word characters prior to the ":=" into $/[0], and
the non-space characters following the ":=" into $/[1]. (In Perl 6,
the $1, $2, ... variables will be aliases to $/[0], $/[1], ... . )
rulesub = p6rule(":w(\w+) \:= (\S+)")
match = rulesub(" let foo := 123 ")
print match # outputs "foo := 123"
$P0 = match[0] # first subpattern capture ($1)
print $P0 # outputs "foo"
$P0 = match[1] # second subpattern capture ($2)
print $P0 # outputs "123"
* If a capture is quantified with any of '+', '*', or '**{m..n}',
then it generates an array of match objects for the subpattern capture
instead of a single match object:
rulesub = p6rule(":w(\w+) \:= (\S+ )*")
match = rulesub(" foo := zip boom bah")
print match # outputs "foo := zip boom bah"
$P0 = match[0] # first subpattern capture ($1)
print $P0 # outputs "foo"
$P1 = match[1] # second subpattern array ($2)
$P2 = $P1[0] # second repetition ($2[0])
print $P2 # outputs "zip "
$P2 = $P1[1] # second repetition ($2[1])
print $P2 # outputs "boom "
* Match objects for nested captures are nested into the surrounding
capture object. Thus, given
rulesub = p6rule(":w (let) ( (\w+) \:= (\S+) )")
match = rulesub("let foo := 123")
the outer match object contains two match objects ($/[0] and $/[1]),
and the second of these contains two match objects at
$/[1][0] and $/[1][1].
print match # outputs "let foo := 123"
$P0 = match[0] # first subcapture ($1)
print $P0 # outputs "let"
$P0 = match[1] # second subcapture ($2)
$P1 = $P0[0] # first nested capture ($2[0])
print $P1 # outputs "foo"
$P1 = $P0[1] # second nested capture ($2[1])
print $P1 # outputs "123"
* Non-capturing subpatterns don't nest match objects:
rulesub = p6rule(":w (let) [ (\w+) \:= (\S+) ]")
match = rulesub("let foo := 123")
print match # outputs "let foo := 123"
$P0 = match[0] # first subcapture ($1)
print $P0 # outputs "let"
$P0 = match[1] # second subcapture ($2)
print $P0 # outputs "foo"
$P0 = match[2] # third subcapture ($3)
print $P0 # outputs "123"
* To define a subrule, store its subroutine into a symbol table somewhere:
rulesub = p6rule("int | double | float | char")
store_global "type", rulesub
rulesub = p6rule("\w+")
store_global "ident", rulesub
* To match a subrule, put the name of the subrule in angle brackets:
rulesub = p6rule(":w<type> <ident>")
match = rulesub(" int argc ")
print match # outputs "int argc"
* Subrule captures become named keys in the resulting match object:
rulesub = p6rule(":w<type> <ident>")
match = rulesub(" int argc ")
print match # outputs "int argc"
$P0 = match["type"] # get type subrule ($/<type>)
print $P0 # outputs "int"
$P0 = match["ident"] # get ident match ($/<ident>)
print $P0 # outputs "argc"
* Quantified subrules produce an array of match objects
rulesub = p6rule(":w<type> <ident> [ , <ident>]*")
(match) = rulesub(" float alpha, beta, gamma")
$P0 = match["type"] # get type subrule ($/<type>)
print $P0 # outputs "float"
$P0 = match["ident"] # get ident subrule (array)
$P1 = $P0[0] # first ident ($/<ident>[0])
print $P1 # outputs "alpha"
$P1 = $P0[1] # second ident ($/<ident>[1])
print $P1 # outputs "beta"
* Captures can be aliased via named aliases:
rulesub = p6rule(":w $<key>:=[\w+] = $<val>:=[\S+]")
(match) = rulesub(" abc = 123")
$P0 = match["key"] # get "key" capture
print $P0 # outputs "abc"
$P0 = match["val"] # get "val" capture
print $P0 # outputs "123"
* Or you can use numbered aliases:
rulesub = p6rule(":w $3:=[\w+] = $1:=[\S+]")
(match) = rulesub(" abc = 123")
$P0 = match[0] # get $1
print $P0 # outputs "123"
$P0 = match[2] # get $3
print $P0 # outputs "abc"
PGE provides the "dump" method for match objects to provide
a data dump of the results. Here's a long example for
parsing arithmetic expressions using the following grammar:
rule factor { \w+ | \( <expr> \) }
rule term {:w <factor> [ (\*|/) <factor> ]* }
rule expr {:w <term> [ (\+|-) <term> ]* }
The PIR code is
.sub _main
.local pmc p6rule
.local pmc match
load_bytecode "../../runtime/parrot/library/PGE.pbc"
p6rule = find_global "PGE", "p6rule"
$P0 = p6rule("\w+ | \( <expr> \)")
store_global "factor", $P0
$P0 = p6rule(":w <factor> [ $<op>:=(\*|/) <factor> ]*")
store_global "term", $P0
$P0 = p6rule(":w <term> [ $<op>:=(\+|-) <term> ]*")
store_global "expr", $P0
$P0 = p6rule("<expr>")
match = $P0("ab * (de + fg) - jk")
match."dump"("$/")
.end
When the above is executed, the match."dump" call above
produces the following output displaying the contents of
the match object in $/:
$/: <ab * (de + fg) - jk @ 0> 1
$/<expr>: <ab * (de + fg) - jk @ 0> 1
$/<expr><term>[0]: <ab * (de + fg) @ 0> 1
$/<expr><term>[0]<op>[0]: <* @ 3> 1
$/<expr><term>[0]<factor>[0]: <ab @ 0> 1
$/<expr><term>[0]<factor>[1]: <(de + fg) @ 5> 1
$/<expr><term>[0]<factor>[1]<expr>: <de + fg @ 6> 1
$/<expr><term>[0]<factor>[1]<expr><term>[0]: <de @ 6> 1
$/<expr><term>[0]<factor>[1]<expr><term>[0]<factor>[0]: <de @ 6> 1
$/<expr><term>[0]<factor>[1]<expr><term>[1]: <fg @ 11> 1
$/<expr><term>[0]<factor>[1]<expr><term>[1]<factor>[0]: <fg @ 11> 1
$/<expr><term>[0]<factor>[1]<expr><op>[0]: <+ @ 9> 1
$/<expr><term>[1]: <jk @ 17> 1
$/<expr><term>[1]<factor>[0]: <jk @ 17> 1
$/<expr><op>[0]: <- @ 15> 1