Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

PGE features update (corrections)

13 views

Skip to first unread message

Patrick R. Michaud

unread,

May 8, 2005, 1:43:18 PM5/8/05

to perl6-i...@perl.org, perl6-c...@perl.org

[Correcting typographic errors noticed after sending original. Sorry
for the errors and duplicate posts. Also, in many of the double
quoted strings below the backslashes should probably be escaped
(i.e., "\\s" instead of "\s") but I'm leaving them alone for
readability. --Pm]

* Rules are Parrot subroutines that know how to match strings. To
compile a rule, one uses the "PGE::p6rule" function:

.sub main
.local pmc p6rule
.local pmc rulesub
.local pmc match
load_bytecode "PGE.pbc"
p6rule = find_global "PGE", "p6rule"

rulesub = p6rule(":w (\w+) \:= (\S+)")
match = rulesub("dog := spot")

* A rule subroutine returns a "match object" containing the
results of the match. In Perl 6, this object will be known as C< $/ >.

* A match object returned from a successful match has the following
characteristics:
- true in boolean context
- 1 in a numeric context (may change later with :g modifier)
- the string matched in string context
- .from() and .to() are offsets delimiting the string where
the match was found
- contains other match objects resulting from captured
subpatterns or subrules in the match

* A rule containing capturing parens gets additional match objects
for each set of parens. Thus a rule like:

$1 $2
rulesub = p6rule(":w(\w+) \:= (\S+)")

captures the word characters prior to the ":=" into $/[0], and
the non-space characters following the ":=" into $/[1]. (In Perl 6,
the $1, $2, ... variables will be aliases to $/[0], $/[1], ... . )

rulesub = p6rule(":w(\w+) \:= (\S+)")
match = rulesub(" let foo := 123 ")
print match # outputs "foo := 123"
$P0 = match[0] # first subpattern capture ($1)
print $P0 # outputs "foo"
$P0 = match[1] # second subpattern capture ($2)
print $P0 # outputs "123"

* If a capture is quantified with any of '+', '*', or '**{m..n}',
then it generates an array of match objects for the subpattern capture
instead of a single match object:

rulesub = p6rule(":w(\w+) \:= (\S+ )*")
match = rulesub(" foo := zip boom bah")
print match # outputs "foo := zip boom bah"
$P0 = match[0] # first subpattern capture ($1)
print $P0 # outputs "foo"
$P1 = match[1] # second subpattern array ($2)
$P2 = $P1[0] # second repetition ($2[0])
print $P2 # outputs "zip "
$P2 = $P1[1] # second repetition ($2[1])
print $P2 # outputs "boom "

* Match objects for nested captures are nested into the surrounding
capture object. Thus, given

rulesub = p6rule(":w (let) ( (\w+) \:= (\S+) )")
match = rulesub("let foo := 123")

the outer match object contains two match objects ($/[0] and $/[1]),
and the second of these contains two match objects at
$/[1][0] and $/[1][1].

print match # outputs "let foo := 123"
$P0 = match[0] # first subcapture ($1)
print $P0 # outputs "let"
$P0 = match[1] # second subcapture ($2)
$P1 = $P0[0] # first nested capture ($2[0])
print $P1 # outputs "foo"
$P1 = $P0[1] # second nested capture ($2[1])
print $P1 # outputs "123"

* Non-capturing subpatterns don't nest match objects:

rulesub = p6rule(":w (let) [ (\w+) \:= (\S+) ]")
match = rulesub("let foo := 123")
print match # outputs "let foo := 123"
$P0 = match[0] # first subcapture ($1)
print $P0 # outputs "let"
$P0 = match[1] # second subcapture ($2)
print $P0 # outputs "foo"
$P0 = match[2] # third subcapture ($3)
print $P0 # outputs "123"

* To define a subrule, store its subroutine into a symbol table somewhere:

rulesub = p6rule("int | double | float | char")
store_global "type", rulesub
rulesub = p6rule("\w+")
store_global "ident", rulesub

* To match a subrule, put the name of the subrule in angle brackets:

rulesub = p6rule(":w<type> <ident>")
match = rulesub(" int argc ")
print match # outputs "int argc"

* Subrule captures become named keys in the resulting match object:

rulesub = p6rule(":w<type> <ident>")
match = rulesub(" int argc ")
print match # outputs "int argc"
$P0 = match["type"] # get type subrule ($/<type>)
print $P0 # outputs "int"
$P0 = match["ident"] # get ident match ($/<ident>)
print $P0 # outputs "argc"

* Quantified subrules produce an array of match objects

rulesub = p6rule(":w<type> <ident> [ , <ident>]*")
(match) = rulesub(" float alpha, beta, gamma")
$P0 = match["type"] # get type subrule ($/<type>)
print $P0 # outputs "float"
$P0 = match["ident"] # get ident subrule (array)
$P1 = $P0[0] # first ident ($/<ident>[0])
print $P1 # outputs "alpha"
$P1 = $P0[1] # second ident ($/<ident>[1])
print $P1 # outputs "beta"

* Captures can be aliased via named aliases:

rulesub = p6rule(":w $<key>:=[\w+] = $<val>:=[\S+]")
(match) = rulesub(" abc = 123")
$P0 = match["key"] # get "key" capture
print $P0 # outputs "abc"
$P0 = match["val"] # get "val" capture
print $P0 # outputs "123"

* Or you can use numbered aliases:

rulesub = p6rule(":w $3:=[\w+] = $1:=[\S+]")
(match) = rulesub(" abc = 123")
$P0 = match[0] # get $1
print $P0 # outputs "123"
$P0 = match[2] # get $3
print $P0 # outputs "abc"

PGE provides the "dump" method for match objects to provide
a data dump of the results. Here's a long example for
parsing arithmetic expressions using the following grammar:

rule factor { \w+ | $ <expr> $ }
rule term {:w <factor> [ (\*|/) <factor> ]* }
rule expr {:w <term> [ (\+|-) <term> ]* }

The PIR code is

.sub _main
.local pmc p6rule
.local pmc match

load_bytecode "../../runtime/parrot/library/PGE.pbc"
p6rule = find_global "PGE", "p6rule"

$P0 = p6rule("\w+ | $ <expr> $")
store_global "factor", $P0

$P0 = p6rule(":w <factor> [ $<op>:=(\*|/) <factor> ]*")
store_global "term", $P0

$P0 = p6rule(":w <term> [ $<op>:=(\+|-) <term> ]*")
store_global "expr", $P0

$P0 = p6rule("<expr>")
match = $P0("ab * (de + fg) - jk")
match."dump"("$/")
.end

When the above is executed, the match."dump" call above
produces the following output displaying the contents of
the match object in $/:

$/: <ab * (de + fg) - jk @ 0> 1
$/<expr>: <ab * (de + fg) - jk @ 0> 1
$/<expr><term>[0]: <ab * (de + fg) @ 0> 1
$/<expr><term>[0]<op>[0]: <* @ 3> 1
$/<expr><term>[0]<factor>[0]: <ab @ 0> 1
$/<expr><term>[0]<factor>[1]: <(de + fg) @ 5> 1
$/<expr><term>[0]<factor>[1]<expr>: <de + fg @ 6> 1
$/<expr><term>[0]<factor>[1]<expr><term>[0]: <de @ 6> 1
$/<expr><term>[0]<factor>[1]<expr><term>[0]<factor>[0]: <de @ 6> 1
$/<expr><term>[0]<factor>[1]<expr><term>[1]: <fg @ 11> 1
$/<expr><term>[0]<factor>[1]<expr><term>[1]<factor>[0]: <fg @ 11> 1
$/<expr><term>[0]<factor>[1]<expr><op>[0]: <+ @ 9> 1
$/<expr><term>[1]: <jk @ 17> 1
$/<expr><term>[1]<factor>[0]: <jk @ 17> 1
$/<expr><op>[0]: <- @ 15> 1

0 new messages