Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
PGE features update (corrections)
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  1 message - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Patrick R. Michaud  
View profile  
 More options May 8 2005, 1:43 pm
Newsgroups: perl.perl6.compiler
From: pmich...@pobox.com (Patrick R. Michaud)
Date: Sun, 8 May 2005 12:43:18 -0500
Local: Sun, May 8 2005 1:43 pm
Subject: PGE features update (corrections)
[Correcting typographic errors noticed after sending original.  Sorry
for the errors and duplicate posts.  Also, in many of the double
quoted strings below the backslashes should probably be escaped
(i.e., "\\s" instead of "\s") but I'm leaving them alone for
readability.  --Pm]

* Rules are Parrot subroutines that know how to match strings.  To
  compile a rule, one uses the "PGE::p6rule" function:

    .sub main
        .local pmc p6rule
        .local pmc rulesub
        .local pmc match
        load_bytecode "PGE.pbc"
        p6rule = find_global "PGE", "p6rule"

        rulesub = p6rule(":w (\w+) \:= (\S+)")
        match = rulesub("dog := spot")

* A rule subroutine returns a "match object" containing the
  results of the match.  In Perl 6, this object will be known as C< $/ >.

* A match object returned from a successful match has the following
  characteristics:
  - true in boolean context
  - 1 in a numeric context (may change later with :g modifier)
  - the string matched in string context
  - .from() and .to() are offsets delimiting the string where
    the match was found
  - contains other match objects resulting from captured
    subpatterns or subrules in the match

* A rule containing capturing parens gets additional match objects
  for each set of parens.  Thus a rule like:

                             $1        $2
        rulesub = p6rule(":w(\w+) \:= (\S+)")

  captures the word characters prior to the ":=" into $/[0], and
  the non-space characters following the ":=" into $/[1].  (In Perl 6,
  the $1, $2, ... variables will be aliases to $/[0], $/[1], ... . )

        rulesub = p6rule(":w(\w+) \:= (\S+)")
        match = rulesub(" let foo := 123 ")
        print match                        # outputs "foo := 123"
        $P0 = match[0]                     # first subpattern capture ($1)
        print $P0                          # outputs "foo"
        $P0 = match[1]                     # second subpattern capture ($2)
        print $P0                          # outputs "123"

* If a capture is quantified with any of '+', '*', or '**{m..n}',
  then it generates an array of match objects for the subpattern capture
  instead of a single match object:

        rulesub = p6rule(":w(\w+) \:= (\S+ )*")
        match = rulesub(" foo := zip boom bah")
        print match                        # outputs "foo := zip boom bah"
        $P0 = match[0]                     # first subpattern capture ($1)
        print $P0                          # outputs "foo"
        $P1 = match[1]                     # second subpattern array ($2)
        $P2 = $P1[0]                       # second repetition ($2[0])
        print $P2                          # outputs "zip "
        $P2 = $P1[1]                       # second repetition ($2[1])
        print $P2                          # outputs "boom "

* Match objects for nested captures are nested into the surrounding
  capture object.  Thus, given

        rulesub = p6rule(":w (let) ( (\w+) \:= (\S+) )")
        match = rulesub("let foo := 123")

  the outer match object contains two match objects ($/[0] and $/[1]),
  and the second of these contains two match objects at
  $/[1][0] and $/[1][1].

        print match                        # outputs "let foo := 123"
        $P0 = match[0]                     # first subcapture ($1)
        print $P0                          # outputs "let"
        $P0 = match[1]                     # second subcapture ($2)
        $P1 = $P0[0]                       # first nested capture ($2[0])
        print $P1                          # outputs "foo"
        $P1 = $P0[1]                       # second nested capture ($2[1])
        print $P1                          # outputs "123"

* Non-capturing subpatterns don't nest match objects:

        rulesub = p6rule(":w (let) [ (\w+) \:= (\S+) ]")
        match = rulesub("let foo := 123")
        print match                        # outputs "let foo := 123"
        $P0 = match[0]                     # first subcapture ($1)
        print $P0                          # outputs "let"
        $P0 = match[1]                     # second subcapture ($2)
        print $P0                          # outputs "foo"
        $P0 = match[2]                     # third subcapture ($3)
        print $P0                          # outputs "123"

* To define a subrule, store its subroutine into a symbol table somewhere:

        rulesub = p6rule("int | double | float | char")
        store_global "type", rulesub
        rulesub = p6rule("\w+")
        store_global "ident", rulesub

* To match a subrule, put the name of the subrule in angle brackets:

        rulesub = p6rule(":w<type> <ident>")
        match = rulesub("   int argc ")
        print match                        # outputs "int argc"

* Subrule captures become named keys in the resulting match object:

        rulesub = p6rule(":w<type> <ident>")
        match = rulesub("   int argc ")
        print match                        # outputs "int argc"
        $P0 = match["type"]                # get type subrule  ($/<type>)
        print $P0                          # outputs "int"
        $P0 = match["ident"]               # get ident match ($/<ident>)
        print $P0                          # outputs "argc"

* Quantified subrules produce an array of match objects

        rulesub = p6rule(":w<type> <ident> [ , <ident>]*")
        (match) = rulesub("    float alpha, beta, gamma")
        $P0 = match["type"]                # get type subrule ($/<type>)
        print $P0                          # outputs "float"
        $P0 = match["ident"]               # get ident subrule (array)
        $P1 = $P0[0]                       # first ident ($/<ident>[0])
        print $P1                          # outputs "alpha"
        $P1 = $P0[1]                       # second ident ($/<ident>[1])
        print $P1                          # outputs "beta"

* Captures can be aliased via named aliases:

        rulesub = p6rule(":w $<key>:=[\w+] = $<val>:=[\S+]")
        (match) = rulesub("   abc = 123")
        $P0 = match["key"]                 # get "key" capture
        print $P0                          # outputs "abc"
        $P0 = match["val"]                 # get "val" capture
        print $P0                          # outputs "123"

* Or you can use numbered aliases:

        rulesub = p6rule(":w $3:=[\w+] = $1:=[\S+]")
        (match) = rulesub("   abc = 123")
        $P0 = match[0]                     # get $1
        print $P0                          # outputs "123"
        $P0 = match[2]                     # get $3
        print $P0                          # outputs "abc"

PGE provides the "dump" method for match objects to provide
a data dump of the results.  Here's a long example for
parsing arithmetic expressions using the following grammar:

    rule factor { \w+ | \( <expr> \) }
    rule term   {:w <factor> [ (\*|/) <factor> ]* }
    rule expr   {:w <term> [ (\+|-) <term> ]* }

The PIR code is

    .sub _main
        .local pmc p6rule
        .local pmc match

        load_bytecode "../../runtime/parrot/library/PGE.pbc"
        p6rule = find_global "PGE", "p6rule"

        $P0 = p6rule("\w+ | \( <expr> \)")
        store_global "factor", $P0

        $P0 = p6rule(":w <factor> [ $<op>:=(\*|/) <factor> ]*")
        store_global "term", $P0

        $P0 = p6rule(":w <term> [ $<op>:=(\+|-) <term> ]*")
        store_global "expr", $P0

        $P0 = p6rule("<expr>")
        match = $P0("ab * (de + fg) - jk")
        match."dump"("$/")
    .end

When the above is executed, the match."dump" call above
produces the following output displaying the contents of
the match object in $/:

    $/: <ab * (de + fg) - jk @ 0> 1
    $/<expr>: <ab * (de + fg) - jk @ 0> 1
    $/<expr><term>[0]: <ab * (de + fg)  @ 0> 1
    $/<expr><term>[0]<op>[0]: <* @ 3> 1
    $/<expr><term>[0]<factor>[0]: <ab @ 0> 1
    $/<expr><term>[0]<factor>[1]: <(de + fg) @ 5> 1
    $/<expr><term>[0]<factor>[1]<expr>: <de + fg @ 6> 1
    $/<expr><term>[0]<factor>[1]<expr><term>[0]: <de  @ 6> 1
    $/<expr><term>[0]<factor>[1]<expr><term>[0]<factor>[0]: <de @ 6> 1
    $/<expr><term>[0]<factor>[1]<expr><term>[1]: <fg @ 11> 1
    $/<expr><term>[0]<factor>[1]<expr><term>[1]<factor>[0]: <fg @ 11> 1
    $/<expr><term>[0]<factor>[1]<expr><op>[0]: <+ @ 9> 1
    $/<expr><term>[1]: <jk @ 17> 1
    $/<expr><term>[1]<factor>[0]: <jk @ 17> 1
    $/<expr><op>[0]: <- @ 15> 1


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2010 Google