------
token start { ^<emptyline>*$ }
regex emptyline { ^^ $$ \n }
token ws { [<sp> | \t]* }
------
If I match this against a string of 7 newlines, it returns 7 <emptyline>
matches, and each match is a single newline. This is the behavior I want
for newlines.
I would like to add smart whitespace matching for spaces and tabs. But,
if I change <emptyline> to a 'rule' and match it against the same string
of 7 newlines, it returns a single <emptyline> match and the matched
string is 7 newlines. I've tried several variations on the <ws> rule,
but it seems to boil down to: no matter what the <ws> rule matches, if
:sigspace is on, it treats newlines as ignorable whitespace.
Is this a bug or a feature?
Thanks,
Allison
I have the latest checkout of Parrot (I'm not using Pugs).
It may not be a bug. The design question is: should <ws> match a newline
even when it's been overloaded to match only spaces and tabs? (I'm
thinking "No", but could be wrong.)
Allison
The above grammar doesn't have a "grammar" statement; as a result
the regexes are being installed into the '' namespace.
> If I match this against a string of 7 newlines, it returns 7 <emptyline>
> matches, and each match is a single newline. This is the behavior I want
> for newlines.
I tried it with a grammar statement and it seems to work:
----
$ cat ar.pg
grammar XYZ;
token start { ^<emptyline>*$ }
rule emptyline { ^^ $$ \n }
token ws { [<sp> | \t]* }
$ ./parrot compilers/pge/pgc.pir ar.pg >ar.pir
$ cat xyz.pir
.sub main :main
load_bytecode 'PGE.pbc'
load_bytecode 'ar.pir'
load_bytecode 'dumper.pbc'
load_bytecode 'PGE/Dumper.pbc'
$P0 = find_global 'XYZ', 'start'
$P1 = $P0("\n\n\n\n\n\n\n", 'grammar' => 'XYZ')
'_dumper'($P1)
.end
$ ./parrot xyz.pir
"VAR1" => PMC 'XYZ' => "\n\n\n\n\n\n\n" @ 0 {
<emptyline> => ResizablePMCArray (size:7) [
PMC 'XYZ' => "\n" @ 0,
PMC 'XYZ' => "\n" @ 1,
PMC 'XYZ' => "\n" @ 2,
PMC 'XYZ' => "\n" @ 3,
PMC 'XYZ' => "\n" @ 4,
PMC 'XYZ' => "\n" @ 5,
PMC 'XYZ' => "\n" @ 6
]
}
$
-----
Pm
Overloading <ws> and other builtins was fixed in parrot and pugs
approaching midnight (hackathon time) on 2006-06-29. If your parrot
and pugs are both more recent than that, I'm not sure where the bug
is.
-kolibrie
The original did have a 'grammar' statement, I just didn't paste it into
the email.
> $ cat xyz.pir
> .sub main :main
> load_bytecode 'PGE.pbc'
> load_bytecode 'ar.pir'
> load_bytecode 'dumper.pbc'
> load_bytecode 'PGE/Dumper.pbc'
>
> $P0 = find_global 'XYZ', 'start'
> $P1 = $P0("\n\n\n\n\n\n\n", 'grammar' => 'XYZ')
What the original didn't have is the 'grammar' named argument when
calling the start rule. When I replace the previous line with:
$P1 = $P0("\n\n\n\n\n\n\n")
then your sample code exhibits the same problem. I assume this means
that the reason overriding <ws> wasn't working is because it was calling
the default version of <ws> in the root namespace. But, if it was
defaulting to the root namespace, why was it able to find any of the
rules? Shouldn't it have complained that it couldn't find <emptyline>?
Thanks,
Allison
At the moment (and this may be incorrect), PGE looks for named rules
via inheritance, and if not found that way it looks in the available
symbol tables using the find_name opcode.
So, the match was able to find the rules because they are in the
current namespace, but when it came time to find the rule for <?ws>
there was a "ws" method available (the default) and so that one
was used.
Again, this may not be the correct behavior; I've been using S12 as
the guide here, in that a method call first considers methods from
the class hierarchy and fails over to subroutine dispatch.
Pm