Having some trouble. First, understanding when :w means \s* and when it
means \s+
Also, these tests are failing when I use :: to separate the modifier
from the pattern. But they work when I do ':w blah' (separate with a
space). I'm not sure which ways are "right".
The actual failing tests:
my $targ = qq{ foobar
baz quux
zot\tfum};
p6rule_is ($targ, ':w::baz quux', 'baz\s+quux or baz\s*quux matches');
p6rule_is ($targ, ':w::zot fum', 'zot\s+fum or zot\s*fum matches');
-Dino
--
.~. Dino Morelli
/V\ email: dmor...@reactorweb.net
/( )\ weblog: http://categorically.net/d/blog/
^^-^^ preferred distro: Debian GNU/Linux http://www.debian.org
If I remember right Larry's intent was that it depends on word
boundaries. Thus the thing to look at is to put "\b" there instead of
the "\s+" or "\s*" and see if the match will still work. I.e., if "\b"
would succeed then use "\s*" and if "\b" would fail the use "\s+". Thus
both of your examples above should use "\s+" because you want to
preserve the separation between the words. So, if the rule was
':w::baz ###', then you'd what to use "\s*" as the white space is not
needed to keep them separate.
On Thu, May 12, 2005 at 01:51:04AM -0400, Dino Morelli wrote:
> I'm working on more p6rules unit tests.
>
> Having some trouble. First, understanding when :w means \s* and when it
> means \s+
I'll do my best to explain. From A05, <ws> means \s+ whenever it's
between two identifiers (i.e., two sets of word characters) and \s*
between anything else. Furthermore, according to S05, <ws> decides
this based on the contents of the matched string, not the pattern
being matched.
Thus, a pattern like
rx :w /hello -?world/
becomes
rx /hello <?ws> -?world/
which matches any of
hello world
hello-world
hello -world
hello world
hello\nworld
but not
helloworld
Thus, <ws> fails if it occurs between two word characters in
the target string, and it greedily consumes any whitespace at
that point in the match.
We might speculate that <ws> is equivalent to \b\s*, but \b fails
between pairs of non-word characters, whereas <ws> will succeed.
Followups on this question should probably go to p6c or p6l.
> Also, these tests are failing when I use :: to separate the modifier
> from the pattern. But they work when I do ':w blah' (separate with a
> space). I'm not sure which ways are "right".
>
> The actual failing tests:
>
> my $targ = qq{ foobar
> baz quux
> zot\tfum};
>
> p6rule_is ($targ, ':w::baz quux', 'baz\s+quux or baz\s*quux matches');
> p6rule_is ($targ, ':w::zot fum', 'zot\s+fum or zot\s*fum matches');
Wow, this is a nice test. I can see why it's failing but I'm not
sure what the correct interpretation should be so I'll be sending
a message to perl6-language for clarification. I'll explain briefly
below, but for now the solution might be to test C< [:w::baz quux] >.
Briefly, the question has to do with unanchored pattern matches -- in
an unanchored match, there's an implicit C< .*? > at the start of
the match. So, should C< rx /::baz quux/ > act like
rx /^ .*? ::baz quux /
or
rx /^ .*? [::baz quux] /
In the first case, the :: ends up negating the .*?, forcing the
expression to match at the first character. In the second, the cut
is limited to the subpattern, so the .*? still has a chance to
shift the pattern across the target string.
At any rate, I'll bring this up on perl6-language, and followups
to this message should probably go there.
Pm