rules trouble

Dino Morelli

unread,

May 12, 2005, 1:51:04 AM5/12/05

to Perl 6 Internals list

I'm working on more p6rules unit tests.

Having some trouble. First, understanding when :w means \s* and when it
means \s+

Also, these tests are failing when I use :: to separate the modifier
from the pattern. But they work when I do ':w blah' (separate with a
space). I'm not sure which ways are "right".

The actual failing tests:

my $targ = qq{ foobar
baz quux
zot\tfum};

p6rule_is ($targ, ':w::baz quux', 'baz\s+quux or baz\s*quux matches');
p6rule_is ($targ, ':w::zot fum', 'zot\s+fum or zot\s*fum matches');

-Dino

--
.~. Dino Morelli
/V\ email: dmor...@reactorweb.net
/( )\ weblog: http://categorically.net/d/blog/
^^-^^ preferred distro: Debian GNU/Linux http://www.debian.org

Mark A. Biggar

unread,

May 12, 2005, 2:29:20 AM5/12/05

to Dino Morelli, Perl 6 Internals list

Dino Morelli wrote:
> I'm working on more p6rules unit tests.
>
> Having some trouble. First, understanding when :w means \s* and when it
> means \s+
>
>
> Also, these tests are failing when I use :: to separate the modifier
> from the pattern. But they work when I do ':w blah' (separate with a
> space). I'm not sure which ways are "right".
>
> The actual failing tests:
>
> my $targ = qq{ foobar
> baz quux
> zot\tfum};
>
> p6rule_is ($targ, ':w::baz quux', 'baz\s+quux or baz\s*quux matches');
> p6rule_is ($targ, ':w::zot fum', 'zot\s+fum or zot\s*fum matches');

If I remember right Larry's intent was that it depends on word
boundaries. Thus the thing to look at is to put "\b" there instead of
the "\s+" or "\s*" and see if the match will still work. I.e., if "\b"
would succeed then use "\s*" and if "\b" would fail the use "\s+". Thus
both of your examples above should use "\s+" because you want to
preserve the separation between the words. So, if the rule was
':w::baz ###', then you'd what to use "\s*" as the white space is not
needed to keep them separate.

--
ma...@biggar.org
mark.a...@comcast.net

Patrick R. Michaud

unread,

May 12, 2005, 9:34:39 AM5/12/05

to Dino Morelli, Perl 6 Internals list

[My reply below likely belongs on either perl6-compiler or perl6-language,
but I didn't want to do a lot of cross-posting, so I'm replying
to perl6-internals for now (with apologies to p6i) and followups
should probably go to p6c or p6l. --Pm]

On Thu, May 12, 2005 at 01:51:04AM -0400, Dino Morelli wrote:
> I'm working on more p6rules unit tests.
>
> Having some trouble. First, understanding when :w means \s* and when it
> means \s+

I'll do my best to explain. From A05, <ws> means \s+ whenever it's
between two identifiers (i.e., two sets of word characters) and \s*
between anything else. Furthermore, according to S05, <ws> decides
this based on the contents of the matched string, not the pattern
being matched.

Thus, a pattern like

rx :w /hello -?world/

becomes

rx /hello <?ws> -?world/

which matches any of

hello world
hello-world
hello -world
hello world
hello\nworld

but not

helloworld

Thus, <ws> fails if it occurs between two word characters in
the target string, and it greedily consumes any whitespace at
that point in the match.

We might speculate that <ws> is equivalent to \b\s*, but \b fails
between pairs of non-word characters, whereas <ws> will succeed.

Followups on this question should probably go to p6c or p6l.

> Also, these tests are failing when I use :: to separate the modifier
> from the pattern. But they work when I do ':w blah' (separate with a
> space). I'm not sure which ways are "right".
>
> The actual failing tests:
>
> my $targ = qq{ foobar
> baz quux
> zot\tfum};
>
> p6rule_is ($targ, ':w::baz quux', 'baz\s+quux or baz\s*quux matches');
> p6rule_is ($targ, ':w::zot fum', 'zot\s+fum or zot\s*fum matches');

Wow, this is a nice test. I can see why it's failing but I'm not
sure what the correct interpretation should be so I'll be sending
a message to perl6-language for clarification. I'll explain briefly
below, but for now the solution might be to test C< [:w::baz quux] >.

Briefly, the question has to do with unanchored pattern matches -- in
an unanchored match, there's an implicit C< .*? > at the start of
the match. So, should C< rx /::baz quux/ > act like

rx /^ .*? ::baz quux /

or

rx /^ .*? [::baz quux] /

In the first case, the :: ends up negating the .*?, forcing the
expression to match at the first character. In the second, the cut
is limited to the subpattern, so the .*? still has a chance to
shift the pattern across the target string.

At any rate, I'll bring this up on perl6-language, and followups
to this message should probably go there.

Pm