special named assertions

David Brunton

unread,

Sep 27, 2006, 2:59:32 PM9/27/06

to perl6-l...@perl.org

From an IRC conversation earlier today:

A quick scan of S05 reveals definitions for these seven special named assertions:
<before pattern>
<after pattern>
<sp>
<ws>
<null>
<'...'>
<at($pos)>

Twenty-four more are listed in docs/Perl6/Overview/Rule.pod (some of which are used in S05, but I don't think there are definitions).
<"...">
<dot>
<lt>
<gt>
<prior>
<commit>
<cut>
<fail>
<null>
<ident>
<self>
<alnum>
<alpha>
<ascii>
<blank>
<cntrl>
<digit>
<graph>
<lower>
<print>
<space>
<upper>
<word>
<xdigit>
<!XXX> # not sure if this counts

Additionally, in t/regex/from_perl6_rules/stdrules.t there is one I didn't notice elsewhere, but appears to be implemented in Pugs:
<punct>

As far as I can tell, this yields a total of 31 or 32 special named assertions. I'm sure if I have missed any obvious ones, someone will speak up. Some have passing tests, some have failing tests, and some have no tests.

Does it make sense to have a single place in S05 where all the builtin special named assertions are defined? It would make it easier to link the tests, and to tell the difference between examples like <moose> and builtins like <ident>.

Last, but not least, should any of these be crossed off the list?

Best,
David.

Patrick R. Michaud

unread,

Sep 27, 2006, 4:15:53 PM9/27/06

to David Brunton, perl6-l...@perl.org

On Wed, Sep 27, 2006 at 11:59:32AM -0700, David Brunton wrote:
> A quick scan of S05 reveals definitions for these seven special named assertions:

> [...]

I don't think that <'...'> or <"..."> are really "named assertions".

I think that <!xyz> (as well as <+xyz> and <-xyz>) are simply special forms
of the named assertion <xyz>.

I should probably compare your list to what PGE has implemented and see if
there are any differences -- will do that later tonight.

Pm

Mark A Biggar

unread,

Sep 27, 2006, 5:12:02 PM9/27/06

to Patrick R. Michaud, David Brunton, perl6-l...@perl.org

The documentation should distinguish between those that are just pre-defined characters classes (E.G., <alpha> and <digit>) and those that are special builtins (E.G., <before ...> and <commit>. The former are things that you should be freely allowed to redefine in a derived grammar, while the other second type may want to be treated as reserved, or at least mention that redefining them may break things in surprising ways.

--
Mark Biggar
ma...@biggar.org
mark.a...@comcast.net
mbi...@paypal.com

Patrick R. Michaud

unread,

Sep 27, 2006, 5:23:00 PM9/27/06

to mark.a...@comcast.net, David Brunton, perl6-l...@perl.org

On Wed, Sep 27, 2006 at 09:12:02PM +0000, mark.a...@comcast.net wrote:
> The documentation should distinguish between those that are just
> pre-defined characters classes (E.G., <alpha> and <digit>) and
> those that are special builtins (E.G., <before ...> and <commit>.
> The former are things that you should be freely allowed to redefine
> in a derived grammar, while the other second type may want to be
> treated as reserved, or at least mention that redefining them may
> break things in surprising ways.

FWIW, thus far in development PGE doesn't treat <before ...>
and <commit> as "special built-ins" -- they're subrules, same
as <alpha> and <digit>, that can indeed be redefined by
derived grammars.

And I think that one could argue that redefining <alpha> or
<digit> could equally break things in surprising ways.

I'm not arguing against the idea of special builtins or saying it's
a bad idea -- designating some named assertions as "special/non-derivable"
could enable some really nice optimizations and implementation shortcuts
that until now I've avoided. I'm just indicating that I haven't
come across anything yet in the regex implementation that absolutely
requires that certain named assertions receive special treatment
in the engine.

Thanks,

Pm