Distributing traits / Rule-matching group properties

Austin Hastings

unread,

Feb 28, 2004, 2:42:47 AM2/28/04

to Perl6 Language

Another hypothetical:

Suppose you have a browser (which understands "language" traits) or a word processor (which stores "style" and/or "font" information) that is storing some not-text-only string-like things via scalar strings+ or objectrefs.

You want to do something like "search for all occurrences of the word 'From:' in a heading style" or "Find all letters 'l' in french text".

How do you write, and how do you code, the rule(s) for that?

I think it could be a rule junction, as C< /all(<french>, 'l')/ > but that's not entirely satisfying since I don't imagine that rule junctions are going to be the most efficient constructs around. (But would a rulejunction be a valid way of searching?)

Alternatively, there would need to be some way of inquiring about distributed traits. That is, a trait that wasn't actually applied to every single member of a list, but which was "inferred" by some magic accessor. (IOW, the "string" object defines a special version of the trait accessor method (.AUTOTRAIT anyone?) that knows how to query to see if there is a <french>...</french> tagset surrounding this text, or whatever.

With that, you could define a rule called "<french>" and called "</french>" that clevery look like XML but invoke the rules. Something like:

m« <french>l</french> »

This has the twin virtues of (1) looking cool; and (2) being really self-explanatory. But, how would you code a rule pair like french and /french?

If that's not doable, is there some other way, especially some variable way, of checking for "traits" at the same time you're matching patterns? (I.e., $language instead of <french>)

=Austin

Larry Wall

unread,

Feb 28, 2004, 3:32:37 PM2/28/04

to Perl6 Language

On Sat, Feb 28, 2004 at 02:42:47AM -0500, Austin Hastings wrote:
: Another hypothetical:

:
: Suppose you have a browser (which understands "language" traits)
: or a word processor (which stores "style" and/or "font" information)
: that is storing some not-text-only string-like things via scalar
: strings+ or objectrefs.

Okay, I supposin'. But I'd rather not call them traits, since that
already means two other things right now. Properties is more like...

: You want to do something like "search for all occurrences of the word

: 'From:' in a heading style" or "Find all letters 'l' in french text".
:
: How do you write, and how do you code, the rule(s) for that?

Depends on how you think of the embedded objects.

: I think it could be a rule junction, as C< /all(<french>, 'l')/ >

: but that's not entirely satisfying since I don't imagine that rule
: junctions are going to be the most efficient constructs around. (But
: would a rulejunction be a valid way of searching?)

Not written like that. At minimum you'd have to put a colon on the
front of that to make it :all. Except that :all is already taken...

I did get a request once for & to do the opposite of | though.
And one could make an argument that we should reserve :all, :any,
:one and :none for junctional utterances. In which case what we
currently call :all should probably be :every or :exhaustive or
something else guaranteed to be confused with :e. :-)

: Alternatively, there would need to be some way of inquiring about

: distributed traits. That is, a trait that wasn't actually applied to
: every single member of a list, but which was "inferred" by some magic
: accessor. (IOW, the "string" object defines a special version of the
: trait accessor method (.AUTOTRAIT anyone?) that knows how to query to
: see if there is a <french>...</french> tagset surrounding this text,
: or whatever.

: With that, you could define a rule called "<french>" and called
: "</french>" that clevery look like XML but invoke the rules. Something
: like:

: m« <french>l</french> »

: This has the twin virtues of (1) looking cool; and (2) being really
: self-explanatory. But, how would you code a rule pair like french
: and /french?

That...makes my head hurt. And will probably make Perl's head hurt too.

: If that's not doable, is there some other way, especially some

: variable way, of checking for "traits" at the same time you're matching
: patterns? (I.e., $language instead of <french>)

If embedded objects are just considered strange characters, and
characters are just considered strange objects, then the most
straightforward way to get object/character properties with set
operations is through the mechanisms that are already there.
For example, to find a french word using character property sets:

/<<alpha> & <french>>+/;

Your specific example is little more complicated. Though of course,
since "I" is one letter, one could in this particular case write:

/<[I] & <french>>+/;

The general solution, however, is:

/(From\:) <( $1 ~~ /^<headingchar>+$/ )>/

Which seems a bit suboptimal. The proposed & counterpart to | could
help here:

/ From\: & <headingchar>+ /

the point of & being that all its subpatterns have to start and stop
at the same spot, or it's not a match. In the way it was originally
posed to me, it was a bioinformatics problem where you want to say
something like:

/$startseq [ $seqA & $seqB ] $finalseq/

except that that's implying some scanning that the regex engine wouldn't
do by default. You'd have to say something like:

/$startseq [ .*? $seqA .*? & .*? $seqB .*? ] $finalseq/

And now you can see how it would be very easy to abuse & badly in terms
of performance. The above could easily be O(n**4) unless the optimizer
was extremely cagey in factoring out the wildcards into something like:

/$startseq .*? [
[$seqA .*? & .*? $seqB ] |
[$seqB .*? & .*? $seqA ]
] .*? $finalseq/

That's still gonna stress the regex engine though. The efficient way
to solve this particular problem, assuming that $finalseq doesn't
match everywhere, is this:

/$startseq (.*?) $finalseq <( $1 ~~ /$seqA/ && $1 ~~ /$seqB/ )>/

It's like ordering your expensive tests after your cheap tests in

if foo() and baz() and bar()

I suppose the regex compiler could guess that a pattern like

A [ B & C ] D

should be tested

if A and D and [ B & C ]

But that gets blown to smithereens if D relies on a backref to B or C.
So does any implementation that tries to turn [ B & C ] into a one-pass
state machine.

Still, just because a feature can be abused doesn't mean that it
shouldn't go in. There's a lot to be said for being able to write
things like:

[ <ident> & <ascii>+ ]

Now I'm supposing that & binds tighter than | as usual, so the
brackets wouldn't always be necessary:

<ident> & <french>+
|
<ident> & <swahili>+

Larry

Larry Wall

unread,

Feb 28, 2004, 3:57:14 PM2/28/04

to Perl6 Language

On Sat, Feb 28, 2004 at 12:32:37PM -0800, Larry Wall wrote:
: Now I'm supposing that & binds tighter than | as usual, so the

: brackets wouldn't always be necessary:
:
: <ident> & <french>+
: |
: <ident> & <swahili>+

Although, of course, that should probably be written:

<ident> & [ <french>+ | <swahili>+ ]

or really, just

<ident> & <<french>|<swahili>>+

That last is likely to be the fastest, since a decent implementation
of character properties should cache swatches of the bitmap like Perl 5
does, or at least memoize something somewhere to keep from having
to recalculate what's french and what's swahili...

Larry

Damian Conway

unread,

Mar 2, 2004, 12:58:38 AM3/2/04

to perl6-l...@perl.org

Larry noted:

> There's a lot to be said for being able to write
> things like:
>
> [ <ident> & <ascii>+ ]
>
> Now I'm supposing that & binds tighter than | as usual, so the
> brackets wouldn't always be necessary:
>
> <ident> & <french>+
> |
> <ident> & <swahili>+

FWIW, I'm strongly in favour of adding & to rules.

Indeed, if Larry were to give the word, I'd be delighted to add support for it
to the Perl6::Rules module.

Damian

Larry Wall

unread,

Mar 2, 2004, 1:59:54 AM3/2/04

to perl6-l...@perl.org

On Tue, Mar 02, 2004 at 04:58:38PM +1100, Damian Conway wrote:
: FWIW, I'm strongly in favour of adding & to rules.

:
: Indeed, if Larry were to give the word, I'd be delighted to add support for
: it to the Perl6::Rules module.

Execute! (I hope that's the right word...)

Larry

Damian Conway

unread,

Mar 2, 2004, 2:15:23 PM3/2/04

to perl6-l...@perl.org

Larry wrote:

> : Indeed, if Larry were to give the word, I'd be delighted to add support for
> : it to the Perl6::Rules module.
>
> Execute! (I hope that's the right word...)

I believe, Captain, the correct word would be: "Engage!"

Data^H^Hmian