Re: Perl 6 Summary for 2004-10-01 through 2004-10-17

Austin Hastings

unread,

Oct 19, 2004, 1:41:03 PM10/19/04

to Michele Dondi, perl6-l...@perl.org

Michele Dondi wrote:

> On Sun, 17 Oct 2004, Matt Fowles wrote:
>
>> Google groups has nothing for Perl6.language between October 2 and 14.
>> Is this really the case? (I had not signed up until shortly before
>
>
> Yes: no traffic at all for quite a while...

Does this mean that we're done? :)

Matthew Walton

unread,

Oct 19, 2004, 4:35:27 PM10/19/04

to Austin_...@yahoo.com, perl6-l...@perl.org

No, it means Larry's about to stun us with something seemingly bizarre
and inexplicable which turns out to be a stroke of genius.

At least, I hope that's the case. Life's been so dull lately I've even
applied to do a PhD.

Peter Scott

unread,

Oct 20, 2004, 4:54:21 AM10/20/04

to perl6-l...@perl.org

In article <41757A8F...@alledora.co.uk>,
mat...@alledora.co.uk (Matthew Walton) writes:

>Austin Hastings wrote:
>> Does this mean that we're done? :)
>
>No, it means Larry's about to stun us with something seemingly bizarre
>and inexplicable which turns out to be a stroke of genius.

This conjured up an image of Larry whacking someone with
a coelacanth...

--
Peter Scott
http://www.perldebugged.com/
*** NEW *** http://www.perlmedic.com/

Michele Dondi

unread,

Oct 21, 2004, 9:15:05 AM10/21/04

to Austin Hastings, perl6-l...@perl.org

On Tue, 19 Oct 2004, Austin Hastings wrote:

>> Yes: no traffic at all for quite a while...
>
> Does this mean that we're done? :)

By any means... I don't think so! I wonder if this could in any way
support what occasionally trolls claim e.g. in clpmisc, i.e. that the Perl
community is devoted to the cult of Larry Wall... for having He put his
mind to rest has put our own ones to rest too!

Michele
--
DAX ODIA ANCORA
- Scritta su diversi muri milanesi

Larry Wall

unread,

Oct 25, 2004, 1:03:40 PM10/25/04

to perl6-l...@perl.org

On Tue, Oct 19, 2004 at 09:35:27PM +0100, Matthew Walton wrote:
: Austin Hastings wrote:
: >Does this mean that we're done? :)

:
: No, it means Larry's about to stun us with something seemingly bizarre
: and inexplicable which turns out to be a stroke of genius.

The only bizarre and inexplicable thing that has occurred to me in the
last week is that I fell into a canal in Venice. It was definitely
somewhat stunning, but I have yet to figure out how to view it as
a stroke of genius. I suppose if I were Archimedes I'd have climbed
back out and shouted "Eureka", but as far as I know Archimedes never
made it to Italy, so it didn't occur to me...

I did learn one important life lesson, however. When you notice that
the last step down to the canal looks slippery, it's a good idea to
consider whether the first step down might also be slippery. I don't
usually buy slippery slope arguments, but in this case, I should have.

Nevertheless, any resemblance between the Perl design process and
bouncing down stairs is entirely coincidental. (I hope.)

: At least, I hope that's the case. Life's been so dull lately I've even

: applied to do a PhD.

Obviously I'm in little danger of that.

Larry

Uri Guttman

unread,

Oct 25, 2004, 11:23:46 PM10/25/04

to perl6-l...@perl.org

>>>>> "LW" == Larry Wall <la...@wall.org> writes:

LW> The only bizarre and inexplicable thing that has occurred to me in
LW> the last week is that I fell into a canal in Venice. It was
LW> definitely somewhat stunning, but I have yet to figure out how to
LW> view it as a stroke of genius. I suppose if I were Archimedes I'd
LW> have climbed back out and shouted "Eureka", but as far as I know
LW> Archimedes never made it to Italy, so it didn't occur to me...

also it would be very hard to detect overflow of the canal/lagoon caused
by the displacement of water due to your body. and i do hope you don't
pollute perl6 with the canal water than seeped into ya! :)

and Archimedes was sicilian (syracuse IIRC) so does that count as italy?
it wasn't part of italy in those days.

now if you suddenly discovered the rest of the apocalypses in a hidden
chamber below the stairs in that canal, that would be fine cause for you
to strip and run around shouting eureka!

uri

--
Uri Guttman ------ u...@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Aldo Calpini

unread,

Oct 26, 2004, 4:52:43 AM10/26/04

to Larry Wall, perl6-l...@perl.org

Larry Wall wrote:
> I suppose if I were Archimedes I'd have climbed
> back out and shouted "Eureka", but as far as I know Archimedes never
> made it to Italy, so it didn't occur to me...

well, Archimedes *was* italian. for some meaning of italian, at least.
he was born in Syracuse (the one in Sicily, not the one in the state of
NY obviously :-). he lived and studied in Egypt, but most of his life
was spent in Syracuse. granted, Sicily at that time was more a Greek
region than an Italian one as it is today, but well, geographically
speaking, it is in Italy.

and of course, Venice was founded ~700 years after Archimedes death, so
he really had no chance of falling into your same canal.

anyway, Larry, I hope the water wasn't too cold :-)

cheers,
Aldo

Matthew Walton

unread,

Oct 26, 2004, 5:17:59 AM10/26/04

to Larry Wall, perl6-l...@perl.org

Larry Wall wrote:
> On Tue, Oct 19, 2004 at 09:35:27PM +0100, Matthew Walton wrote:
> : Austin Hastings wrote:
> : >Does this mean that we're done? :)
> :
> : No, it means Larry's about to stun us with something seemingly bizarre
> : and inexplicable which turns out to be a stroke of genius.
>
> The only bizarre and inexplicable thing that has occurred to me in the
> last week is that I fell into a canal in Venice. It was definitely
> somewhat stunning, but I have yet to figure out how to view it as
> a stroke of genius. I suppose if I were Archimedes I'd have climbed
> back out and shouted "Eureka", but as far as I know Archimedes never
> made it to Italy, so it didn't occur to me...

That's not what I was thinking of.

Also, climbing back out and shouting 'Eureka' would only really be
appropriate if you actually had experienced a moment of revelation about
something. I suspect you were too busy with the not drowning part for that.

Youch, anyway.

Michele Dondi

unread,

Oct 26, 2004, 6:47:53 AM10/26/04

to perl6-l...@perl.org

On Mon, 25 Oct 2004, Larry Wall wrote:

> : No, it means Larry's about to stun us with something seemingly bizarre
> : and inexplicable which turns out to be a stroke of genius.
>
> The only bizarre and inexplicable thing that has occurred to me in the
> last week is that I fell into a canal in Venice. It was definitely

Hope that bouncing downstairs apart you've had a good time staying here in
Italy...

Michele
--
> - if using perl on Unix isn't that simple as on PCs
It is not PC to use "PC" to mean "Windows box". :-)
- Brian McCauley on clpmisc, "Re: Using perl over network (NFS)"

Austin Hastings

unread,

Oct 26, 2004, 9:18:00 AM10/26/04

to Larry Wall, perl6-l...@perl.org

Every once in a while some fascist proposes installing TV cameras in all
public places, and I think, "Oh, God! Nothing good can come of this!"
and resist heartily.

Larry Wall wrote:

>The only bizarre and inexplicable thing that has occurred to me in the last week is that I fell into a canal in Venice. It was definitely somewhat stunning, but I have yet to figure out how to view it as a stroke of genius. I suppose if I were Archimedes I'd have climbed back out and shouted "Eureka", but as far as I know Archimedes never
>made it to Italy, so it didn't occur to me...
>
>I did learn one important life lesson, however. When you notice that the last step down to the canal looks slippery, it's a good idea to consider whether the first step down might also be slippery. I don't usually buy slippery slope arguments, but in this case, I should have.
>
>

And then I get a message like this, and have to reconsider. Venice could
pay for the maintenance on the cameras with the proceeds from their
"Venice's Funniest Public Videos" program.

I hope you were wearing a Hawaiian shirt, for maximum visibility (and
comic relief)?

=Austin

Aaron Sherman

unread,

Oct 26, 2004, 1:42:02 PM10/26/04

to Perl6 Language List

On Tue, 2004-10-26 at 05:17, Matthew Walton wrote:

> Also, climbing back out and shouting 'Eureka' would only really be
> appropriate if you actually had experienced a moment of revelation about
> something. I suspect you were too busy with the not drowning part for that.

Well, such moments of total-awareness triggered by danger are often the
origin of a revelation, but you never know how it's going to play out.
For my part, I can't recall what I was doing when this revelation
hit....

Larry, while you're feeling chatty, I have a question about Perl 6
regular expressions for you. You answered a question of mine, long ago
with a correction. I had said something like:

/ab(c|b){$1 eq 'c'}/

If I recall correctly you had said something like, "there is no plan
(yet) to allow embedded closures to affect matching directly, other than
failing outright." This implied (to me) that you were thinking along
those lines, but were not sure if the regex engine would support it.

Given as how I'm working on a prototype regex engine for Parrot, into
which I intend to plug a Perl 6 rx parser I wanted to ask what your
thoughts on the language are there.

My work so far, and discussions with friends suggest that my scheme for
building an FSA with attached continuations will work as long as I place
some heavy restrictions on the behavior and side-effects of the closure.
I have some code now, and should have a functional prototype in a couple
of weeks.

The way I see this working is that you build an FSA like this:

State 1:
match(token=>'a', target => 2),
match(token=>!'a', target => trap)
State 2:
match(token=>'b', target => 3),
match(token=>!'b', target => trap)
NOTE: Up this point might be optimized by a back-end to a match
against the multi-character string, "ab", but that's not the
FSA's concern.
State 3 (start subexp mark, end subexp mark):
match(token => 'c', target => 4),
match(token => 'b', target => 4),
match(token => !['c','b'], target => trap)
State 4 (closure {$1 eq 'c'}):
match(token => true, target => 5)
match(token => false, target => trap)
State 5 (terminal)

This representation is, of course, after minimization. Comparison of
states normally only takes into account if they are terminal or
non-terminal. In this case, there are 3 additional attributes of a
state:

start subexp mark: boolean: start saving with this token
end subexp mark: boolean: stop saving after this token
closure: continuation: sub(RXContext $cx) returns(RXToken)

For a state to be equivalent to another state, it must satisfy:

state1.terminal == state2.terminal
state1.start_mark == state2.start_mark
state1.end_mark == state2.end_mark
state1.closure == state2.closure

and, of course, the normal recursive comparison of all transitions and
the states they target.

This allows for transformation and minimization of the FSA as well as
arbitrary code execution. The obvious drawback is something that you've
pointed out several times: the code may be executed many times, not at
all or may be executed and then backtracked over (and then re-executed
or not). This makes side-effects dangerous and unpredictable, but I
think that's OK.

I do think that in Perl, we'll want to add a backtrack API though so
that you can do this:

rule newuser {
user:(\w+) ssid:(\w+)
{
db_exec("begin transaction");
let $uid = db_exec("create_user($1,$2)");
BACKTRACK {
db_exec("rollback transaction")
}
}
department:(\S+)
{
my $dpid = get_department($1) and
db_exec("assign_dept($uid, $3)") and
db_exec("commit transaction");
}
}

These examples only use a true/false return from the closure, which may
be all Perl allows for, but the FSA itself will allow a closure to
"match" any token (as if it had been in the input stream) by returning
it (the token is consumed, but is translated to the empty string for
purposes of sub-expressions and other "memory"). Its important to keep
this separate in concept from actually causing the FSA to transition to
a target state (which the closure cannot do). I can't think of a syntax
off the top of my head for taking advantage of that, but its certainly
something to bear in mind.

Larry et al., please let me know if this is in line with The Plan, as
I'd like the first pass to be sufficient to implement Perl 6 rxes, so
that even if the whole thing has to be re-written, at least we have a
framework and API.

--
☎ 781-324-3772
✉ a...@ajs.com aaronj...@gmail.com
☷ http://www.ajs.com/~ajs

Piers Cawley

unread,

Oct 26, 2004, 3:10:08 PM10/26/04

to Aldo Calpini, Larry Wall, perl6-l...@perl.org

Aldo Calpini <da...@perl.it> writes:

But the odds are good that at least one of the water molecules in said
canal passed some of its time in Archimedes body (or one of its
constituent atoms did).

Larry Wall

unread,

Oct 26, 2004, 8:16:24 PM10/26/04

to Perl6 Language List

On Tue, Oct 26, 2004 at 01:42:02PM -0400, Aaron Sherman wrote:
: Larry, while you're feeling chatty, I have a question about Perl 6

: regular expressions for you. You answered a question of mine, long ago
: with a correction. I had said something like:
:
: /ab(c|b){$1 eq 'c'}/
:
: If I recall correctly you had said something like, "there is no plan
: (yet) to allow embedded closures to affect matching directly, other than
: failing outright." This implied (to me) that you were thinking along
: those lines, but were not sure if the regex engine would support it.

I think you might have misunderstood me slightly. I was primarily
quibbling about your syntax, insofar as the above won't work because
the syntax of a bare closure indicates that the return value is to
be ignored, so it's normally used *only* for its side effects (in the
absence of a failure exception). The current syntax for what you're
trying to write is:

/ab(c|b) <($1 eq 'c')>/

which is equivalent to

/ab(c|b) {fail unless $1 eq 'c'}/

The current match state is also magically communicated to submatches
within the closure so that progress on an embedded match is progress
on the outer match. The $_ within the closure is therefore proxying
for the $0 of the outer match.

: Given as how I'm working on a prototype regex engine for Parrot, into

: which I intend to plug a Perl 6 rx parser I wanted to ask what your
: thoughts on the language are there.
:
: My work so far, and discussions with friends suggest that my scheme for
: building an FSA with attached continuations will work as long as I place
: some heavy restrictions on the behavior and side-effects of the closure.

I suppose the usefulness of that will depend on what you mean by
"heavy restrictions". But we have to have general side effects or
we can't build up our syntax tree.

: I have some code now, and should have a functional prototype in a couple

: of weeks.
:
: The way I see this working is that you build an FSA like this:
:
: State 1:
: match(token=>'a', target => 2),
: match(token=>!'a', target => trap)

Does this notation imply that you're matching against 'a' twice?

: State 2:

: match(token=>'b', target => 3),
: match(token=>!'b', target => trap)
: NOTE: Up this point might be optimized by a back-end to a match
: against the multi-character string, "ab", but that's not the
: FSA's concern.
: State 3 (start subexp mark, end subexp mark):
: match(token => 'c', target => 4),
: match(token => 'b', target => 4),
: match(token => !['c','b'], target => trap)

I find this notation confusing. In Perl 5 regex, the start and end are
considered separate states, and the end state would follow the matches,
or you wouldn't capture the "motion", and $1 would end up null. But
then, I think in NFA, not DFA...

: State 4 (closure {$1 eq 'c'}):

: match(token => true, target => 5)
: match(token => false, target => trap)

It's not the regex that would be evaluating for truth here. It only needs
to make sure it can trap a failure.

: State 5 (terminal)

:
: This representation is, of course, after minimization. Comparison of
: states normally only takes into account if they are terminal or
: non-terminal. In this case, there are 3 additional attributes of a
: state:
:
: start subexp mark: boolean: start saving with this token
: end subexp mark: boolean: stop saving after this token
: closure: continuation: sub(RXContext $cx) returns(RXToken)

We also have the <{ $subregex }> form that has to be able to evaluate
a rule returned by a closure.

: For a state to be equivalent to another state, it must satisfy:

:
: state1.terminal == state2.terminal
: state1.start_mark == state2.start_mark
: state1.end_mark == state2.end_mark
: state1.closure == state2.closure
:
: and, of course, the normal recursive comparison of all transitions and
: the states they target.

I think you also have to consider *which* start and end you're saving here.
Don't want to mix up your paren counts...

Also have to allow saves to named variables.

: This allows for transformation and minimization of the FSA as well as

: arbitrary code execution. The obvious drawback is something that you've
: pointed out several times: the code may be executed many times, not at
: all or may be executed and then backtracked over (and then re-executed
: or not). This makes side-effects dangerous and unpredictable, but I
: think that's OK.

What I have said is that the ordinary side-effect form (bare closure)
must run at the canonical time for its location in the regex, even
if that disables certain optimizations. The conditional form and
the regex returning form may be optimized away.

: I do think that in Perl, we'll want to add a backtrack API though so

: that you can do this:
:
: rule newuser {
: user:(\w+) ssid:(\w+)
: {
: db_exec("begin transaction");
: let $uid = db_exec("create_user($1,$2)");
: BACKTRACK {
: db_exec("rollback transaction")
: }
: }
: department:(\S+)
: {
: my $dpid = get_department($1) and
: db_exec("assign_dept($uid, $3)") and
: db_exec("commit transaction");

: }
: }

Um, that's what UNDO already is supposed to do. Closures embedded
in rules logically return when backtracked over, and UNDO is already
set up to roll back on failure.

: These examples only use a true/false return from the closure, which may

: be all Perl allows for,

That's one of the things Perl doesn't allow for. :-)

Perl allows you to return only

success (void context)
failure exception
subrule to match

(I'm assuming <(cond)> is translated by the parser to {(cond) or fail}.)

: but the FSA itself will allow a closure to

: "match" any token (as if it had been in the input stream) by returning
: it (the token is consumed, but is translated to the empty string for
: purposes of sub-expressions and other "memory").

Matching one token seems relatively useless to me. A closure typically
wants to match either 0 tokens or a sequence of tokens.

: Its important to keep

: this separate in concept from actually causing the FSA to transition to
: a target state (which the closure cannot do).

The closure must be able to change the current "pos" at least. I don't
think it necessarily has to be able to pick the next state. But it does
have to have access to the current $0 match state object.

: I can't think of a syntax

: off the top of my head for taking advantage of that, but its certainly
: something to bear in mind.
:
: Larry et al., please let me know if this is in line with The Plan, as
: I'd like the first pass to be sufficient to implement Perl 6 rxes, so
: that even if the whole thing has to be re-written, at least we have a
: framework and API.

Be sure to keep in touch with Patrick and Luke, who are coming at this
from the direction of the Perl 6 parser.

Larry

Aaron Sherman

unread,

Oct 26, 2004, 10:26:34 PM10/26/04

to Larry Wall, Perl6 Language List

On Tue, 2004-10-26 at 20:16, Larry Wall wrote:
> On Tue, Oct 26, 2004 at 01:42:02PM -0400, Aaron Sherman wrote:

> : /ab(c|b){$1 eq 'c'}/
> :
> : If I recall correctly you had said something like, "there is no plan
> : (yet) to allow embedded closures to affect matching directly, other than
> : failing outright." This implied (to me) that you were thinking along
> : those lines, but were not sure if the regex engine would support it.
>
> I think you might have misunderstood me slightly. I was primarily
> quibbling about your syntax, insofar as the above won't work because
> the syntax of a bare closure indicates that the return value is to
> be ignored, so it's normally used *only* for its side effects (in the
> absence of a failure exception).

Ok, that makes sense.

> The current syntax for what you're
> trying to write is:
>
> /ab(c|b) <($1 eq 'c')>/
>
> which is equivalent to
>
> /ab(c|b) {fail unless $1 eq 'c'}/

Now, what does "fail" mean? I can think of two definitions:

1. proceed to trap state (backtracking then happens)
2. exit (probably using an exception) the rule?

> : My work so far, and discussions with friends suggest that my scheme for
> : building an FSA with attached continuations will work as long as I place
> : some heavy restrictions on the behavior and side-effects of the closure.
>
> I suppose the usefulness of that will depend on what you mean by
> "heavy restrictions". But we have to have general side effects or
> we can't build up our syntax tree.

I'm sorry. By that I mean that you will not be allowed to change the
structure or behavior of the FSA in ways which are not consistent with
what was known before execution began. That is, you can't say "goto
state 10", you can only say, "pretend you just matched an 'a'". Now, if
you're thinking in terms of a simple regex that sounds strange, but if
you're thinking in terms of building an FSA that goes to state 10 on
matching a "zero-width 'a'", then you've just gotten the same value out
of the closure as if it *could* "goto state 10". Sort of the "all's fair
if you predeclare" approach to regular expressions and FSAs in general.

> : State 1:
> : match(token=>'a', target => 2),
> : match(token=>!'a', target => trap)
>
> Does this notation imply that you're matching against 'a' twice?

No, it implies that there are two comparisons to be done, one for the
set 'a' and one for the set !'a'. My formal set notation skills suck,
pardon the poor clarity.

Since the back-end is pluggable, you can't say how that will happen, but
the default back-end will test each set serially until it finds a match.

> : State 3 (start subexp mark, end subexp mark):
> : match(token => 'c', target => 4),
> : match(token => 'b', target => 4),
> : match(token => !['c','b'], target => trap)
>
> I find this notation confusing. In Perl 5 regex, the start and end are
> considered separate states, and the end state would follow the matches,
> or you wouldn't capture the "motion", and $1 would end up null. But
> then, I think in NFA, not DFA...

NDFAs (or NFAs as you say) should be the same as DFAs for this purpose.
I think it's just a difference in the way we look at the token-matching
process. I see it as part of the state, so to say "we both begin and end
subexpression storage here" is to say, "only the token matched here will
be stored as the subexpression." This restricts you, though, and I may
find that in order to say:

(a)((b)c)

I need too much hair and want to put the markers elsewhere.... I'll know
that by next week.

> : State 4 (closure {$1 eq 'c'}):
> : match(token => true, target => 5)
> : match(token => false, target => trap)
>
> It's not the regex that would be evaluating for truth here. It only needs
> to make sure it can trap a failure.

The way it will trap failure will be to check the return value. If that
means that the continuation that it stores is a standard wrapper that
invokes the Perl-level closure and catches exceptions or some other
mechanism, that's fine with me, and I consider that a compiler issue to
be addressed later.

I'm pretty sure I want the return value to be an artificial token, as
the input is the only state you can change without affecting the
integrity of the FSA. (this is where finite state purists come hunting
for me with pitchforks and torches)

> : start subexp mark: boolean: start saving with this token
> : end subexp mark: boolean: stop saving after this token
> : closure: continuation: sub(RXContext $cx) returns(RXToken)
>
> We also have the <{ $subregex }> form that has to be able to evaluate
> a rule returned by a closure.

I think evaluating a sub-rule will be a fairly simple matter, and in
that particular case, the continuation stored in with the state would
wrap the sub-rule invocation and manage it for you. I have not stopped
to think about that too much, so I might be missing something big. For
example, I've not tried to think in terms of what style of recursion p6
rules will have to follow, but I'm 90% convinced that no matter how I
think about it, a solution can be engineered in the compiler given the
tools that I'm providing.

Again, though, that's mostly a compiler issue, and I don't want to get
too far into that (I'll probably post a long bit to p6c once I get the
core engine in place).

> : For a state to be equivalent to another state, it must satisfy:
> :
> : state1.terminal == state2.terminal
> : state1.start_mark == state2.start_mark
> : state1.end_mark == state2.end_mark
> : state1.closure == state2.closure
> :
> : and, of course, the normal recursive comparison of all transitions and
> : the states they target.
>
> I think you also have to consider *which* start and end you're saving here.
> Don't want to mix up your paren counts...

I don't think I do. It *will* be wasteful to have empty states just to
open and/or close subexpressions, but only in terms of the FSA
representation, not the generated code (which presumably would optimize
away the empty states where needed).

By empty I mean:

State 1 (start subexp mark):
match(token => all, target => 2, consume => false)
State 2 (start subexp mark):
match(token => all, target => 3, consume => false)
and so-on...

> Also have to allow saves to named variables.

I think that will require some more thought.... I'm not sure yet how to
deal with that case, but it will have to be tightly tied to numbered
subexpressions somehow.... hmmm.

> : This allows for transformation and minimization of the FSA as well as
> : arbitrary code execution. The obvious drawback is something that you've
> : pointed out several times: the code may be executed many times, not at
> : all or may be executed and then backtracked over (and then re-executed
> : or not). This makes side-effects dangerous and unpredictable, but I
> : think that's OK.
>
> What I have said is that the ordinary side-effect form (bare closure)
> must run at the canonical time for its location in the regex, even
> if that disables certain optimizations. The conditional form and
> the regex returning form may be optimized away.

I may not have followed you there. Are you saying that:

"abc" ~~ rx/ab{print "Hello"}b | ab{print "World"}c/

would print "Hello", "World", or "HelloWorld"? Or are you just referring
to the fact that although the NFA view of the world tells us that it
could say, "WorldHelloWorldHelloWorld", it will not?

> Um, that's what UNDO already is supposed to do. Closures embedded
> in rules logically return when backtracked over, and UNDO is already
> set up to roll back on failure.

Great... so I just have to have some vehicle to allow for that... I'll
be right with you ;-)

> Perl allows you to return only
>
> success (void context)
> failure exception
> subrule to match

I'm going to think about this bit some more too... I think I'm not yet
fully following, but that's because I'm slow ;-)

> : but the FSA itself will allow a closure to
> : "match" any token (as if it had been in the input stream) by returning
> : it (the token is consumed, but is translated to the empty string for
> : purposes of sub-expressions and other "memory").
>
> Matching one token seems relatively useless to me. A closure typically
> wants to match either 0 tokens or a sequence of tokens.

Hmm... I think we're saying different things.

I'm talking about how the closure communicates its desires to the FSA,
not how it thinks about the input. I don't think this is useful to
Perl-like regular expression languages, but it's a superset of what is
needed for any sort of flow control from inside a closure without
destroying the nature of the FSA (e.g. that it has a fixed and finite
set of states and transitions). It also allows for minimal special-cases
for Perl 6. Ideally I'd like Perl 6 to simply be a special-case of the
generic finite-pattern system I'm building, not the other way around.

> : Its important to keep
> : this separate in concept from actually causing the FSA to transition to
> : a target state (which the closure cannot do).
>
> The closure must be able to change the current "pos" at least. I don't
> think it necessarily has to be able to pick the next state. But it does
> have to have access to the current $0 match state object.

Hrm... I'm not sure what changing the current "pos" would mean... I can
see forcing backtracking, but that's implicit in failure (failure is the
wrong way to think of it... it's really injecting a zero-width character
that the existing FSA just happens to transition to a trap state on).

> Be sure to keep in touch with Patrick and Luke, who are coming at this
> from the direction of the Perl 6 parser.

As soon as I have a reference implementation that just does my simple
regexp syntax and has a simple, if stupid, back-end then I think I can
really help out the various compiler camps. Right now, I have no idea if
I have anything to offer (or if, for example, Perl 6 would be better off
rolling its own)... all of which is why I'm writing to p6l right now. I
consider this all a theoretical discussion of the value of my approach
visa vis Perl 6's definition of a rule.

--
☎ 781-324-3772
✉ a...@ajs.com
☷ http://www.ajs.com/~ajs

Aaron Sherman

unread,

Oct 27, 2004, 5:56:03 PM10/27/04

to Luke Palmer, Perl6 Language List

On Wed, 2004-10-27 at 14:01, Luke Palmer wrote:
> Aaron Sherman writes:

> > > /ab(c|b) {fail unless $1 eq 'c'}/
> >
> > Now, what does "fail" mean? I can think of two definitions:
> >
> > 1. proceed to trap state (backtracking then happens)
> > 2. exit (probably using an exception) the rule?
>

> The first one.

Great. In that case, that's what I expected. Number 2 might require my
having more control over exceptions than I really want to.

> > NDFAs (or NFAs as you say)
>

> Last time I checked, "nondeterministic" was a word all by itself. :-p

Tomato, tomatoe.... Yeah, I know, but I grew up hearing NDFA at school,
so I'm stuck. :-)

> > should be the same as DFAs for this purpose. I think it's just a
> > difference in the way we look at the token-matching process. I see it
> > as part of the state, so to say "we both begin and end subexpression
> > storage here" is to say, "only the token matched here will be stored
> > as the subexpression." This restricts you, though, and I may find that
> > in order to say:
> >
> > (a)((b)c)
> >
> > I need too much hair and want to put the markers elsewhere.... I'll know
> > that by next week.
>

> I've given this a lot of thought in the past. Way too much, actually.
> I think you'll find your DFA approach to fail when you deal with the
> generic case of perl 6 rules.

Hmmm, actually I thought Perl 6 rules would be trivial because almost
all of the non-regular-expression parts are so perfectly impossible to
model in an FSA. There's no ambiguity as to how that's done that I can
see, it simply has to be a call to external code.

Its in places where I think I have to try to cram difficult concepts
into the FSA that it gets hard.

> DFAs can't adequately model context free grammars. I think it's best
> for your brain if you leave it at that.

Certainly. Let's just stop here if you thought I was saying anything
differently. I *am* talking about using the API for building an FSA to
also describe where the non-finite bits live, but that's just an API,
and I take no responsibility for deciding how you generate what code to
handle the rest, I'm just telling you what bag to drop it in once you
do. This, of course, branches off into compiler topics that I didn't
want to get into yet. Let me finish the FSA API and the trivial regex
parser plugin before we start talking about how a compiler will use this
too deeply (and we can have that chat on p6c... I really do have to
remember to subscribe to p6c).

> Perl 6 also allows conditionals within rules. So if you have an
> alternation:
>
> / [ <( cond_1() )> | <( cond_2() )> | <( cond_3() )> ] /
>
> You're aware that you need to make those conditions orthogonal, right?
> You can only proceed to one of those states if you're sure that it's the
> _only_ one that matches. It turns out that this is going to take as
> much time _all the time_ as the NFA takes in its worst case.

Well, most of that's a matter for the back-end to deal with, and the
relatively optimization-free back end I'm planning for a start will
output code which does exactly what you describe. Obviously if Perl
places restrictions on how optimizations can be applied to such cases,
then it will need its own back-end or a generic back-end which has some
sort of tagging that turns off optimizations, all of which is fine. None
of that touches how we store and represent the FSA before it's compiled
to Parrot.

> DFAs are a bad idea for Perl 6.

You seem to have taken a left turn somewhere. Let's get our assumptions
above cleared up before you start leaping to the conclusion that what
I'm in the middle of won't work.

> On the other hand, optimizing sufficiently simple parts of regular
> expressions to DFAs and embedding them in a more general system is a
> pretty good idea, IMO.

I think that's what I said.