Rules and hypotheticals: continuations versus callbacks

Matthijs Van Duin

unread,

Mar 18, 2003, 2:55:15 PM3/18/03

to perl6-l...@perl.org

A quick note before I begin: the stuff in this email isn't just an
implementation detail - it impacts the language too, that's why I'm
posting it here. Should I cross-post to perl6-internals ? (I'm not
really familiar with that list yet)

I've recently spent thought on backtracking into rules, and the exact
nature of hypothetical variables etc. The two systems for backtracking
into a subrule that are described below are the best I can think of right
now, but maybe I'm completely missing something and is something much
simpler possible - in which case I'd love to hear about it ofcourse. :-)

My main questions are:

Is there a simpler system I'm overlooking?
Which of the two systems would you prefer if speed isn't the issue?
Which system is likely to run faster on parrot?

and maybe also:
What is the current plan?

although I got the impression earlier that there isn't any yet for invoking
subrules :-)

Anyway, I will use the following grammar for examples:
rule foo { a }
rule bar { a+ }
rule quux { ab }
rule test { [ <foo> | <bar> ] <quux> }

==== Mechanism 1 -- Continuations ====

Continuations can be used to reset the "state of the world" to the
previous backtracking point.

Various ways can be imagined to pass around the continuation. I picked
one that seems fairly clean to me and doesn't create any more
continuations than strictly necessary.

One thing I'll need to explain is what 'let' means in this system: it
makes sure the variable is restored if a continuation is invoked that was
created before the variable was hypothesized. This generalization means
you can hypothesize any variable you can temporize.

OK, let's look at how the rule 'test' could be implemented using
continuations. Note that I'm not paying attention to optimization at this
point.

method test (Continuation ?&backtrack is rw) {
&backtrack or mark(&backtrack) or return;
$_ = .new;
if (mark(&backtrack)) {
let $.{foo} = .foo(&backtrack);
} elsif (mark(&backtrack)) {
let $.{bar} = .bar(&backtrack);
} else {
backtrack;
}
let $.{quux} = .quux(&backtrack);
return $_;
}

where mark is a utility sub defined as something like:

sub mark (Continuation &backtrack is rw) {
my &cc = callcc { $_ } or return 0;
let &backtrack = &cc.assuming(undef);
return 1;
}

Let's see how this would match on 'aaab':

0. A new state object is created (Ignore the first line, it's for later)

1. The first mark hypothesizes &backtrack.

2. $.{foo} is hypothesized. foo matches 'a', and since it doesn't do any
backtracking, it leaves &backtrack alone.

3. $.{quux} is hypothesized. quux fails, it calls backtrack:
a. $.{quux} is de-hypothesized.
b. $.{foo} is de-hypothesized.
c. &backtrack is de-hypothesized.
d. inside the first mark undef is returned from callcc. mark returns false.

4. The second mark hypothesizes &backtrack.

5. $.{bar} is hypothesized. bar matches 'aaa' and hypothesizes &backtrack

6. $.{quux} is hypothesized. quux fails, it calls backtrack:
a. $.{quux} is de-hypothesized
b. inside bar, it backtracks to match 'aa'. bar returns again

7. $.{quux} is hypothesized. quux matches, leaves &backtrack alone.

Note that the &backtrack remains hypothesized after test completes. Let's
say test is followed by <fail> and see that happens:

8. fail calls backtrack
a. $.{quux} is de-hypothesized
b. inside bar, it backtracks to match 'a'. bar returns again

9. $.{quux} is hypothesized. quux fails, it calls backtrack:
a. $.{quux} is de-hypothesized
b. inside bar, &backtrack is de-hypothesized. bar calls backtrack
c. $.{bar} is de-hypothesized
d. &backtrack is de-hypothesized
e. inside the second mark undef is returned from callcc. mark returns false.

10. backtrack is called, causing backtracking into whatever was before test.

This only leaves the issue of how to deal with the top-level, where the
continuation is omitted. The magical first line will in that case create
a continuation which will simply return from the rule if the match fails.
If the top-level match succeeds then the &backtrack variable disappears
into thin air, and with it all backtracking information (the continuations
and de-hypotheticalization info).

Note that the user can ofcourse choose to retain the backtracking info,
and use it later to cause backtracking into the match after it has completed:

if Grammar.rule($string, my &backtrack) {
...
if i_dont_like_this_match {
backtrack; # try another
}
}

==== Mechanism 2 -- Callbacks ====

The second mechanism is a bit more mundane. The idea is that every rule
will get passed a closure that's called to match whatever comes next. If
the "whatever comes next" fails, the rule can backtrack and call the
closure again.

This time 'let' is exactly the same as 'temp' except hypothetical
variables are only allowed inside the dynamic scope of a subroutine with a
special trait (I've called it "hypothetical" here), and they're restored
only if that sub is left *normally*, not via an exception.

method test ($result is rw, Code &continue0) is hypothetical {
given (let $result = .new) {
my &continue1 = {
.quux($.{quux}, &continue0);
}
my &continue2 = {
.foo($.{foo}, &continue1);
.bar($.{bar}, &continue1);
}
continue2;
}
}

You might say this is exactly opposite to using continuations: the passed
argument is used when the match is *successful*, and a return is a failure.
Other things become opposite as well: alternations can be put right under
each other while it's concatenation that needs extra work by chaining
closures, in opposite order!

Note that the optimizer might see that &continue2 isn't needed as variable
in this case, however it would still need it if the [ <foo> | <bar> ] were
preceded by something, so I'm showing it here in non-optimized form.
(continue1 can also be set to &.quux.assuming($.{quux}, &continue0), but I
don't know if that's faster)

Let's see how this would match on 'aaab':

1. enter hypothetical scope of test
2. hypothesize the result as a new state object
3. prepare the chain of closures
4. enter hypothetical scope of foo, hypothesize $.{foo}
5. foo: match failed, exit hypothetical scope (restore $.{foo})
6. enter hypothetical scope of bar, hypothesize $.{bar}
7. bar: match 'aaa', do callback to &continue1
8. enter hypothetical scope of quux, hypothesize $.{quux}
9. quux: match failed, exit hypothetical scope (restore $.{quux})
10. bar: match 'aa', do callback to &continue1
11. enter hypothetical scope of quux, hypothesize $.{quux}
12. quux: match 'ab', do callback to &continue0

Let's again look at what happens if <test> is followed by <fail>:

14. bar: &continue1 failed, match 'a', do callback to &continue1
15. enter hypothetical scope of quux, hypothesize $.{quux}
16. quux: match failed, exit hypothetical scope (restore $.{quux})
17. bar: &continue1 failed, exit hypothetical scope (restore $.{bar})
18. continue2 failed, exit hypothetical scope of test (restore $result)

But what if the match succeeds? The top-level requires much more work
here: the final &continue needs to throw some "match succeeded" exception
which needs to be catched by the top-level match and cause it to return
the state object, with all hypothetical vars intact ofcourse.

Having code that does the match cause explicit backtracking is possible
here too, but not as easy: it'll be necessary to become part of the call
chain, so effectively you'll need a temporary rule to do it. I certainly
can't think of any clean syntax to accomplish it.

==== Let's wrap it up

The solution using continuations has more inherent beauty: the rule-
methods have a logical structure that you can even explain without needing
to understand continuations. The inter-rule calling conventions are also
cleaner. The top-level handling is so trivial a single statement at the
beginning of reach rule can handle it, because of which { .subrule } works
inside rules without having to do anything ugly behind the scenes (which
*will* be necessary in the callback-system).

The callback system requires the rule-methods to have a weird internal
structure, which makes the rule compiler harder to write, and also makes
rules harder to debug (considering how large and involved grammars can
be, I expect that "debugging a rule" in perl 6 will be a much more common
occurrance than "debugging a regex" is in perl 5). It also requires
special handling at the start and completion of the match.

The callback system also uses heaps of closures.. that might negatively
affect speed.

Finally, the continuation system also gives 'let' interesting semantics
which may be useful outside of rules.

Basically, the continuation system has only one big drawback: it uses
continuations. I really have no idea how efficient those will be in
parrot. If using them makes rules significantly slower than speed will
probably have to win from cleanness and the callback system should be used.

Or, as I mentioned at the top, maybe I'm just thinking way too complex
and overlooking a simple and obvious system for backtracking into subrules.

So, comments? (see also the questions at the top of the email)

--
Matthijs van Duin -- May the Forth be with you!

Matthijs Van Duin

unread,

Mar 19, 2003, 4:05:44 AM3/19/03

to perl6-l...@perl.org, Luke Palmer

On Tue, Mar 18, 2003 at 09:28:43PM -0700, Luke Palmer wrote:
>Plan 1: Pass each rule a I<success> continuation (rather than a
>backtrack one), and have it just return on failure. The big
>difference between this and your example is that C<let>s are now
>implemented just like C<temp>s. Too bad C<let> needs non-regex
>behavior, too.

That's mechanism #2, not #1

You probably don't mean the word "continuation" here though, since
a continuation never returns, so once a rule would invoke the success
"continuation" it can't regain control except via another continuation

You probably simply mean a closure

>Plan 2: Call subrules as plain old subs and have them throw a
>backtrack exception on failure (or just return a failure-reporting
>value... same difference, more or less).

But.. say you have:

Would would this be implemented? When bar fails, it needs to backtrack
into foo, which has already returned. Are you saying every rule will be
an explicit state machine?

>This has the advantage that C<let> behaves consistently with the
>rest of Perl

What do you mean?

>I looked around in Parrot a little, and it seems like continuations
>are done pretty efficiently.

Yes, I noticed that do

Leopold Toetsch

unread,

Mar 19, 2003, 4:38:54 AM3/19/03

to Matthijs van Duin, perl6-l...@perl.org

Matthijs van Duin wrote:

> Which system is likely to run faster on parrot?

I would propose, estimate the ops you need and test it :)

E.g. call a continuation 1e6 times and communicate state with one global
(a lexical is probably the same speed, i.e. a hash lookup)

$ cat a.pasm
new P5, .PerlInt
set P5, 1000000
store_global "foo", P5
new P0, .Continuation
set_addr I0, endcont
set P0, I0
endcont:
find_global P4, "foo"
unless P4, done
dec P4
# store_global "foo", P4 --no need to store, P4 is a reflike thingy
invoke
done:
print "done\n"
end

$ time imcc -P a.pasm
done

real 0m0.881s

$ imcc -p a.pasm
done
OPERATION PROFILE

CODE OP FULL NAME CALLS TOTAL TIME AVG TIME
----- ----------------- ------- ---------- ----------
0 end 1 0.000001 0.000001
40 set_addr_i_ic 1 0.000001 0.000001
66 set_p_i 1 0.000001 0.000001
67 set_p_ic 1 0.000004 0.000004
226 unless_p_ic 1000001 0.595025 0.000001
276 dec_p 1000000 0.599946 0.000001
758 store_global_sc_p 1 0.000006 0.000006
760 find_global_p_sc 1000001 1.037922 0.000001
786 new_p_ic 2 0.000011 0.000005
819 invoke 1000000 0.914063 0.000001
883 print_sc 1 0.005299 0.005299
----- ----------------- ------- ---------- ----------
11 4000010 3.152280 0.000001

So you can estimate that the more "heavy" opcodes take about 1us, more
"light" vtable functions are ~double that speed with the slow, profiling
core. CGP (or JIT) is 3 - 4 times faster.

-O3 compiled parrot, Athlon 800, i386/linux

leo

Matthijs Van Duin

unread,

Mar 19, 2003, 7:01:28 AM3/19/03

to perl6-l...@perl.org, Leopold Toetsch

On Wed, Mar 19, 2003 at 10:38:54AM +0100, Leopold Toetsch wrote:
>I would propose, estimate the ops you need and test it :)

Hmm, good point

Or even better.. I should just implement both examples and benchmark them;
they're simple enough and the ops are available.

I guess it's time to familiarize myself with pasm :)

Matthijs Van Duin

unread,

Mar 19, 2003, 7:19:24 AM3/19/03

to perl6-l...@perl.org

On Wed, Mar 19, 2003 at 01:01:28PM +0100, Matthijs van Duin wrote:
>On Wed, Mar 19, 2003 at 10:38:54AM +0100, Leopold Toetsch wrote:
>>I would propose, estimate the ops you need and test it :)
>
>Hmm, good point
>
>Or even better.. I should just implement both examples and benchmark them;
>they're simple enough and the ops are available.

except I forgot entirely about "let"

however the implementation "let" will have impact on the performance of both
systems.. oh well, I'll just have to estimate like you said :-)

Matthijs Van Duin

unread,

Mar 19, 2003, 8:31:06 AM3/19/03

to perl6-l...@perl.org

On Wed, Mar 19, 2003 at 10:38:54AM +0100, Leopold Toetsch wrote:

>I would propose, estimate the ops you need and test it :)

I haven't completed testing yet, however it's becoming clear to me that
this is likely to be a pointless effort

There are so many variables that can affect performance here that the
results I may find in these tests are unlikely to have any relation to
the performance of rules in practice.

1. making continuations affects the performance of *other* code (COW)
2. the "let" operation is missing and all attempts to fake it are silly
3. to really test it, I'd need to make subrules full subroutines, but then
the performance difference will probably disappear in the overhead of all
other stuff. To test I'd need large, realistic patterns; and I'm
certainly not in the mood to write PIR for them manually.

And it appears that on my machine continuations and garbage collection have
a quarrel, which also makes testing problematic.

I guess the only way to find out is to implement both systems and compare
them using a large test set of realistic grammars. Or ofcourse just
implement it using continuations (system #1), since the speed difference
probably isn't gonna be huge anyway.

Here is my test program for continuation and the results on my machine:

# "aaab" ~~ / ^ [ a | a* ] ab <fail> /

set I5, 1000
sweepoff # or bus error
collectoff # or segmentation fault

begin:
set S0, "aaab"
set I0, 0

new P0, .Continuation
set_addr I1, second
set P0, I1

rx_literal S0, I0, "a", backtrack
branch third

second:
new P0, .Continuation
set_addr I1, fail
set P0, I1

deeper:
rx_literal S0, I0, "a", third
save P0 # normally hypothesize
new P0, .Continuation
set_addr I1, unwind
set P0, I1
branch deeper
unwind:
dec I0 # normally de-hypothesize
restore P0 # normally de-hypothesize

third:
rx_literal S0, I0, "ab", backtrack
sub I0, 2 # normally de-hypothesize

backtrack:
invoke

fail:
dec I5
if I5, begin
end

OPERATION PROFILE

CODE OP FULL NAME CALLS TOTAL TIME AVG TIME

----- ------------ ------- ---------- ----------
0 end 1 0.000029 0.000029
40 set_addr_i_ic 5000 0.024928 0.000005
46 set_i_ic 1001 0.010573 0.000011
60 set_s_sc 1000 0.005717 0.000006
66 set_p_i 5000 0.016201 0.000003
213 if_i_ic 1000 0.002848 0.000003
274 dec_i 4000 0.011390 0.000003
370 sub_i_ic 1000 0.004227 0.000004
675 save_p 3000 0.192309 0.000064
682 restore_p 3000 0.246457 0.000082
719 branch_ic 4000 0.012216 0.000003
770 sweepoff 1 0.000014 0.000014
772 collectoff 1 0.000003 0.000003
786 new_p_ic 5000 0.179403 0.000036
819 invoke 5000 0.026285 0.000005
962 rx_literal_s_i_sc_ic 10000 0.054260 0.000005
----- ------------ ------- ---------- ----------
16 48004 0.786861 0.000016

iBook; PPC G3; 700 Mhz

Dan Sugalski

unread,

Mar 19, 2003, 10:40:02 AM3/19/03

to Matthijs van Duin, perl6-l...@perl.org, Luke Palmer

At 10:05 AM +0100 3/19/03, Matthijs van Duin wrote:
>But.. say you have:
>
><foo> <bar>
>
>Would would this be implemented? When bar fails, it needs to
>backtrack into foo, which has already returned. Are you saying
>every rule will be an explicit state machine?

By compile-time interpolation. <foo> isn't so much a subroutine as a
macro. For this to work, if we had:

foo: \w+?
bar: [plugh]{2,5}

then what the regex engine *really* got to compile would be:

(\w+?) ([plugh]{2,5})

with names attached to the two paren groups. Treating them as actual
subroutines leads to madness, continuations don't quite work, and
coroutines could pull it off if we could pass data back into a
coroutine on reinvocation, but...

We do, after all, want this fast, right?
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Matthijs Van Duin

unread,

Mar 19, 2003, 10:52:52 AM3/19/03

to perl6-l...@perl.org

On Wed, Mar 19, 2003 at 10:40:02AM -0500, Dan Sugalski wrote:
>By compile-time interpolation. <foo> isn't so much a subroutine as a
>macro. For this to work, if we had:
>
> foo: \w+?
> bar: [plugh]{2,5}
>
>then what the regex engine *really* got to compile would be:
>
> (\w+?) ([plugh]{2,5})
>
>with names attached to the two paren groups. Treating them as actual
>subroutines leads to madness,

Ehm, Foo.test cannot inline Foo.foo since it may be overridden:

grammar Foo {
rule foo { \w+? }
rule bar { [plugh]{2,5} }
rule test { <foo> <bar> }
}

grammar Bar is Foo {
rule foo { <alpha>+? }
}

What you say is only allowed if I put "is inline" on foo.

>continuations don't quite work

Care to elaborate on that? I'd say they work fine

>We do, after all, want this fast, right?

Ofcourse, and we should optimize as much as we can - but not optimize
*more* than we can. Rules need generic backtracking semantics, and that's
what I'm talking about. Optimizations to avoid the genericity of these
backtracking semantics is for later.

Dan Sugalski

unread,

Mar 19, 2003, 11:09:01 AM3/19/03

to perl6-l...@perl.org

At 4:52 PM +0100 3/19/03, Matthijs van Duin wrote:
>On Wed, Mar 19, 2003 at 10:40:02AM -0500, Dan Sugalski wrote:
>>By compile-time interpolation. <foo> isn't so much a subroutine as
>>a macro. For this to work, if we had:
>>
>> foo: \w+?
>> bar: [plugh]{2,5}
>>
>>then what the regex engine *really* got to compile would be:
>>
>> (\w+?) ([plugh]{2,5})
>>
>>with names attached to the two paren groups. Treating them as
>>actual subroutines leads to madness,
>
>Ehm, Foo.test cannot inline Foo.foo since it may be overridden:
>
>grammar Foo {
> rule foo { \w+? }
> rule bar { [plugh]{2,5} }
> rule test { <foo> <bar> }
>}
>
>grammar Bar is Foo {
> rule foo { <alpha>+? }
>}
>
>What you say is only allowed if I put "is inline" on foo.

At the time I run the regex, I can inline things. There's nothing
that prevents it. Yes, at compile time it's potentially an issue,
since things can be overridden later, but that's going to be
relatively rare, and can be dealt with by selective recompilation.

By the time the regex is actually executed, it's fully specified. By
definition if nothing else--you aren't allowed to selectively
redefine rules in the middle of a regex that uses those rules. Or,
rather, you can but the update won't take effect until after the end
of the regex, the same way that you can't redefine a sub you're in
the middle of executing. (And yes, I'm aware that if you do that
you'll pick up the new version if you recursively call, but that
won't work with regexes)

>>continuations don't quite work
>
>Care to elaborate on that? I'd say they work fine

There's issues with hypothetical variables and continuations. (And
with coroutines as well) While this is a general issue, they come up
most with regexes.

>>We do, after all, want this fast, right?
>
>Ofcourse, and we should optimize as much as we can - but not
>optimize *more* than we can. Rules need generic backtracking
>semantics, and that's what I'm talking about.

No. No, in fact they don't. Rules need very specific backtracking
semantics, since rules are fairly specific. We're talking about
backtracking in regular expressions, which is a fairly specific
generality. If you want to talk about a more general backtracking
that's fine, but it won't apply to how regexes backtrack.

Matthijs Van Duin

unread,

Mar 19, 2003, 11:38:02 AM3/19/03

to perl6-l...@perl.org, Dan Sugalski

On Wed, Mar 19, 2003 at 11:09:01AM -0500, Dan Sugalski wrote:
>At the time I run the regex, I can inline things. There's nothing
>that prevents it. Yes, at compile time it's potentially an issue,
>since things can be overridden later,

OK, but that's not how you initially presented it :-)

>you aren't allowed to selectively redefine rules in the middle of a regex
>that uses those rules. Or, rather, you can but the update won't take
>effect until after the end

I don't recall having seen such a restriction mentioned in Apoc 5.

While I'm a big fan of optimization, especially for something like this,
I think we should be careful with introducing mandatory restrictions just
to aid optimization. ("is inline" will allow such optimizations ofcourse)

>There's issues with hypothetical variables and continuations. (And
>with coroutines as well) While this is a general issue, they come up
>most with regexes.

I'm still curious what you're referring to exactly. I've outlined possible
semantics for hypothetical variables in earlier posts that should work.

>>>We do, after all, want this fast, right?
>>
>>Ofcourse, and we should optimize as much as we can - but not
>>optimize *more* than we can. Rules need generic backtracking
>>semantics, and that's what I'm talking about.
>
>No. No, in fact they don't. Rules need very specific backtracking
>semantics, since rules are fairly specific. We're talking about
>backtracking in regular expressions, which is a fairly specific
>generality. If you want to talk about a more general backtracking
>that's fine, but it won't apply to how regexes backtrack.

My impression from A5 and A6 is that rules are methods. They're looked up
like methods, they can be invoked like methods, etc.

I certainly want to be able to write rules myself, manually, when I think
it's appropriate; and use these as subrules in other methods. Generic
backtracking semantics are needed for that, and should at least conceptually
also apply to normal rules.

When common sub-patterns are inlined, simple regexen will not use runtime
subrules at all, so the issue doesn't exist there - that covers everything
you would do with regexen in perl 5 for example.

When you do use real sub-rules, you're getting into the domain previously
held by Parse::RecDescent and the like. While these should ofcourse still
be as fast as possible, a tiny bit of overhead on top of regular regex is
understandable.

However, such overhead might not be even needed at all: whenever possible
optimizations should be applied, and rules are free to use special hacky
but fast calling semantics to subrules if they determine that's possible.
But I don't think a special optimization should be elevated to the official
semantics. I say, make generic semantics first, and then optimize the heck
out of it.

Leopold Toetsch

unread,

Mar 19, 2003, 11:35:19 AM3/19/03

to Matthijs van Duin, perl6-l...@perl.org

Matthijs van Duin wrote:

> sweepoff # or bus error
> collectoff # or segmentation fault

Please try :

/* set this to 1 for tracing the system stack and processor registers */
#define TRACE_SYSTEM_AREAS 1

in dod.c (works for me).

Though I don't know, if processor registers on PPC gets traced by this
(it might not stand optimization if not).

Code is not that deeply looked at, that we can savely turn off stack
tracing yet.

leo

Jonathan Scott Duff

unread,

Mar 19, 2003, 11:41:55 AM3/19/03

to Dan Sugalski, perl6-l...@perl.org

On Wed, Mar 19, 2003 at 11:09:01AM -0500, Dan Sugalski wrote:
> By the time the regex is actually executed, it's fully specified. By
> definition if nothing else--you aren't allowed to selectively
> redefine rules in the middle of a regex that uses those rules. Or,
> rather, you can but the update won't take effect until after the end
> of the regex, the same way that you can't redefine a sub you're in
> the middle of executing. (And yes, I'm aware that if you do that
> you'll pick up the new version if you recursively call, but that
> won't work with regexes)

Are you implying that

$fred = rx/fred/;
$string ~~ m:w/ <$fred> { $fred = rx/barney/; } rubble /

won't match "barney rubble"?

-Scott
--
Jonathan Scott Duff
du...@cbi.tamucc.edu

Sean O'Rourke

unread,

Mar 19, 2003, 11:44:17 AM3/19/03

to du...@pobox.com, Dan Sugalski, perl6-l...@perl.org

On Wed, 19 Mar 2003, Jonathan Scott Duff wrote:
> Are you implying that
>
> $fred = rx/fred/;
> $string ~~ m:w/ <$fred> { $fred = rx/barney/; } rubble /
>
> won't match "barney rubble"?

Or, worse, that

$fred = rx/fred/;
$string ~~ m:w/ { $fred = rx/barney/; } <$fred> rubble /

won't, either?

/s

Dan Sugalski

unread,

Mar 19, 2003, 12:22:14 PM3/19/03

to du...@pobox.com, perl6-l...@perl.org

Potentially, no. What, then, should happen if you do:

$barney = rx/barney/;
$string = "barney rubble";
$string ~~ m:w/ <$barney> { $barney = rx/fred/; } rubble /;

The regex shouldn't match, since you've invalidated part of the match
in the middle.

I can potentially see constructs of the form <$var> be taken as
indirect rule invocations and their dispatch left to runtime,
complete with the potential for bizarre after-the-fact invalidations,
but as regex rules in the regex stream rather than as generic code.

Dan Sugalski

unread,

Mar 19, 2003, 12:35:19 PM3/19/03

to Matthijs van Duin, perl6-l...@perl.org

At 5:38 PM +0100 3/19/03, Matthijs van Duin wrote:
>On Wed, Mar 19, 2003 at 11:09:01AM -0500, Dan Sugalski wrote:
>>At the time I run the regex, I can inline things. There's nothing
>>that prevents it. Yes, at compile time it's potentially an issue,
>>since things can be overridden later,
>
>OK, but that's not how you initially presented it :-)

Then I wasn't clear enough, sorry. This is perl -- the state of
something at compile time is just a suggestion as to how things
ultimately work. The state at the time of is the only thing that
really matters, and I shortcut.

>>you aren't allowed to selectively redefine rules in the middle of a
>>regex that uses those rules. Or, rather, you can but the update
>>won't take effect until after the end
>
>I don't recall having seen such a restriction mentioned in Apoc 5.

I'll nudge Larry to add it explicitly, but in general redefinitons of
code that you're in the middle of executing don't take effect
immediately, and it's not really any different for regex rules than
for subs.

>While I'm a big fan of optimization, especially for something like
>this, I think we should be careful with introducing mandatory
>restrictions just to aid optimization. ("is inline" will allow such
>optimizations ofcourse)

Actually, we should be extraordinarily liberal with the application
of restrictions at this phase. It's far easier to lift a restriction
later than to impose it later, and I very much want to stomp out any
constructs that will force slow code execution. Yes, I may lose, but
if I don't try...

My job, after all, is to make it go fast. If you want something
that'll require things to be slow then I don't want you to have it. :)

>>There's issues with hypothetical variables and continuations. (And
>>with coroutines as well) While this is a general issue, they come
>>up most with regexes.
>
>I'm still curious what you're referring to exactly. I've outlined
>possible semantics for hypothetical variables in earlier posts that
>should work.

The issue of hypotheticals is complex.

>>>>We do, after all, want this fast, right?
>>>
>>>Ofcourse, and we should optimize as much as we can - but not
>>>optimize *more* than we can. Rules need generic backtracking
>>>semantics, and that's what I'm talking about.
>>
>>No. No, in fact they don't. Rules need very specific backtracking
>>semantics, since rules are fairly specific. We're talking about
>>backtracking in regular expressions, which is a fairly specific
>>generality. If you want to talk about a more general backtracking
>>that's fine, but it won't apply to how regexes backtrack.
>
>My impression from A5 and A6 is that rules are methods. They're
>looked up like methods, they can be invoked like methods, etc.

They aren't methods, though. They're not code in general, they're
regex constructions in specific. Because they live in the symbol
table and in some cases can be invoked as subs/methods doesn't make
them subs or methods, it makes them regex constructs with funky
wrappers if you want to use them in a non-regex manner.

>I certainly want to be able to write rules myself, manually, when I
>think it's appropriate; and use these as subrules in other methods.
>Generic backtracking semantics are needed for that, and should at
>least conceptually also apply to normal rules.

No, no it shouldn't. Rule are rules for regexes, they are *not* subs.
If you want generic backtracking to work, then there can't be any
difference between:

rule foo { \w+ }

and
sub foo { ... }

but there must be. With rules as regex constructs the semantics are
much simpler. If we allow rules to be arbitrary code not only do we
have to expose a fair amount of the internals of the regex engine to
the sub so it can actually work on the stream and note its position
(which is fine, I can do that) we also need to be able to pause foo
in the middle and jump back in while passing in parameters of some
sort. Neither continuations nor standard coroutines are sufficient in
this instance, since the reinvocation must *both* preserve the state
of the code at the time it exited but also pass in an indication as
to what the sub should do. For example, if the foo sub was treated as
a rule and we backtrack, should it slurp more or less?

If rules are just plain regex rules and not potentially arbitrary
code, the required semantics are much simpler.

Then there's the issue of being able to return continuations from
within arbitrary unnamed blocks, since the block in this:

$foo ~~ m:w/<alpha> {...} <number>/;

should be able to participate in the backtracking activities if we're
not drawing a distinction between rules and generic code. (Yeah, the
syntax is wrong, but you get the point)

Ultimately the question is "How do you backtrack into arbitrary code,
and how do we know that the arbitrary code can be backtracked into?"
My answer is we don't, but I'm not sure how popular that particular
answer is.

>When common sub-patterns are inlined, simple regexen will not use
>runtime subrules at all, so the issue doesn't exist there - that
>covers everything you would do with regexen in perl 5 for example.

All rules will essentially be inlined at regex invocation time. There
may be some indirect rule dispatch, but in a very simplistic form
compared to sub dispatch.

>I say, make generic semantics first, and then optimize the heck out of it.

That's fine. I disagree. :)

Simon Cozens

unread,

Mar 19, 2003, 12:47:20 PM3/19/03

to perl6-l...@perl.org

d...@sidhe.org (Dan Sugalski) writes:
> you aren't allowed to selectively redefine
> rules in the middle of a regex that uses those rules.

This is precisely what a macro does.

--
"How should I know if it works? That's what beta testers are for. I only
coded it."
(Attributed to Linus Torvalds, somewhere in a posting)

Dan Sugalski

unread,

Mar 19, 2003, 12:52:13 PM3/19/03

to perl6-l...@perl.org

At 5:47 PM +0000 3/19/03, Simon Cozens wrote:
>d...@sidhe.org (Dan Sugalski) writes:
>> you aren't allowed to selectively redefine
>> rules in the middle of a regex that uses those rules.
>
>This is precisely what a macro does.

Not once execution starts, no.

Simon Cozens

unread,

Mar 19, 2003, 12:54:20 PM3/19/03

to perl6-l...@perl.org

d...@sidhe.org (Dan Sugalski) writes:
> At 5:47 PM +0000 3/19/03, Simon Cozens wrote:
> >d...@sidhe.org (Dan Sugalski) writes:
> >> you aren't allowed to selectively redefine
> >> rules in the middle of a regex that uses those rules.
> >
> >This is precisely what a macro does.
>
> Not once execution starts, no.

Compilation's just execution of a regex, albeit the Perl6::Grammar::program
regex, and that regex will need to be modified while it's in operation in
order to pick up macro "is parsed" definitions and apply them to the rest
of what it's parsing.

--
* DrForr digs around for a fresh IV drip bag and proceeds to hook up.
<dngor> Coffee port.
<DrForr> Firewalled, like everything else around here.

Dan Sugalski

unread,

Mar 19, 2003, 1:09:16 PM3/19/03

to perl6-l...@perl.org

At 5:54 PM +0000 3/19/03, Simon Cozens wrote:
>d...@sidhe.org (Dan Sugalski) writes:
>> At 5:47 PM +0000 3/19/03, Simon Cozens wrote:
>> >d...@sidhe.org (Dan Sugalski) writes:
>> >> you aren't allowed to selectively redefine
>> >> rules in the middle of a regex that uses those rules.
>> >
>> >This is precisely what a macro does.
>>
>> Not once execution starts, no.
>
>Compilation's just execution of a regex, albeit the Perl6::Grammar::program
>regex, and that regex will need to be modified while it's in operation in
>order to pick up macro "is parsed" definitions and apply them to the rest
>of what it's parsing.

Ah, damn, I wasn't thinking far enough out. I'm not sure it'll work
quite like that, with a single call to the regex engine that spits
out everything in one go. More likely it'll be a set of iterative
calls to the engine that terminate at natural sequence points,
potentially with recursive calls into the parsing regex.

Simon Cozens

unread,

Mar 19, 2003, 1:13:47 PM3/19/03

to perl6-l...@perl.org

d...@sidhe.org (Dan Sugalski) writes:
> >Compilation's just execution of a regex, albeit the Perl6::Grammar::program
> >regex, and that regex will need to be modified while it's in operation in
> >order to pick up macro "is parsed" definitions and apply them to the rest
> >of what it's parsing.
>
> Ah, damn, I wasn't thinking far enough out.

This, you see, is precisely why some of us started work last year on a
regular expression engine which could handle having its expressions
rewritten during the match... ;)

--
Last week I forgot how to ride a bicycle. -- Steven Wright

Matthijs Van Duin

unread,

Mar 19, 2003, 2:04:15 PM3/19/03

to perl6-l...@perl.org

On Wed, Mar 19, 2003 at 12:35:19PM -0500, Dan Sugalski wrote:
>Then I wasn't clear enough, sorry. This is perl -- the state of
>something at compile time is just a suggestion as to how things
>ultimately work.

Yes, hence my surprise about actually inlining stuff, luckily that was
just a misunderstanding :-)

>I'll nudge Larry to add it explicitly, but in general redefinitons of
>code that you're in the middle of executing don't take effect
>immediately, and it's not really any different for regex rules than
>for subs.

Ah, but we're not redefining the sub that's running, but the subs it's
about to call. That works for subs, and Simon Cozens already pointed out
we certainly also need it for rules :-)

>Actually, we should be extraordinarily liberal with the application
>of restrictions at this phase. It's far easier to lift a restriction
>later than to impose it later,

This is perl 6, we can add a new restriction next week

>and I very much want to stomp out any constructs that will force slow code
>execution. Yes, I may lose, but if I don't try...

You're absolutely right, and optimization is very important to me too. But
you can't *only* look at the speed of constructs, or we'll be coding in C
or assembly :-)

We'll need to meet in the middle..

>The issue of hypotheticals is complex.

Well, I'm a big boy, I'm sure I can handle it. Are you even talking about
semantics or implementation here? Because I already gave my insights on
semantics, and I have 'em in my head for implementation too but I should
probably take those to perl6-internals instead.

>Ultimately the question is "How do you backtrack into arbitrary code,
>and how do we know that the arbitrary code can be backtracked into?"
>My answer is we don't, but I'm not sure how popular that particular
>answer is.
>

>>I say, make generic semantics first, and then optimize the heck out of it.
>
>That's fine. I disagree. :)

Now that Simon Cozens has established that sub-rules need to be looked up
at runtime, I think we can both be happy:

As far as I can see, a rule will consist of two parts: The wrapper that
will handle stuff when the rule is invoked as a normal method, perhaps
handle modifiers, handle searches for unanchored matches, setup the state,
etc; and the actual body that does a match at the current position.

Now, what you want is that subrule-invocation goes directly from body to
body, skipping the overhead of method invocation to the wrapper. I say,
when you look up the method for a subrule, check if it is a regular rule
and if so call its body directly, and otherwise use the generic mechanism.

I'll get my lovely generic semantics with the direct body->body calling
hidden away as an optimization details, and I get the ability to write
rule-methods in perl code.

You still get your low-overhead body->body calls and therefore the speed
you desire (hopefully). Since you need to fetch the rule body anyway,
there should be no extra overhead: where you'd normally throw an error
(non-rule invoked as subrule) you'd switch to generic invocation instead.

Sounds like a good deal? :-)

Dan Sugalski

unread,

Mar 19, 2003, 2:31:58 PM3/19/03

to perl6-l...@perl.org

At 8:04 PM +0100 3/19/03, Matthijs van Duin wrote:
>On Wed, Mar 19, 2003 at 12:35:19PM -0500, Dan Sugalski wrote:
>>I'll nudge Larry to add it explicitly, but in general redefinitons
>>of code that you're in the middle of executing don't take effect
>>immediately, and it's not really any different for regex rules than
>>for subs.
>
>Ah, but we're not redefining the sub that's running, but the subs
>it's about to call. That works for subs, and Simon Cozens already
>pointed out we certainly also need it for rules :-)

Well, I'm not 100% sure we need it for rules. Simon's point is
well-taken, but on further reflection what we're doing is subclassing
the existing grammar and reinvoking the regex engine on that
subclassed grammar, rather than redefining the grammar actually in
use. The former doesn't require runtime redefinitions, the latter
does, and I think we're going to use the former scheme.

>>Actually, we should be extraordinarily liberal with the application
>>of restrictions at this phase. It's far easier to lift a
>>restriction later than to impose it later,
>
>This is perl 6, we can add a new restriction next week

We can't add them once we hit betas. I'd as soon add them now, rather
than later.

>>and I very much want to stomp out any constructs that will force
>>slow code execution. Yes, I may lose, but if I don't try...
>
>You're absolutely right, and optimization is very important to me
>too. But you can't *only* look at the speed of constructs, or we'll
>be coding in C or assembly :-)
>
>We'll need to meet in the middle..

Well, not to be too cranky (I'm somewhat ill at the moment, so I'll
apologize in advance) but... no. No, we don't actually have to,
though if we could that'd be nice.

>>The issue of hypotheticals is complex.
>
>Well, I'm a big boy, I'm sure I can handle it. Are you even talking
>about semantics or implementation here? Because I already gave my
>insights on semantics, and I have 'em in my head for implementation
>too but I should probably take those to perl6-internals instead.

Semantics. Until Larry's nailed down what he wants, there are issues
of reestablishing hypotheticals on continuation reinvocation,
flushing those hypotheticals multiple times, what happens to
hypotheticals when you invoke a continuation with hypotheticals in
effect, what happens to hypotheticals inside of coroutines when you
establish them then yield out, and when hypotheticals are visible to
other threads.

I read through your proposal (I'm assuming it's the one that started
this thread) and it's not sufficient unless I missed something, which
I may have.

>>Ultimately the question is "How do you backtrack into arbitrary
>>code, and how do we know that the arbitrary code can be backtracked
>>into?" My answer is we don't, but I'm not sure how popular that
>>particular answer is.
>>
>>>I say, make generic semantics first, and then optimize the heck out of it.
>>
>>That's fine. I disagree. :)
>
>Now that Simon Cozens has established that sub-rules need to be
>looked up at runtime,

Well....

>Sounds like a good deal? :-)

At the moment, no. It seems like a potentially large amount of
overhead for no particular purpose, really. I don't see any win in
the regex case, and you're not generalizing it out to the point where
there's a win there. (I can see where it would be useful in the
general case, but we've come nowhere near touching that)

Matthijs Van Duin

unread,

Mar 19, 2003, 3:14:57 PM3/19/03

to perl6-l...@perl.org, Dan Sugalski

On Wed, Mar 19, 2003 at 02:31:58PM -0500, Dan Sugalski wrote:
>Well, I'm not 100% sure we need it for rules. Simon's point is
>well-taken, but on further reflection what we're doing is subclassing
>the existing grammar and reinvoking the regex engine on that
>subclassed grammar, rather than redefining the grammar actually in
>use. The former doesn't require runtime redefinitions, the latter
>does, and I think we're going to use the former scheme.

That's not the impression I got from Simon

It would also be rather annoying.. think about balanced braces etc, take
this rather contrieved, but valid example:

$x ~~ m X {
macro ... yada yada yada;
} X;

It seems to be that you're really inside a grammar rule when that macro
is defined. Otherwise you'd have to keep a lot of state outside the
parser to keep track of such things, which is exactly what perl grammars
were supposed to avoid I think.

>We can't add them once we hit betas. I'd as soon add them now, rather
>than later.

Well, I'd rather not add it at all :-)

>>We'll need to meet in the middle..
>
>Well, not to be too cranky (I'm somewhat ill at the moment, so I'll
>apologize in advance) but... no. No, we don't actually have to,
>though if we could that'd be nice.

OK, strictly speaking that's true, but I think we can

>Semantics. Until Larry's nailed down what he wants, there are issues
>of reestablishing hypotheticals on continuation reinvocation,

They should be though, if a variable was hypothesized when the continuation
was taken, then it should be hypothesized when that continuation is invoked.

>flushing those hypotheticals multiple times,

Not idea what you mean

>what happens to hypotheticals when you invoke a continuation with
>hypotheticals in effect,

Basically de-hypothesize all current hypotheticals, and re-hypothesize
the ones that were hypothesized when the continuation was taken. You can
ofcourse optimize this by skipping the "common ancestry", if you know
what I mean

>what happens to hypotheticals inside of coroutines when you
>establish them then yield out,

This follows directly from the implementation of coroutines: the first
yield is a normal return, so if you hypothesize $x before that it'll stay
hypothesized. if you then hypothesize $y outside the coroutine and call
the coroutine again, $y will be de-hypothesized. If the coroutine then
hypothesizes $z and yields out, $z will be de-hypothesized and $y
re-hypothesized. $x will be unaffected by all this

>and when hypotheticals are visible to other threads.

I haven't thought of that, but to be honest I'm not a big fan of preemptive
threading anyway. Cooperative threading using continuations is probably
faster, has no synchronization issues. And the behavior of hypotheticals
follows naturally there (you can use 'let' or 'temp' to create thread-
local variables in that case)

>I read through your proposal (I'm assuming it's the one that started
>this thread) and it's not sufficient unless I missed something, which
>I may have.

Also look at Sean O'Rourke's reply and my reply to that; it contains
additional info.

>>Sounds like a good deal? :-)
>
>At the moment, no. It seems like a potentially large amount of
>overhead for no particular purpose, really.

I have to admit I don't know the details of how your system works, but
what I had in mind didn't have any extra overhead at all -- under the
(apparently still debatable) assumption that you need to look up subrules
at runtime anyway.

You do agree that if that is possible, is *is* a good deal?

>I don't see any win in the regex case, and you're not generalizing it out
>to the point where there's a win there. (I can see where it would be
>useful in the general case, but we've come nowhere near touching that)

We have come near it.. backtracking is easy using continuations, and we can
certainly have rules set the standard for the general case.

Dan Sugalski

unread,

Mar 19, 2003, 3:46:50 PM3/19/03

to Matthijs van Duin, perl6-l...@perl.org

At 9:14 PM +0100 3/19/03, Matthijs van Duin wrote:
>On Wed, Mar 19, 2003 at 02:31:58PM -0500, Dan Sugalski wrote:
>>Well, I'm not 100% sure we need it for rules. Simon's point is
>>well-taken, but on further reflection what we're doing is
>>subclassing the existing grammar and reinvoking the regex engine on
>>that subclassed grammar, rather than redefining the grammar
>>actually in use. The former doesn't require runtime redefinitions,
>>the latter does, and I think we're going to use the former scheme.
>
>That's not the impression I got from Simon
>
>It would also be rather annoying.. think about balanced braces etc,
>take this rather contrieved, but valid example:
>
>$x ~~ m X {
> macro ... yada yada yada;
> } X;
>
>It seems to be that you're really inside a grammar rule when that
>macro is defined.

Right. Macro definition ends, you subclass off the parser object,
then immediately call into it, and it eats until the end of the
regex, at which point it exits and so does the parent, for lack of
input, and the resulting parse tree is turned to bytecode and
executed.

> Otherwise you'd have to keep a lot of state outside the parser to
>keep track of such things, which is exactly what perl grammars were
>supposed to avoid I think.

You, as a user-level programmer, don't have to track the state. The
parser code will, but that's not a big deal.

>>>We'll need to meet in the middle..
>>
>>Well, not to be too cranky (I'm somewhat ill at the moment, so I'll
>>apologize in advance) but... no. No, we don't actually have to,
>>though if we could that'd be nice.
>
>OK, strictly speaking that's true, but I think we can
>
>Semantics. Until Larry's nailed down what he wants, there are issues
>of reestablishing hypotheticals on continuation reinvocation,
>
>They should be though, if a variable was hypothesized when the
>continuation was taken, then it should be hypothesized when that
>continuation is invoked.

Should they? Does hypotheticalization count as data modification (in
which case it shouldn't) or control modification (in which case it
should), and do you restore the hypothetical value at the time the
continuation was taken or just re-hypotheticalize the variables?
(Which makes continuations potentially more expensive as you need to
then save off more info so on invocation you can restore the
hypothetical state)

What about co-routines, then? And does a yield from a coroutine count
as normal or abnormal exit for pushing of hypothetical state outward,
or doesn't it count at all?

>>flushing those hypotheticals multiple times,
>
>Not idea what you mean

I hypotheticalize the variables. I then take a continuation. Flow
continues normally, exits off the end normally, hypothetical values
get pushed out. I invoke the continuation, flow continues, exits
normally. Do I push the values out again?

>
>what happens to hypotheticals when you invoke a continuation with
>hypotheticals in effect,
>
>Basically de-hypothesize all current hypotheticals,

How? Successfully or unsuccessfully? Does it even *count* as an exit
at all if there's a pending continuation that could potentially exit
the hypotheticalizing block later?

>>what happens to hypotheticals inside of coroutines when you
>>establish them then yield out,
>
>This follows directly from the implementation of coroutines: the
>first yield is a normal return, so if you hypothesize $x before that
>it'll stay hypothesized. if you then hypothesize $y outside the
>coroutine and call the coroutine again, $y will be de-hypothesized.

Why? That doesn't make much sense, really. If a variable is
hypotheticalized outside the coroutine when I invoke it, the
coroutine should see the hypothetical variable. But what about yields
from within a couroutine that's hypotheticalized a variable? That's
neither a normal nor an abnormal return, so what happens?

>If the coroutine then hypothesizes $z and yields out, $z will be
>de-hypothesized and $y
>re-hypothesized. $x will be unaffected by all this

Yech. I don't think that's the right thing to do.

>>and when hypotheticals are visible to other threads.
>
>I haven't thought of that, but to be honest I'm not a big fan of
>preemptive threading anyway.

Doesn't matter whether you like it or not, they're a fact that must
be dealt with. (And scare up a dual or better processor machine and
I'll blow the doors off a cooperative threading scheme,
synchronization overhead or not)

>>I read through your proposal (I'm assuming it's the one that started this

>>>Sounds like a good deal? :-)
>>
>>At the moment, no. It seems like a potentially large amount of
>>overhead for no particular purpose, really.
>
>I have to admit I don't know the details of how your system works,
>but what I had in mind didn't have any extra overhead at all --
>under the (apparently still debatable) assumption that you need to
>look up subrules at runtime anyway.
>
>You do agree that if that is possible, is *is* a good deal?

No. Honestly I still don't see the *point*, certainly not in regards
to regular expressions and rules. The hypothetical issues need
dealing with in general for threads, coroutines, and continuations,
but I don't see how any of this brings anything to rules for the
parsing engine.

The flow control semantics the regex/parser needs to deal with are
small and simple. I just don't see the point of trying to make it
more complex.

>>I don't see any win in the regex case, and you're not generalizing
>>it out to the point where there's a win there. (I can see where it
>>would be useful in the general case, but we've come nowhere near
>>touching that)
>
>We have come near it.. backtracking is easy using continuations, and
>we can certainly have rules set the standard for the general case.

We're not backtracking with continuations, though.

Matthijs Van Duin

unread,

Mar 19, 2003, 6:21:34 PM3/19/03

to perl6-l...@perl.org, Dan Sugalski

On Wed, Mar 19, 2003 at 03:46:50PM -0500, Dan Sugalski wrote:
>Right. Macro definition ends, you subclass off the parser object,
>then immediately call into it

>...

>You, as a user-level programmer, don't have to track the state. The
>parser code will, but that's not a big deal.

OK, I suppose that works although that still means you're moving the
complexity from the perl implementation to its usage: in this case, the
perl 6 parser which is written in perl 6 -- but I can well imagine other
people want to do the same, and they'll have to do a similar hack.

I really don't like that, perl normally moves the complexity away from the
programmer and into perl.

>>They should be though, if a variable was hypothesized when the
>>continuation was taken, then it should be hypothesized when that
>>continuation is invoked.
>
>Should they? Does hypotheticalization count as data modification (in
>which case it shouldn't) or control modification (in which case it
>should),

Isn't that the whole point of hypotheses in perl 6? You talk about
"successful" and "unsuccessful" de-hypothesizing, and about abormals exits
etc.. you seem to have a much complexer model of hypotheses than what's
in my head.

I could be entirely missing things ofcourse, but I haven't seen any
evidence to that yet.

To me, what 'let' does is temporize a variable except it doesn't get
restored if you leave the scope, but only if you use a continuation to
go back to a point where it wasn't hypothesized yet.

When the last continuation taken *before* the hypothesis is gone, so is
the old version and thus the hypothesized variable becomes permanent.

The behavior regarding coroutines followed naturally from this, and so does
the behavior inside regexen if they use the "continuation" semantics for
backtracking -- which is what I'm suggesting.

This leave only behavior regarding preemptive threads, which is actually
very easy to solve: disallow hypothesizing shared variables -- it simply
makes no sense to do that. Now that I think of it, temporizing shared
variables is equally bad news, so this isn't something new.

>(Which makes continuations potentially more expensive as you need to
>then save off more info so on invocation you can restore the
>hypothetical state)

Actually, I think 'let' can handle this.. it's only invocation of
continuations that will become more expensive because it needs to deal with
the hypothesized variables

>What about co-routines, then? And does a yield from a coroutine count
>as normal or abnormal exit for pushing of hypothetical state outward,
>or doesn't it count at all?

Your terminology gets rather foreign to me at this point. Assuming a
co-routine is implemented using continuations, their behavior follows
directly from the description above, and I think the resulting behavior
looks fine. I don't see why people would hypothesize variables inside a
co-routine anyway.

>I hypotheticalize the variables. I then take a continuation. Flow
>continues normally, exits off the end normally, hypothetical values
>get pushed out. I invoke the continuation, flow continues, exits
>normally. Do I push the values out again?

If it ends normally, the variable isn't de-hypothesized at all. Also, the
continuation was created *after* you hypothesized the variable, so when you
invoke it nothing will happen to the variable.

>How? Successfully or unsuccessfully? Does it even *count* as an exit
>at all if there's a pending continuation that could potentially exit
>the hypotheticalizing block later?

You're making 0% sense to me, apparently because your mental model of
hypothesizing differs radically from mine.

>Why? That doesn't make much sense, really.

Probably the same problem in opposite direction :-)

>(And scare up a dual or better processor machine and I'll blow the doors
>off a cooperative threading scheme, synchronization overhead or not)

Ofcourse, for CPU-intensive applications that spread their computation over
multiple threads on a multi-processor machine, you'll certainly need
preemptive multithreading.

When exactly is the last time you wrote such an application in perl? :-)

Seriously though, I think in the common case cooperative threading is likely
to be superior.. it has low overhead, it should have faster context switch
time, you have no synchronization issues, and you can mostly avoid the need
for explicit yielding: in many applications threads will regularly block on
something anyway (which will yield to another thread)

But anyway, this is getting off-topic.. I'll save it for later. Regex first

>No. Honestly I still don't see the *point*, certainly not in regards
>to regular expressions and rules. The hypothetical issues need
>dealing with in general for threads, coroutines, and continuations,
>but I don't see how any of this brings anything to rules for the
>parsing engine.
>
>The flow control semantics the regex/parser needs to deal with are
>small and simple. I just don't see the point of trying to make it
>more complex.

More complex ?!

What I'm suggesting is a simple definition of hypothetical variables which
makes backtracking easy to do using continuation in the general case, and
therefore automatically also by rules.

Rules can then probably be optimized to *avoid* explicitly use continuation
to the point where they have the speed you demand, while still keeping up
appearances of the simple continuation-based backtracking semantics.

>We're not backtracking with continuations, though.

I'm suggesting you do "officially", but optimize it away behind the scenes.
This leaves nice and simple semantics for backtracking in general, while
in fact their implementation inside rules is simple and efficient.

I have to admit I'm not 100% sure this is possible, but give me some time
to try to work out the details :-)

Simon Cozens

unread,

Mar 19, 2003, 7:00:05 PM3/19/03

to perl6-l...@perl.org

p...@nubz.org (Matthijs Van Duin) writes:
> OK, I suppose that works although that still means you're moving the
> complexity from the perl implementation to its usage: in this case,
> the perl 6 parser which is written in perl 6

No, I don't believe that's what's happening. My concern is that at some
point, there *will* need to be a bootstrapped parser which is written in
some low level language, outputting Parrot bytecode, and it *will* need
to be able to reconfigure itself mid-match.

I think. I can't remember why I'm so convinced of this, and I'm too tired
to think it through with examples right now, and I might be wrong anyway,
but at least I can be ready with a solution if it proves necessary. :)

--
"There is no safe investment. To love at all is to be vulnerable. ...
The only place outside Heaven where you can be perfectly safe from all the
dangers and pertubations of love is Hell."
-CS Lewis "The Four Loves"

Dan Sugalski

unread,

Mar 20, 2003, 9:24:33 AM3/20/03

to perl6-l...@perl.org

At 12:00 AM +0000 3/20/03, Simon Cozens wrote:
>p...@nubz.org (Matthijs Van Duin) writes:
>> OK, I suppose that works although that still means you're moving the
>> complexity from the perl implementation to its usage: in this case,
>> the perl 6 parser which is written in perl 6
>
>No, I don't believe that's what's happening. My concern is that at some
>point, there *will* need to be a bootstrapped parser which is written in
>some low level language, outputting Parrot bytecode, and it *will* need
>to be able to reconfigure itself mid-match.
>
>I think. I can't remember why I'm so convinced of this, and I'm too tired
>to think it through with examples right now, and I might be wrong anyway,
>but at least I can be ready with a solution if it proves necessary. :)

You may well be right--I don't think so, but I'm not at my clearest
either. I don't see that it'll be needed outside the initial
bootstrap parser if at all, so I'm not too worried. (And the
low-level language for it will probably be perl 5, since I'd far
rather build something with a Parse::RecDescent grammar than a
hand-nibbler in C)

Matthijs Van Duin

unread,

Mar 20, 2003, 12:24:16 PM3/20/03

to perl6-l...@perl.org, Austin Hastings

On Thu, Mar 20, 2003 at 08:49:28AM -0800, Austin Hastings wrote:
>--- Matthijs van Duin <p...@nubz.org> wrote:
>> you seem to have a much complexer model of hypotheses
>> than what's in my head.
>

>The complex model is right -- in other words, if hypotheses are to be a
>first-class part of the language then they must interoperate with all
>the other language features.
>
>(lots of explanation here)

You're simply expanding on the details your complex model - not on the
need for it in the first place.

I'll see if I can write some details/examples of my model later, and show
how it interacts with various language features in a simple way.

>> This leave only behavior regarding preemptive threads, which is
>> actually very easy to solve: disallow hypothesizing shared
>> variables -- it simply makes no sense to do that. Now that
>> I think of it, temporizing shared variables is equally bad news,
>> so this isn't something new.

I just realize there's another simple alternative: make it cause the
variable become thread-local for that particular thread.

>Hypothesize all the new values you wish, then pay once to get a mux,
>then keep all the data values while you've got the mux. Shrinks your
>critical region

You're introducing entirely new semantics here, and personally I think
you're abusing hypotheses, although I admit in an interesting and
potentially useful way. I'll spend some thought on that.

>My experience has been that when anyone says "I don't see why anyone
>would ...", Damian immediately posts an example of why.

No problem since it works fine in my model (I had already mentioned that
earlier) - I just said *I* don't see why anyone would.. :-)

>So, stop talking about rexen. When everyone groks how continuations
>should work, it'll fall out.

rexen were the main issue: Dan was worried about performance

>(And if you reimplement the rexengine using continuations and outperform
>Dan's version by 5x or better, then we'll have another Geek Cruise to
>Glacier Bay and strand Dan on an iceberg. :-)

I don't intend to outperform him.. I intend to get the same performance
with cleaner, simpler and more generic semantics.

But as I said in my previous post.. give me some time to work out the
details.. maybe I'll run into fatal problems making the whole issue moot :)

BTW, you say "reimplement" ? Last time I checked hypothetic variables
weren't implemented yet, let alone properly interact with continuations.
Maybe it's just sitting in someone's local version, but until I have
something to examine, I can't really compare its performance to my system.

Austin Hastings

unread,

Mar 20, 2003, 11:49:28 AM3/20/03

to Matthijs van Duin, perl6-l...@perl.org, Dan Sugalski

--- Matthijs van Duin <p...@nubz.org> wrote:

> On Wed, Mar 19, 2003 at 03:46:50PM -0500, Dan Sugalski wrote:
>
> >>They should be though, if a variable was hypothesized when the
> >>continuation was taken, then it should be hypothesized when that
> >>continuation is invoked.
> >
> >Should they? Does hypotheticalization count as data modification (in
>
> >which case it shouldn't) or control modification (in which case it
> >should),
>
> Isn't that the whole point of hypotheses in perl 6? You talk about
> "successful" and "unsuccessful" de-hypothesizing, and about abormals
> exits etc.. you seem to have a much complexer model of hypotheses
> than what's in my head.

The complex model is right -- in other words, if hypotheses are to be a

first-class part of the language then they must interoperate with all
the other language features.

So there's several ways out of a sub. What does a hypo do in all these
cases?

let $x - Declares a hypothesis. I think this should be a verb.
(Which is to say, a function instead of a storage
class.)
(This in turn suggests that primitive types can't be
hypothesized, although arrays thereof could be.)

fail - Pretty strongly suggests that the hypo is ignored.
Also suggests continuation/backtracking behavior. Do we
really know what fail does? See discard, below.

discard $x - (suggested) Obvious keyword for failing one single
hypothesis.

keep $x - Suggests the value is made permanent. This should be
a verb. C<keep> with no args should just keep every
hypo in the current <fillintheSCOPE>.

return - This is unclear. On the one hand, it's a transfer of
control and I think that C<let> and C<keep> are data,
not control, modifiers. On the other hand, it's one of
the normal ways to leave a block, and could be argued
either way.

(Also: remember the bad old days of C<my> vs. C<local>.
I propose that hypos fail by default - to keep the
distinction clear in the minds of newbies.)

throw - Exceptions unwind the call stack. If "let" is a
control action, it should undo. If let is a data
action, it should not. To me, C<let> is an explicit
action taken against a variable, and should not be
undone by this. (Of course, if unwinding the call stack
causes the variable to go out of scope, it's not an
issue.)

continuation: goto
- Again: continuations are transfers of control, not
data. If "let" is a control action, continuations
will have to know if they are transferring "back up
the stack" or if they are transferring to some new,
never-before-seen (on the stack) place. I think that
C<let> should be a data action, in which case this
doesn't affect the hypothesis.

generators: yield
- Same issues, although the argument can be made that
since you can resume a generator, the hypo should
be confined to the extant-but-inactive scope.

> To me, what 'let' does is temporize a variable except it doesn't get
> restored if you leave the scope, but only if you use a continuation
> to go back to a point where it wasn't hypothesized yet.

Yes and no.

I agree it shouldn't get restored, see above. However, continuations
don't touch data. So a global variable that has been hypo'ed should
(IMO) remain so after the continuation.

Frankly, if you mix the two, it's YOUR job to understand the
ramifications.

> When the last continuation taken *before* the hypothesis is gone, so
> is the old version and thus the hypothesized variable becomes
> permanent.

I disagree. Proposal and acceptance should be explicit actions.

> The behavior regarding coroutines followed naturally from this, and
> so does the behavior inside regexen if they use the "continuation"
> semantics for backtracking -- which is what I'm suggesting.
>
> This leave only behavior regarding preemptive threads, which is
> actually very easy to solve: disallow hypothesizing shared
> variables -- it simply makes no sense to do that. Now that
> I think of it, temporizing shared variables is equally bad news,
> so this isn't something new.

Why?

If you constrain hypotheses to the thread (making it a control action
instead of a data action) this could be a way to get cheap MUXing.

Hypothesize all the new values you wish, then pay once to get a mux,
then keep all the data values while you've got the mux. Shrinks your

critical region:

{: is synchronized($mux)
keep all;
}

OTOH, if you are really using threads well, then your app may construct
a hypothesis based on user input, and the math and visualizer and gui
threads may all need to work in that hypothetical space.

> >(Which makes continuations potentially more expensive as you need to
> >then save off more info so on invocation you can restore the
> >hypothetical state)
>
> Actually, I think 'let' can handle this.. it's only invocation of
> continuations that will become more expensive because it needs to
> deal with the hypothesized variables

Only true if hypo is a control action. See above - make it a data
action.

> >What about co-routines, then? And does a yield from a coroutine
> > count as normal or abnormal exit for pushing of hypothetical
> > state outward, or doesn't it count at all?
>
> Your terminology gets rather foreign to me at this point. Assuming a
> co-routine is implemented using continuations, their behavior follows
> directly from the description above, and I think the resulting
> behavior looks fine.

> I don't see why people would hypothesize
> variables inside a co-routine anyway.

My experience has been that when anyone says "I don't see why anyone
would ...", Damian immediately posts an example of why. Unless it's
Damian, in which case Simon, Larry, or Dan usually counterpost.

As such, I've become accustomed to substutiting "I don't see why anyone
would want to." with "Sounds okay to me." -- I learn less, but my head
doesn't hurt as much. :-)

> >I hypotheticalize the variables. I then take a continuation. Flow
> >continues normally, exits off the end normally, hypothetical values
> >get pushed out. I invoke the continuation, flow continues, exits
> >normally. Do I push the values out again?
>
> If it ends normally, the variable isn't de-hypothesized at all.
> Also, the continuation was created *after* you hypothesized the
> variable, so when you invoke it nothing will happen to the variable.

I think the point is that the "hypothetical values get pushed out"
means that end-of-scope := acceptance-of-hypothesis. Regardless, the
hypos get pushed out. The continuation resumes *IN THE LIFESPAN* of the
hypothesis.

If "let" is a control action, then the hypothesis is re-engaged.
If "let" is a data action, it isn't.

I vote for isn't. (Which means you're writing questionable code. But
you didn't know, so I'll forgive you.)

Aside: FYI, the new ISO C ('99) standard allows dynamically declared
array sizing, but prohibits "goto" from crossing into or outo the
lifespan of such a variable (goto within, or goto outside, but no
crossing the lines). We could make such a rule for hypotheticals, but
frankly I think that would be lame.

> >How? Successfully or unsuccessfully? Does it even *count* as an exit
> >at all if there's a pending continuation that could potentially exit
> >the hypotheticalizing block later?
>
> You're making 0% sense to me, apparently because your mental model of
> hypothesizing differs radically from mine.

Matthijs: Think about closures.

Dan: Does it count as an exit when a closure exits? Continuations, as I
think you've pointed out a long time ago, translate into scratchpadding
the stack. (stackpads?)

> >No. Honestly I still don't see the *point*, certainly not in regards
> >to regular expressions and rules. The hypothetical issues need
> >dealing with in general for threads, coroutines, and continuations,
> >but I don't see how any of this brings anything to rules for the
> >parsing engine.
> >
> >The flow control semantics the regex/parser needs to deal with are
> >small and simple. I just don't see the point of trying to make it
> >more complex.
>
> More complex ?!
>
> What I'm suggesting is a simple definition of hypothetical variables
> which makes backtracking easy to do using continuation in the
> general case, and therefore automatically also by rules.
>
> Rules can then probably be optimized to *avoid* explicitly use
> continuation to the point where they have the speed you demand,
> while still keeping up appearances of the simple continuation-based
> backtracking semantics.

So, stop talking about rexen. When everyone groks how continuations
should work, it'll fall out. (And if you reimplement the rexengine

using continuations and outperform Dan's version by 5x or better, then
we'll have another Geek Cruise to Glacier Bay and strand Dan on an
iceberg. :-)

=Austin