As the analysis of test errors of the new reigster allocator has shown, we have a problem WRT register allocation. This problem isn't new, but as the allocator is more efficiently reusing registers (or reusing them in a different way) it's exposed again.
0) The register allocator isn't to blame, that looks really fine.
We have two kinds of problems one with exceptions and one with continuations. As exceptions are continuations, we could also define it as the same problem. Anyway, the usage of exceptions is a bit different. Catch blocks are local labels only AFAIK in HLL. Parrot allows currently a Sub too, but that might be bogus.
1) Exceptions (see t/op/gc_14.imc)
We have a typical usage showing the problem like:
n = x ... # code that can reuse register of n newsub eh, .Exception_Handler, catch set_eh eh # code that migth through n = y clear_eh catch: print n
Now the register allocator just doesn't know that there exists a control flow graph edge from anywhere in the try block to the catch label. The register of variable n from the first statement is therefore not preserved, as it's reassigned in the try block. So the catch block can get the wrong value (neither x nor y) of n.
A simble solution could look like:
n = x new eh, .Exception_Handler set_eh eh, catch # code that might through n = y clear_eh catch:
with the changed opcode C<set_eh (in PMC, labelconst INT).
This would suffice to make a real branch target out of the catch label and the register allocator should do the right thing. You can test that by inserting "unless $P0, catch" after "set_eh" in the mentioned test. (there is some code in imcc that specially treats newsub, but the newsub ocpode isn't the branch - basically everything below C<set_eh> may branch to the catch label)
(Did I already mention that implict behavior like using a register or branching is bad)
2) Continuations (t/op/gc_13.imc [Hi Piers])
Again we have a hidden branch done by a Continuation, or better a loop. The control flow of the main program is basically represented by this conventional code fragment:
except that the gotos are performed by backtracking via the continuations. So we don't have these loop labels and the continuation continues at the next opcode after the invocation of choose() and not at the shown position above.
So again, the register allocator doesn't see that there is a branch, the variable's arr2 register is clobbered, in this case by the fail closure.
As we never know, if a subroutine captures the return continuation and creates such loops like above, we have in the absence of other syntax actually no chance to hold any variable in a register as far as I can see now. We'd have just to force using lexicals for all vars, except for leaf subroutines (that don't call other subs) or subs that just call one sub (they can't create loops).
Another idea is to create edges from all 1..n function calls in a sub to the previos 0..n-1 calls, so that basically all possible loops done via possible continuations are represented in the CFG.
> Again we have a hidden branch done by a Continuation, or better a > loop. The control flow of the main program is basically represented by > this conventional code fragment:
> arr1=[...]; arr2=[...] > loop1: x = choose(arr1)
> except that the gotos are performed by backtracking via the > continuations. So we don't have these loop labels and the continuation > continues at the next opcode after the invocation of choose() and not > at the shown position above.
> So again, the register allocator doesn't see that there is a branch, > the variable's arr2 register is clobbered, in this case by the fail > closure.
> As we never know, if a subroutine captures the return continuation and > creates such loops like above, we have in the absence of other syntax > actually no chance to hold any variable in a register as far as I can > see now. We'd have just to force using lexicals for all vars, except > for leaf subroutines (that don't call other subs) or subs that just > call one sub (they can't create loops).
> Another idea is to create edges from all 1..n function calls in a sub > to the previos 0..n-1 calls, so that basically all possible loops done > via possible continuations are represented in the CFG.
That analysis looks correct to me--any time you have a function call, the subsequent op might be a branch target, if there is a subsequent function call.
> We'd have just to force using lexicals for all vars
Having variable-size register sets would solve this, since you could have fixed assignments of variables to registers, and not have to suffer the overhead of moving data between registers and lexical pads over-and-over. Well, it doesn't really "solve" it--just makes it workable.
On Sat, 13 Nov 2004 10:52:38 -0800, Jeff Clites <jcli...@mac.com> wrote: > On Nov 13, 2004, at 8:53 AM, Leopold Toetsch wrote: > > We'd have just to force using lexicals for all vars
> Having variable-size register sets would solve this, since you could > have fixed assignments of variables to registers, and not have to > suffer the overhead of moving data between registers and lexical pads > over-and-over. Well, it doesn't really "solve" it--just makes it > workable.
I like the idea of mandating lexicals vars. This would also eliminate the need for spilling (I think), as the register allocator would only need to refetch the lexical rather than save it off somewhere to be restored later.
I get the feeling that this is equivalent to requiring exception handlers to be a locally defined closure, which is another way we could go about this.
Matt -- "Computer Science is merely the post-Turing Decline of Formal Systems Theory." -???
Matt Fowles <uberm...@gmail.com> wrote: > I like the idea of mandating lexicals vars. This would also eliminate > the need for spilling (I think), as the register allocator would only > need to refetch the lexical rather than save it off somewhere to be > restored later.
There are two issues: yes with refetch - no with store (probably). Lexicals and globals are references, so:
add Plex, Px, Py
stores already x+y in the variable lex. Well, that's a compiler issue and a reason to keep the current "destination exists" semantics of opcodes. (This usage doesn't cope with the generation of new values, though and morping "Undef" isn't a good solution)
> I get the feeling that this is equivalent to requiring exception > handlers to be a locally defined closure, which is another way we > could go about this.
Yes. That solves it. OTOH going all along with lexicals could be pretty inefficient.
On Sat, 13 Nov 2004 14:08:12 -0800, Jeff Clites <jcli...@mac.com> wrote: > That's oversimplifying a bit, but I feel like those are the core issues > (stemming from the observation of Leo's that continuations in effect > give all variables a lifetime that extends to their entire block, in > most cases).
This does not give all variables extended lifetimes. It only gives variables that are used in the exception handler such a lifetime. Thus temporaries used in calculations can be safely overwritten. Perhaps we should try adding the control flow arcs to the basic block analysis and see how the allocator handles it then...
Matt -- "Computer Science is merely the post-Turing Decline of Formal Systems Theory." -???
> On Sat, 13 Nov 2004 14:08:12 -0800, Jeff Clites <jcli...@mac.com> > wrote: >> That's oversimplifying a bit, but I feel like those are the core >> issues >> (stemming from the observation of Leo's that continuations in effect >> give all variables a lifetime that extends to their entire block, in >> most cases).
> This does not give all variables extended lifetimes. It only gives > variables that are used in the exception handler such a lifetime. > Thus temporaries used in calculations can be safely overwritten. > Perhaps we should try adding the control flow arcs to the basic block > analysis and see how the allocator handles it then...
Not all variables, but due to Leo's case (2), it should be all variables which are referenced after the first function call out of a subroutine, if there's a later function call; for instance, consider:
... foo(); a = 10; b = 12; ... use a and b bar(); ... never use a or b again c = 1; d = 2; ... use c and d baz(); ... never use c or d again zoo();
Normally, the lifetime of a and b would end at the call to bar, and that of c and d would end at the call to baz, but due to continuations, the call to zoo might resume at the op right after the call to foo (since the continuation created when calling foo may have been captured), so a,b,c,d have to persist at least until the last function call is made.
We could teach the allocator about these possibilities as you mentioned, and that would give us correct program behavior, but we know a priori that we'll have much higher register pressure that you might expect, so a different overall strategy might work better.
On Sat, 13 Nov 2004 17:35:02 -0800, Jeff Clites <jcli...@mac.com> wrote: > Not all variables, but due to Leo's case (2), it should be all > variables which are referenced after the first function call out of a > subroutine, if there's a later function call; for instance, consider:
> ... > foo(); > a = 10; > b = 12; > ... use a and b > bar(); > ... never use a or b again > c = 1; > d = 2; > ... use c and d > baz(); > ... never use c or d again > zoo();
> Normally, the lifetime of a and b would end at the call to bar, and > that of c and d would end at the call to baz, but due to continuations, > the call to zoo might resume at the op right after the call to foo > (since the continuation created when calling foo may have been > captured), so a,b,c,d have to persist at least until the last function > call is made.
Yes, but in the case of the continuation resuming after foo, the continuation should restore the frame to the point where it was taken. Thus all of the registers will be exactly as they were when the continuation was taken (i.e. in the correct place). Thus, it does not matter if we reuse a register later in the function, the continuation will restore the proper context for itself.
Exceptions handlers, on the other hand, are a different story, because anything that is used in the catch block must be kept in the correct place through out the entire try block. However, the allocator will figure this out for itself if we provide it the branch information. Another reasonable option is to mandate that the exception handler starts without any preconception about the register contents and fetches everything as needed. This means that only lexicals can be used in the handler, but it is a slightly softer requirement then only lexicals everywhere.
Matt -- "Computer Science is merely the post-Turing Decline of Formal Systems Theory." -???
Matt Fowles <uberm...@gmail.com> wrote: > Jeff~ > Yes, but in the case of the continuation resuming after foo, the > continuation should restore the frame to the point where it was taken. > Thus all of the registers will be exactly as they were when the > continuation was taken (i.e. in the correct place).
Yes, but Jeff's example wasn't really reflecting the problem.
The case I've shown looks like:
.local pmc arr1, arr2 arr1=[1,3,5] arr2=[1,5,9] x = choose(arr1) y = choose(arr2) # arr2 never used beyond $P0 = ...
At the last line the register allocator happily reuses the register that arr2 had for $P0. That's totally legal in the absence of continuations. So it doesn't suffice that the register frame is restored and that variables are in the same place.
The effect of the continuation is the creation of a loop in the CFG. Life time of variables and thus register allocation is different within loops.
> Exceptions handlers, on the other hand, are a different story, because > anything that is used in the catch block must be kept in the correct > place through out the entire try block.
No they aren't really different. The presence of an exception handler (and the code for installing such a handler) is more visible in PIR. That's the only difference.
But again up to the above continuation example. The scheme source of the relevant parts is this:
In that source it's obvious that the continuations of choose are captured in the local closures created by the lambda. So it's probably just a lack of the compiler (and a lack of PIR syntax) to express the relevant information that the call to choose has the possible side-effect of being resumed just after the created "invokecc" opcode.
So from a HLL point of view that's all visible and clear.
<cite author="Piers">And, damnit, making a full continuation isn't something a programmer should do lightly. </cite>
And of course an HLL compiler can't and doesn't emit some code that captures a continuation silently and w/o any reason.
So what to do:
1) Extending register frame size ad infinitum and never reuse a Parrot register will definitely blow caches.
2) Generating an edge for every call to every previous calls will blow the CFG and cause huge pressure on the register allocator. A lot of spilling will be the result.
3) Using lexicals all over is slower (but HLL compilers will very likely emit code that does exactly that anyway). So the problem may not be a real problem anyway. We just know that an optimizer can't remove the refetch of lexicals in most of the subroutines.
4) Having an explicit syntax construct "(call-with-current-continuation " that expresses verbatim, what's going on, like e.g. with a reserved word placed as a label:
>As the analysis of test errors of the new reigster allocator has >shown, we have a problem WRT register allocation. This problem isn't >new, but as the allocator is more efficiently reusing registers (or >reusing them in a different way) it's exposed again.
We don't really have that much of a problem. What we have is just something more simple -- the target of a continuation marks the start of a basic block. That means that we have to assume everything we don't get handed back from the function's dirty and should be refetched.
Or, alternately, if we declare that the top half of the register set is preserved on function call and return we can assume that the PMCs and values in there are acceptable for use, though any that map to lexicals or globals ought to be refetched, since we have the possibility that the names have been rebound.
I'm perfectly fine in declaring that this is *only* legitimate in mainline code, and that code generators don't have to deal with the possibility that vtable or MMD function code has rebound names. -- Dan
--------------------------------------it's like this------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk
At 11:01 PM +0100 11/13/04, Leopold Toetsch wrote:
>Matt Fowles <uberm...@gmail.com> wrote: > > I get the feeling that this is equivalent to requiring exception >> handlers to be a locally defined closure, which is another way we >> could go about this.
>Yes. That solves it. OTOH going all along with lexicals could be pretty >inefficient.
We're dealing with languages that pretty much mandate inefficiency in many ways. Since, for example, it's completely reasonable (well, likely at least) for a called sub to rebind lexicals in its parent, refetching's pretty much required. -- Dan
--------------------------------------it's like this------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk
> Since, for example, it's completely reasonable (well, likely at least) > for a called sub to rebind lexicals in its parent
What does that mean, exactly? It seems like that directly contradicts the meaning of "lexical". For instance, see Larry's comments from "Re: Why lexical pads" at September 25, 2004 10:01:42 PM PDT (the first of the 3 messages from him on that day).
On Sun, 14 Nov 2004 17:03:33 -0500, Dan Sugalski <d...@sidhe.org> wrote: > At 5:53 PM +0100 11/13/04, Leopold Toetsch wrote: > >As the analysis of test errors of the new reigster allocator has > >shown, we have a problem WRT register allocation. This problem isn't > >new, but as the allocator is more efficiently reusing registers (or > >reusing them in a different way) it's exposed again.
> We don't really have that much of a problem. What we have is just > something more simple -- the target of a continuation marks the start > of a basic block. That means that we have to assume everything we > don't get handed back from the function's dirty and should be > refetched.
I tend to agree that this is not such a problem. Basically, Parrot has various control flow possibilities that were not previously considered. (1) An exception can happen in a try block, so CFG arcs need to be added from statements within the try block to the catch. (2) Continuations, which I don't pretend to understand, can also flow in previously unconsidered directions. Those arcs need to be added to the CFG.
Alternately, for exceptions, it may be possible to circumvent that process, and just add symbolic interfenence to all vars in the try block with all vars in the catch block.
For continuations, however, it seems like those really are control flow arcs that need to be added. If analysis can trim a few of them, then that would be great, but if people are using continuations, maybe they shouldn't expect high performance, and extra register spilling may be another side effect.
>> Yes, but in the case of the continuation resuming after foo, the >> continuation should restore the frame to the point where it was taken. >> Thus all of the registers will be exactly as they were when the >> continuation was taken (i.e. in the correct place).
> Yes, but Jeff's example wasn't really reflecting the problem.
How come? (Not quibbling--just afraid I'm missing something.) It seems that even this function body shows the problem:
a = 1 foo() print a b = 10 return b
It would seem (w/o continuations) that b should be able to re-use a's register, but if it does then we'll print 10 instead of 1 "the second time".
> So what to do:
> 1) Extending register frame size ad infinitum and never reuse a Parrot > register will definitely blow caches.
> 2) Generating an edge for every call to every previous calls will blow > the CFG and cause huge pressure on the register allocator. A lot of > spilling will be the result.
> 3) Using lexicals all over is slower (but HLL compilers will very > likely > emit code that does exactly that anyway). So the problem may not be a > real problem anyway. We just know that an optimizer can't remove the > refetch of lexicals in most of the subroutines.
It seems that, in term of cache locality, the same problem is there with more registers v. spilling v. lexicals. That is, if you have 100 local variables whose lifetimes overlap (due to continuations), then you need 100 distinct memory locations to (repeatedly) access.
> 4) Having an explicit syntax construct "(call-with-current-continuation > " that expresses verbatim, what's going on, like e.g. with a reserved > word placed as a label:
> RESUMEABLE: x = choose(arr1)
I don't think that really helps either: If you have such a call, then all the frames higher up the stack also can "return multiple times", so they have the behavior, even w/o the label.
On the other hand, this Ruby code really bugs me (note: "$" variables in Ruby are globals):
def outer a = 0 strange() a = a + 1 print "a = ", a, "\n" a = "hello" print "a = ", a, "\n" end
outer()
$saved.call
% ruby continuation5.ruby a = 1 a = hello continuation5.ruby:8:in `+': failed to convert Fixnum into String (TypeError) from continuation5.ruby:8:in `outer' from continuation5.ruby:14
What bugs me is that "outer" gets an error when the continuation is invoked, because "the second time" strange() returns, "a" is a string and so you can't add 1 to it. But looking at the definition of "outer", you'd expect that you could never get such an error. (Without the line setting "a" to "hello", you get an infinite loop, printing increasing integers.)
> >>Yes, but in the case of the continuation resuming after foo, the > >>continuation should restore the frame to the point where it was taken. > >> Thus all of the registers will be exactly as they were when the > >>continuation was taken (i.e. in the correct place).
> >Yes, but Jeff's example wasn't really reflecting the problem.
> How come? (Not quibbling--just afraid I'm missing something.) It seems > that even this function body shows the problem:
> a = 1 > foo() > print a > b = 10 > return b
> It would seem (w/o continuations) that b should be able to re-use a's > register, but if it does then we'll print 10 instead of 1 "the second > time".
It can. You can't reuse the same PMC (if you're using PMCs), but you can reuse the register.
It all comes down to how the code is generated. I've done this in a project of mine, and it takes a little thought, but it works. When you take a continuation, you have to saveall before you take it, and restoreall at the point where the continuation is to resume.
This is the trick I used (modulo brain code rot--I haven't written PIR in a really long time):
saveall $P0 = new Continuation set_addr $P0, RESUME save $P0 restoreall restore $P0 # ... $P0 is your continuation
RESUME: restoreall # ...
On the other hand, this may be completely irrelavent, since I haven't been following the discussion.
Dan Sugalski <d...@sidhe.org> wrote: > At 5:53 PM +0100 11/13/04, Leopold Toetsch wrote: >>As the analysis of test errors of the new reigster allocator has >>shown, we have a problem WRT register allocation. This problem isn't >>new, but as the allocator is more efficiently reusing registers (or >>reusing them in a different way) it's exposed again. > We don't really have that much of a problem. What we have is just > something more simple -- the target of a continuation marks the start > of a basic block.
It is of course a new basic block. But setting the CFG correctly would need to create loops, that is an edge from all subcalls 1..n to previous subcalls 0..n-1. That could create big increases in CFG size.
> ... That means that we have to assume everything we > don't get handed back from the function's dirty and should be > refetched.
I'm fine with refetching lexicals, as - you say it - the may got rebound. But what about more static languages?
> Or, alternately, if we declare that the top half of the register set > is preserved on function call and return
These are preserved anyway - as well as the lower half. Please forget the upper/lower half notion coming from old savetop times. The failing gc_13 test is not failing because the register isn't preserved. It's failing because there is no indication that this register isn't reassignable due to the loop-effect of the continuation.
[ refetching lexicals/globals ]
> I'm perfectly fine in declaring that this is *only* legitimate in > mainline code, and that code generators don't have to deal with the > possibility that vtable or MMD function code has rebound names.
Yes, as said I'm fine too with that. Perl/Python will do that anyway. But what about other languages? Are we forcing these to be as dynamic as the primary HLL targets?
Bill Coffman <bill.coff...@gmail.com> wrote: > On Sun, 14 Nov 2004 17:03:33 -0500, Dan Sugalski <d...@sidhe.org> wrote: >> We don't really have that much of a problem. What we have is just >> something more simple -- the target of a continuation marks the start >> of a basic block. That means that we have to assume everything we >> don't get handed back from the function's dirty and should be >> refetched. > I tend to agree that this is not such a problem. Basically, Parrot > has various control flow possibilities that were not previously > considered. (1) An exception can happen in a try block, so CFG arcs > need to be added from statements within the try block to the catch.
I've already proposed a simple solution for the exception case:
The try block is starting exactly at this point, where the exception handler gets pushed onto the contol stack. By connecting the C<set_eh> opcode to the catch block with an edge in the CFG, we could easily prevent the reuse of registers located in front of the try block.
> (2) Continuations, which I don't pretend to understand, can also flow > in previously unconsidered directions. Those arcs need to be added to > the CFG.
As said: there are no invisible continuations taken. From an HLL point of view, it's very clear what is happening.
> For continuations, however, it seems like those really are control > flow arcs that need to be added. If analysis can trim a few of them, > then that would be great, but if people are using continuations, maybe > they shouldn't expect high performance, and extra register spilling > may be another side effect.
If you insert automatically these CFG edges you can't trim them down, there is no such analysis that could find points, where it isn't necessary. So we have the performance degradation for *all* subs, because w/o any compiler hints any subroutine invocation could act as a loop.
The backward branches are going to the opcode after the invoke. You'd have to connect sub 1..n to sub 0..n-1. This are 81 loops for calling 10 subroutines in a row. This kills for sure all efforts the register allocator might take, we'll end up with spilling a lot.
Luke Palmer <l...@luqui.org> wrote: > Jeff Clites writes: >> a = 1 >> foo() >> print a >> b = 10 >> return b
>> It would seem (w/o continuations) that b should be able to re-use a's >> register, but if it does then we'll print 10 instead of 1 "the second >> time". > It can. You can't reuse the same PMC (if you're using PMCs), but you > can reuse the register.
No. With the presence of a continuation the "print a" can be executed twice. If now C<b = 10> reuses a's register, it'll break.
> It all comes down to how the code is generated. I've done this in a > project of mine, and it takes a little thought, but it works. When you > take a continuation, you have to saveall before you take it, and > restoreall at the point where the continuation is to resume.
Please forget C<saveall>. This was the way to go before we had register frames.
> This is the trick I used (modulo brain code rot--I haven't written PIR > in a really long time): > saveall > $P0 = new Continuation > set_addr $P0, RESUME > save $P0 > restoreall > restore $P0
Sure. That's what we formerly used to do. Two memcpys á 640 bytes + 2 stack operations. The memcpys were killing performance.
This isn't needed anymore. A continuation does restore the register frame. In your code above you emit all possible code to do the rigth thing. I'm proposing a much simpler syntax:
RESUMABLE: foo() # code here might be executed more then once
Jeff Clites <jcli...@mac.com> wrote: > On Nov 14, 2004, at 3:03 AM, Leopold Toetsch wrote: >> Yes, but Jeff's example wasn't really reflecting the problem. > How come?
Your code didn't reuse C<a> after the call.
> ... It seems > that even this function body shows the problem:
Yes, that one is reflecting it.
> a = 1 > foo() > print a > b = 10 > return b > It would seem (w/o continuations) that b should be able to re-use a's > register, but if it does then we'll print 10 instead of 1 "the second > time".
Yep, if there is another call that uses the captured continuation of the foo() call and continues at "print a".
> It seems that, in term of cache locality, the same problem is there > with more registers v. spilling v. lexicals.
Not quite. We can't extend just one register set, we'd do that for all 4 kinds. You can not easily have 64 PMC registers and just 10 INTVALs. A lexical access is just an array lookup (at least if the compiler uses the indexed addressing of lexicals).
> ... That is, if you have 100 > local variables whose lifetimes overlap (due to continuations), then > you need 100 distinct memory locations to (repeatedly) access.
Sure. If the program is complex you need the storage anyway.
>> 4) Having an explicit syntax construct "(call-with-current-continuation >> " that expresses verbatim, what's going on, like e.g. with a reserved >> word placed as a label:
>> RESUMEABLE: x = choose(arr1) > I don't think that really helps either: If you have such a call, then > all the frames higher up the stack also can "return multiple times", so > they have the behavior, even w/o the label.
The RESUMABLE label is of course at the invocation that might resume, somehwere up the call chain. Again: the HLL compiler (or the programmer) knows the effect of the continuation. Just the PIR code is too dumb, i.e. is lacking this information.
> On the other hand, this Ruby code really bugs me (note: "$" variables > in Ruby are globals): > ... , you get an infinite loop, printing increasing > integers.)
Sure. With callcc you are calling the function again and again.
I think I may know a way around this problem. You will have to bear with me for a moment as I am not entirely used to speaking in terms of continuations (I find a similar difficulty in speaking about Math, but at least I have a commonly accepted lexicon for that ;-).
The view from 10,000 feet. Full continuations do not operate on their context in place, but operate on a copy of it when invoked.
The nitty gritty, consider (as provided by Jeff):
a = 1 foo() print a b = 10 return b
in the case where foo does not construct a full continuation an donly uses the return continuation, no extra copying is done and everything works. In the case where foo takes a full continuation and puts it in a global "evil", when foo upgrades its return continuation to a full one (which happens automatically if foo or one of its children creates a full continuation of its own), all of the continuations down the tree will be marked as full. When a full continuation is invoked it *copies* its contenxt into place (thus it can be invoked multiple times and it will always have its original context). This means that invoking full continuations will have a speed hit associated with it (although that is to be expected), creatings full continuations has a smaller speed hit of marking the chain (also reasonably expected), but invoking and creating return continuations would still be cheap.
Hopefully that made sense to someone other than me, Matt -- "Computer Science is merely the post-Turing Decline of Formal Systems Theory." -???
Matt Fowles <uberm...@gmail.com> wrote: > All~ > ... When a full continuation is invoked it > *copies* its contenxt into place (thus it can be invoked multiple > times and it will always have its original context).
If you mean by "context" C<struct Parrot_Context> then yes, this is what continuations are doing. But then nothing is solved.
If you mean context + registers then there is another problem: returning via continuation would restore everything, including e.g. a loop-counter that keeps track of how often you loop via the continuation.
A continuation isn't allowed to do that, you couldn't make any progress in that subroutine, when all register contents is restored.
>On Mon, Nov 15, 2004 at 09:51:58AM +0100, Leopold Toetsch wrote: >: Yes, as said I'm fine too with that. Perl/Python will do that anyway. >: But what about other languages? Are we forcing these to be as dynamic as >: the primary HLL targets?
>In Perl 6, any assumptions that cause inefficiency should be optional.
Okay, for this one if it's optional then we might as well not do it, since it'll work so rarely all it'll be is a massive source of bug reports. "Why did the sub that has no idea that I'm messing with its lexical bindings only spottily and irregularly notice that I rebound them?" I just don't want to go there.
Compilers can assume lexicals aren't going to get rebound outside of their view. They're still going to have to deal with global rebinding. -- Dan
--------------------------------------it's like this------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk
On Nov 15, 2004, at 3:27 AM, Leopold Toetsch wrote:
> Jeff Clites <jcli...@mac.com> wrote: >> On Nov 14, 2004, at 3:03 AM, Leopold Toetsch wrote:
>>> Yes, but Jeff's example wasn't really reflecting the problem.
>> How come?
> Your code didn't reuse C<a> after the call.
Oops.
>> It seems that, in term of cache locality, the same problem is there >> with more registers v. spilling v. lexicals.
> Not quite. We can't extend just one register set, we'd do that for all > 4 > kinds. You can not easily have 64 PMC registers and just 10 INTVALs. > A lexical access is just an array lookup (at least if the compiler uses > the indexed addressing of lexicals).
Ah. What I don't like, though, is that in the in the lexical case you have things sitting in an array, from which you need to move things back-and-forth to registers to work on them. In the "more registers" case, they're sitting in a quite similar array, but you can work on them directly there. Adding 2 such INTVALs is one op, instead of 4 (2 loads, 1 add, 1 store). Since the problem exists for all register types (unless HLLs only use PMCs, which is possible), then we either need 4 lexical arrays for maximum locality (so that they can be sized independently), or we need to be able to size the register frames independently for the 4 types. (I realize that currently lexicals must be PMCs, but it seems we have the same issue with other reg. types.)
>> ... That is, if you have 100 >> local variables whose lifetimes overlap (due to continuations), then >> you need 100 distinct memory locations to (repeatedly) access.
> Sure. If the program is complex you need the storage anyway.
But I think the real problem is that it's only due to CPS that the lifetimes are overlapping so much--that's what's biting us. (By expanding my example code, you could come up with simple code which uses 100 locals be would only need 1 register w/o continuations, and needs 100 "spots" with them.) It's just a pretty unfortunate price we're paying, if the feature is not extensively used. Now we're just figuring out how to survive it.
>>> 4) Having an explicit syntax construct >>> "(call-with-current-continuation >>> " that expresses verbatim, what's going on, like e.g. with a reserved >>> word placed as a label:
>>> RESUMEABLE: x = choose(arr1)
>> I don't think that really helps either: If you have such a call, then >> all the frames higher up the stack also can "return multiple times", >> so >> they have the behavior, even w/o the label.
> The RESUMABLE label is of course at the invocation that might > resume, somehwere up the call chain. Again: the HLL compiler (or the > programmer) knows the effect of the continuation. Just the PIR code is > too dumb, i.e. is lacking this information.
Picture this call stack:
main --> A --> B --> C --> D --> E
The call from D to E uses the "RESUMEABLE" label, and E stores the resulting continuation in a global, and everything up to main returns. Then, main invokes the continuation, and we find ourselves back in D. Now, C, B, and A will return "again", without any compile-time hint.
>> On the other hand, this Ruby code really bugs me (note: "$" variables >> in Ruby are globals):
>> ... , you get an infinite loop, printing increasing >> integers.)
> Sure. With callcc you are calling the function again and again.
I know--the infinite loop was the "desired" behavior (just mentioned it so that people wouldn't have to run it). What bugs me is that you can get that error, though looking at the code it should be impossible. The author of outer() might have no clue that could happen, so it's not really his bug, and the person using a continuation needs really detailed knowledge of everything in the call stack, to know if it will work. I guess that's "just how it is", but it seems to mean that continuations have limited usefulness in languages with side-effects, except for very local usage (breaking out of a loop and such).
Jeff Clites <jcli...@mac.com> wrote: > Picture this call stack: > main --> A --> B --> C --> D --> E > The call from D to E uses the "RESUMEABLE" label, and E stores the > resulting continuation in a global, and everything up to main returns. > Then, main invokes the continuation, and we find ourselves back in D. > Now, C, B, and A will return "again", without any compile-time hint.
That's fine. The return is an expected operation. I don't think that's the problem. The error in gc_13 is a call path like:
The problem now is not per se the path in main from the two choose() calls down to fail is executed more then once (as it's the case with multiple returns). The problem is the loop in main. By denoting the loop from the call to fail() to the first choose() with some kind of syntax, the register allocator does the right thing.
On Mon, 15 Nov 2004 20:30:18 +0100, Leopold Toetsch <l...@toetsch.at> wrote: > Matt Fowles wrote:
> > Leo~
> > On Mon, 15 Nov 2004 17:36:20 +0100, Leopold Toetsch <l...@toetsch.at> wrote:
> >>Matt Fowles wrote:
> >>>I do mean context + registers and it can do that. If you keep your > >>>loop-counter in a PMC, then it will be incremented
> >>Ok, but having suddenly such a difference between value and reference > >>types would really cause weird behavior.
> > I disagree. This is exactly the sort of distinction that has always > > been annoying novice programmers with most every language.
> Yes of course. But continue that sentence above: weird... depending on > the presence of Continuation, which, by creating a loop from your code, > start preserving your scalar data types.
I am sorry, but I don't understand what you are trying to say here. Would you mind rewording it for me?
> I didn't mention the runtime impact yet. You said already that it's > costy. This would make continuations unusable for backtracking in the > rules/rx engine.
The runtime inpact would be comparable to the speed of our previous calls where memory had to be copied off and restored (accept that we only need one of the copies instead of two).
I am not saying that this is necessarily the correct solution, but it is a solution that would impose minimal breakage and would not put heavy constraints on the register allocator.
One thing that worries me is the need to copy the entire callstack worth of return continuations into full ones when a continuation is created/promoted. This could be optimized for the regex case by taking an initial continuation and thereafter using its copied stack to initialize any newly taken continuations.
Which gives me an evil idea. We could allow bytecode to specify that it wanted to start taking full continuations everywhere, but that these would never be used below it on the callstack. Thus the regex engine could do this and not suffer too much efficiency loss for taking new continuations for every backtrack point.
Matt -- "Computer Science is merely the post-Turing Decline of Formal Systems Theory." -???
On Mon, 15 Nov 2004 17:19:01 -0500, Matt Fowles <uberm...@gmail.com> wrote: > Which gives me an evil idea. We could allow bytecode to specify that > it wanted to start taking full continuations everywhere, but that > these would never be used below it on the callstack. Thus the regex > engine could do this and not suffer too much efficiency loss for > taking new continuations for every backtrack point.
This is not really evil - it's known as an "escape continuation" or "weak continuation", or - more commonly - as an "exception".