The example:
   .sub main :main
     $P0 = get_hll_global ['Foo'], 'load'
     $P0()
     $P0 = new 'Foo'
     push_eh catch
       $S0 = $P0
     clear_eh
     say "huh?"
     .return()
   catch:
     say "caught"
     .return()
   .end
.namespace ['Foo']
   .sub load
     $P0 = newclass 'Foo'
   .end
   .sub __get_string :method
     $P0 = new .Exception
     throw $P0
   .end
Running this gives:
   caught
   No exception to pop.
It should only say "caught". I'm guessing (a) that this is a result  
of throwing the exception in a vtable method and (b) that this is  
causing the "attempt to access code outside of code segment" errors  
in Tcl.
--
Matt Diephouse
PIR code running on behalf of a vtable (or MMD) function is implemented by 
entering a secondary runloop (see src/pmc/delegate.pmc). The C code and the 
extra runloop is acting as a Continuation barrier. There are also some notes 
about that in dan's blog.
I've experimented some time ago to get at least exceptions working by 
rewinding runloops also, but have failed so far.
There's no way to get full Continuations working around such C code barriers, 
except by *not* entering secondary runloops at all for these cases[1]. This 
could be achieved by (optionally) returning a new PC for all vtable/MMD 
functions that is, by changing the internal (C) calling conventions of all 
the PMC code.
leo
[1] we can't avoid that for e.g. custom sort functions, but these are special 
enough that we could restrict these.
   Am Donnerstag, 27. Juli 2006 19:44 schrieb Matt Diephouse:
   > Running this gives:
   >
   >    caught
   >    No exception to pop.
   PIR code running on behalf of a vtable (or MMD) function is implemented by 
   entering a secondary runloop (see src/pmc/delegate.pmc). The C code and the 
extra runloop is acting as a Continuation barrier . . .
Ouch.
   I've experimented some time ago to get at least exceptions working by 
   rewinding runloops also, but have failed so far.
IIUC, this is not even theoretically possible, at least not without
putting heavy constraints on the code that a secondary runloop can
run [2].  Last night I thought about capturing the identity of the
current runloop in each closure, e.g. by adding a C<jmp_buf> to C<struct
Parrot_cont>, so that invoking the closure would return to the runloop
in which it was created.  Then it occurred to me that, once the
secondary runloop exited, invoking the continuation would make Parrot
jump off into never-never land.
   Of course, this also affects calling actions (or dynamic-wind thunks,
or whatever we are calling them this week).  As a result, my "Partial
fix to make closures invoke actions" patch of Wednesday is clearly not
the right thing; please consider it withdrawn.
   There's no way to get full Continuations working around such C code barriers, 
   except by *not* entering secondary runloops at all for these cases[1]. This 
   could be achieved by (optionally) returning a new PC for all vtable/MMD 
   functions that is, by changing the internal (C) calling conventions of all 
   the PMC code.
leo
   [1] we can't avoid that for e.g. custom sort functions, but these are
   special enough that we could restrict these.
I see a solution for simpler cases, that might even work for custom sort
functions, though it's certainly not painless.  Here's what I would like
to do for calling actions:
   In order to call back into bytecode, the C code must set up a call
using the original runloop that invokes more C code on exit in order to
resume the stack rewinding.  This can be done with a C function hook in
the continuation.  Stack rewinding would therefore need to be split up
into a number of thunks that do the rewinding magic, running between
episodes of bytecode:
   T1.  The entrypoint (which could be Continuation:invoke) scans down
the dynamic environment looking for an action to run.  If it finds one,
it sets it up to run, calling T2 when it exits.  If not, call T3
directly.
   T2.  After running an action, we need to look for the next action.
If we find it, set it up to run and call T2 again on exit.  Otherwise,
call T3 directly.
   T3.  We have reached the bottom, and can start scanning up, if
necessary, using the same logic.  (Since at present we don't call
actions on the way up, this is a noop for now.)
   T4.  Having finally gotten to the destination environment, resume
execution from the bytecode pointed to by the sub.
   Keeping track of state between thunk calls would be pretty messy.
Unless the state is garbage-collected, it'd be hard not to leak memory
when the rewinding is aborted because the bytecode happens to call some
other continuation.
Does this sound sane?
   Beyond that, I have no clue how to rescue the delegate, ParrotClass,
and ParrotObject pmclasses.  In principal, this strategy could be
applied to the general case.  But I find it really hard to imagine
rewriting *every* call to a vtable method to allow for the possibility
that it might call into bytecode . . .
					-- Bob Rogers
					   http://rgrjr.dyndns.org/
[2]  It might be possible for Exception_Handler, being a restricted sort
     of continuation, but I assume that is no longer interesting, as the
     new PDD23 design doesn't use them.
   Am Donnerstag, 27. Juli 2006 19:44 schrieb Matt Diephouse:
   > Running this gives:
   >
   >    caught
   >    No exception to pop.
   PIR code running on behalf of a vtable (or MMD) function is implemented by 
   entering a secondary runloop (see src/pmc/delegate.pmc). The C code and the 
   extra runloop is acting as a Continuation barrier . . .
The attached patch detects cases where a continuation tries to enter a
runloop different from the one that is executing, and prints a warning
to stderr.  It shows that there is only one such case in the present
test suite, where t/src/extend.t:13 creates an exception handler in C,
and hence outside of any run loop.  But this particular idiom is not
problematic, so I tweaked this case to suppress the message.
   So the purpose of this patch is to give a heads-up to anyone
encountering this problem in the future; they should notice the message
right before their code starts behaving bizarrely.  Does anyone see a
reason why I should not commit this?
> +local $TODO = 'runloop shenanigans';
> +# stringification is handled by a vtable method, which runs in a second
> +# runloop. when an error in the method tries to go to a Error_Handler
> defined +# outside it, it winds up going to the inner runloop, giving
> strange results. +pir_output_is(<<'CODE', <<'OUTPUT', 'clear_eh out of
> context (2)');
You should be able to replace this with pir_output_is( ..., todo => '...' );
Migrating existing TODO tests might make a good cage cleaners task.
-- c
You should be able to replace this with pir_output_is( ..., todo => '...' );
Good idea; thank you. (I had forgotten about that syntax.)
					-- Bob
Looks very sane and appliable to me.
Thanks,
leo
   Thanks,
   leo
Great; committed as r13715.
					-- Bob
Bob, could you briefly write up the problem and proposed solution as a
[PDD] ticket for the extending, embedding, and external C API PDDs
(10-12, and possibly 2 and 23)? How we handle exceptions and
control-flow across C/Parrot boundaries is an important question, and I
want to make sure we address it.
Thanks,
Allison
Bob Rogers wrote:
   >    From: Leopold Toetsch <l...@toetsch.at>
   >    Date: Thu, 27 Jul 2006 20:50:18 +0200
   >
   >    There's no way to get full Continuations working around such C code barriers, 
> except by *not* entering secondary runloops at all for these cases. This
   >    could be achieved by (optionally) returning a new PC for all vtable/MMD
   >    functions that is, by changing the internal (C) calling conventions of 
   >    all the PMC code.
   > 
   > I see a solution for simpler cases, that might even work for custom sort
   > functions, though it's certainly not painless.  Here's what I would like
   > to do for calling actions:
   Bob, could you briefly write up the problem and proposed solution as a
   [PDD] ticket for the extending, embedding, and external C API PDDs
   (10-12, and possibly 2 and 23)? How we handle exceptions and
   control-flow across C/Parrot boundaries is an important question, and I
   want to make sure we address it.
   Thanks,
   Allison
There are several possible variations on this theme, and all of them
involve a fair amount of pain, though applied in different places.  So I
think more discussion is necessary before picking a direction.
   I had hoped to produce a POC to seed the process, but didn't get
enough tuits.  I'll have some time this weekend, though, and will at the
very least post an analysis Sunday night, with or without POC.
Does that sound alright?
					-- Bob
    main runloop
        => main bytecode
            => op (e.g. set_s_p)
                => vtable method
                    => inferior runloop
                        => method bytecode
Calling a continuation requires restarting a runloop, but when the
method bytecode is running, there are two runloops.  In principle, if
the method bytecode calls a continuation, as when signalling an error,
the continuation can go to the "right" runloop, i.e. the one where it
was taken.  However, if the method bytecode creates a continuation which
is later called from the main bytecode, Parrot has no way of restarting
the abandoned inferior runloop.  This problem also affects coroutines,
which have an implementation similar to continuations.
   So the crux of the problem is that the C code that implements the op
has state that cannot be captured by a continuation.  Normally, the
entire state of the computation is captured in Parrot-accessible
datastructures when the outer runloop is between instructions.  Invoking
an inferior runloop violates this principle by running bytecode in the
middle of an instruction.  And having vtable methods that do so
transparently makes this problem affect a great deal of the code, since
vtable methods are ubiquitous, and are generally assumed to be
low-level.
There are two broad solutions:
   1.  Restrict continuations to move only "outward", which makes it
unnecessary to restart inferior runloops.  This may not be as bad as it
seems, as most of the languages I can think of can be implemented using
only outward (returning) continuations.  And I believe the rest
(including Scheme) can be implemented without Parrot continuations, by
using the CP transformation, which is a standard Scheme compiler
technique in any case.  However, we'd probably have to abandon
coroutines, since telling people that they can't use coroutines with OO
programming is really lame.
   2.  Eliminate inferior runloops.  This is much harder, as it involves
rewriting much code, though the attached POC (see patch; more on this
below) shows that it is possible.  The essential strategy is to split
the C code into two or more portions, the first containing the part that
sets up the bytecode call, and the rest deals with the result.
Ironically, this is just the CP transformation mentioned earlier; we are
turning the latter half of the C code into a continuation, and then
arrange to call it after the bytecode returns.  Unfortunately, this is a
pain in C, as we have to fake a closure in order to retain state between
the C functions.
The second solution is subject to a few variations:
   V1.  It may be possible to redesign delegation in a way that doesn't
involve vtable methods.  For instance, it's not clear that all uses of
VTABLE_get_string (there are ~250 of them) need to support delegation,
and some of them might be really hard to rewrite, being within deeply
nested C calls.  Of course, set_s_p would need to call a delegated
method, but that is easier to special-case (it is the subject of the
POC).  And that only covers some of the vtable-to-bytecode cases.
   V2.  Failing that, it may be possible to limit the number of vtable
methods that are subject to delegation.
   MMD doesn't worry me.  It has long been known that Parrot MMD needs
to be reimplemented more efficiently, and my own favorite candidate for
the job is the FSM-based dispatch that is standard technology for Common
Lisp systems.  The idea is that, instead of doing the type dispatch
directly, the MMD engine builds an FSM in PIR that does argument
discrimination, compiles it to bytecode, caches it for future use, then
calls it to handle the case at hand.  That neatly kills multiple birds
with one stone.
   Nor does embedding.  In embedding, it is normal to exit the main
runloop and later start another main runloop.  As long as continuations
don't care which runloop they restart (i.e. there is only one setjmp
address that gets reset every time), this is not a problem.
   I think V1 would be very, very desirable, but I'm not sure if it's
even possible.  If not, the rest are are all long, hard roads; we should
be quite certain we are heading down the right one before we start.
What does everyone think?
					-- Bob Rogers
					   http://rgrjr.dyndns.org/
Notes on the POC:
   The patch changes set_s_p (around src/ops/set.ops:160) to call
Parrot_op_set_reg_from_vtable, which just does it if given a "normal"
get_string method, else it arranges to call the delegated method, saving
the result register type and number, and arranging to call
store_tail_result_into_register on continuation exit in order to do the
job.
   It doesn't quite work, apparently because set_retval gives up too
soon, and so set_s_p always sets the result to a null string.
Consequently, Parrot doesn't even completely build; it fails compiling
PGE.  In general, it breaks all non-error uses of set_s_p.  However, it
is sufficient to fix t/pmc/exception.t case 30, which tests handling of
an error within set_s_p.
   I started out by writing Parrot_op_set_reg_from_vtable to handle the
general case of storing the result from any vtable method into any
register, but wound up focusing on the "get_string => S register" path
in order to finish it in reasonable time.  Had I not tried to be
general, it could have been a lot simpler.  It could still be made
simpler by code sharing between the bytecode and non-bytecode cases, and
by undoing some of the C&P butchery.
[ a much more detailed answer will follow ]
>    The problem is that inferior runloops cannot be re-entered via
> continuation.  
C<get_string> is an excellent example for the POC. I've a first question 
though: assuming that we might want to eliminate inferior runloops, what can 
we do about usage of (e.g.) C<get_string> in:
a) print P0
b)  foo(P0)    # function call with Preg
    ...
    .sub foo
       .param string s  # string param
a) could be easy by making the get_string explicit in the opcodes:
   $S0 = P0    # imcc could handle this conversion 
   print $S0
but I see little chance to catch b) at compile time.  And b) is of course 
really nasty, as the C<get_string> is occuring during argument passing, which 
has also to be used for any replacement functionality.
leo
Am Sonntag, 6. August 2006 17:20 schrieb Bob Rogers:
[ a much more detailed answer will follow ]
   >    The problem is that inferior runloops cannot be re-entered via
   > continuation.  
   C<get_string> is an excellent example for the POC. I've a first question 
   though: assuming that we might want to eliminate inferior runloops, what can 
   we do about usage of (e.g.) C<get_string> in:
a) print P0
   b)  foo(P0)    # function call with Preg
       ...
       .sub foo
	  .param string s  # string param
a) could be easy by making the get_string explicit in the opcodes:
      $S0 = P0    # imcc could handle this conversion 
      print $S0
Or print_p (which is not much more complicated than set_s_p) could be
reimplemented using a print_tail_result helper fn instead of
store_tail_result_into_register.  (Given the difficulty I've had with
set_retval, that would have made an easier POC, come to think of it.)
   but I see little chance to catch b) at compile time.  And b) is of
   course really nasty, as the C<get_string> is occuring during argument
   passing, which has also to be used for any replacement functionality.
leo
Yes, this is a good example of the kind of pain I meant.  If we insisted
on trying to optimize this away, we'd be limited to supporting languages
no more dynamic than Java or C# -- and that's pretty limited.  ;-}  [1]
   It can still be done, IIUC.  Since an arbitrary number of
get_{string,number,integer} calls might be required, the arg processing
loop would have to be rewritten as a tail-recursion, and then broken
into thunks so that each could be called as a C_continuation if
necessary.  But it may depend on how many places need to call
parrot_pass_args and its kin, and how difficult *those* are to rewrite.
   This may not be quite so as bad as it seems -- the current algorithm
(as you well know!) already revolves around an explicit struct
call_state.  On the other hand, it's not a PMC, and we'd have to figure
out how to keep GC on during arg processing.
   Then there are :flat arrays.  What if some user passes a :flat arg
using an array class with an overloaded get_pmc_keyed_int method?
Methinks it would be good to draw a line here.
Your detailed reply is eagerly awaited,
-- Bob
[1]  Actually, I take that back:  For INS formals, IMCC could generate a
     prolog that is equivalent to this for each parameter <n>:
	C<n>:
		unless param_<n>_needs_conversion goto C<n+1>
		param_<n> = param_<n>_temp
	C<n+1>:
		. . .
     So if the value destined for param_<n> is a PMC, process_args
     stuffs it into param_<n>_temp (still a P register) and sets the
     param_<n>_needs_conversion flag.  The fact that coercion is
     out-of-order with respect to the assignment of other formals should
     not matter; it won't be visible to vtable methods [2].  The whole
     prolog would be unecessary if all were P formals, and could be
     skipped at runtime if no coercion was required.
     It rather seems like a band-aid to me.  But it would be easier in
     the short term.
[2]  Unless we should someday provide a mechanism for formals to be bound
     dynamically.
Important to consider this option, but overall it's too restrictive and 
doesn't allow enough room for the future evolution of programming 
languages. (Yeah, we can't go out of our way to accommodate languages 
that don't exist yet. The YAGNI principle applies. But we can avoid 
making decisions along the lines of "No one could ever need more than 
64K of RAM.")
>    2.  Eliminate inferior runloops.  This is much harder, as it involves
> rewriting much code, though the attached POC (see patch; more on this
> below) shows that it is possible.  The essential strategy is to split
> the C code into two or more portions, the first containing the part that
> sets up the bytecode call, and the rest deals with the result.
> Ironically, this is just the CP transformation mentioned earlier; we are
> turning the latter half of the C code into a continuation, and then
> arrange to call it after the bytecode returns.  Unfortunately, this is a
> pain in C, as we have to fake a closure in order to retain state between
> the C functions.
In general, this is the right direction to head. But that's about like 
saying you can get to Chicago from Portland by heading east. It is true, 
but we'll have to work out a few more details before we get there.
>    The second solution is subject to a few variations:
> 
>    V1.  It may be possible to redesign delegation in a way that doesn't
> involve vtable methods.  For instance, it's not clear that all uses of
> VTABLE_get_string (there are ~250 of them) need to support delegation,
> and some of them might be really hard to rewrite, being within deeply
> nested C calls.  Of course, set_s_p would need to call a delegated
> method, but that is easier to special-case (it is the subject of the
> POC).  And that only covers some of the vtable-to-bytecode cases.
> 
>    V2.  Failing that, it may be possible to limit the number of vtable
> methods that are subject to delegation.
In specific, both of these solutions are too specific. These two 
variations extend Parrot with C code that calls back into bytecode, but 
the general principle applies to any C code used to extend Parrot.
That doesn't mean we need to support CPS and exceptions in any arbitrary 
C code called from within Parrot (the native call interface handles more 
general C libraries), but it does mean we need to be a bit more general 
than limiting or eliminating vtable methods from delegation.
We can put some restrictions on the C code that avoids inferior 
runloops. These fall into the general category of calling conventions: 
we can require set-up and tear-down code wrapped around the C code; we 
can require any calls out of the C code (to bytecode, other C code, or 
by throwing an exception) to set up certain information before the call; 
we can probably even go so far as limiting what C features you can use 
in extension code; we can certainly provide macros to ease the pain of 
whatever constraints we put on C extension code. (This isn't a solution, 
it's just the tools we can use to reach a solution.)
What are the necessary and essential aspects of the Parrot calling 
conventions that a C routine would need to replicate or emulate in order 
to act as if it was a Parrot subroutine/method/opcode?
- accepting a return continuation as an argument?
- returning from the C code by invoking the return continuation passed in?
- creating a return continuation for any calls out of the C code 
(perhaps a special return continuation that externally acts just like an 
ordinary one, but internally takes special actions or stores special 
information for interfacing with the C code)?
Other suggestions? Leo?
>    MMD doesn't worry me.  It has long been known that Parrot MMD needs
> to be reimplemented more efficiently, and my own favorite candidate for
> the job is the FSM-based dispatch that is standard technology for Common
> Lisp systems.  The idea is that, instead of doing the type dispatch
> directly, the MMD engine builds an FSM in PIR that does argument
> discrimination, compiles it to bytecode, caches it for future use, then
> calls it to handle the case at hand.  That neatly kills multiple birds
> with one stone.
I can't say I'm entirely enamored with the idea of handling MMD with a 
Lispy FSM. OTOH, I do suspect that ultimately MMD will have to be 
extensible, to allow for different ways to do MMD. But that's a 
rabbit-trail, so let's save it for later.
>    Nor does embedding.  In embedding, it is normal to exit the main
> runloop and later start another main runloop.  As long as continuations
> don't care which runloop they restart (i.e. there is only one setjmp
> address that gets reset every time), this is not a problem.
Though embedding and extending both use the same C API for Parrot, their 
concerns are quite different. While extension is concerned with getting 
C code to act like native Parrot code, embedding is concerned with 
getting Parrot code (of all sorts) to act as a native part of some other 
system. All that to say: resolving the continuation barrier isn't 
necessary for embedding, but would still be handy for embedding in as 
much as it makes Parrot act as a more consistent system.
Allison
Notes on the POC:
. . . It doesn't quite work, apparently because set_retval gives up
   too soon, and so set_s_p always sets the result to a null string.
I figured this out in the process of implementing print_p.  It still
gets plenty of errors, though, some of which are pretty strange.  (The
one that unexpected succeeded is Matt's test case for bug #39988.)
-- Bob
------------------------------------------------------------------------
Failed 11/239 test scripts, 95.40% okay. 124/5359 subtests failed, 97.69% okay.
Failed Test                        Stat Wstat Total Fail  Failed  List of Failed
--------------------------------------------------------------------------------
t/compilers/pge/03-optable.t         31  7936    35   31  88.57%  1-23 27 29-35
t/compilers/pge/p6regex/01-regex.t   80 20480   494   80  16.19%  287-301 305-
                                                                  327 330 355-
                                                                  357 359-390
                                                                  393-396 486-
                                                                  487
t/compilers/pge/p6regex/closure.t     3   768     6    3  50.00%  1 4-5
t/compilers/pge/p6regex/context.t     1   256    20    1   5.00%  8
t/compilers/pge/pge_examples.t        1   256     2    1  50.00%  2
t/op/calling.t                        1   256    93    1   1.08%  39
t/op/gc.t                             1   256    22    1   4.55%  11
t/pmc/delegate.t                      1   256     9    1  11.11%  9
t/pmc/mmd.t                           1   256    39    1   2.56%  30
t/pmc/object-meths.t                  2   512    34    2   5.88%  11 25
t/pmc/objects.t                       2   512    78    2   2.56%  33 59
 (1 subtest UNEXPECTEDLY SUCCEEDED), 10 tests and 459 subtests skipped.
make: *** [test] Error 255
The continuation barrier is only one nastyness of inferior runloops. The 
second problem with it is that it heavily influences the guts of garbage 
collection. Due to the involved C code with its auto-storage objects on the 
C-stack, we have to walk the C-stack for active objects. This is the reason 
that our GC system is termed 'conservative', but much worse, it effectively 
prevents timely destruction.
See "Method Overloading and GC Issues" in Cfunc.pod. The only way IMHO to 
avoid this problem is to run GC at "safe" points at the runloop level, so 
that we don't have to trace any system areas (HW registers and C stack). This 
can be achieved by a) avoiding inferior runloops and b) triggering GC within 
some opcodes like C<new>, if there is resource shortage but not inside C 
code. I.e. if an allocation inside C code needs more objects, it would just 
set a flag "need_GC", allocate more objects and proceed.
This would have also the advantage, that no pointers (e.g. str->strstart) 
would move under C code.
Back to continuations.
>    2.  Eliminate inferior runloops.  This is much harder, as it involves
> rewriting much code, though the attached POC (see patch; more on this
> below) shows that it is possible.  The essential strategy is to split
> the C code into two or more portions, the first containing the part that
> sets up the bytecode call, and the rest deals with the result.
Exactly. But the code splitting can be simplifed vastly IMHO. Your POC is 
creating an extra layer between opcodes and C code, which is basically doing 
two things:
- manage to call user code on behalf of the C code and pass args to it:
  C<Parrot_op_set_reg_from_vtable> and C< C_continuation > stuff
- pass return results back to the opcode:
  C<store_tail_result_into_register>
The proposal below is dealing with these issues directly in the runloop. 
Basically all C code is called like this:
    if (info->op_func(interpreter, args)) {
        /* if it branched, goto new pc */
        pc = args[n - 1];
    }
where C<op_func> is any C function following this calling convention. (It 
might be reasonable to also pass an argument signature or argument count, but 
this are implementation details).
When now this opcode function is overloaded, it would be a stub that
- invokes the PASM/PIR subroutine, which returns the new C<pc>
  and creates a return continuation
- sets up current_args and current_results pointers
Then the runloop would just dispatch to the PASM/PIR user code and run it w/o 
any inferior runloop.
There's still the mentioned problem of calling user code deeply inside some 
called C function. E.g. calling C<get_string> from with C<parrot_pass_args> 
due to argument conversion. This can be handled by a combination of:
a) roll it into the runloop e.g.
   print_p  => set_s_p ; print_s
b) disallow or restrict some overloading
c) handle argument conversion at the runloop too
The POC runloop below (np2.c) basically splits the runloop into 2 parts:
a) fast inlined (switched) opcodes
b) Cfunc calls, with argument setup performed in the runloop
This argument setup could also be used for calling PASM/PIR code so that 
necessary argument conversions, which might trigger user code, are also 
executed in the runloop.
The attached Cfunc.pod has plenty other reasons, why we might need to change C 
internals as well. The continuation barrier is just one of these.
Comments welcome,
leo
   The continuation barrier is only one nastyness of inferior runloops. The 
   second problem with it is that it heavily influences the guts of garbage 
collection . . .
   See "Method Overloading and GC Issues" in Cfunc.pod. The only way
   IMHO to avoid this problem is to run GC at "safe" points at the
runloop level . . .
Had you considered keeping track of object references from C variables
explicitly?  This is what Emacs does, and the overhead is surprisingly
low -- less than one line in 300.  There are occasional bugs introduced
due to failure to "GCPRO" the right things at the right times, but the
cost might be more acceptable than conservative GC.  (But, IIUC, your
Csub proposal should make this problem completely avoidable, so this is
just academic curiosity.)
. . . But the code splitting can be simplifed vastly IMHO. Your POC
   is creating an extra layer between opcodes and C code, which is
   basically doing two things:
   - manage to call user code on behalf of the C code and pass args to it:
     C<Parrot_op_set_reg_from_vtable> and C< C_continuation > stuff
   - pass return results back to the opcode:
     C<store_tail_result_into_register>
Yes.
   The proposal below is dealing with these issues directly in the runloop. 
   Basically all C code is called like this:
       if (info->op_func(interpreter, args)) {
	   /* if it branched, goto new pc */
	   pc = args[n - 1];
       }
   where C<op_func> is any C function following this calling convention
. . .
Yes; your proposal clearly goes farther, addressing more problems by
taking a larger view.  And, at least as importantly, it is much less
ugly than the code I wrote.
   When now this opcode function is overloaded, it would be a stub that
   - invokes the PASM/PIR subroutine, which returns the new C<pc>
     and creates a return continuation
   - sets up current_args and current_results pointers
   Then the runloop would just dispatch to the PASM/PIR user code and run it w/o 
   any inferior runloop.
   There's still the mentioned problem of calling user code deeply inside some 
   called C function. E.g. calling C<get_string> from with C<parrot_pass_args> 
   due to argument conversion. This can be handled by a combination of:
   a) roll it into the runloop e.g.
      print_p  => set_s_p ; print_s
   b) disallow or restrict some overloading
   c) handle argument conversion at the runloop too
We still need a mechanism to run C when the PASM/PIR sub returns,
though.  In the case of (e.g.) rewinding the stack, you need a hook to
tell Continuation:invoke to resume rewinding [1].
. . .
   The attached Cfunc.pod has plenty other reasons, why we might need to
   change C internals as well. The continuation barrier is just one of
   these.
   Comments welcome,
   leo
I have mostly questions, but these are about details; I'd rather see
others respond first.
   There is one pressing question, though:  I had intended to use the
continuation-tailcalling mechanism from the POC to eliminate inferior
runloops from stack rewinding, as the logical next step in my campaign
to clean up dynamic environment support.  Should I wait for a more
complete Cfunc.pod design, or should I proceed in the expectation that
the continuation-tailcalling mechanism isn't likely to change that much?
-- Bob
[1]  Of course, that can be done in terms of a Csub, but you still need
     the equivalent of C_closure state.
>    See "Method Overloading and GC Issues" in Cfunc.pod. The only way
>    IMHO to avoid this problem is to run GC at "safe" points at the
>    runloop level . . .
>
> Had you considered keeping track of object references from C variables
> explicitly?  
Ah, yep. I forget about that. It was discussed (and discarded) some years ago 
and is also documented:
$ perldoc docs/dev/infant.pod
  Solution 3: Explicit root set augmentation
         Variant 1: Temporarily anchor objects
>    There is one pressing question, though:  I had intended to use the
> continuation-tailcalling mechanism from the POC to eliminate inferior
> runloops from stack rewinding, as the logical next step in my campaign
> to clean up dynamic environment support.  Should I wait for a more
> complete Cfunc.pod design, or should I proceed in the expectation that
> the continuation-tailcalling mechanism isn't likely to change that much?
I dunno yet. A general "continuation-tailcalling mechanism" could still be 
needed for all stuff that just can't be (easily) forced into the runloop like 
a user-defined "sort" function. OTOH, if all is done properly, we might only 
have a few special cases, which could be handled as such.
> -- Bob
leo
Allison, Bob:  Is this going to happen?  If so, I think we should track
it in a separate RT so that we close this one.  If not, then I will
close this ticket after this week's release.
Thank you very much.
kid51
   The issues that this thread was discussing appear to have been resolved,
   but the most recent posting was a request for the development of a PDD.
   Allison, Bob:  Is this going to happen?  If so, I think we should track
   it in a separate RT so that we close this one.  If not, then I will
   close this ticket after this week's release.
   Thank you very much.
   kid51
I think it ought to happen, though I think Allison just wanted a ticket
for updating existing PDDs, and not for a whole new PDD.  I asked
Allison for a clarification on 11-Mar in Will's "[oops; continuation
0xb6926320 of type 22 is trying to jump from runloop 15008 to runloop
1]" thread, and had been waiting for that.
   But I do agree that it ought to be a separate ticket.  The underlying
issue is still with us, but has outgrown the original ticket.
					-- Bob
> I think it ought to happen, though I think Allison just wanted a ticket
> for updating existing PDDs, and not for a whole new PDD.  I asked
> Allison for a clarification on 11-Mar in Will's "[oops; continuation
> 0xb6926320 of type 22 is trying to jump from runloop 15008 to runloop
> 1]" thread, and had been waiting for that.
> 
>    But I do agree that it ought to be a separate ticket.  The underlying
> issue is still with us, but has outgrown the original ticket.
> 
Okay.  I will now close this ticket.  For more granular tracking, I
recommend that a separate RT be opened for each existing PDD which needs
updating.
On Sun Mar 16 10:02:32 2008, rgrjr wrote:
   > I think it ought to happen, though I think Allison just wanted a ticket
   > for updating existing PDDs, and not for a whole new PDD.  I asked
   > Allison for a clarification on 11-Mar in Will's "[oops; continuation
   > 0xb6926320 of type 22 is trying to jump from runloop 15008 to runloop
   > 1]" thread, and had been waiting for that.
   > 
   >    But I do agree that it ought to be a separate ticket.  The underlying
   > issue is still with us, but has outgrown the original ticket.
Okay. I will now close this ticket.
Fine by me.
   For more granular tracking, I recommend that a separate RT be opened
   for each existing PDD which needs updating.
   Thank you very much.
   kid51
I am not sure that separate tickets makes sense for what ought to be a
single coordinated change.  But I will submit the first ticket so you
can look at it (though probably not for a while yet), and we can take it
from there.
					-- Bob
OK, I've finished the writeup for one or more new tickets; please advise
on whether this is on target, and how to file it.  My personal
recommendation is for a single design/documentation ticket, the
satisfaction of which will generate a number of implementation tickets.
   FWIW, the issue seems much less scary for having written it up.  It
probably also helps that I've let go of the notion that we can ever
remove all continuation barriers; it's a big job, and I still can't
think of a compelling use case for needing to do so.
-- Bob