Variables not alive around a call (temps) are allocated preferred in the
lower range first.
Seems to work fine and is not really specific to this register
allocator, nor to a specific ABI. Its just exploiting the fact that a
bunch of registers are preserved around a call.
Works fine *except* for the .flatten_arg directive. This directive takes
an argument array and expands the array contents to function arguments
in consecutive parrot registers. E.g.
.arg a => P5
.flatten_arg array => P6, P7, ...
The code emitted to achieve that runs in a loop and is using the Parrot
opcode C<setp_ind Ix, Py> which sets the xth Parrot register from Py.
Now this array is typically a temporary and not not used around the
call, so it gets allocated in the volatile register range, which then
collides with the generated code for function argument passing.
The register allocator doesn't know, that e.g. P6, P7 is effected by
this opcode.
see imcc/t/syn/pcc_20 - _25 for examples and ops/set.ops for usage
information of this opcode.
leo
[ setp_ind troubles ]
I've found a way to force allocation to R16..R31 in the presence of this
opcode.
leo
Interesting. I'd like to see it.
Regarding pdd03, I am still not clear how it should be interpreted.
The current pdd03, as well as the previous one, both seem to indicate
that registers 0-15 are likely to be overwritten, and anyone making a
call, should save those registers if they still want them. The issue
with PIR Code, is that the author won't know which of their symbols
are mapping to registers about to be killed. So, as previously
discussed, those registers will have to be hands off for the register
allocator. That is essentially how the old and new alloctor have been
working. But this doesn't have to always be the case.
* If the subroutine being allocated is a leaf (with no sub calls),
then all registers should be available.
* Registers P4, S1-S4, N0-N4 are free for allocation, regardless.
* It seems like it would be simple enough to provide a "compiler
hint", to let the allocator know if the subs it calls are using the
parrot convention or not, or how many of the R5-R15 it will need.
From this hint, a bit mask saying which registers are available could
be constructed. This can then be used as part of a static analysis,
and can be incorporated into the unit data structure, or passed as a
separate parameter to imc_reg_alloc().
I wouldn't think this last idea would be considered a change to the
calling convention, but rather as an optional optimization prototype.
Not part of pasm. Dan, would something like this be allowed?
~Bill
> Interesting. I'd like to see it.
See below, you know where it's called from ;)
> Regarding pdd03, I am still not clear how it should be interpreted.
All registers are preserved, but some of these registers are used,
either by implict opcodes or as return values.
> * If the subroutine being allocated is a leaf (with no sub calls),
> then all registers should be available.
Yep.
> * Registers P4, S1-S4, N0-N4 are free for allocation, regardless.
I've included P3 (see below). If it's used it interfers.
> * It seems like it would be simple enough to provide a "compiler
> hint", to let the allocator know if the subs it calls are using the
> parrot convention or not, or how many of the R5-R15 it will need.
The register allocator is only supporting pdd03, nothing else. The
amount of needed R5 - R15 is unknown, as these are return results.
> ... This can then be used as part of a static analysis,
> and can be incorporated into the unit data structure, or passed as a
> separate parameter to imc_reg_alloc().
Yep and it's working.
> ~Bill
leo
/*
* find available color for register #x in available colors
*/
static int
ig_find_color(Interp* interpreter, IMC_Unit *unit, int x, char *avail)
{
int c, t;
SymReg *r;
static const char types[] = "ISPN";
static const char assignable[4][5] = {
/* 0 1 2 3 4 */
{ 0, 0, 0, 0, 0, }, /* I */
{ 0, 1, 1, 1, 1, }, /* S */
{ 0, 0, 0, 1, 1, }, /* P */
{ 1, 1, 1, 1, 1, }, /* N */
};
UNUSED(interpreter);
r = unit->reglist[x];
t = strchr(types, r->set) - types;
/* please note: c is starting at 1 for R0 */
if (!(r->usage & U_NON_VOLATILE)) {
/* 0) 5-15 volatile range */
for (c = 6; c <= 16; c++)
if (avail[c])
return c;
}
/* 1) try upper non-volatiles, 16...31 */
for (c = 17; c <= 32; c++)
if (avail[c])
return c;
/* some lower regs are preserved too 0...4 */
for (c = 1; c <= 5; c++)
if (avail[c] && assignable[t][c - 1])
return c;
/* no chance, force high range with possible spilling */
for (c = 33; ; c++)
if (avail[c])
return c;
assert(0);
return 0;
}
Erm, no. Unused registers in the 0-15 range are explicitly garbage:
Note that registers 16-31 of each of the four types are, for
security reasons, I<never> passed into the invoked subroutine,
method, or continuation. They are guaranteed to be garbage.
> > * Registers P4, S1-S4, N0-N4 are free for allocation, regardless.
>
>I've included P3 (see below). If it's used it interfers.
Nope. It'll either be set if a call returns overflow parameters, or
unused and thus garbage.
--
Dan
--------------------------------------it's like this-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk
Yep. The indirect access ops will cause problems for the PIR register
allocation, since there's no way to know at compile time what's
happening. Their use probably ought to invalidate all the registers,
or the op restricted to pasm code.
>>All registers are preserved, but some of these registers are used,
>>either by implict opcodes or as return values.
> Erm, no. Unused registers in the 0-15 range are explicitly garbage:
It was about usabalitiy of registers for the allocator. So before I make
a function call, these are allocatable as temps. Return values are
garbage, if not set.
> Note that registers 16-31 of each of the four types are, for
> security reasons, I<never> passed into the invoked subroutine,
> method, or continuation. They are guaranteed to be garbage.
Not quite. S and P regs have to be NULLed. Or you gonna tell the DOD
system how to mark garbage ;)
>> > * Registers P4, S1-S4, N0-N4 are free for allocation, regardless.
>>
>>I've included P3 (see below). If it's used it interfers.
> Nope. It'll either be set if a call returns overflow parameters, or
> unused and thus garbage.
Ah, yep. Thanks. It is returned too, forgot that.
leo
As long as the allocator is set to assume that after a function call
that all the registers in the range 0-15 that don't have return
values are garbage. So if there are no string return values, string
registers 0-15 are toast.
> > Note that registers 16-31 of each of the four types are, for
>> security reasons, I<never> passed into the invoked subroutine,
>> method, or continuation. They are guaranteed to be garbage.
>
>Not quite. S and P regs have to be NULLed. Or you gonna tell the DOD
>system how to mark garbage ;)
No, the invoke op.
Nope -- this isn't the job of the register allocator. We aren't
leaving security issues up to bytecode except in a very few, limited
cases. (All involving subroutines with elevated security credentials
which the sub needs to drop after using things they allow)
>Other observations:
>
>* From new allocator bugs, and analysis, we've discovered that
>exceptions cause new control flow edges, not previously considerd.
>This case is being reworked by Leo? to provide missing CFG edges,
>through a minor change in the try block declaration. (thread
>"Continuations, basic blocks, loops and register allocation")
>* The case of continuations has not been solved with respect to
>register alloction. Leo's RESUMEABLE: label might provide help here.
>In any case, we can expect to see some additional edges being inserted
>though. (also thread "Continuations, basic blocks, loops and register
>allocation")
Exceptions and continuations should be the same problem -- the target
is the start of a basic block. (Well, more than that, as they're
places where calling conventions potentially kick in) This means the
instruction immediately after a sub call starts a new block, as does
the start of an exception handler. (And I've got some docs on
exceptions that should be out later tonight)
Okay, looks like I misread an earlier message of Dan's. The reason
that we cannot use R0-R15 through a sub, is that they are shredded.
The values are not preserved through the sub call.
Since I understand the item about allocating registers between sub
calls, I can probably implement that change, as I work through the
control flow/data flow analysis.
Sounds like everything else is okay. We're just missing a few CFG
arcs from the continuations stuff, which I'll let you all worry about.
:)
Bill
ps: I'm making progress on grokking the cfg and register renaming
stuff. Will let you know.
> Since I understand the item about allocating registers between sub
> calls, I can probably implement that change, as I work through the
> control flow/data flow analysis.
This is already implemented, parts of it are in CVS.
> Sounds like everything else is okay. We're just missing a few CFG
> arcs from the continuations stuff, which I'll let you all worry about.
> :)
Yep
> Bill
> ps: I'm making progress on grokking the cfg and register renaming
> stuff. Will let you know.
This needs an SSA graph of the data flow?
BTW, looking at analyse_life_block() it seems that this allocates
unconditionally a Life_range structure. As these are O2 in (n_symbols *
n_basic_blocks) we could safe huge amounts of memory, by defining that a
missing life block propagates the previous one. Dunno if its possible,
though.
leo
> Exceptions and continuations should be the same problem -- the target
> is the start of a basic block. (Well, more than that, as they're
> places where calling conventions potentially kick in) This means the
> instruction immediately after a sub call starts a new block, as does
> the start of an exception handler.
Dan, I've already said that there is of course a new basic block. The
problem arises by the silent generation of loops in the CFG. Within a
loop the same register can't be reallocated to a different variable.
There are two possible solutions (AFAIK):
1) statically mark the branch target of the loop. Proposed syntax
constructs:
1a) for exceptions:
set_eh handler, catch_label
This is just a small adaption of the sequence of installing an
exception handler.
It depends a bit, if exception handlers are inline or nested
closures or both.
1b) generally
RESUMABLE: func_that_might_loop_through_cc()
possibly accompanied with another markup of the function call that
loops back.
2) Fetch all from lexicals/globals after a function call.
leo
> Since I understand the item about allocating registers between sub
> calls, I can probably implement that change, as I work through the
> control flow/data flow analysis.
I've no resynced pending changes with the old allocator and committed
these changes.
Your code is slightly modified here in my tree (minor cleanup,
whitespace) and has the mentioned code to use either volatiles or
non-volatiles depending on register usage. Please drop me a note for a
copy.
leo
Exceptions are going to get redone later. Probably (hopefully) later today.
>1b) generally
>
> RESUMABLE: func_that_might_loop_through_cc()
Blech. No. The instruction after a function call's inherently a
destination (sort of, given how we're generally guaranteeing register
contents), as is any label and the target of a branch operation.
(Though most branch operations are going to labels so there's some
redundancy)
If the point is to direct the allocator to flush temps and start
fresh, then we should just do it:
.flushtemps
or
.fardestination
rather than playing label overloading games.
>2) Fetch all from lexicals/globals after a function call.
This should only be necessary for volatile I and N values for normal
(i.e. ones where parrot automatically generates the return
continuation) function and method calls, since the S and P registers
should be fine across the call.
> If the point is to direct the allocator to flush temps and start fresh,
> then we should just do it:
>
> .flushtemps
The point basically is to insert a loop into the CFG. When we have
foo()
...
bar()
which is a linear control flow through these basic blocks, we need:
invoke --> foo(), captures continuation
cont_label:
....
invoke_or_branch cont_label -- bar()
as the invocation of bar() can turn into a continuation resume at
cont_label.
>> 2) Fetch all from lexicals/globals after a function call.
>
>
> This should only be necessary for volatile I and N values for normal
No. During a linear control flow, the register allocator can reuse a
register. E.g.
$P0 = a + b [1]
foo() [2]
print $P0 [3]
$P1 = new .Schrott [4]
bar() [5]
print $P1
[1] $P0 get e.g. P16 - preserved over call to foo()
[2] inside foo() the continuation is captured and made available for bar
[3] all fine first, P16 is valid
[4] $P0 isn't in use anymore. If this is a linear control flow, as it
looks like, $P1 gets register P16, as it needs being preserved over the
call to bar
[5] the call to bar uses the passed continuation and re-enters after the
call in [2], it has created a loop.
Result: the print statements in [3] fails.
This does *not* depend on any register type, nor if it's a temp or a
lexical.
leo
Static Single-Assignment (I had to look it up). My approach is to
consider something you once said, which was to the effect of ensuring
the accuracy of the analysis. I've got some code to do a fairly
simple depth first search per symbol. It is slower (I presume), but
easier to understand. Later, I plan to (or someone else could)
implement the data flow stuff with bit vectors and SSA to determine
reachability, etc. In this way, I can compare the simple analysis
with the complex analysis, and convince myself, and anyone else that
the code is correct. All this will be done even before verifying the
program runs correctly (passing "make test").
"Advanced_Compiler_Design_&_Implementation" by Muchnick is my main
resource. Register renaming is easy enough when you have DU (def-use
("def" means value is written, "use" means value read) and UD chains,
and especifically "webs" (which connect defs to uses). Webs are
components of def-use chains (using graph theory language). They are
effectively different variables, and can be provided with separate
symbol names, which is actually very similar to what is done during a
spill. My depth first search algorithm finds these too. Like I said,
I wanted to start with something simple that is correct, and move
forward with the more optimized forms later.
Another thing I'd like to do, is throw in is a randomizer, to change
the way the allocator assigns registers. Considering all the cruft
that was uncovered when the algorithm was changed, it might be a good
idea to have a debug feature that selects registers differently each
time it runs (each allocation should be valid, of course). This could
help flesh out any other faulty assumptions in the parrot compiler.
> BTW, looking at analyse_life_block() it seems that this allocates
> unconditionally a Life_range structure. As these are O2 in (n_symbols *
> n_basic_blocks) we could safe huge amounts of memory, by defining that a
> missing life block propagates the previous one. Dunno if its possible,
> though.
That's definitely not very efficient. Plus it's mallocing each
Life_Range, which is usually at least another 8 bytes per chunk, and
then a lot of potential thrashing when freed. It seems I've seen a
lot of less than optimum code like this, and I suspect there's a lot
more like it.
Solving these things will bring me that much closer to the ultimate
goal of defeating Dan's evil code. ;)
> leo
>
Bill
That'd actually help quite a bit. I've run into a number of cases
where things worked only through sheer happenstance -- subs that were
used before they were actually fetched, but only because they'd been
fetched before and the temp registers mapped the same. Being able to
throw a random seed into the pir compiler'd be nice there to flush
those out.
>That's definitely not very efficient. Plus it's mallocing each
>Life_Range, which is usually at least another 8 bytes per chunk, and
>then a lot of potential thrashing when freed. It seems I've seen a
>lot of less than optimum code like this, and I suspect there's a lot
>more like it.
Yow. That'd also trigger pathologically degenerate code in some
versions of glibc's memory deallocation system.
>Solving these things will bring me that much closer to the ultimate
>goal of defeating Dan's evil code. ;)
Better save that code -- I'm working on some machine generated
Anti-Evil(tm) for it. If I'm lucky I won't be able to recreate it
without some CVS fiddling. :)
That can't work, because *any* function might loop back, unless you want
to analyse the entire logic flow of the called function, find and
compile all functions it calls, recursively, and see if any of them get
to put the return continuation somewhere where someone else may invoke
it. Even that can't work in the case of methods, 'cos which piece of
code gets called isn;t known until runtime.
> 2) Fetch all from lexicals/globals after a function call.
Can you not make that re-fetch conditional on being invoked from a full
(as opposed to a return) continuation, so it will only happen if
someone's messing about with continuations? You can also avoid
rrefetching if the allocator happens not to have reused that register
elsewhere in the function, which is known at compile time.
I agree (I think[1]) with you that the allocator should not try to keep
registers unique if that means spilling (this is pessimizing the common
case for the uncommon), but if possible it would be good for it to avoid
reusing registers if this *doesn't* require spilling. Also, for places
where people know they are going to be using continuations a lot such as
the p6 regex engine, it might make sense to have a pragma to get the
allocator to allocate all registers uniquely and damn the spilling...
but this is almost certainly premature optimization :).
Ben
[1] The 'I think' is about whether or not Leo actually thinks this, not
about whether or not I do... :)
--
All persons, living or dead, are entirely coincidental.
b...@morrow.me.uk Kurt Vonnegut
Yes. In general that is true. The proposed markup would need that the
HLL compiler spots all such places. I don't know if it's possible.
Codepaths due to continuation invocation aren't arbitrary. The compiler
should know affected subroutines and returns. This would work for
lexically nested closure as in t/op/gc_13.imc.
>>2) Fetch all from lexicals/globals after a function call.
>
> Can you not make that re-fetch conditional on being invoked from a full
> (as opposed to a return) continuation, so it will only happen if
> someone's messing about with continuations?
Unlikely, I think no.
> ... You can also avoid
> rrefetching if the allocator happens not to have reused that register
> elsewhere in the function, which is known at compile time.
The allocator uses the first free register. It e.g. starts alway at P16,
if that's avaiable. Having a fetch_lex instead changes the register
usage. The used temp register would be a volatile register then.
> I agree (I think[1]) with you that the allocator should not try to keep
> registers unique if that means spilling (this is pessimizing the common
> case for the uncommon), but if possible it would be good for it to avoid
> reusing registers if this *doesn't* require spilling.
This seems also hard to implement.
> Ben
leo
It's not: consider the case of a function calling a method that hasn't
even been written yet. That method, when written and compiled, could
stuff its passed-in RC into some global (promoting it to a full
continuation, of course) and some other random piece of code could
then invoke it later. This is pretty unpleasant behaviour, of course;
but we need to either define such cases carefully and outlaw them, or
make them work.
Effectively this means that all function calls need to be marked
potentially resumable, except when all functions that could possibly get
hold of our RC are compiled already and known not to do anything nasty
with it. We can't even say that certain languages never use
continutations so they don't need to pay the price, as cross-language
method calls will presumably be possible, as will XS-like methods
written in IMCC.
> Codepaths due to continuation invocation aren't arbitrary. The compiler
> should know affected subroutines and returns. This would work for
> lexically nested closure as in t/op/gc_13.imc.
Yes; it's possible to make it work in the case of a single compilation
unit which makes no calls outside itself, as all the required
information is available at compile time.
> >>2) Fetch all from lexicals/globals after a function call.
> >
> > Can you not make that re-fetch conditional on being invoked from a full
> > (as opposed to a return) continuation, so it will only happen if
> > someone's messing about with continuations?
>
> Unlikely, I think no.
>
<snippage>
>
> This seems also hard to implement.
Fair enough. It was just an idea.
Ben
--
You poor take courage, you rich take care:
The Earth was made a common treasury for everyone to share
All things in common, all people one.
'We come in peace'---the order came to cut them down. [b...@morrow.me.uk]
* Registers R16-R31 are always available for the allocator.
* Registers R0-R15 are available between sub calls. That is, for any
symbol, whose life range does not cross a subroutine. (This implies
that all registers are available if no subs are called.) Since we
have no way to determine if a sub is using those or not, any sub call
will be assumed to possibly use R0-R15. Furthermore, even though we
know there are certain registers in that range, which are unused by
the calling convention, we will still not use them through a sub call
for security reasons.
* [NEW] If register 15 or below is used, it should be cleared out,
ZEROED, after it's last use and before the next sub call. This is for
security reasons. Obviously, these registers will not be the first
choice to use.
* Availability of these registers is subject to the rules for using
the Parrot opcode C<setp_ind Ix, Py>, which were (are being?) worked
through by Leo.
Other observations:
* Leo introduced a flag on the symbol, to indicate if it's volatile or
not. These will be eligible for R0-R15 (volitile registers?).
* From new allocator bugs, and analysis, we've discovered that
exceptions cause new control flow edges, not previously considerd.
This case is being reworked by Leo? to provide missing CFG edges,
through a minor change in the try block declaration. (thread
"Continuations, basic blocks, loops and register allocation")
* The case of continuations has not been solved with respect to
register alloction. Leo's RESUMEABLE: label might provide help here.
In any case, we can expect to see some additional edges being inserted
though. (also thread "Continuations, basic blocks, loops and register
allocation")
Did I miss anything?