Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[COMMIT] Register allocator for the JIT

6 views
Skip to first unread message

Daniel Grunblatt

unread,
Aug 4, 2002, 6:25:08 PM8/4/02
to perl6-i...@perl.org
I have just committed the register allocator for the JIT, it is a very
early version to start from, not the optimal.

I also changed a bit the alpha jit to use a constant pool.

I didn't touch the SUN port yet. (I hope I didn't broke it).
I have to change the ARM port too.
Non of them are really using the allocator.

Daniel Grunblatt.


Nicholas Clark

unread,
Aug 4, 2002, 6:22:36 PM8/4/02
to Daniel Grunblatt, perl6-i...@perl.org
On Sun, Aug 04, 2002 at 07:25:08PM -0300, Daniel Grunblatt wrote:
> I have just committed the register allocator for the JIT, it is a very
> early version to start from, not the optimal.

But it's still arrived a lot sooner than I expected. (thanks)

> I also changed a bit the alpha jit to use a constant pool.
>
> I didn't touch the SUN port yet. (I hope I didn't broke it).
> I have to change the ARM port too.
> Non of them are really using the allocator.

You have this in jit/arm/jit_emit.h

# define REQUIRES_CONSTANT_POOL 0
# define MAX_REGITERS_TO_MAP 10

char register_map[MAX_REGITERS_TO_MAP] =
{ r0, r1, r2, r3, r4, r5, r6, r7, r8, r12 };


I've not looked at the rest of the code, and won't get a chance now until
tomorrow night, but if I'm guessing correctly, then these are the registers
you're prepared to map between external function calls.

I *think* (others may correct me) that you can safely use r9 if you're
not generating shared library code, and I know that you can safely use r14
(the link register) if you don't call any external subroutines. And I *know*
you're not, because you can't rely on r0-r3 or r12 being preserved across
any subroutine calls.

Plus I was using r4 as my interpreter pointer, and I suspect you don't want
to map that out.

So you may wish to make your code consider two sorts of registers - registers
which you can map in the jit's own code, and the subset which are preserved
across calls (r4-r8 and r9 (and r10 if you're not using a stack limit))


The other thing I was thinking about, but hadn't yet needed to consider was
that my entry code was only saving r4 (from that list) and restoring it on
exit. I was envisaging making a new fixup type (probably only one needed
both for the stm and the ldm) that is pushed onto the fixup chain to mark
the stm on entry, and the ldm in each end op (where the subroutine can
return). There would be an entry in some part of the jit info structure, and
as the mapper found it needed each extra register from r4-r10 it would record
this in the jit info structure, and then when the fixups run, they'd adjust
all the ldm and stm instructions to preserve exactly the minimum set of
registers needed.

Of course the simpler approach right now is to save all of r4-r10 and not be
bothered about absolute efficiency yet :-)

It might be easier if I let you get on with integrating the mapper into
the arm jit, and work on some perl5 stuff for a few evenings, rather than
fiddle with arm stuff that has merge conflicts with what you're doing.
[I could even do something non perl! :-)]

Nicholas Clark
--
Even better than the real thing: http://nms-cgi.sourceforge.net/

Daniel Grunblatt

unread,
Aug 4, 2002, 11:41:45 PM8/4/02
to Nicholas Clark, perl6-i...@perl.org
On Sun, 4 Aug 2002, Nicholas Clark wrote:

> > I also changed a bit the alpha jit to use a constant pool.
> >
> > I didn't touch the SUN port yet. (I hope I didn't broke it).
> > I have to change the ARM port too.
> > Non of them are really using the allocator.
>
> You have this in jit/arm/jit_emit.h
>
> # define REQUIRES_CONSTANT_POOL 0
> # define MAX_REGITERS_TO_MAP 10
>
> char register_map[MAX_REGITERS_TO_MAP] =
> { r0, r1, r2, r3, r4, r5, r6, r7, r8, r12 };
>
>
> I've not looked at the rest of the code, and won't get a chance now until
> tomorrow night, but if I'm guessing correctly, then these are the registers
> you're prepared to map between external function calls.
>
> I *think* (others may correct me) that you can safely use r9 if you're
> not generating shared library code, and I know that you can safely use r14
> (the link register) if you don't call any external subroutines. And I *know*
> you're not, because you can't rely on r0-r3 or r12 being preserved across
> any subroutine calls.

Yes, I'll add them.

>
> Plus I was using r4 as my interpreter pointer, and I suspect you don't want
> to map that out.

True, I won't use r0 either as it's the register we need to use in the
opcodes that doesn't have it's registers mapped.

>
> So you may wish to make your code consider two sorts of registers - registers
> which you can map in the jit's own code, and the subset which are preserved
> across calls (r4-r8 and r9 (and r10 if you're not using a stack limit))
>
>
> The other thing I was thinking about, but hadn't yet needed to
> consider was that my entry code was only saving r4 (from that list)
> and restoring it on exit. I was envisaging making a new fixup type
> (probably only one needed both for the stm and the ldm) that is pushed
> onto the fixup chain to mark the stm on entry, and the ldm in each end
> op (where the subroutine can return). There would be an entry in some
> part of the jit info structure, and
> as the mapper found it needed each extra register from r4-r10 it would record
> this in the jit info structure, and then when the fixups run, they'd adjust
> all the ldm and stm instructions to preserve exactly the minimum set of
> registers needed.
>
> Of course the simpler approach right now is to save all of r4-r10 and not be
> bothered about absolute efficiency yet :-)

The idea of Parrot_jit_(load|save)_registers is to load/save only the
register that the current section use/used.

>
> It might be easier if I let you get on with integrating the mapper into
> the arm jit, and work on some perl5 stuff for a few evenings, rather than
> fiddle with arm stuff that has merge conflicts with what you're doing.

I will, no promises on when, I want to get some things done for the
debugger first.

> [I could even do something non perl! :-)]

But it won't be *THAT* fun, will it? ;)

Daniel Grunblatt.

Nicholas Clark

unread,
Aug 5, 2002, 5:59:44 PM8/5/02
to Daniel Grunblatt, perl6-i...@perl.org
On Mon, Aug 05, 2002 at 12:41:45AM -0300, Daniel Grunblatt wrote:
> On Sun, 4 Aug 2002, Nicholas Clark wrote:

> > Plus I was using r4 as my interpreter pointer, and I suspect you don't want
> > to map that out.
>
> True, I won't use r0 either as it's the register we need to use in the
> opcodes that doesn't have it's registers mapped.

I'm not quite sure I follow you here.
I've looked a bit at the code, and I think what it's doing is looking for
sections between ops that branch to C functions.
Ops that branch are either ops that aren't JITted (so they call the regular
C implementation), or ops that don't happen to inline a C function call.

[er, I've not worked out what the code is doing about ops that branch
elsewhere in the subroutine the JIT is working on, or the targets of those
branches.]

If you're in a section between branches, then on ARM all the registers are
free for you to use. I suspect that other CPUs are similar - within your code
you can use most or all registers, but when you call or return registers must
be in a controlled state. On ARM, the non-JIT op before won't have anything
useful in r0, while the JIT op that did a branch may well do (as its output).
But inside the block of code being built up by the JIT, where the JIT knows
there are no branches in or out, then all the registers are its to use as it
sees fit. So r0 is no different from r1 or r2.

I'm not sure if your map code allows a JIT op to say "I need a temporary
register" (or more generally "I need n temporary registers", where n will
often be zero) I don't yet think I've needed to use more registers than were
being used for IN and OUT, but given more complex ops it will be necessary.
For example, set NUM, NUM would best be done by copying via the hardware
(integer) registers, ie avoid using floating point registers. I suspect other
CPUs will be similar, in that some ops will want to use one or more scratch
registers that would normally be available to the map code.

Also, across the section with branches, the ARM ABI says that r4-r10 are
preserved. So the map code could quite happily map intermediate values into
any of them (or at least r5-r9) and know that they will be preserved after
a non-JIT op, or (if it is written to conform with this) any JIT op that makes
an external branch. I presume that other CPUs will also have some sort of
split - some registers are preserved across a function call, others are not.
While inside a function you can use the preserved registers as you see fit
(providing you save and restore them at entry and exit)

This would suggest to me that the map code should be able to treat each op
as preserving a set of registers, and potentially corrupting all others.
The default for regular JIT ops would be that they preserve all registers
apart from those that are explicitly mapped as output; the default for ops
that call out to C functions would be that they preserve and corrupt as for
that platform's ABI.

Maybe this is too complex to do right now, but the short version is that I
don't see why r0 needs to be special in the map code for ARM.

> > The other thing I was thinking about, but hadn't yet needed to
> > consider was that my entry code was only saving r4 (from that list)
> > and restoring it on exit. I was envisaging making a new fixup type
> > (probably only one needed both for the stm and the ldm) that is pushed
> > onto the fixup chain to mark the stm on entry, and the ldm in each end
> > op (where the subroutine can return). There would be an entry in some
> > part of the jit info structure, and
> > as the mapper found it needed each extra register from r4-r10 it would record
> > this in the jit info structure, and then when the fixups run, they'd adjust
> > all the ldm and stm instructions to preserve exactly the minimum set of
> > registers needed.
> >
> > Of course the simpler approach right now is to save all of r4-r10 and not be
> > bothered about absolute efficiency yet :-)
>
> The idea of Parrot_jit_(load|save)_registers is to load/save only the
> register that the current section use/used.

I wasn't directly meaning those two functions. I was thinking about the
function entry code, and the code generated by the end op, which returns from
the function generated by the JIT.

On ARM, you need to preserve r4-r10, which is normally done by pushing them
onto the stack at entry, and popping them at exit. However, the C compiler
knows which of r4-r10 it used, and generates the minimal code needed to only
store the ones it used. (as the save and restore of a register that is never
actually changed wastes time)

I was thinking that they way the parrot works at the moment it writes out the
function entry code before it starts processing parrot ops, so at that time it
has no idea how many registers it's going to use. Also, if a function has
several return points, it will write end op before it reaches the end of the
function. And it may find that near the end of the function it encounters
code that requires it to start fusing lots of CPU registers. So the JIT won't
actually know which CPU registers it needs to save/restore until it finishes
the last parrot op of the function. Hence the JIT would need to

a: mark the entry point and each exit point with a fixup
b: track how many registers it used in the map code
c: fixup the entry and exit points to preserve the registers it did use.

> > It might be easier if I let you get on with integrating the mapper into
> > the arm jit, and work on some perl5 stuff for a few evenings, rather than
> > fiddle with arm stuff that has merge conflicts with what you're doing.
>
> I will, no promises on when, I want to get some things done for the
> debugger first.
>
> > [I could even do something non perl! :-)]
>
> But it won't be *THAT* fun, will it? ;)

Well, with all this JIT stuff I did find nicer arm optimisation for !a than
gcc currently knows, and someone's told me exactly which bit of which file in
gcc is the relevant bit of the arm peephole optimiser, so all I have to do is
free up enough disk space to build gcc...
[no, freeing up disk space isn't fun, so you're correct there :-(]

0 new messages