Of mops and microops

Dan Sugalski

unread,

Oct 28, 2002, 4:31:27 PM10/28/02

to Leopold Toetsch, P6I

At 7:09 PM +0100 10/27/02, Leopold Toetsch wrote:
>So the I-register access is substituted by access to 3 global integers.
>
>Now, how would these globals be loaded? When are these »arg« OPs inserted?
>
>Currently the register optimizer in jit.c does something very
>similar: Setting up register access for the most used parrot
>registers in one execution block + load and store add block
>begin/end.

This is definitely a Clever Thing, and one I've pondered on and off.
It will definitely speed up some things, as there's less bytecode to
chew through, there's more of a chance for optimization by the C
compiler when parrot's being built, and generally more opportunity to
cheat.

I'm currently leaning against it only because it doesn't ultimately
help the JIT. What we have now is wildly cool and damn useful (and
has anyone heard from Daniel lately, BTW?) but there's room for more
optimizations.

Specifically, since the interpreter struct address is fixed, the
bytecode is fixed at JIT time, and the JIT is allowed to make
interpreter-private JITted versions of bytecode, it means that the
JIT already has license to take things a step beyond what you're
looking for--since the registers are at a fixed address and the
bytecode is fixed, the JIT can produce a stream of executable code
that directly addresses data, which is what the micro-ops are meant
to do.

Now, on the other hand it *does* speed up the interpreter, so it's
definitely not an idea to discard. But if we're going to (and I'd
still like to hold off) I think we're better off with a few special
versions of ops that target one or twi registers directly, perhaps
register 0 and 1, rather than have a separate set of special-purpose
registers.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Leopold Toetsch

unread,

Oct 29, 2002, 2:26:00 AM10/29/02

to Dan Sugalski, P6I

Dan Sugalski wrote:

> At 7:09 PM +0100 10/27/02, Leopold Toetsch wrote:
>
>> So the I-register access is substituted by access to 3 global integers.
>>
>> Now, how would these globals be loaded? When are these »arg« OPs
>> inserted?
>>
>> Currently the register optimizer in jit.c does something very similar:
>> Setting up register access for the most used parrot registers in one
>> execution block + load and store add block begin/end.
>
>
> This is definitely a Clever Thing, and one I've pondered on and off. It
> will definitely speed up some things, as there's less bytecode to chew
> through, there's more of a chance for optimization by the C compiler
> when parrot's being built, and generally more opportunity to cheat.

It would also solve the multi_keyed problem. The _get_keyed argument
preparation could fetch the PMC out of the aggregate.

> I'm currently leaning against it only because it doesn't ultimately help
> the JIT. What we have now is wildly cool and damn useful (and has anyone
> heard from Daniel lately, BTW?) but there's room for more optimizations.

Yes, that's correct. JIT wouldn't profit currently. But with an
optimized stream of (micro-)Ops, having optimzed fetch/store opcodes not
in (basic-) block but finer granularity, JIT could profit too. Also the
JIT-optimizer now run at load time would be done at compile time, so JIT
startup time would be cut down.

> Now, on the other hand it *does* speed up the interpreter, so it's
> definitely not an idea to discard. But if we're going to (and I'd still
> like to hold off) I think we're better off with a few special versions
> of ops that target one or twi registers directly, perhaps register 0 and
> 1, rather than have a separate set of special-purpose registers.

My hack with the 3 globals includes obviously some cheating, globals are
a nono, when having multiple interpreters. But nethertheless we could
produce an optimized PBC stream, where the 3*4 registers are treated as
"fast" registers, with load/store to the 32*4 slower registers only when
necessary. This would also fit neatly with my proposal WRT keyed access.

I was also thinking of the various fixed sized integer ops for JVM or
C#. The load/store ops would prepare integers of needed size and do sign
extension when necessary.

leo

Nicholas Clark

unread,

Oct 29, 2002, 4:59:30 AM10/29/02

to Leopold Toetsch, Dan Sugalski, P6I

On Tue, Oct 29, 2002 at 08:26:00AM +0100, Leopold Toetsch wrote:
> Dan Sugalski wrote:

> > I'm currently leaning against it only because it doesn't ultimately help
> > the JIT. What we have now is wildly cool and damn useful (and has anyone
> > heard from Daniel lately, BTW?) but there's room for more optimizations.
>
>
> Yes, that's correct. JIT wouldn't profit currently. But with an
> optimized stream of (micro-)Ops, having optimzed fetch/store opcodes not
> in (basic-) block but finer granularity, JIT could profit too. Also the
> JIT-optimizer now run at load time would be done at compile time, so JIT
> startup time would be cut down.

But then you end up with a messier two level register spillage problem at
compile time, don't you? Which values to spill from fast to slow registers,
and which values to spill further from slow to stack? And is there much
literature on this sort of thing?

> My hack with the 3 globals includes obviously some cheating, globals are
> a nono, when having multiple interpreters. But nethertheless we could
> produce an optimized PBC stream, where the 3*4 registers are treated as
> "fast" registers, with load/store to the 32*4 slower registers only when
> necessary. This would also fit neatly with my proposal WRT keyed access.

And the fast registers are going to be called ax, bx, cx and dx? :-)

> I was also thinking of the various fixed sized integer ops for JVM or
> C#. The load/store ops would prepare integers of needed size and do sign
> extension when necessary.

I've had 3 drafts at responding to this, and I conclude "my brain hurts"
I don't see an "obvious" clean solution to this, specifically 64 bit ops
that run correctly on 32 bit native systems, but take advantage of 64 bit
native systems.

Nicholas Clark

Leopold Toetsch

unread,

Oct 29, 2002, 5:56:29 AM10/29/02

to Nicholas Clark, Dan Sugalski, P6I

Nicholas Clark wrote:

> On Tue, Oct 29, 2002 at 08:26:00AM +0100, Leopold Toetsch wrote:

> But then you end up with a messier two level register spillage problem at
> compile time, don't you?

Yes.

> ...Which values to spill from fast to slow registers,

> and which values to spill further from slow to stack?

imcc does already spilling if more then 32 registers per type are used.
Adding another step to optimize for 3 * 4 usage wouldn't be much
different IMHO. It could probably done in one step, when we define
I0-I3, S0-S2 ... as the "fast" registers.
Though I didn't thin much of this until now.

> ... And is there much

> literature on this sort of thing?

Dunno.

> And the fast registers are going to be called ax, bx, cx and dx? :-)

How did you know that ;-) No actually I'm always thinking of 3*4 registers.

>>I was also thinking of the various fixed sized integer ops for JVM or
>>C#. The load/store ops would prepare integers of needed size and do sign
>>extension when necessary.

> I've had 3 drafts at responding to this, and I conclude "my brain hurts"
> I don't see an "obvious" clean solution to this, specifically 64 bit ops
> that run correctly on 32 bit native systems, but take advantage of 64 bit
> native systems.

2 separate core.ops files e.g. core32.ops emulating 64bit ints, and
core64.ops with native 64bit ints, generated from core.ops?

> Nicholas Clark

leo