This is definitely a Clever Thing, and one I've pondered on and off. 
It will definitely speed up some things, as there's less bytecode to 
chew through, there's more of a chance for optimization by the C 
compiler when parrot's being built, and generally more opportunity to 
cheat.
I'm currently leaning against it only because it doesn't ultimately 
help the JIT. What we have now is wildly cool and damn useful (and 
has anyone heard from Daniel lately, BTW?) but there's room for more 
optimizations.
Specifically, since the interpreter struct address is fixed, the 
bytecode is fixed at JIT time, and the JIT is allowed to make 
interpreter-private JITted versions of bytecode, it means that the 
JIT already has license to take things a step beyond what you're 
looking for--since the registers are at a fixed address and the 
bytecode is fixed, the JIT can produce a stream of executable code 
that directly addresses data, which is what the micro-ops are meant 
to do.
Now, on the other hand it *does* speed up the interpreter, so it's 
definitely not an idea to discard. But if we're going to (and I'd 
still like to hold off) I think we're better off with a few special 
versions of ops that target one or twi registers directly, perhaps 
register 0 and 1, rather than have a separate set of special-purpose 
registers.
-- 
                                         Dan
--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
d...@sidhe.org                         have teddy bears and even
                                       teddy bears get drunk
> At 7:09 PM +0100 10/27/02, Leopold Toetsch wrote:
> 
>> So the I-register access is substituted by access to 3 global integers.
>>
>> Now, how would these globals be loaded? When are these »arg« OPs 
>> inserted?
>>
>> Currently the register optimizer in jit.c does something very similar: 
>> Setting up register access for the most used parrot registers in one 
>> execution block + load and store add block begin/end.
> 
> 
> This is definitely a Clever Thing, and one I've pondered on and off. It 
> will definitely speed up some things, as there's less bytecode to chew 
> through, there's more of a chance for optimization by the C compiler 
> when parrot's being built, and generally more opportunity to cheat.
It would also solve the multi_keyed problem. The _get_keyed argument 
preparation could fetch the PMC out of the aggregate.
> I'm currently leaning against it only because it doesn't ultimately help 
> the JIT. What we have now is wildly cool and damn useful (and has anyone 
> heard from Daniel lately, BTW?) but there's room for more optimizations.
Yes, that's correct. JIT wouldn't profit currently. But with an 
optimized stream of (micro-)Ops, having optimzed fetch/store opcodes not 
in (basic-) block but finer granularity, JIT could profit too. Also the 
JIT-optimizer now run at load time would be done at compile time, so JIT 
startup time would be cut down.
> Now, on the other hand it *does* speed up the interpreter, so it's 
> definitely not an idea to discard. But if we're going to (and I'd still 
> like to hold off) I think we're better off with a few special versions 
> of ops that target one or twi registers directly, perhaps register 0 and 
> 1, rather than have a separate set of special-purpose registers.
My hack with the 3 globals includes obviously some cheating, globals are 
a nono, when having multiple interpreters. But nethertheless we could 
produce an optimized PBC stream, where the 3*4 registers are treated as 
"fast" registers, with load/store to the 32*4 slower registers only when 
necessary. This would also fit neatly with my proposal WRT keyed access.
I was also thinking of the various fixed sized integer ops for JVM or 
C#. The load/store ops would prepare integers of needed size and do sign 
extension when necessary.
leo
> > I'm currently leaning against it only because it doesn't ultimately help 
> > the JIT. What we have now is wildly cool and damn useful (and has anyone 
> > heard from Daniel lately, BTW?) but there's room for more optimizations.
> 
> 
> Yes, that's correct. JIT wouldn't profit currently. But with an 
> optimized stream of (micro-)Ops, having optimzed fetch/store opcodes not 
> in (basic-) block but finer granularity, JIT could profit too. Also the 
> JIT-optimizer now run at load time would be done at compile time, so JIT 
> startup time would be cut down.
But then you end up with a messier two level register spillage problem at
compile time, don't you? Which values to spill from fast to slow registers,
and which values to spill further from slow to stack? And is there much
literature on this sort of thing?
> My hack with the 3 globals includes obviously some cheating, globals are 
> a nono, when having multiple interpreters. But nethertheless we could 
> produce an optimized PBC stream, where the 3*4 registers are treated as 
> "fast" registers, with load/store to the 32*4 slower registers only when 
> necessary. This would also fit neatly with my proposal WRT keyed access.
And the fast registers are going to be called ax, bx, cx and dx? :-)
> I was also thinking of the various fixed sized integer ops for JVM or 
> C#. The load/store ops would prepare integers of needed size and do sign 
> extension when necessary.
I've had 3 drafts at responding to this, and I conclude "my brain hurts"
I don't see an "obvious" clean solution to this, specifically 64 bit ops
that run correctly on 32 bit native systems, but take advantage of 64 bit
native systems.
Nicholas Clark
> On Tue, Oct 29, 2002 at 08:26:00AM +0100, Leopold Toetsch wrote:
> But then you end up with a messier two level register spillage problem at
> compile time, don't you? 
Yes.
> ...Which values to spill from fast to slow registers,
> and which values to spill further from slow to stack? 
imcc does already spilling if more then 32 registers per type are used. 
Adding another step to optimize for 3 * 4 usage wouldn't be much 
different IMHO. It could probably done in one step, when we define 
I0-I3, S0-S2 ... as the "fast" registers.
Though I didn't thin much of this until now.
> ... And is there much
> literature on this sort of thing?
Dunno.
> And the fast registers are going to be called ax, bx, cx and dx? :-)
How did you know that ;-) No actually I'm always thinking of 3*4 registers.
>>I was also thinking of the various fixed sized integer ops for JVM or 
>>C#. The load/store ops would prepare integers of needed size and do sign 
>>extension when necessary.
> I've had 3 drafts at responding to this, and I conclude "my brain hurts"
> I don't see an "obvious" clean solution to this, specifically 64 bit ops
> that run correctly on 32 bit native systems, but take advantage of 64 bit
> native systems.
2 separate core.ops files e.g. core32.ops emulating 64bit ints, and 
core64.ops with native 64bit ints, generated from core.ops?
> Nicholas Clark
leo