Comment #16 on issue 11 by
hlide.de...@gmail.com: Crash in
ControlFlowGraph::Build()
http://code.google.com/p/jitasm/issues/detail?id=11
There are several reasons:
1) In early times, whenever I tried to use symbolic registers, I had very
bad code (too many mov or xchg because of registers rotation probably) and
a lot of asserts being false as well. Maybe the buggy CFG was the culprit.
And several of my fellows told me the same thing: code seems too bloated.
2) Context registers (registers mapped into a memory structure representing
mips registers) and symbolic registers (registers spilled into a stack)
have radically different purposes. Symbolic registers are used as local
variables whereas context registers are used as global and
persistent "variables". I need them to be read when entering the function
and be saved in the context at the end of the function, which is not how a
local variable works (a local variables is always written the first time
and doesn't need to be saved back in the spill slot when leaving the
function). When executing a "call rdi" (mips syscall emulation), it's very
important to get all registers back in the context (that's why I consider
I_CALL always terminates a basic block).
3) The most important part is the ability to issue fastest direct or
indirect branching/jumping across the program and not the efficiency of the
register allocation. It leads to a huger execution speed while the use of
allocated registers brings a very marginal speed gain and less bytes for
the generated superblock.
With a true interpreter, I have a frame period of 5.0-5.3 ms for a demo.
Using the recompiler in interpreter-like mode (each basic block is created
per one instruction or two if there is a delayslot), I have a frame period
of 1.1-1.2 ms. In full mode, I have a frame period of 1 ms with or without
register allocation.
4) The strategy I choose for register allocation is different too. While I
can have up to 12 registers (RAX, RDX, RCX, RBX, R8-R15), I want to
allocate one if and only if it leads to more than 2 instructions to
read/write in the register context and reuse registers if there is no
overlaps. Besides, I also remap the allocated registers, so the most
referenced mips registers will be associated to the most prioritized
registers in this order (RAX, RDX, RCX, RBX, R8, ..., R15) since it helps
to reduce the generated bytes (less prefixes because of use of R8-R15). I
also plan to add a constant propagation to reduce the register pressure and
merge several instructions into one (ex: lui reg, 0xXXXX; ori reg, reg,
0xYYYY ---> mov(GPR(vreg), 0xXXXX0000); or(GPR(vreg), 0x0000YYYY); --->
mov(GPR(vreg), 0xXXXXYYYY)) which are not infrequent in mips.
While I do understand why you says jitasm register allocator is more
efficient - because it can generate intermediate blocks to help to keep
only registers in a loop (no need to save/restore context registers), it
tends to use too many registers and to issue too many mov or xchg
instructions and it doesn't scope well with the purpose of a context
register (vs. symbolic register). So it isn't top in my priority list. But
yeah, when a lot of other stuff are done, I may reconsiderate an adapted
version of jitasm register allocator so I can avoid flushing registers
inside a loop.