It sorta worked, but yeah, preferably one doesn't design their FPU this way.
If compiling code as-written, switching between Single and Double would
waste a lot of clock-cycles. GCC seemed to take an approach of in some
cases (when mixing Single and Double) just sort of converting everything
to Single and leaving it there (even if the C rules say you should
promote to Double and then convert the result back to Single).
When I designed the FPU for BJX2, I went over instead to treating Double
as the default internal format, with scalar Float values being operated
on as Double, in which case format conversions for scalar operations are
essentially free. I had also initially eliminated any "modal" concepts
from the FPU (later re-adding them only in a limited form as required
for C's "fenv_access" mechanism).
>>
>>
>> As for displacements:
>> ~ 9-bit (scaled) or 11/12-bit (unscaled) seems to be mostly sufficient
>> for Load/Store;
> <
> Works better for C than FORTRAN.
> <
Can generally deal with the sizes of most structs and stack frames.
>> One can go smaller, but then one really needs a register-indexed mode,
>> or else it is going to suck.
> <
> It is reasons such as these that My 66000 ISA never uses instructions to
> paste constants together.
If all you have is a 5 bit displacement, and no indexed mode, the ISA is
kinda screwed as:
Stack frames are frequently bigger than this;
Structs are also frequently bigger than this.
In SuperH, options were:
If the value is a 32-bit type, you can have a 4 bit load/store displacement;
Else, you can load the displacement into a register (R0 was hard-wired
as an index register);
If your displacement doesn't fit in 8-bits (sign extended), you need to
use a memory load to load the displacement;
Because the PC-relative load displacement was also pretty limited, one
would (typically) also need to branch over the constants they needed to
dump into the middle of the instruction stream in order to be able to
constant-load the memory displacements;
...
This... also sucked...
One of my extensions was an LDSH instruction:
MOV 0xXX, R0
LDSH 0xXX, R0 //R0=(R0<<8)|Imm8u
MOV.L @(R4, R0), R2
This could replace a 16-bit memory load with a 2-instruction bit-paste,
which was at least a little better.
>>
>> At these sizes, the number of (basic) displacements which fall outside
>> the range is pretty small (and those that fail, often do so by a much
>> larger amount).
>>
> Then think about 64-bit address space where .bss and .data may be
> placed more than 1GB away from each other and the code and at
> randomized addresses/offsets. Even if 64-bit address/displacements
> are only 1%-2% of all memory references, using LDs pollutes the D$,
> using instructions pollutes the I$. The only reasonable way out is to
> provide access means in the decode structure of the instructions
> themselves (ala My 66000).
In my current ABI, ".data" and ".bss" are accessed relative to GBR.
With Jumbo encodings, I can access a 4GB section data/bss section via a
single instruction.
Not currently a good way to deal with a data/bss section larger than 4GB
though (this case would require using ALU ops for the address calculations).
>>
>>
>> For local branches (within a function), 8 bits can still be fairly
>> effective, though a fair number of functions will be larger than this
>> (so it is not sufficient "in general").
> <
> And then there are inner loops such as FPPPP (8K instructions).
> So while 8-10-bits is sufficient for a lot of things, it still leaves a few
> things dangling.
In these cases, 20-bit branches are available in my case.
My compiler is generally able to figure out when to use a Disp8 or
Disp20 branch (16-bit or 32-bit instruction formats, eg: 20dd or F0dd_Cddd).
It would start using Disp33 if the program got too large for Disp20, but
this hasn't happened yet.