Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>Anton Ertl <
an...@mips.complang.tuwien.ac.at> wrote:
>> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>>I guess that depends on whether your glass is half empty or half full.
>>>There's no getting around register allocation being NP-complete, but
>>>IME approximate algorithms run in reasonable time on reasonably small
>>>programs.
>>
>> Sure, but that does nothing to address the issues that register
>> windows address. A function that calls another function and has, say,
>> 6 registers live across the call, needs to save 6 registers and
>> restore 6 registers somewhere (either save calle-saved registers on
>> entry and exit, or save caller-saved registers around the call).
>> Register Windows reduce that overhead. In theory, one can use
>> interprocedural register allocation to reduce these overheads.
>
>Sure, but it's completely not free with regster windows either. Of
>course if every call fits neatly in the register window that's a win,
>but if they don't that's a loss.
If a function needs more registers, it's certainly not worse off then
if the machine has no register windows: When it calls another
function, it shifts the window before the call and shifts it back
after, and presto, 16 registers are saved that would have required 16
stores before the call and 16 loads after; apart from that, the
register allocator has to spill registers just as it would without
register windows; the register windows certainly are a win here.
>>>Well, sure: an optimizing compiler can't optimize beyond its
>>>"optimization horizon": the amount of code that it can see. That's
>>>obvious, surely. You can't optimize what you can't see.
>>
>> But register windows work even for such calls.
>
>Sure, up to their intrinsic limitations caused by the fixed-size
>windows.
What do you have in mind?
>>>The register stack isn't on the face of it such a terrible idea, but
>>>it's certainly questionable whether it's the best way to use the
>>>silicon. But the IA-64 does not, as you note, have fixed-size
>>>windows, which are the real Achilles' heel of the SPARC.
>>
>> I doubt that it's the Achilles' heel. Sure, variable shifts would
>> have made better use of the physical registers, but as long as the
>> number of overflows/underflows is small, that makes little difference.
>
>Indeed, but that's the big assumption
Not at all, because the physical register size can be (and has been)
chosen such that overflows/underflows are rare.
Apart from that, if the fixed shift had been such a big problem, they
could have added variable shifts in later revisions of the
architecture (in particular in the move to 64 bits), but they chose
not to; so apparently it was not a big problem, much less an Achilles'
heel.
>Even with somewhat inferior
>manufacturing technology, if register windows are a substantial
>architectural advantage they should have won, but didn't.
Superior computer architecture always wins? Maybe on another world.
In this world, architectures with lots of software win.
Anyway, back to register windows:
Are they a substantial advantage for statically-linked SPEC CPU
benchmarks? Obviously not, otherwise more architectures would have
adopted them.
Are they a small advantage for real-world applications (dynamically
linked, with polymorphic calls)? I think so, and apparently the IA-64
architects thought so, too.
Do they justify the additional silicon? If it was a justifiable use
of silicon in 1986, it certainly is justfifiable now.
Do they justify the additional design complexity (with possible
knock-on effects on clock rate and time-to-market). The SPARC,
AMD29K, and IA-64 architects thought so, the others didn't. It's
probably not a clear-cut issue.
>> And the number of physical registers was selected such that the
>> number of overflows was small in practice.
>
>That's true iff you're prepared to dedicate a lot of silicon to
>register sets rather than something else. It's not free.
Somehow the CPU people ran out of things that they need silicon for
some years ago, so now they give us more than one core per CPU chip.
Interestingly, SPARC implementations are pretty far along on the
multi-core game: SPARC64 X (2012) has 16 cores (with 2 threads each)
on a chip, i.e., 32 instances of the register set; actually, already
the UntraSPARC T1 from 2005 has 8 cores with 4 threads each, again
needing 32 instances of the register set, and the SPARC T3 and T5 have
16 cores with 8 threads each, for 128 instances of the register set.
Would they have gone for more cores and more threads if they did not
have such big register sets? Maybe, but no other server CPUs with
more threads come to my mind, so probably not.
If register windows were really as much as a problem you make them out
to be, they could have been obsoleted over time: First, let compilers
produce code without window shifts, then reduce the number of physical
registers to the allowed minimum, and eventually remove register
windows support, especially in the 32-bit to 64-bit transition; look
at what ARM did with shifts and conditional instructions in the move
to 64 bits (or AMD with segments in the move to 64 bits).