Thinking about the architectures in Celio's talk
<
https://www.youtube.com/watch?v=Ii_pEXKKYUg>
<
https://arxiv.org/abs/1607.02318>
<
https://arxiv.org/pdf/1607.02318.pdf> also made me think of what
CISCy problems AMD64 has.
Ok, instruction decoding is an obvious problem, but it is now pretty
well understood how to decode variable-length instructions quickly if
we throw enough transistors at it (and now even variants of RISC-V
have variable-length instructions), so I will skip this part here.
I'll focus on the central issue in RISC: AMD64 is not a load-store
architecture. And in the 1980s this made a big difference wrt. easy
and efficient implementation, but how are things now?
One problem in the VAX (especially wrt the number of bug-prone corner
cases) was reported to be the number of page translations needed by
one instruction. On a page fault, the instruction would be rerun
afterwards, but the system would have to ensure that a some point all
pages needed by the instruction are there.
The common instructions of AMD64 outside the load/store paradigm are:
load-and-operate instructions, e.g., reg += mem. I don't see that
these instructions cause any difficulty. Am I missing something?
Neither A64 nor RISC-V have added such instructions.
read-modify-write instructions, e.g., mem += reg. Here we can
translate the address once, fault in the page(s) that contain the
relevant memory, then do the reading and writing on mem. One issue
is whether the cache line can migrate to a different core between
reading and writing; AFAIK the architecture says permissions are
allowed to do it, not sure if the implementations actually do it.
Delaying the answer to a cache line request a little while the
"modify" part runs appears to be a relatively cheap way to deal with
the problem, but maybe I am missing something. In any case, this
appears more problematic than load-and-operate. At least in the K8
days AMD had a load-store microinstruction for implementing RMW
instructions.
AMD64 also supports unaligned accesses, which means that the memory
reference in the instructions above may refer to bytes in two pages;
but the same is true about modern 64-bit RISCs.
Now for the not-so-common AMD64 instructions; among those that we have
inherited from the 8086, the most extreme seems to be MOVSW (and its
32-bit variant MOVSL/MOVSD, and its 64-bit variant MOVSQ): it loads
from one memory address and stores in a different memory address;
overall it can access 4 pages (if both memory accesses are misaligned
and straddling pages); can anybody name anything worse?
In recent years Intel has added the VGATHER and VSCATTER instructions
which (in their AVX512 form) can access up to 16 independent memory
locations in one instruction (if unaligned accesses are allowed, that
would be 32 pages). Makes me wonder if accessing many pages in one
instruction is no longer considered a problem.
Apart from the memory accesses and the instruction encoding, are there
any other non-RISC properties of AMD64 that matter today?
- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <
c17fcd89-f024-40e7...@googlegroups.com>