On 5/28/2018 10:47 AM, Paul A. Clayton wrote:
> On Monday, May 28, 2018 at 2:08:11 AM UTC-4, Terje Mathisen wrote:
> [snip]
>> <BG>
>>
>> I have been waiting to see this view offered, I do agree that x86 asm
>> was very pleasant indeed.
>
> I am not that familiar with x86, but it seems to me
> that the number of instructions is relatively high,
> that the operations are not especially orthogonal
As for x86 instruction counts being high:
Yep, pretty much. The x86 ISA has far more instruction forms than a
typical RISC or similar. Likewise, many opcodes are encoded through
layers of re-purposed prefixes.
The original ISA (8086):
Single-byte opcodes
Many with a Mod/RM byte
Many with a displacement
Many with an immediate
Some prefixes, like REP/REPNE, segment overrides, ...
The 286: Some bytes started being used to encode longer opcodes.
For example, "0F XX Mod/RM ..." vs just "XX Mod/RM", ...
The 386: Added a 32-bit mode, new Mod/RM scheme, and more prefix bytes.
Prefixes: address and data size overrides, FS/GS overrides, ...
By around the time MMX and SSE were being added, lacking much else to
do, they started tacking on the various prefix bytes to other operations
where they were previously not defined, which would give them new
meanings. This causes many SSE operations to effectively have a soup of
prefix bytes as part of their opcode field.
For x86-64, some less frequently used single-byte (INC/DEC) instruction
forms were dropped (forcing their two-byte encodings to be used), and
then were reused as the REX prefixes (needed for QWORD operations and to
access R8-R15).
By AVX, they took another instruction and redefined it so that certain
invalid encodings would be interpreted as a VEX prefix, with its Mod/RM
bits and similar encoding the equivalent of the chain of prefix bytes,
including the REX bits, and potentially an additional register argument, ...
How many instruction-forms exist? Several thousand last I checked...
So, now it is sort of a hairy mess on this front, and implementing a CPU
for x86 would probably be fairly non-trivial. If I were to do something
with x86 support, would probably just implement a RISC style core and
use an emulator to run any x86 code. Likely the emulator would decoded
the ISA in software and then JIT compile it into the native ISA (with
maybe several MB or so for translated instruction traces).
However, from the perspective of someone writing ASM code, it isn't
nearly so bad. The assembler can deal with most of the encoding details,
and most instructions present a fairly consistent interface.
Similarly, most decoding for most operations is basically the same once
you get to the Mod/RM byte.
The situation is much less friendly on a typical RISC, where one might
battle with which instruction forms exist for which combinations of
parameters.
There might also be other issues, like needing to deal with delay slots
and other funkiness. For example, a memory load might not take effect
for several instructions, or the effects of executing a branch
instruction might not take effect until one or two instructions later
(causing instructions after the branch to be executed), ...
Similarly, many have a habit of requiring loading constants from memory,
which may need to be placed within a certain distance of the code being
executed, ...
Some of this is a lot harder to gloss over with an assembler, so writing
ASM code is a bit more painful if compared with x86.
However, for the CPU it is easier, given the instruction format itself
is typically fixed-width and fairly regular.
Some partial exceptions exist, like Thumb, where the layout of the bits
within the various instruction forms is a bit chaotic. Most other RISC's
are a bit more regular here.
> (perhaps particularly with condition code results?;
> not being able to avoid setting the condition code
> seems a relatively useless constraint other than for
> code density), and the encoding (which can matter
> for density or alignment optimizations) is somewhat
> complex.
IMO: x86 style condition codes are needlessly inconvenient in some areas.
One alternative that is nicer IMO is simply having a True/False status
code, with only a small subset of instructions effecting it. But in this
case, now one needs comparison operators that perform a specific
comparison, and there are fewer possible conditions to branch on.
> The complexity might make the discovery of
> a clever use more satisfying and perhaps the
> complexity adds constraints that assist creation
> (perhaps similar to a jigsaw puzzle with more
> piece-shapes, where fit as well as image matching
> constrain placement?)
>
This puzzle aspect is probably more true of dealing with a lot of the
small RISC ISAs than when dealing with x86 IMO.
Bigger RISC ISA's (with 32-bit instruction words) are typically a little
more regular here, but with 16-bit instruction words there is often a
need to fight with instruction coding a bit more to make everything fit
nicely.
There are tradeoffs here (code density, performance, complexity, ...).
> Your previous posts on the subject imply that the
> modest register count with semi-dedicated purposes
> facilitated mental tracking of availability and
> allocation.
>
> Humans seem to need some complexity to find
> intellectual enjoyment, but I suspect that much of
> the bookkeeping aspects of a register rich ISA
> could be handled with software assistance or by
> iterative refinement.
>
Having a lot of registers is not really an issue for writing ASM, if you
don't need them you don't use them.
It isn't exactly hard to write comments to say which variable is in
which register, more so if there are enough registers that the same
variable can be kept in the same register the whole lifetime of the
function.
Likewise, most non-x86 archs simply name them by number, and use names
only for registers with special defined meanings. Even on x86-64, it may
make sense in some cases (such as compilers or JITs) to mostly abandon
the use of symbolic names in favor of identifying them as R0-R15.
Ex: R0=RAX, R1=RCX, R2=RDX, R3=RBX, ...
The tradeoff in an ISA would mostly relate to how one trades off the
usage of bits, vs how much of the time will be spent loading/storing
memory values, vs other issues.
But, as can be noted, 16 or 32 seem to be roughly about optimal in most
cases.