On 4/8/2023 1:34 PM, Bernd Linsel wrote:
> BGB wrote on Sat, 2023-04-08T18:49:12+0200:
>>> Where, the GCC output also seems to be apparently using a Load/Store
>>> strategy, whereas [...]
> Only in -O0 or -O1, from -O2 on it tries to hold all locals in registers
>
Load/Store in that it loads values into registers and mostly operates on
them in registers, storing the final results back to memory when it is
done with them or similar.
Not in the sense that it loads/stores for every operation (like in "-O0").
Contrast with MSVC, which is making heavy use of x86-64's complex
addressing modes to perform operations directly against values held in
memory.
I am not saying that either is good or bad, and I have noted that GCC's
output does tend to outperform MSVC's output, as well, as the executable
code being smaller. So, it is also very well possible that Reg/Mem ops
are not really a win on x86-64, even if it does have them for most
instructions.
> [...]
>
>> I am not entirely sure what is going on here with GCC, as it seems to
>> often alter code sequences into unrecognizable forms (whereas MSVC seems
>> to keep the same general structure as the C source intact).
>
> The first thing that causes this effect is that with gcc, variables don't have a home register. You can clearly see the effects of its SSA analysis, where every generated value gets its own register; and the register allocation is performed more often.
>
Possibly true.
Both GCC and Clang use SSA.
I don't know what MSVC uses here.
BGBCC had attempted to use a sort of "pseudo SSA", but I was unable to
figure out how to make things like phi operators work, so it is
effectively still plain 3-address-code.
Between being C source, and being 3-Address-Code, it does take a detour
through a stack-machine representation.
Where, say, an expression like:
y=m*x+b;
Gets represented as, say:
LOAD m
LOAD x
MUL
LOAD b
ADD
STORE y
Which then needs to be translated back to 3AC form during load:
MUL:i t0, m, x
ADD:i y, t0, b
( This is a shorthand, the notation used by BGBCC for debug output is a
lot more verbose. )
Mostly by noting that, the stack can be seen, not as directly holding
values, but instead as a temporary holding space for variables (and
cases where it does need to "hold" a value, the stack item is used to
hold a temporary variable).
This makes the stack -> 3AC transformation stage relatively
straightforward, but does still put a lot of constraints on how the
front-end ASTs can be expressed in terms of the underlying IR (things
like the "z=c?x:y;" operator not mapping all that cleanly to the stack
machine model).
Though, given .NET CIL is also stack-based, it is possible it may be not
too far off from what MSVC uses. BGBCC's RPNIL is also (at least
vaguely) comparable to .NET CIL in terms of many areas of its design.
I had at one point considered going more directly from ASTs to a 3AC
format, however I would need to rewrite a big chunk of the compiler to
do so (as well as redesign how I handle static libraries); so had not
done so.
The newest form of my BGBScript VM (BS2VM, ~ 2015 era) had switched to a
hybrid stack + register model (1), however this happened long after
BGBCC and the BGBScript VM split into two different entities (in the
late 2000s).
1: Operations could either use the stack or a variable as inputs, or the
stack or a variable as outputs; with the stack serving primarily as
dynamically created temporary variables. Internally, the interpreter
itself used a 3AC model.
Say, hypothetically (for such a VM), one uses a few bits to encode the
type of operation:
000: S=S+S
001: S=S+R
010: S=S+I
011: S=R+R
100: R=S+S
101: R=R+S
110: R=R+I
111: R=R+R
More bits to encode the value base type and operator, all within a
logical 16-bit opcode space (with a few bytes following for operands).
0: Bare 16-bit stack op
1: Op + 8-bit ID.
2: Op + Immed (VLN).
3: Op + 2x 8-bit ID.
...
With a register space something like:
00: 'this'
01..N: Arg1..ArgN
N+1 .. L: Local Variables
L+1 .. FF: Stack/Temporaries
But, almost easier to just forgo the stack entirely, and move entirely
to just using temporaries.
I had at least one attempt at a VM that went entirely to a 3AC model
(though, this was targeted using BGBCC, which still used a stack model
internally).
Big obvious screw-up in the design was trying to put locals,
temporaries, and function arguments into 3 different register spaces (VM
had 48 logical registers, as 3x 16 registers).
If I were doing it now, would just have a single register space probably
with 256 logical registers (local to each call-frame).
Hmm, operand suffix:
dddd-dsss-sstt-ttt0 //3R, 0..31
dddd-dddd-ssss-ssss tttt-tttt-0000-0001 //3R, 0..255
dddd-dddd-ssss-ssss iiii-iiii-iiii-i101 //3RI, 0..255, Imm13s
The '11' case would escape to "complex operands", such as global
variables and literal expressions.
This would be preceded by an opcode word encoding operation+type (for
simple types); likely using a JVM-like type model (with simple types
being encoded using bits, complex types encoded as a reference to a
signature string).
So, primary types:
I, 32-bit Int
L, 64-bit Long
F, 32-bit Float
D, 64-bit Double
A, 64-bit Address
X, 128-bit Int128
With secondary types:
UI, 32-bit Unsigned Int
UL, 64-bit Unsigned Long
UF, -
UD, -
UA, -
X, 128-bit Unsigned Int128
And, tertiary types (Array ops):
SB, UB: Byte
SW, UW: Short
H: Biunary16
...
Possibly, a 6-bit type field:
puyyyy: p=pointer, u=unsigned, yyy=base.
yyyy:
0000, I, 32-bit Int
0001, L, 64-bit Long
0010, F, 32-bit Float (Array/Pointer)
0011, D, 64-bit Double
0100, A, 64-bit Address
0101, X, 128-bit Int128
0110, B, 8-bit Byte (Array/Pointer)
0111, W, 16-bit Word (Array/Pointer)
Some combinations could be reserved, with "Unsigned Address" as an
escape to a signature string (held in a string table). Some operators
would partially discard type distinction.
So, basic instructions could consist of a 16-bit opcode word (type +
operator), followed by a 16-bit or 32-bit operand word. String table or
global/constant references would add additional 16-bit words.
Another option would be trying to make the effort to move over to SSA.
But, in any case, writing a new C compiler (more or less ground up)
would be a massive pain.
And, short of trying to write a backend for GCC or LLVM (looks like a
massive pain combined with their long build times), no existing options
seemingly offering much beyond what I already have with BGBCC.
Goal would be to have the front-end produce output in a "reasonably
generic" IR form, where IR/bytecode images will be loaded into the
compiler backend to produce the final binaries.
Ideally, would also want it to be possible to invoke the backend as an
AOT compiler; and have it so that the backend can still operate within a
constrained memory footprint. This means the image format needs to be
such that the compiler need not be required to be able to load all of
the IR and metadata all at the same time.
May not be held in a WAD-like format.
Would also want the IR design to be able to be used with an interpreter
without it being excessively slow (this would be a downside both with
implicit-typed stack machines and also with SSA form).
...