On 7/25/2018 3:56 PM, Richard wrote:
> [Please do not mail me a copy of your followup]
>
> BGB <
cr8...@hotmail.com> spake the secret code
> <pj9gc7$r12$
1...@news.albasani.net> thusly:
>
>> I have my own custom C compiler (*) which can compile Quake and Doom in
>> about 5 seconds.
>
> Are your customizations around making the compilation process faster?
> If so, what exactly are you doing to get the speedup? Is this
> something that could be contributed to clang?
>
Nothing that really seems that particularly notable (beyond just its
potentially "unusual" architecture).
The original form of this compiler was written about 10 years ago (based
on a fork of an early form of my BGBScript language, which was at the
time a lazy JavaScript mockup with "pretty much awful" performance;
which I had hacked into being able to parse C), and is stand-alone /
ground-up, and written in C. But, at the time, the original idea was
partly inspired by the scripting system used by Quake 3 Arena.
It has been used/maintained on-off for much of this time. It spent many
years ignored and/or used as a "parse C headers for metadata" tool,
before fairly recently being "reborn" back as a a full fledged C
compiler for some of my recent projects.
I don't know Clang well enough to comment how much is applicable to
LLVM/Clang, and its speed doesn't really seem to be due to anything in
particular (beyond maybe being fairly small/simple).
Speeds otherwise generally aren't that drastically different from what I
see from MSVC (whereas compiling Quake with GCC in WSL is closer to
around 35s on my PC; though it seems a little faster if compiled via
Linux inside VirtualBox).
Vs a normal C compiler (such as GCC):
The compilation process is essentially monolithic.
A single binary does everything, rather than spawning new processes.
There is currently a cache where headers are kept one read-in.
Otherwise, not much seems particularly notable on this front.
Some minor lexer/parser tricks exist, but nothing major.
AST nodes are essentially small key/value B-Trees.
It is faster and uses less memory than CONS lists or XML DOM.
But, many other compilers use tagged-unions (probably faster).
All the translation-units use a shared context in the middle/backend.
Internal renaming is used to isolate TU-local features.
Ex: 'static' variables are renamed to hide them from other TU's.
As-is, this abstraction has some leaks which can break stuff (*1).
Only during parsing/frontend stuff are TU's independent.
Code generation uses a few major stages.
There is no separate assembler or linker stage.
Codegen sends commands directly to an "instruction emitter".
Resembles the inner part of an assembler, w/o an ASM parser.
I did later add an ASM parser, but mostly to support ASM.
Just relocates sections and spits out a binary as its output.
( *1: In particular, mismatched declarations between translation units
can break compilation. As can quietly having global variables and
functions with overlapping names (though MSVC also rejects these, but
GCC seems to accept them in certain cases).
Doom stumbled on a lot of these edge cases, so ended up modifying the
code so that, either, the declarations matched, or in other cases things
were manually renamed such that there was no longer a name clash. GCC
and MSVC just sort of quietly merged these cases union-style (unless
they were initialized with a value, in which case the linker would
typically complain).
Granted, a better solution could be making this work, probably along
with a few other cases which aren't currently handled, like initializer
expressions performing arithmetic or other calculations on the addresses
of other variables (which can't be handled in the front-end).
An overall better solution (to the naming issue) may be to have a
two-level scheme, with a partially separate per-translation-unit
namespace, and a second-level "extern" space which bindings may be
promoted to if referenced from within another translation unit, but
which may be shadowed by bindings local within each TU. Uninitialized
variables may be merged across TU's if their names match and their types
can be reconciled (otherwise, "do something sane").
These sorts of edge cases weren't really much of an issue for Quake or
for most of my own code though.
)
Beyond the frontend stage, there is no real separation between
translation units in the middle-end or backends.
Things like the C library are handled by compiling to an intermediate
stack-based bytecode (vaguely similar to CLR bytecode, produced with
only partial type-information), which is reloaded by the middle-end and
converted into a 3AC/SSA form used by the backend. No bytecode is used
at runtime, but exists purely as an IR stage (between the front-end and
middle-end). In the conversion to 3AC, stack-locations get remapped to
temporaries.
The current stack bytecode is called "RIL3", which consists of blobs of
stack-based bytecode within a collection of begin/end tag structuring.
The format can also contain LZ compressed data blobs (using an LZ4 like
format), which are used for various purposes (such as blobs of ASM code).
The backend basically works with everything in 3AC form, and using an
SSA-like scheme, but it differs from the one used in LLVM in that,
rather than giving each instance of a variable its own "name", it is
given a name with a revision number (and implicitly a "phi" just sort of
merges everything which corresponds to the same base variable).
Though, in general the backend is pretty naive (fancy optimizations
aren't really its strong area).
Eliminating a lot of the "infrastructural busywork" compared with
something like GCC does help with keeping the code smaller and simpler.
Namely, there is a whole lot of stuff in GCC which just doesn't need to
exist in this design.
As-is, compiler code size:
Current total = ~ 157 kLOC;
27 kLOC, BSR1 ISA backend;
34 kLOC, BJX2 ISA backend;
41 kLOC, SuperH/BJX1 ISA backend.
10 kLOC, C parser.
23 kLOC, front/middle-end (AST -> RIL -> 3AC);
15 kLOC, MM/AST/util code.
~7 kLOC, misc stuff
Compiler binary is currently approx 2.5 MB.
Compiler can also be itself recompiled within a few seconds.
There was a debatable possible feature to allow front-ends and back-ends
to exist as plug-in DLLs or SOs, but I have been on the fence as this
would add complexity to the compiler.
ISA Support Overview:
Where BJX1 was a prior ISA of mine developed as a superset of the
Hitachi SH-4 ISA. A later incarnation was a 64-bit ISA with 32 GPRs and
only a superficially similar instruction set remained (code was not
binary compatible between the 32 and 64 bit ISAs). Much hair developed
trying to deal with this much ISA variation in a single backend.
BSR1 was similar to SH, but somewhat simplified (left out features that
unnecessarily created pain when trying to implement the ISA in Verilog,
or "in-general"). It was intended to be a small/simple ISA. Like SH, it
is a 32-bit ISA with fixed-width 16-bit instructions and 16 GPRs, and
was intended for smaller/cheaper FPGAs (ex: lower cost 9kLUT devices).
BJX2 was essentially a redesign of the BJX1 ISA based on BSR1. In many
regards it is a cleaner and simpler design, but expands to support
16/32/48 bit instruction-forms (the larger 48-bit forms being able to
support larger immediate values and displacements). Note that it is
otherwise a 64-bit RISC-style ISA with 32 GPRs.
I have also had a little more success getting a working Verilog
implementation, mostly due to design simplifications in many areas.
The latter two backends were mostly copy/paste/edit versions of the
original SH / BJX1 backend.
Note that it is a fairly minimal ISA if compared with something like
x86, as many things which are done natively in hardware in x86 need to
be faked in software in these ISAs (such as integer division). Granted,
it is intended to work on a "hobbyist grade" FPGA dev-board (currently
targeting an Arty S7 with a 50 kLUT Spartan-7).
There is not currently an x86/x86-64 compiler backend, mostly because I
typically use MSVC or GCC for these (and, for the most part, C is C).
Though, x86-64 support is still a possibility.
Canonically, currently used output formats are either raw ROM images, or
PE/COFF (the BJX2 Boot-ROM expects to load a PE/COFF image). For BJX2,
the PE/COFF format is based on the PE32+ format, but lacks anything in
the MZ stub.
For both boot and program launching, currently PE/COFF is the format
used, but program launching (unlike launch via the Boot-ROM) will
support base-relocatable binaries and DLLs.
Or such...