OK.
I had considered a fully custom format at one point, original idea would
have been something vaguely resembling a AR or TAR file, say:
Program Header;
Segment Header;
LZ compressed segment;
Segment Header;
LZ compressed segment;
...
Where say, each segment may have one or more sections, specify a loader
address, and may include additional commands for what to do with it
(such as interpreting it as base relocs rather than being part of the
final image).
Technically, this would have also been along vaguely similar lines to
the Mach-O format.
But ended up opting with a modified PE/COFF as it already had most of
what I wanted (and I was already reasonably familiar with the format).
My case differed slightly in that I didn't need to care about whether
Windows or existing tools could understand the binaries, and so "PEL4"
is sort of its own format in a way as well.
Technically, it still fit what I wanted to do better than what ELF would
have been, though I had ended up tweaking some stuff to allow it to
support loading multiple binaries into the same address space:
All binaries include reloc tables;
The read-only and read/write sections are split up into two different
segments (using the Global Pointer data-directory entry to effectively
define the read/write section, with the Global Pointer pointed to the
start of this segment on program start-up).
I had looked into a few possible compression schemes, and LZ4 gave the
best properties for binaries.
I have a different (byte-oriented) compression scheme that tends to give
better compression for general purpose data compression, but LZ4 seemed
to give better results with executable code in this case.
>
>>> Each DLL exports certain symbols such as the addresses of functions
>>> and variables. So no reason you can't access a variable exported from
>>> any DLL, unless perhaps multiple instances of the same DLL have to
>>> share the same static data, but that sounds very unlikely, as little
>>> would work.
>>>
>>
>> Much past roughly Win9x or so, it has been possible to use
>> "__declspec(dllimport)" on global variables in Windows (in an earlier
>> era, it was not possible to use the __declspec's, but instead
>> necessary to manage DLL import/exports by writing out lists in ".DEF"
>> files).
>>
>> It isn't entirely transparent, but yes, on actual Windows, it is very
>> much possible to share global variables across DLL boundaries.
>>
>>
>> Just, this feature is not (yet) supported by my compiler. Personally,
>> I don't see this as a huge loss (even if it did work; I personally see
>> it as "poor coding practice").
>
> This is a language issue. Or, in C, it is compiler related.
>
> I've never been quite sure how you tell a C compiler to export a certain
> symbol when creating a DLL. Sometimes it just works; I think it just
> exports everything that is not static (it may depend on a compiler
> option too).
>
> And some compilers may need this __declspec business, but I've never
> bothered with it.
>
Yeah. GCC seems to be "export everything".
MSVC needs either __declspec or ".DEF" files.
I am not sure when exactly __declspec started being used for this,
seemingly sometime between "Visual C++ 4.0" and "Visual Studio 2003".
This isn't really documented online to really narrow it down much further.
In my case, I went with using a similar approach to MSVC, namely
explicit export.
Where the normal "extern" storage class is shared between translation
units, but does not cross DLL boundaries.
> Mine just exports all not-static names. So this program:
>
> int abc;
> static int def;
>
> void F(void) {}
> static void G(void) {}
>
> if compiled as: 'mcc -dll prog', produces a file prog.dll which, if I
> dump it, shows this export table:
>
> Export Directory
>
> 0 00000000 0 Fun F
> 1 00000000 0 Var abc
>
> (There's something in it that distinguishes functions from variables,
> but I can't remember the details.)
>
> In any case, in C it can be hit and miss. In my own language, it is more
> controlled: I used an 'export' prefix to export symbols from a program.
>
> (It also conventiently creates interface files to be able to use the DLL
> library from a program. The equivalent of prog.h for my example
> containing the API needed to use it. Rolling that out to C is not
> practical however as my 'export' applies also to things like types and
> enums.)
>
In my case, there are two major ways of invoking the compiler, say:
bgbcc /Fefoo.dll foo.c
Or:
bgbcc -o foo.dll foo.c
Where the compiler uses the file extension, and if it is DLL, it assumes
you want a DLL:
EXE: EXE file, "PBO ABI", fully relocatable.
DLL: DLL file, "PBO ABI", fully relocatable.
SYS: Bare-metal EXE, ABI more like traditional Win32 EXEs.
BIN: ROM image, no EXE headers, no relocs, ...
RIL: RIL Bytecode
OBJ: RIL Bytecode
O: Also RIL Bytecode
S: ASM output.
If no output file is given, it looks at whether it is trying to mimic
GCC style command-line behavior:
No: Assume "foo.exe" as default output.
Yes: Assume "a.exe" as default output.
Where, say, for the GCC-like mode:
-o <name> Output file.
-c Compile only
-E Preprocess only
-S ASM only.
-I<path> Add include path
-L<path> Add library path
-S<path> Add source path (excludes '-S' by itself)
-l<name> Add library
-D<name>[=<value>] #define something
-W<opt> Warning option
-m<tgt> Specify target machine
-f<opt> Specify target option/flag
-O<opt> Specify optimizer option.
-Z<opt> Specify debug option.
-g<opt> Also specify debug option.
...
For libraries, it checks the library path, where for "-l<name>" it will
look for:
lib<name>.<arch>.ril
lib<name>.ril
Assuming static linking in this case (the handling for DLLs is a little
different).
Note that, sort of like with MSVC, any debug data is dumped into
external files, and is not held within the EXE itself. Thus far, it is
fairly limited, mostly as a big ASCII text file with with a vaguely
similar structure to "nm" output.
Note for -E and -S, if no output file is specified, output is dumped to
stdout; and a bare '-' option indicates to read the input from stdin.
This was also partly to mimic GCC behavior.
Technically, to mimic GCC, for Linux and similar, it also symlinks other
tool names back to the BGBCC binary:
bjx2-pel-cc
bjx2-pel-gcc
bjx2-pel-ld
bjx2-pel-as
bjx2-pel-ar
...
Where, if it is called with a name in this form, it assumes that it
needs to try to emulate the respective command-line interface and
behavior (but, this part is still fairly incomplete).
Technically, this 'ar' is very nonstandard and currently can't entirely
emulate the standard behaviors.
But, apart from things like 'ar -c libname.a objfile*', the 'ar' tool
doesn't see much use (so its inability to be used to incrementally
update contents or similar is mostly N/A; could in theory add proper
support for '.a' files if it became an issue).
In this case, the main compiler binary functions like a sort of hydra
that takes over the roles of the entirety of "binutils".
>>> This [somes] my experience of software originating in Linux. This is
>>> why Windows had to acquire CYGWIN then MSYS then WSL. You can't build
>>> the simplest program without involving half of Linux.
>>
>> Yes, and it is really annoying sometimes.
>>
>>
>> For the most part, Linux software builds and works fairly well... if
>> one is using a relatively mainline and relatively up-to-date Linux
>> distro...
>>
>>
>> But, if one is not trying to build in or for a typical Linux style /
>> GNU based userland; it is straight up pain...
>>
>> Like, typically either the "./configure" script is going to go down in
>> a crap-storm of error messages (say, if the shell is not "bash", or
>> some commands it tries to use are absent or don't accept the same
>> command-line arguments, etc); or libraries are going to be missing; or
>> the build just ends up dying due to compiler errors (say, which
>> headers exist are different, or their contents are different, ...).
>
> ./configure is an abomination anyway; I've seen 30,000-line scripts
> which take forever to run, and test things like whether 'printf' is
> supported.
>
Yeah, and it is seemingly a bit of an uphill battle to try to make it
work in any environment that is not "GNU userland with GCC".
In the case of Clang, it seems to actively lie about its identity to try
to make configure and similar willing to accept it.
> But the biggest problem with them is when someone expects a Windows user
> to use that same build process. Of course, ./configure is a Bash script
> using Linux utilities.
>
> It's like someone providing a .BAT file and expecting Linux users to do
> something with it.
>
Yes.
For simple programs, one-liner ".bat" or ".sh" files are fairly effective.
And, then, "Makefile.tgt" or similar for more involved cases.
A lot of the more complex build systems are often either unnecessary, or
indicate a more fundamental problem with the program and its dependency
management (along with other annoyances, like indirectly making
"perl"/"python"/"nodejs"/etc effectively prerequisites to get the
program built).
>>
>> Within the code itself, it often doesn't take much looking to find one
>> of:
>> Pointer arithmetic on "void *";
>> Various GCC specific "__attribute__((whatever))" modifiers;
>> Blobs of GAS specific inline ASM;
>> ...
>>
>>
>> Whereas in more cross-platform code, one will usually find stuff like:
>> #ifdef __GNUC__
>> ... GCC specific stuff goes here ...
>> #endif
>> #ifdef _MSC_VER
>> ... MSVC specific stuff goes here ...
>> #endif
>> ...
>
> Those conditional blocks never list my compiler, funnily enough. (#ifdef
> __MCC__ will do it.)
>
>
I didn't list my own either, which is using __BGBCC__, ...
But, in practice, I can often partly overlap the __BGBCC__ and _MSC_VER
blocks, as a lot of the dialect-specific functionality is closer to MSVC
than GCC (but does support some GCC extensions as well).
There were some differences, like I ended up aligning with GCC and
making it so that "sizeof(long)==sizeof(void *)" rather than
"sizeof(long)==4" for 64-bit targets.
Summarized history:
~ 2001: (During high-school) Wrote a Scheme interpreter.
~ 2003: Started writing the first BGBScript interpreter.
This was around the end of high-school for me.
This interpreter used XML DOM for the ASTs
And AST walking for the interpreter.
It was dead slow...
The language design somewhat resembled JavaScript / ES3.
~ 2006: Rewrote BGBScript interpreter.
Reused much of the core of the Scheme interpreter as a base.
Started gluing on features from ActionScript.
Went over to a bytecode interpreter.
Started experimenting with JIT.
~ 2007:
First BGBCC was written, as a fork off the 2003 BGBScript.
Idea was to try to allow using C as a scripting language.
But, C was not a good scripting language...
BGBCC was repurposed as an FFI generator for BGBScript.
Still used XML-DOM based ASTs, with a Stack-Machine IR.
~ 2008-2013:
Wrote a 3D engine that was originally Doom3 like
Was using some Half-Life based file-formats (for maps/models/etc).
Was using dynamic Phong lighting and stencil shadows (like Doom3).
But, then shifted to copying Minecraft (with a Doom3 style renderer)
Its performance and memory usage was "not good"...
~ 2014: Made BGBScript2 VM
This was a redesign of BGBScript made to more resemble Java and C#.
Simplified some stuff, and made it primarily static typed.
Used stack-machine bytecode
Translated into 3AC traces for interpretation.
This strategy was a lot faster than direct interpretation.
Architecturally, it was similar to the Java JVM.
~ 2015/2016: Made a 2nd Minecraft like 3D engine
Was written in a mixture of C and BS2.
Core engine was C, most game code was BS2.
Was intended to be simpler/faster/lighter than its predecessor.
~ 2016: Started taking an interest in ISA design stuff.
BGBCC was revived, and was made to target SuperH / SH-4.
Ended up going with the WinCE PE/COFF variant for binaries.
Was also using GCC built for SH-4 / PE-COFF as well.
This mutated into my "BJX1" ISA, which was a modified SH-4.
Though, BJX1 turned into a horrid mess.
~ 2018:
The ISA design was rebooted into BJX2.
Basically, a new encoding scheme that was "less horrible".
The new ISA could mostly reuse the old ASM with minor tweaks.
The compiler backend was partly reworked for the new ISA encoding.
But, most of the compiler backend was copy/pasted from BJX1.
~ 2019-present:
The BJX2 effort had continued and expanded somewhat.
The ISA design has mutated a fair bit since it started.
My compiler's backend has, however, turned into a horrible mess.
Not much reason to target BGBCC to mainstream ISAs:
MSVC, GCC, CLang, etc, do well enough...
Some past small scale experiments trying to generate native code on ARM
performed horribly. It seems like, unless I could fix some of the issues
that still plague code generation for my own ISA, there is basically no
hope of being able to compete with GCC on ARM (which seemed to be not
particularly forgiving of crappy code generation, at least on the A53
and A55).
In some ways, BGBCC is a little bit of a throwback vs my BS2VM
(BGBScript2 VM design).
BGBCC:
Originally XML based ASTs.
Organized in linked-lists;
Using strings for node/attribute names;
...
Now, object-based ASTs faking the original XML-based ASTs.
No more string pointers for tag/attribute names.
The Bytecode was mostly unstructured.
Loading the bytecode is effectively a purely linear process.
You run the stack model, and the ops build all the 3AC and metadata.
BS2VM:
Object-based ASTs (conceptually JSON-like);
The bytecode uses a TLV based container format for bytecode.
Stuff is organized into sections and tables.
The metadata has an actual structure.
At present, in both cases, the ASTs use a similar structure internally:
Key-value pairs, with 16-bit keys and 64-bit values.
BGBCC uses type-tagged keys, BS2VM used a different tagging scheme.
Each node holds up to a fixed number of key/value pairs.
If this limit is exceeded, the nodes break-up B-Tree style.
Currently, this limit is 8, with one balancing for memory use.
At one point I did make a 3rd Minecraft-like 3D engine, but mostly
because the prior engine was still too heavyweight to run on an FPGA
board (and I wanted "something" that could run).
Say, my 2nd 3D engine needed around 256MB of RAM to work.
But, the FPGA board I was using has 128MB of RAM, and realistically
going much over 48-64MB of memory use is "seriously pushing it".
So, there was a bunch of effort trying to manage to make a small
"basically functional" Minecraft like 3D engine be able to fit into
around 40MB or so of RAM.
Was mostly successful, at least assuming one doesn't go out far enough
that it is generating new chunks (which somewhat increases its RAM
requirements).
Had a major difference from the second engine in how it managed world
drawing:
Second engine:
Figure out potentially visible chunks (16x16x16 blocks);
Build a vertex array for every potentially visible chunk;
Draw all the visible chunks.
Third engine:
Do spherical raycasts from the camera position;
Build a list of block faces that a ray had hit;
Draw all of the block faces into a vertex array;
Draw the vertex array.
Both engines ended up still using 16x16x16 block chunks, however:
2nd engine had 16x16x16 chunk regions (256x256x256 blocks);
3rd engine had 8x8x8 chunk regions (128x128x128 blocks).
Both used a similar scheme for chunks:
Single block-type, chunk block no data;
2-16 block-types, 4 bits per block;
17-256 block types, 8 bits per block;
257+:
Raw 32-bit block entries (2nd engine)
Unsupported (3rd engine).
With a block layout sorta like, say:
( 7: 0): Block Type
(11: 8): Block Attribute
(15:12): Sky Light (15 if direct view of the sky)
(19:16): Block Light Intensity
(23:20): Block Light Color
(31:24): Depends on engine (eg, block flags).
Where, most of the chunks have fewer than 16 unique blocks (sky being
purely air at a constant sky-light=15; underground mostly solid stone at
sky-light=0, ...). Where, say, when rebuilding the vertex array, the
sky-light level is multiplied with the light-level of the sky (based on
a day/night cycle) with the block-light intensity and color being added
on, to give the final face vertex color.
Though, one big tradeoff is that the computational cost of the
third-engine's strategy scales very poorly with draw distance.
And, unlike Wolfenstein3D or ROTT, the number of rays needed to fully
cover the screen with a ray sweep is impractical for 3D:
Wolf3D/ROTT: 320 ray sweeps, in 2D;
Minecraft like:
2000 if we disregard block-faces smaller than 4 pixels (at 320x200).
Though, one can reduce the ray-sweep density by applying a random jitter
to the rays and discarding any faces that haven't been hit recently.
I had used a full spherical sweep rather than a frustum sweep, as with
an asynchronous "run the ray sweep at 4 times per second or so", a
frustum sweep will result in big holes in the world whenever one turns
the camera. With a spherical sweep, everything is already there (so
looking around doesn't result in big ugly holes being visible), but does
lessen the amount of rays one can cast in the forward direction, along
with increasing the number of visible block-faces (since the block-faces
are still being processed even if they are outside the area being looked
at).
Partly to limit both costs, and required ray density, the rays will
simply stop after a certain distance (if it doesn't hit anything).
Note that if one sets a limit of, say, 16000 block faces, then this also
sets an upper limit on how much memory they need for the vertex arrays
(which is also somewhat less than the memory required to build full
vertex arrays for every chunk within the current draw distance).
Though, for a slow periodic update and small draw distance, it is
possible to outrun the visible part of the terrain (as the raycast and
vertex arrays lag behind the current position of the camera); along with
temporary holes opening up whenever previously occluded areas come into
view. These would be less of an issue with a faster raycast update
though (say, 10 or 15Hz), but, on a 50MHz CPU, this is asking a lot.
Didn't really make any interesting game out of this, it was more of a
technical experiment than anything else.
...