Hey Paulo,
Unfortunately, we do not have an up-to-date porting guide for the compiler. Fortunately, porting Chez Scheme is relatively straightforward, and we have a couple of RISC examples (ARM 32-bit Little Endian and PPC 32-bit Big Endian) implemented, which should help out as a starting point for porting.
While the following is not a full porting guide, I've tried to walk through the compiler a bit and give at least a sketch of how to approach this. We've worked to keep machine and operating system dependant-implementation details modular, so that they can be replaced when porting to a new hardware platform or operating system.
# A quick overview of the compiler, libraries, and run time
The majority of the Chez Scheme compiler and libraries are implemented in Scheme and can be found in the 's' (for Scheme) subdirectory. The run time (including the garbage collector, support for interacting with the operating system, and some of the more complicated math library support) are implemented in C and can be found in the 'c' directory.
Porting to a new system requires both getting the C run time compiled on the new platform and updating the Scheme compiler to generate machine code for the platform. There are several places where the C run time and code generated by the compiler need to work in harmony in order to get the system to run. For instance, the C run time needs to know the type tags, sizes, and field offsets into Scheme objects, so that the garbage collector in the C run time can do its job. This is handled by having the Scheme compiler generate a couple of C headers: scheme.h and equates.h, that the contain the information about the Scheme compiler the C run time needs to do its job.
Chez Scheme is a boot-strapped compiler meaning you need a Chez Scheme compiler to build a Chez Scheme compiler. In the case of porting to a new platform, you'll need to work from an already supported host to cross-compile the boot files and produce the header files. These can then be moved to the target platform and the C run time can be compiled with the generated header files. Once you have all of the pieces working together, you can run the Chez Scheme compiler on the new machine to produce native-built boot files and run the tests in the 'mats' directory.
# Porting to a new platform
Chez Scheme assigns a 'machine-type' name to each platform it runs on. The 'machine-type' currently carries three pieces of information:
1. is the system threaded? ('t' indicating it is, vs. nothing indicating it is not threaded);
2. the hardware platform: i3 for x86, a6 for x86_64, arm32 for 32-bit ARM, and ppc32 for 32-bit PPC; and
3. the operating system: le for Linux, nt for Windows, osx for macOS, etc.
For instance ta6osx, is threaded-Chez Scheme for x86_64 machines running macOS. You'll need to pick a new machine type name for the new support, as a first step. These machine types are also stored in the cmacros.ss file, in the 'define-machine-types' list, and you'll need to add your new machine type to this list.
## build support for new platforms
The 'workarea' script in the root of the Chez Scheme project is used to generate a subdirectory with the appropriate contents to build for that particular machine. This is the script the configure script runs when configuring for doing the build, but you can also run the 'workarea' script on your own, supplying the machine type you'd like to build. As you walk through this file, you'll see where machine type names sprinkled through the file where it needs to do things specific for the machine type.
One thing you'll notice, if you look in the 's', 'c', and 'mats' directories is that they all contain files of the format 'Mf-[machine-type]'. These files are Makefiles that contain platform-specific settings for the various platforms, and you'll need to create a version of each of these Makefiles for your new platform. I'd recommend starting from one that is similar to the platform you are porting to. In the 'c' and 'mats' directories, these are mostly differences in settings for the host C compiler. The 's' directory 'Mf-[machine-type]' files record the machine type being compiled and target-specific source files to be included in the compiler.
You'll also notice in the 's' directory that there is a '[machine-type].def' directory for each Chez Scheme machine type. This file contains machine-specific information, like the size of various integer types, the endianness of the machine, and the name of the machine architecture file. You'll need to add a version of this file for your new platform.
With this, you should be able to use the 'workarea' script to give you a new directory for your machine type. You may also want to create an architecture specific file (like the 'x86.ss', 'x86_64.ss', 'ppc32.ss', or 'arm32.ss' files). One way to create this is by copying an existing one for a similar architecture (say 'arm32.ss') to use as a starting point, this can be helpful as you get started, because it can allow you to build the C header files you need to compile the C run time and get that part compiling before you've finished the full port of the backend.
### Other build support files
Once you've got things working, you might want to update the configure script so that you can configure for the new platform and the 'bintar' script which can be used to package up a tar-ball of the Chez Scheme binary.
# Using the Chez Scheme cross-compiler
In the 's' directory, you'll find a file called 'Mf-cross', which is the Makefile for invoking the cross compiler. Before you can use it, you'll need to have a scheme executable for the host machine built. By default Mf-cross will look for the built binary to be checked-in to the root 'bin/<machine-type>' and 'boot/<machine-type>' directories for the machine. You can either use the 'checkin' script to check in the binaries from a built host-machine image (the result of running './configure' and 'make') or you can tell 'Mf-cross' where to look for the scheme binary. You'll also want to tell it the machine type for your target. The call looks like the following:
make -f Mf-cross m=<host-machine> xm=<target-machine> Scheme="../../<host-machine>/bin/<host-machine>/scheme -b ../../<host-machine>/boot/<host-machine>/petite.boot -b ../../<host-machine>/boot/<host-machine>/scheme.boot"
Where the 'Scheme' argument is unnecessary if you have checked in the host machine scheme binary and boot files. For instance, if I was building a 32-bit x86 Linux image from a 64-bit x86_64 macOS host, I would do the following:
make -f Mf-cross m=a6osx xm=i3le Scheme="../../a6osx/bin/a6osx/scheme -b ../../a6osx/boot/a6osx/petite.boot -b ../../a6osx/boot/a6osx/scheme.boot"
Once this is done, you should find boot and header files for the target machine in the <machine-type>/boot directory (../<machine-type>/boot from where you ran the make in the 's' directory). There should be four files here: 'petite.boot', 'scheme.boot', 'equates.h', and 'scheme.h'. These files will need to be transfered to your target machine (or provided for you C cross-compiler for the target machine) to build a 'scheme' binary for the target machine.
The startup process for the 'scheme' binary checks integers sizes and the like that were provided by the 'machine.def' file match the integer sizes the C compiler expects. If something goes wrong it will report that the size of one of these types doesn't match and it is usually an indication that there is something wrong in the 'machine.def' file.
Of course, we expect that 'scheme' is going to fail when it tries to load the boot files, because we've not yet done anything to replace the machine-specific back-end for the new architecture, and it is generating machine files for a different machine type.
# Replacing the machine-specific backend
As mentioned earlier, the machine-specific contents of the compiler are stored in separate files named for the target machine: 'x86.ss', 'x86_64.ss', 'arm32.ss', and 'ppc32.ss'. Each one of these files is broken up into three sections.
## Section One: Registers
Section one provides a definition of the registers available on the machine, along with a mapping into Chez scheme's task specific registers: %tc, %sfp, %ap, %trap, and potentially others. The %tc is the thread context and must get a register for storing information about the running session, and in-threaded versions, the context for the currently running thread. The %sfp is the Scheme frame pointer. Chez scheme does not use the architectural stack for its calls, leaving it where it is for use by the foreign-function interface. The %trap register keeps track of when the current scheme system should be paused to check for interrupts, requests for garbage collection, etc. that have come in since it was last checked. This register counts down towards zero as functions are executed. Additional registers include %esp, the end of stack pointer and %eap, the end of allocation pointer. When these are not specified as real registers, they become 'virtual registers' in the data structure pointed to by the %tc. The detail here is less important, other than knowing that at least a handful of required registers must be set aside. The rest of the compiler should do some checking to make sure the set it was handed is sane. The C argument registers must also be specified so that when foreign function interface knows what registers to save and restore around foreign function calls.
This register information along with the 'machine.def' file determine how many registers are used in the scheme calling conventions and which ones will be used (in the allocable) list. Other arguments are passed in the stack.
## Section Two: Instructions
Section two provides a mapping from generic operations (like -, +, etc.) into machine-specific variations. This section is used by the np-select-instructions! pass to translate the primitive scheme operators, lowered in the np-expand-primitives pass, to machine specific representations. The job of this section is to make sure that all registers used as input to instructions, set as the result of instructions, etc. are made explicit to the machine-independent section of the compiler. This allows the register allocator to determine register locations for local variables without being aware of the target machine details.
On a platform like x86, there are many special cases where only certain registers can be used for certain instructions or where additional registers are killed during an instructions use, so Chez Scheme has support for this, but hopefully you won't need much of that on Risc-V.
This section is also written using the define-instruction DSL defined at the top of the file. This is mostly to make repeated operations (like I found a memory argument where I needed a register argument) simple. We've defined this for each machine type, but this has mostly been done by copying and updating
Finally, this section produces expressions that contain a procedure slot that tells the assembler how to build the particular instruction. These procedures are defined in Section three.
## Section Three: Assembler and FFI
The final section has both the assembler and foreign-function interface support built into it. This module is named the 'asm-module' and exports a number of functions. It is worth noting that Chez scheme produces machine code directly, instead of relying on a system provided assembler. The machine code is generated through the emit-code calls. These specify the bit-layout of each instruction variation. In some cases several instructions have the same bit-layout, differing only in the op-code, so you generally can share several of these with a few functions to do the emit.
The asm-foreign-call and asm-foreign-callable functions generate the ABI-specified calling conventions to call foreign procedures and to allow scheme procedures to be called from foreign functions.
You will notice in the scheme calling conventions that there are spaces to allow for code to be rewritten by the linker. Chez Scheme uses its own linker, since garbage collection can lead to code moving and requiring relinking and because new code can be generated or loaded during a session requiring relink.
# Updating the linker
The link is largely embedded in the main compiler file: 'compile.ss' You'll see in the function 'c-faslcode' a 'constant-case' over the architecture that generates slightly different code for the different machine-specific linking elements, for instance the arm has 'arm32-abs', 'arm32-call', and 'arm32-jump'. These all correspond to places where the linker may need to link in Scheme constants and calling information. Each one produces a relocation record (or 'reloc') that the linker uses to figure out where items need to be rewritten in the binary on load.
The reloc constants are defined in cmacros.ss along. If you need to add additional machine-specific reloc constants (and you almost certainly will) these are added here in the 'define-reloc-constants' call. You will also need to add (or have already added by the time you get to this point) the new types to the 'define-machine-types' entry in this file.
# Conclusion
This is a rough sketch, largely from memory and walking a bit through source code, so I may have missed a bit here and there. If you do decide to take a stab at this and run into problems, let me know. It would be great to get feedback on how this goes to improve this document as well.
Thanks and hope that helps,
-andy:)