Chez Scheme on RISC-V

Paulo Matos

unread,

Feb 3, 2018, 2:03:15 PM2/3/18

to chez-scheme

Hi,

I would like to see Chez ported to riscv. How involved/complex is the process?
Is there any documentation hinting at how to start?

Kind regards,

Paulo Matos

Andy Keep

unread,

Feb 20, 2018, 9:12:55 PM2/20/18

to chez-scheme, Paulo Matos

Hey Paulo,

Unfortunately, we do not have an up-to-date porting guide for the compiler. Fortunately, porting Chez Scheme is relatively straightforward, and we have a couple of RISC examples (ARM 32-bit Little Endian and PPC 32-bit Big Endian) implemented, which should help out as a starting point for porting.

While the following is not a full porting guide, I've tried to walk through the compiler a bit and give at least a sketch of how to approach this. We've worked to keep machine and operating system dependant-implementation details modular, so that they can be replaced when porting to a new hardware platform or operating system.

# A quick overview of the compiler, libraries, and run time

The majority of the Chez Scheme compiler and libraries are implemented in Scheme and can be found in the 's' (for Scheme) subdirectory. The run time (including the garbage collector, support for interacting with the operating system, and some of the more complicated math library support) are implemented in C and can be found in the 'c' directory.

Porting to a new system requires both getting the C run time compiled on the new platform and updating the Scheme compiler to generate machine code for the platform. There are several places where the C run time and code generated by the compiler need to work in harmony in order to get the system to run. For instance, the C run time needs to know the type tags, sizes, and field offsets into Scheme objects, so that the garbage collector in the C run time can do its job. This is handled by having the Scheme compiler generate a couple of C headers: scheme.h and equates.h, that the contain the information about the Scheme compiler the C run time needs to do its job.

Chez Scheme is a boot-strapped compiler meaning you need a Chez Scheme compiler to build a Chez Scheme compiler. In the case of porting to a new platform, you'll need to work from an already supported host to cross-compile the boot files and produce the header files. These can then be moved to the target platform and the C run time can be compiled with the generated header files. Once you have all of the pieces working together, you can run the Chez Scheme compiler on the new machine to produce native-built boot files and run the tests in the 'mats' directory.

# Porting to a new platform

Chez Scheme assigns a 'machine-type' name to each platform it runs on. The 'machine-type' currently carries three pieces of information:

1. is the system threaded? ('t' indicating it is, vs. nothing indicating it is not threaded);

2. the hardware platform: i3 for x86, a6 for x86_64, arm32 for 32-bit ARM, and ppc32 for 32-bit PPC; and

3. the operating system: le for Linux, nt for Windows, osx for macOS, etc.

For instance ta6osx, is threaded-Chez Scheme for x86_64 machines running macOS. You'll need to pick a new machine type name for the new support, as a first step. These machine types are also stored in the cmacros.ss file, in the 'define-machine-types' list, and you'll need to add your new machine type to this list.

## build support for new platforms

The 'workarea' script in the root of the Chez Scheme project is used to generate a subdirectory with the appropriate contents to build for that particular machine. This is the script the configure script runs when configuring for doing the build, but you can also run the 'workarea' script on your own, supplying the machine type you'd like to build. As you walk through this file, you'll see where machine type names sprinkled through the file where it needs to do things specific for the machine type.

One thing you'll notice, if you look in the 's', 'c', and 'mats' directories is that they all contain files of the format 'Mf-[machine-type]'. These files are Makefiles that contain platform-specific settings for the various platforms, and you'll need to create a version of each of these Makefiles for your new platform. I'd recommend starting from one that is similar to the platform you are porting to. In the 'c' and 'mats' directories, these are mostly differences in settings for the host C compiler. The 's' directory 'Mf-[machine-type]' files record the machine type being compiled and target-specific source files to be included in the compiler.

You'll also notice in the 's' directory that there is a '[machine-type].def' directory for each Chez Scheme machine type. This file contains machine-specific information, like the size of various integer types, the endianness of the machine, and the name of the machine architecture file. You'll need to add a version of this file for your new platform.

With this, you should be able to use the 'workarea' script to give you a new directory for your machine type. You may also want to create an architecture specific file (like the 'x86.ss', 'x86_64.ss', 'ppc32.ss', or 'arm32.ss' files). One way to create this is by copying an existing one for a similar architecture (say 'arm32.ss') to use as a starting point, this can be helpful as you get started, because it can allow you to build the C header files you need to compile the C run time and get that part compiling before you've finished the full port of the backend.

### Other build support files

Once you've got things working, you might want to update the configure script so that you can configure for the new platform and the 'bintar' script which can be used to package up a tar-ball of the Chez Scheme binary.

# Using the Chez Scheme cross-compiler

In the 's' directory, you'll find a file called 'Mf-cross', which is the Makefile for invoking the cross compiler. Before you can use it, you'll need to have a scheme executable for the host machine built. By default Mf-cross will look for the built binary to be checked-in to the root 'bin/<machine-type>' and 'boot/<machine-type>' directories for the machine. You can either use the 'checkin' script to check in the binaries from a built host-machine image (the result of running './configure' and 'make') or you can tell 'Mf-cross' where to look for the scheme binary. You'll also want to tell it the machine type for your target. The call looks like the following:

make -f Mf-cross m=<host-machine> xm=<target-machine> Scheme="../../<host-machine>/bin/<host-machine>/scheme -b ../../<host-machine>/boot/<host-machine>/petite.boot -b ../../<host-machine>/boot/<host-machine>/scheme.boot"

Where the 'Scheme' argument is unnecessary if you have checked in the host machine scheme binary and boot files. For instance, if I was building a 32-bit x86 Linux image from a 64-bit x86_64 macOS host, I would do the following:

make -f Mf-cross m=a6osx xm=i3le Scheme="../../a6osx/bin/a6osx/scheme -b ../../a6osx/boot/a6osx/petite.boot -b ../../a6osx/boot/a6osx/scheme.boot"

Once this is done, you should find boot and header files for the target machine in the <machine-type>/boot directory (../<machine-type>/boot from where you ran the make in the 's' directory). There should be four files here: 'petite.boot', 'scheme.boot', 'equates.h', and 'scheme.h'. These files will need to be transfered to your target machine (or provided for you C cross-compiler for the target machine) to build a 'scheme' binary for the target machine.

The startup process for the 'scheme' binary checks integers sizes and the like that were provided by the 'machine.def' file match the integer sizes the C compiler expects. If something goes wrong it will report that the size of one of these types doesn't match and it is usually an indication that there is something wrong in the 'machine.def' file.

Of course, we expect that 'scheme' is going to fail when it tries to load the boot files, because we've not yet done anything to replace the machine-specific back-end for the new architecture, and it is generating machine files for a different machine type.

# Replacing the machine-specific backend

As mentioned earlier, the machine-specific contents of the compiler are stored in separate files named for the target machine: 'x86.ss', 'x86_64.ss', 'arm32.ss', and 'ppc32.ss'. Each one of these files is broken up into three sections.

## Section One: Registers

Section one provides a definition of the registers available on the machine, along with a mapping into Chez scheme's task specific registers: %tc, %sfp, %ap, %trap, and potentially others. The %tc is the thread context and must get a register for storing information about the running session, and in-threaded versions, the context for the currently running thread. The %sfp is the Scheme frame pointer. Chez scheme does not use the architectural stack for its calls, leaving it where it is for use by the foreign-function interface. The %trap register keeps track of when the current scheme system should be paused to check for interrupts, requests for garbage collection, etc. that have come in since it was last checked. This register counts down towards zero as functions are executed. Additional registers include %esp, the end of stack pointer and %eap, the end of allocation pointer. When these are not specified as real registers, they become 'virtual registers' in the data structure pointed to by the %tc. The detail here is less important, other than knowing that at least a handful of required registers must be set aside. The rest of the compiler should do some checking to make sure the set it was handed is sane. The C argument registers must also be specified so that when foreign function interface knows what registers to save and restore around foreign function calls.

This register information along with the 'machine.def' file determine how many registers are used in the scheme calling conventions and which ones will be used (in the allocable) list. Other arguments are passed in the stack.

## Section Two: Instructions

Section two provides a mapping from generic operations (like -, +, etc.) into machine-specific variations. This section is used by the np-select-instructions! pass to translate the primitive scheme operators, lowered in the np-expand-primitives pass, to machine specific representations. The job of this section is to make sure that all registers used as input to instructions, set as the result of instructions, etc. are made explicit to the machine-independent section of the compiler. This allows the register allocator to determine register locations for local variables without being aware of the target machine details.

On a platform like x86, there are many special cases where only certain registers can be used for certain instructions or where additional registers are killed during an instructions use, so Chez Scheme has support for this, but hopefully you won't need much of that on Risc-V.

This section is also written using the define-instruction DSL defined at the top of the file. This is mostly to make repeated operations (like I found a memory argument where I needed a register argument) simple. We've defined this for each machine type, but this has mostly been done by copying and updating

Finally, this section produces expressions that contain a procedure slot that tells the assembler how to build the particular instruction. These procedures are defined in Section three.

## Section Three: Assembler and FFI

The final section has both the assembler and foreign-function interface support built into it. This module is named the 'asm-module' and exports a number of functions. It is worth noting that Chez scheme produces machine code directly, instead of relying on a system provided assembler. The machine code is generated through the emit-code calls. These specify the bit-layout of each instruction variation. In some cases several instructions have the same bit-layout, differing only in the op-code, so you generally can share several of these with a few functions to do the emit.

The asm-foreign-call and asm-foreign-callable functions generate the ABI-specified calling conventions to call foreign procedures and to allow scheme procedures to be called from foreign functions.

You will notice in the scheme calling conventions that there are spaces to allow for code to be rewritten by the linker. Chez Scheme uses its own linker, since garbage collection can lead to code moving and requiring relinking and because new code can be generated or loaded during a session requiring relink.

# Updating the linker

The link is largely embedded in the main compiler file: 'compile.ss' You'll see in the function 'c-faslcode' a 'constant-case' over the architecture that generates slightly different code for the different machine-specific linking elements, for instance the arm has 'arm32-abs', 'arm32-call', and 'arm32-jump'. These all correspond to places where the linker may need to link in Scheme constants and calling information. Each one produces a relocation record (or 'reloc') that the linker uses to figure out where items need to be rewritten in the binary on load.

The reloc constants are defined in cmacros.ss along. If you need to add additional machine-specific reloc constants (and you almost certainly will) these are added here in the 'define-reloc-constants' call. You will also need to add (or have already added by the time you get to this point) the new types to the 'define-machine-types' entry in this file.

# Conclusion

This is a rough sketch, largely from memory and walking a bit through source code, so I may have missed a bit here and there. If you do decide to take a stab at this and run into problems, let me know. It would be great to get feedback on how this goes to improve this document as well.

Thanks and hope that helps,

-andy:)

--
You received this message because you are subscribed to the Google Groups "chez-scheme" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chez-scheme...@googlegroups.com.
To post to this group, send email to chez-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chez-scheme/2b0ebd5f-c9f5-4ddb-8c84-d6f0962b1351%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Paulo Matos

unread,

Feb 21, 2018, 4:48:44 AM2/21/18

to Andy Keep, chez-scheme

On 21/02/18 03:12, Andy Keep wrote:

(*massive snip*)

> # Conclusion
>
> This is a rough sketch, largely from memory and walking a bit through
> source code, so I may have missed a bit here and there. If you do
> decide to take a stab at this and run into problems, let me know. It
> would be great to get feedback on how this goes to improve this document
> as well.
>

Wow, thanks Andy. I can't thank you enough for this description. I will
be looking at this in detail in the near future. Are you able to put
this somewhere (github wiki page)? Or do you give me permissions (you
wrote it so, you own it :)) to do so? That would make it easier for me
to create notes as I go along and possibly polish on some missing details.

What do you think?

Thanks,

Paulo Matos

>
> Thanks and hope that helps,
> -andy:)
>
> On February 3, 2018 at 2:03:17 PM, 'Paulo Matos' via chez-scheme

> (chez-...@googlegroups.com <mailto:chez-...@googlegroups.com>) wrote:
>
>> Hi,
>>
>> I would like to see Chez ported to riscv. How involved/complex is the
>> process?
>> Is there any documentation hinting at how to start?
>>
>> Kind regards,
>>
>> Paulo Matos
>> --
>> You received this message because you are subscribed to the Google
>> Groups "chez-scheme" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>> an email to chez-scheme...@googlegroups.com

>> <mailto:chez-scheme...@googlegroups.com>.

>> To post to this group, send email to chez-...@googlegroups.com

>> <mailto:chez-...@googlegroups.com>.

>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/chez-scheme/2b0ebd5f-c9f5-4ddb-8c84-d6f0962b1351%40googlegroups.com

>> <https://groups.google.com/d/msgid/chez-scheme/2b0ebd5f-c9f5-4ddb-8c84-d6f0962b1351%40googlegroups.com?utm_medium=email&utm_source=footer>.

>> For more options, visit https://groups.google.com/d/optout.

--
Paulo Matos

Andy Keep

unread,

Feb 21, 2018, 9:53:54 AM2/21/18

to chez-scheme, Paulo Matos

I'd like to run this past a few other people who've worked on the compiler, but then I would like to add this to the Chez Scheme GitHub repo as part of the documentation.

-andy:)

Paulo Matos

unread,

Feb 21, 2018, 10:22:44 AM2/21/18

to Andy Keep, chez-scheme

On 21/02/18 15:53, Andy Keep wrote:
> I'd like to run this past a few other people who've worked on the
> compiler, but then I would like to add this to the Chez Scheme GitHub
> repo as part of the documentation.
>

Sounds great. Thanks.

> -andy:)
>
>
>
> On February 21, 2018 at 4:48:44 AM, Paulo Matos (pma...@linki.tools

> <mailto:pma...@linki.tools>) wrote:
>
>>
>>
>> On 21/02/18 03:12, Andy Keep wrote:
>>
>> (*massive snip*)
>>
>> > # Conclusion
>> >
>> > This is a rough sketch, largely from memory and walking a bit through
>> > source code, so I may have missed a bit here and there. If you do
>> > decide to take a stab at this and run into problems, let me know. It
>> > would be great to get feedback on how this goes to improve this document
>> > as well.
>> >
>>
>> Wow, thanks Andy. I can't thank you enough for this description. I will
>> be looking at this in detail in the near future. Are you able to put
>> this somewhere (github wiki page)? Or do you give me permissions (you
>> wrote it so, you own it :)) to do so? That would make it easier for me
>> to create notes as I go along and possibly polish on some missing
>> details.
>>
>> What do you think?
>>
>> Thanks,
>>
>> Paulo Matos
>>
>> >
>> > Thanks and hope that helps,
>> > -andy:)
>> >
>> > On February 3, 2018 at 2:03:17 PM, 'Paulo Matos' via chez-scheme
>> > (chez-...@googlegroups.com <mailto:chez-...@googlegroups.com>

>> <mailto:chez-...@googlegroups.com

>> <mailto:chez-...@googlegroups.com>>) wrote:
>> >
>> >> Hi,
>> >>
>> >> I would like to see Chez ported to riscv. How involved/complex is the
>> >> process?
>> >> Is there any documentation hinting at how to start?
>> >>
>> >> Kind regards,
>> >>
>> >> Paulo Matos
>> >> --
>> >> You received this message because you are subscribed to the Google
>> >> Groups "chez-scheme" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> >> an email to chez-scheme...@googlegroups.com

>> <mailto:chez-scheme%2Bunsu...@googlegroups.com>
>> >> <mailto:chez-scheme...@googlegroups.com
>> <mailto:chez-scheme%2Bunsu...@googlegroups.com>>.

>> >> To post to this group, send email to chez-...@googlegroups.com <mailto:chez-...@googlegroups.com>

>> >> <mailto:chez-...@googlegroups.com <mailto:chez-...@googlegroups.com>>.

>> >> To view this discussion on the web visit
>> >> https://groups.google.com/d/msgid/chez-scheme/2b0ebd5f-c9f5-4ddb-8c84-d6f0962b1351%40googlegroups.com
>>
>> >> <https://groups.google.com/d/msgid/chez-scheme/2b0ebd5f-c9f5-4ddb-8c84-d6f0962b1351%40googlegroups.com?utm_medium=email&utm_source=footer>.
>>
>> >> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> Paulo Matos
>
> --
> You received this message because you are subscribed to the Google
> Groups "chez-scheme" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to chez-scheme...@googlegroups.com
> <mailto:chez-scheme...@googlegroups.com>.
> To post to this group, send email to chez-...@googlegroups.com
> <mailto:chez-...@googlegroups.com>.
> To view this discussion on the web visit

> https://groups.google.com/d/msgid/chez-scheme/CANbdEPpRBD-nYd-bMFQhN9Jc8n643EuzxF7K3_TTOu0gofZGmw%40mail.gmail.com
> <https://groups.google.com/d/msgid/chez-scheme/CANbdEPpRBD-nYd-bMFQhN9Jc8n643EuzxF7K3_TTOu0gofZGmw%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Paulo Matos

unread,

Mar 22, 2018, 3:26:36 AM3/22/18

to Andy Keep, chez-scheme

On 21/02/18 03:12, Andy Keep wrote:

> You'll also notice in the 's' directory that there is a
> '[machine-type].def' directory for each Chez Scheme machine type.
> This file contains machine-specific information, like the size of
> various integer types, the endianness of the machine, and the name of
> the machine architecture file. You'll need to add a version of this
> file for your new platform.
>

Hi,

As I move forward with this I was faces with two questions. What is it
meant for Chez that an arch is threaded?

I noticed that Chez has tarm32le and arm32le, however there's no
Mf-tarm32le or tarm32le.def, so I wonder if it's being used at all and
if it's useful to define a trv64le (for the case of riscv).

Another question is on the definition of a new rv64le.def:

What's the meaning of asm-arg-reg-max, asm-arg-reg-cnt, and
segment-table-levels? Also, I
assume but would like to confirm that integer-divide-instruction should
be true if there's an integer divide instruction? And software floating
point should be true if there's no hardware support for floating point.
Is this the case?

Regards,

--
Paulo Matos

Andy Keep

unread,

Apr 8, 2018, 12:50:47 AM4/8/18

to chez-scheme, Paulo Matos

Hey Paulo,

Answers inline below:

As I move forward with this I was faces with two questions. What is it
meant for Chez that an arch is threaded?

Chez Scheme treats threaded and non-threaded targets slightly differently, partially to ensure that the non-threaded platform doesn’t need to pay for the additional overhead necessary for the threaded version of the compiler.

I noticed that Chez has tarm32le and arm32le, however there's no
Mf-tarm32le or tarm32le.def, so I wonder if it's being used at all and
if it's useful to define a trv64le (for the case of riscv).

I don’t think any of the maintainers are currently building or using the arm port on a regular basis, although there are some users that have reported success in getting this built. I’m sure no one is building the threaded version, if we are missing the threaded make file. If you don’t have a need to support the pthread-enabled version of Chez Scheme, I’d probably recommend not bothering with creating the threaded version, especially since this would just be untested code in the compiler if you are not using it.

Another question is on the definition of a new rv64le.def:

What's the meaning of asm-arg-reg-max, asm-arg-reg-cnt, and
segment-table-levels? Also, I
assume but would like to confirm that integer-divide-instruction should
be true if there's an integer divide instruction? And software floating
point should be true if there's no hardware support for floating point.
Is this the case?

asm-arg-reg-max is the maximum number of registers available for use when passing procedure arguments. It is expected to correspond to the number of “extra” allocatable registers listed in the assembler's ``define-registers`` form. The allocable registers are expected to provide registers for %ac0, %xp, %ts, and %td which are used for some of “hand-coded” forms like the call/cc, continuation invoke, and rest-args handlers. Additionally %ac1, %yp, %cp, and %ret can be specified. Any allocable register beyond the initial four to eight specific purpose registers, are considered eligible for use as argument registers. The specific registers are used for:

%ac0 is used for passing the argument count to functions (needed to support argument count check and dispatch for calls where the function is not “known” (e.g. the function definition is not immediately visible to to the caller).

%xp is used during allocation for the computed allocation spot (and in various places in the hand-coded functions).

%ts and %td are special temporaries used in the hand-coded functions.

Since some of the hand-coded functions (like the one that deals with rest arguments) are called at the start of a function, before the normal argument handling is done, these cannot be used for passing arguments.

%ac1 and %yp are both also used for hand-coded functions, but can be ‘auxiliary’ registers, which are memory locations referenced from the %tc register.

%cp is used for the closure pointer (again this is a memory location off the %tc if not specified).

%ret is the function return pointer (e.g. the pointer back to the caller’s next instruction). We’ve mostly stopped putting this in the register after experimenting with it, keeping it on the stack seems to work out pretty decently.

Anyway, any other allocable register is considered a potential arg.

asm-arg-reg-cnt is the actual number of registers used for arguments. Registers are used for the first asm-arg-reg-cnt arguments and additional arguments are passed on the stack.

segment-table-levels corresponds to the number of levels for the tables used to track the memory segments. For 32-bit platforms this can be 1 or 2, for 64-bit platforms this can be 2 or 3. The layout for the segment tables is defined in cmacros.ss. The lower numbers mean larger segment table entries, but fewer steps to walk through it, the higher numbers mean smaller tables, but more walking to get through the table.

integer-divide-instruction is indeed to indicate if there is a divide instruction available on the machine, or if a library routine must be used for the divide.

software-floating-point is indeed used for specifying that software should be used for floating point because there is no hardware floating point or because there is a preference to use software floating point (perhaps because the host OS was compiled to use it). Currently, this only seems to be used in the ppc32.ss code.

-andy:)

Regards,

--
Paulo Matos

maoif

unread,

May 10, 2022, 5:24:03 AM5/10/22

to chez-scheme

Hi Paulo and Andy,

Currently I've succeeded in cross-compiling the boot file. I'm using a qemu-riscv environment running Debian.

Problem is there's a dead loop bug that I've tried to kill for so long but it's still there, which will cause the boot file loading process to dead loop after the 136th invokation to S_generic_invoke(), and it keeps using FFI to call S_mul() over and over again.

I wonder if you could take some time to help debug it. The repo is here: https://github.com/maoif/ChezScheme. Main files added: s/riscv.64ss, s/rv64le.def, s/Mf-rv64le, linker support in c/fasl.c, and cpu cache management in c/rv64le.ss.

The following is some shell commands needed:

crossing compile (after configuring for, say, a6le):

make boot XM=rv64le -j N

compile in riscv VM:

./configure --installprefix=`pwd`/build --disable-x11 -m=rv64le

If you have any questions about my code, please ask.

Regards,

Mao Yifu

maoif

unread,

May 26, 2022, 10:39:31 PM5/26/22

to chez-scheme

Hi,

ChezScheme has been ported to RISC-V; bootstrap succeeds. See https://github.com/maoif/ChezScheme. Configure and make:

./configure -m=rv64le

make

FFI support is limited.

On Sunday, February 4, 2018 at 3:03:15 AM UTC+8 Paulo Matos wrote:

Reply all

Reply to author

Forward