On Thu, Jun 05, 2025 at 10:26:14AM +0800, Qian Yun wrote:
> On 6/5/25 3:22 AM, Waldek Hebisch wrote:
> > Today we have 59872 lines of Lisp and Boot code in src/interp
> > (4100 lines of Lisp and 55772 lines of Boot). That is decrease
> > by 1522 compared to FriCAS 1.3.11. As I wrote eventually
> > I would like to limit Lisp to lowest level support code and
> > move other functionality to Spad code.
>
> 1. Do you think this development process could be done
> incrementally? (i.e. replace the compiler component by component)
Mostly yes. In case of compiler (and probably some other
components) we probably will need some period of parallel
developement, that is having old component in place but
having capability to replace it by new one so that we can test
compatibility.
More specifically, to compile Spad code we need working Spad
compiler. Replacement almost surely will use different data
structures, which means that part which we replace will be
sizeable. Currenly Spad compiler has several stages,
first 3 stages, that is reading files and include handling,
scanner and pile handling are shared with the interpreter.
After that there is Spad specific part which changes symbols
and affects pile handling (scwrap2.boot). There is part
which mostly helps in algebra bootstrap but also contains
calls to compiler passes (ncomp.boot). There parser
(s-pares.boot). There are 2 transformation passes
(postpar.boot and parse.boot) run after parser. Biggest
part of Spad compiler takes (transformed) parse tree and
environment as arguments and produces Lisp via calls to
later stages. Entry point to the compiler is 'compTopLevel',
main "driver" is 'comp' ('compTopLevel' mostly sets up
things to properly print error messages and calls 'comp').
'comp' recursively handles various Spad constructs.
Definitions (constructors and functions) are handled in
'define.boot'. 'functor.boot' generates code to initialize
constructors (part of this is delegated to 'nruncomp.boot').
'iterator.boot' handles Spad looping constructs.
'apply.boot' handles function calls.
Environment keeps information about visible declarations,
at entry to the compiler it is empty, but in recursive calls
it contains info from upper stages. Environment handling is
partially shared with the interpreter. Part of environment
handling is in 'i-intern.boot'. However, 'modemap.boot'
is related as it puts a lot of information into environment.
Global information is kept in databases, several places in
compiler query databases and put slightly changed information
in the environment.
Compiler uses runtime system, in particular categories in
compiler sometimes are represented by Lisp S-expression,
but frequently those S-expression are evaluated to get
runtime representation of a category. In particular this
is done to produce operation list for a domain/package
(compiler effectively produces fake category reprezenting
type of domain and extracts operation list and some other
info from that category). Also, compiler needs to handle
conditions. To do this compiler tries various sources of
information like databases, but ultimately evaluates
categories to query runtime values of conditions (and
especially, presence of operations). Compiler plays
special tricks to avoid evaluating domains and packages
during compilation, but sometimes can not avoid this.
Compiler uses special representaion for several constructs in
object code, this is changed by functions in 'g-boot.boot'
to Lisp that is output (or compiled in memory). For
constructors, interpreter functions and for internal
use by interpreter there is support for memoization,
(in 'clam.boot' and 'slam.boot'
As you can see from the above there is a lot of interaction
between various parties involved in compilation and they
must keep consistent representations of needed data.)
Early stages of compiler should be relatively easy to
replace (for example I have Spad parser in Spad), but
that requires implementing bootstrap infrastructure
and ATM I decided to handle bootstrap only later
(mainly to avoid reworkin bootstap later, because what
needs to be done for full bootstrap will be known only
when other parts are done). Typecheking should be
doable by separate pass. But for some time we will
have to live with old compiler and not fully working
new compiler.
> 2. Is it a good idea to make "src/interp" more modular?
> For example, making the dependency between files more clear,
> mark some files as "core" and let other files depend on them.
> Current situation feels like spaghetti.
There are some clever abuses and undesirable sharing of code.
Also, in IBM era new parts were developed as "patches" on
older part, that is they defined functions replacing at
runtime older functions. As one of first things in my work
on Axiom code I removed duplicate definitions, but I kept
function mostly in the same place. So effectively logically
connected functions are in different files due to historic
developement.
But there is also separation into larger (multi file)
modules and some attempts at layering. If you look at
older FriCAS sources you will notice that largish parts
of interpeter were dynamially loaded. In principle, if
you did not need specific part FriCAS could run without
loading it. One of those parts, that is Spad to Aldor
translater is completely removed. HyperDoc code is
now always included, but if somebody really wanted to
remove it, then removal would be relatively easy.
"Interpeter" proper, that is part which compiles input
files and hadles user expressions is mostly separate
from other part. Of course, handling user expressions
is core functionality of FriCAS, so nobody tried to
remove it, but it should be not hard to create
version of FriCAS that say only contains Spad compiler
and is unable to perform normal user-oriented tasks.
Version of FriCAS containg only HyperDoc would be
harder to do, as HyperDoc takes advantage of interpreter
proper.
Some functionaly in FriCAS is independent of runtime support
(that is currently 'buildom.boot', 'interop.boot',
'nruntime.boot', 'nrungo.boot', 'nrunfast.boot' + database
info needed there), but normal Spad code needs it so
in a sense it is the lowest layer. For bootstrap it
should be possible to generate Spad code that does not
need its own runtime support, so we should be able to
write most of the runtime in Spad. We could also try
to write and compile Spad compiler so that it does not
need normal runtime support to run. However, details
here are to be decided later, currently Spad compiler
re-uses runtime functions for type checking. This
reduces amount of code that we need and I would like
to preserve this. So, Spad compiler running without
any runtime support would have weaker typechecking
and probably would be unable to compile code needing
runtime support. Without actually coding this I do
not know is simpler bootstrap possible with Spad
compiler independent from runtime is worth effort
needed to create such version of Spad compiler, Of
course we wnat to use full Spad in algebra so we
need full compiler. So, if created, Spad compiler
without runtime would be separate beast (hopfully
subsetting full compiler) needed only during bootstrap.
--
Waldek Hebisch