I am not sure what the goal is. I somebody want "single executable"
thechincally simplest thing would be bundle all FriCAS files into
a read-only filesystem. Include that filesytem into executable
and modify file operations so that they first look into included
filesystem and only after that they look at host filesystem.
The current situation is a compromise between buid time, space
use during build, speed of resulting installation, size of
resulting instalation and build complexity. We dump several
executables during build, that takes a lot of space but
simplifies build process. We could marginally simplify build
by dumping more executables, but that will use more space.
Including more things in executable is likely to increase space
use during build. We could do this if there is significant gain.
Having files in filesystem, especially text files makes debugging
easier. Binary representation could be much smaller, but
we would need separate reader/viewer for our files and even
with a viewer debugging probably would be harder.
If we want to optimize space use, then various "compressed"
representations are possible. For example, we use of order
of 15000-20000 identifiers. We could have a string table
and represent indetifier by 16-bit index into string table.
That could dramatically reduce space taken by symbols.
probably 90% of FriCAS code could run as bytecode without
loss of speed. That could reduce space taken by code by
about 60-70% or more. IIUC in NAG era code took about 16Mb,
now using sbcl it is closer to 120Mb. We probably have more
code than in NAG era (a lot of NAG stuff is removed, but we
also have significant additions), but clearly _very_ large
saving is possible.
If we want to increase speed, then it would be natural to
use C code. In princile we could add C backend to Spad
compiler so that speed-critical low-level part would be
translated directly to C. Interfacing C code to Lisp
has its problems, but in principle using ECL or GCL
should be reasonably easy. In NAG era CCL (Codemist Common
Lisp) could translate some Lisp files to C and the rest
was compiled to bytecode. By compiling dorectly to C
we could do better job: currently main source of
inefficiency in ECL and GCL is because ECL and GCL
can not make good use of type information that we have
at Spad level. sbcl is doing time-consuming type inference
at Lisp level and that recovers enough type information
to generate good code. But sbcl code generator is not
as good a code generation in gcc.
Coming back to databases, I did limited changes to
related code in last several years. Basically,
I saw no opportunity for _substantial_ gain and
preffered to work on more fruitful things. In
longer run database part needs significant rework,
we need to store more information (for example
it would be good to store names of arguments)
and probably organize it differently. To say the
truth, it is not clear how much should be in central
database and how much should be in "per constructor"
files (or maybe "object files" produced from source
file).
Given need for significant changes I would prefer
to avoid complicating database stuff.
Extra thing: when we read databases, then data is almost
immediately changed to different form. So keeping database
in memory could save time on disc operation and parsing,
but would lead to more space use.
--
Waldek Hebisch