in memory daase?

15 views
Skip to first unread message

Qian Yun

unread,
Dec 7, 2023, 9:45:48 AM12/7/23
to fricas-devel
I did some experiments on putting daase into memory instead of disk.

For the 8.5MB interp.daase, load it into memory takes 23MB:

(sb-ext:gc)
(room) ;; Dynamic space usage is: 19,602,224 bytes.
(setq l1 '())
(with-open-file (s "interp.daase")
(do ((item (read s nil 'eof) (read s nil 'eof)))
((eq item 'eof))
(setq l1 (cons item l1))))
(sb-ext:gc)
(room) ;; Dynamic space usage is: 42,911,520 bytes.

The number is reasonable: viewing l1 as a tree, it has 1.4 million
leaves (which are symbol or number), and requires 1.4 million cons
cells, and each takes a machine word -- 64 bits so:
1.4 million x 2 x 8 bytes = 22.4 MB

Of course, if we want all of database in memory, why not put
it directly in the various hash tables? I have not tested how
much memory this would require, should be much less than 23 MB.

Bzip2 compress interp.daase gives 504K, that's also reasonable
considering there's so much duplicated sub-expressions in it.

So I wonder does in memory daase (then dump to disk image) have
advantages over current external text daase?

Or consider a crazier thing: load everything into memory (image)
and distribute FriCAS as a single executable! It actually can
be a good idea for Windows.

A rough test using fricas0 shows that loading the 1383 algebra
fasl files (39MB) into memory takes 26MB RAM, interp/ takes 13MB.

- Qian

Hill Strong

unread,
Dec 7, 2023, 6:50:27 PM12/7/23
to fricas...@googlegroups.com
If the daase is so small that it only takes 23 Mbytes to load into memory, it would be worth it if it took 100 Mbytes to load into memory. How many of our machines would be constrained by this? My current machine is old and it has 7.7 Gbytes usable. It is nothing special - a stock standard HP laptop.

The question to ask is: why has this not been done sooner?

regards to all

Hill Strong.

--
You received this message because you are subscribed to the Google Groups "FriCAS - computer algebra system" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fricas-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fricas-devel/2e092244-2c7a-4f8c-b16b-d60c78c686fd%40gmail.com.

Qian Yun

unread,
Dec 7, 2023, 7:00:59 PM12/7/23
to fricas...@googlegroups.com
I think current design comes from over 30 years ago, where system has
only a few MB memory in total.

- Qian

Waldek Hebisch

unread,
Dec 7, 2023, 7:20:12 PM12/7/23
to fricas...@googlegroups.com
I am not sure what the goal is. I somebody want "single executable"
thechincally simplest thing would be bundle all FriCAS files into
a read-only filesystem. Include that filesytem into executable
and modify file operations so that they first look into included
filesystem and only after that they look at host filesystem.

The current situation is a compromise between buid time, space
use during build, speed of resulting installation, size of
resulting instalation and build complexity. We dump several
executables during build, that takes a lot of space but
simplifies build process. We could marginally simplify build
by dumping more executables, but that will use more space.

Including more things in executable is likely to increase space
use during build. We could do this if there is significant gain.

Having files in filesystem, especially text files makes debugging
easier. Binary representation could be much smaller, but
we would need separate reader/viewer for our files and even
with a viewer debugging probably would be harder.

If we want to optimize space use, then various "compressed"
representations are possible. For example, we use of order
of 15000-20000 identifiers. We could have a string table
and represent indetifier by 16-bit index into string table.
That could dramatically reduce space taken by symbols.
probably 90% of FriCAS code could run as bytecode without
loss of speed. That could reduce space taken by code by
about 60-70% or more. IIUC in NAG era code took about 16Mb,
now using sbcl it is closer to 120Mb. We probably have more
code than in NAG era (a lot of NAG stuff is removed, but we
also have significant additions), but clearly _very_ large
saving is possible.

If we want to increase speed, then it would be natural to
use C code. In princile we could add C backend to Spad
compiler so that speed-critical low-level part would be
translated directly to C. Interfacing C code to Lisp
has its problems, but in principle using ECL or GCL
should be reasonably easy. In NAG era CCL (Codemist Common
Lisp) could translate some Lisp files to C and the rest
was compiled to bytecode. By compiling dorectly to C
we could do better job: currently main source of
inefficiency in ECL and GCL is because ECL and GCL
can not make good use of type information that we have
at Spad level. sbcl is doing time-consuming type inference
at Lisp level and that recovers enough type information
to generate good code. But sbcl code generator is not
as good a code generation in gcc.

Coming back to databases, I did limited changes to
related code in last several years. Basically,
I saw no opportunity for _substantial_ gain and
preffered to work on more fruitful things. In
longer run database part needs significant rework,
we need to store more information (for example
it would be good to store names of arguments)
and probably organize it differently. To say the
truth, it is not clear how much should be in central
database and how much should be in "per constructor"
files (or maybe "object files" produced from source
file).

Given need for significant changes I would prefer
to avoid complicating database stuff.

Extra thing: when we read databases, then data is almost
immediately changed to different form. So keeping database
in memory could save time on disc operation and parsing,
but would lead to more space use.

--
Waldek Hebisch

Waldek Hebisch

unread,
Dec 7, 2023, 7:47:09 PM12/7/23
to fricas...@googlegroups.com
On Fri, Dec 08, 2023 at 10:50:13AM +1100, Hill Strong wrote:
> If the daase is so small that it only takes 23 Mbytes to load into memory,
> it would be worth it if it took 100 Mbytes to load into memory. How many of
> our machines would be constrained by this? My current machine is old and it
> has 7.7 Gbytes usable. It is nothing special - a stock standard HP laptop.
>
> The question to ask is: why has this not been done sooner?

Well, memory is there but it does not mean that we should waste it.
I mean, if we gain something valuable, then sure use what is
needed. But using memory just becuse it is there is antisocial
and may lead to troubles. As an example look at GCL: it checks
how much memory machine has and then feels free to use most of
it. So on 64 Gb machine it feels that there is no need for garbage
collection before memory use hits something like 45Gb. One
trouble is that in defualt settings GCL can not cope with that much
of memory, it gives errors because some parts can only handle 2Gb.
Once this is resolve there is another trouble: machine has 40
logical cores (20 physical) and could usefully run 40 copies of
GCL (each on its own logical core). But at 20 copies machine
is thrashing as 20 copies try to use about 800 Gb and this is
much more than what the machine has.

To put is differently, programmers should exhibit common decency:
do not use megabytes when kilobytes would do, do not use gigabytes
when megabytes would do.

--
Waldek Hebisch

Hill Strong

unread,
Dec 7, 2023, 10:02:47 PM12/7/23
to fricas...@googlegroups.com
On Fri, Dec 8, 2023 at 11:47 AM Waldek Hebisch <de...@fricas.org> wrote:
On Fri, Dec 08, 2023 at 10:50:13AM +1100, Hill Strong wrote:
> If the daase is so small that it only takes 23 Mbytes to load into memory,
> it would be worth it if it took 100 Mbytes to load into memory. How many of
> our machines would be constrained by this? My current machine is old and it
> has 7.7 Gbytes usable. It is nothing special - a stock standard HP laptop.
>
> The question to ask is: why has this not been done sooner?

Well, memory is there but it does not mean that we should waste it.
I mean, if we gain something valuable, then sure use what is

If the memory is available for such a small quantity then by all means use it.  Even an extra 100 Mbytes is relatively inconsequential compared to the usual amount of memory in these machines today. Having to do I/O is an extraordinarily slow process and any activity that requires this and can be mitigated by use of a small amount of memory achieves a significant speed up with little effort. It was true back in the '80's when main memory was measured in Mbytes and it is even more applicable today.

I think one of the bigger speed ups and more effective directions for change is to look at alternatives to using any form of Common Lisp. That however is a personal opinion.


needed.  But using memory just becuse it is there is antisocial

it is only antisocial for multiuser machines or for machines that have a single user where large memory systems are concurrently in play. As far as FriCAS is concerned, how much memory is being used should be user configurable.

and may lead to troubles.  As an example look at GCL: it checks
how much memory machine has and then feels free to use most of
it.  So on 64 Gb machine it feels that there is no need for garbage
collection before memory use hits something like 45Gb.  One

If GCL is so bad that it cannot garbage collect under the constraints put on it by the user, then why use it in the first place? This is an implementation issue and shows a failure on the part of the developers of GCL to handle user defined constraints. That is something to take up with them. This is not an argument for not being able to use in memory databases for the FriCAS application.


trouble is that in defualt settings GCL can not cope with that much
of memory, it gives errors because some parts can only handle 2Gb.
Once this is resolve there is another trouble: machine has 40
logical cores (20 physical) and could usefully run 40 copies of
GCL (each on its own logical core).  But at 20 copies machine
is thrashing as 20 copies try to use about 800 Gb and this is
much more than what the machine has.

This is a known problem that has a known fix. Of course, few systems today use this known fix. It is called separation of executable code from user data. If multiple instances are required for some piece of code, there's only a single copy of the executable code in memory which all processes use and each process has its own data store.

To put is differently, programmers should exhibit common decency:
do not use megabytes when kilobytes would do, do not use gigabytes
when megabytes would do.

Which programmers? The programmers who develop the compilers, the programmers who develop the operating systems or the developers who are using these systems to solve the specific problems that they are writing code for? Or should the marketing and sales people for the companies producing the operating systems or compilers be the ones we get rid of first by putting them up against a wall and beating the living daylights out of with cricket bats? If you have had to deal with the last lot, you would understand just how much of a bane they are in this respect.

 

--
                              Waldek Hebisch


--
You received this message because you are subscribed to the Google Groups "FriCAS - computer algebra system" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fricas-devel...@googlegroups.com.

Tim Daly

unread,
Dec 8, 2023, 12:14:32 AM12/8/23
to FriCAS - computer algebra system
The original design of the daase files was from the VMLisp version.
They are random access files. You get a location from the index and
perform a disk seek. With electronic storage this is no longer relevant
but still exists.

The current design was my work, a result of the effort to put Scratchpad
on a 640k computer running DOS. Compress.daase was added to ensure
that the daase files could fit on individual floppy disks. This is also no
longer relevant. Unlike the other daase files, the compress.daase file
could be deleted with a single update.

I pushed out as much algebra as possible and created the 'getdatabase'
API to give uniform, single location access.

It would make sense to simply put all of the algebra inline and remove
the daase files. The decisions that created them are based on hardware
constraints that no longer exist. The whole idea of a 'database' for
algebra is not necessary. Maintaining such a structure is unnecessary
overhead.

Tim

Qian Yun

unread,
Dec 8, 2023, 6:23:40 AM12/8/23
to fricas...@googlegroups.com


On 12/8/23 08:20, Waldek Hebisch wrote:
>
> Coming back to databases, I did limited changes to
> related code in last several years. Basically,
> I saw no opportunity for _substantial_ gain and

Here's a non substantial gain I discovered:

Bind *PRINT-RIGHT-MARGIN* to a larger value (say 160) can
avoid some lines with very long prefix of spaces, shrinking
the size of interp.daase by 10%. (8.8MiB -> 8.0MiB)

Is this worth a change?

- Qian

Waldek Hebisch

unread,
Dec 8, 2023, 9:36:28 AM12/8/23
to fricas...@googlegroups.com
I do not think so. For me longer lines significantly decrease
readability, so large drawback and small space gain.
--
Waldek Hebisch
Reply all
Reply to author
Forward
0 new messages