one way to reduce build time further (with sbcl)

Qian Yun

unread,

Jul 24, 2022, 9:30:17 AM7/24/22

to fricas-devel

As you know, the majority time of building FriCAS is to compile
over 1000 SPAD files to LSP files and then to LISP_BIN file.
Luckily, this step can be paralleled.

So on a multi-core machine, to reduce build time, the focus should
be on the serial part of build process.

The building of algebra/*daase is the most important serial part.

I'm not entirely familiar with the process, but it seems there
are a few stages to generate daase, and each stage requires to
compile many (if not full) spad files, and the compiled result
is discarded. So I think in this step, it is wasteful to compile
these *lsp file to *fasl file, especially for SBCL with :SB-FASTEVAL.

So some simple testing:

Compile FriCAS (without X11) with 8 cores (2.2GHz) takes
2m53s real time and 11m CPU time.

(So a rough estimation is that parallel part takes around 1min10s and
serial part takes around 1m40s.)

The first stage to generate daase is "make stamp-db" in src/algebra.
This serial step alone takes 60s. Skipping the lsp->fasl compilation
reduce this step to 40s. This will make the whole compilation 11%
faster.

This is just a rough idea, still a long way to figure out all the
details.

- Qian

Waldek Hebisch

unread,

Jul 24, 2022, 2:47:41 PM7/24/22

to fricas...@googlegroups.com

On Sun, Jul 24, 2022 at 09:29:24PM +0800, Qian Yun wrote:
> As you know, the majority time of building FriCAS is to compile
> over 1000 SPAD files to LSP files and then to LISP_BIN file.
> Luckily, this step can be paralleled.
>
> So on a multi-core machine, to reduce build time, the focus should
> be on the serial part of build process.
>
> The building of algebra/*daase is the most important serial part.
>
> I'm not entirely familiar with the process, but it seems there
> are a few stages to generate daase, and each stage requires to
> compile many (if not full) spad files, and the compiled result
> is discarded.

More precisely, the result is replaced by result of subsequent
compilation. But we compile them because we need to load
some of them (at it is tricky to decide if we can skip some).

> So I think in this step, it is wasteful to compile
> these *lsp file to *fasl file, especially for SBCL with :SB-FASTEVAL.

Quite likely.

> So some simple testing:
>
> Compile FriCAS (without X11) with 8 cores (2.2GHz) takes
> 2m53s real time and 11m CPU time.
>
> (So a rough estimation is that parallel part takes around 1min10s and
> serial part takes around 1m40s.)

Build time is rather nonliner, for example CPU time grows with
growing number of jobs. For example on machine with 20
physical cores and 40 logical ones due to hypethreading
(all builds including X11 and generation of textual
HyperDoc pages):

-j 30
real 2m14.979s
user 15m0.109s
sys 1m46.336s

-j 15
real 2m30.543s
user 12m39.773s
sys 1m35.727s

-j 10
real 2m44.613s
user 11m35.911s
sys 1m25.861s

-j 5
real 3m51.130s
user 11m5.644s
sys 1m23.756s

-j 1
real 12m20.886s
user 10m16.380s
sys 1m22.216s

Still, roughly enough your estimate is probably OK.

> The first stage to generate daase is "make stamp-db" in src/algebra.
> This serial step alone takes 60s. Skipping the lsp->fasl compilation
> reduce this step to 40s. This will make the whole compilation 11%
> faster.
>
> This is just a rough idea, still a long way to figure out all the
> details.

--
Waldek Hebisch

Qian Yun

unread,

Jul 24, 2022, 6:59:47 PM7/24/22

to fricas...@googlegroups.com

On 7/25/22 02:47, Waldek Hebisch wrote:
> On Sun, Jul 24, 2022 at 09:29:24PM +0800, Qian Yun wrote:
>> As you know, the majority time of building FriCAS is to compile
>> over 1000 SPAD files to LSP files and then to LISP_BIN file.
>> Luckily, this step can be paralleled.
>>
>> So on a multi-core machine, to reduce build time, the focus should
>> be on the serial part of build process.
>>
>> The building of algebra/*daase is the most important serial part.
>>
>> I'm not entirely familiar with the process, but it seems there
>> are a few stages to generate daase, and each stage requires to
>> compile many (if not full) spad files, and the compiled result
>> is discarded.
>
> More precisely, the result is replaced by result of subsequent
> compilation. But we compile them because we need to load
> some of them (at it is tricky to decide if we can skip some).

Yes, the step "make stamp-oboo3" loads some files, but we can
use "compile-file before load" to just compile the needed files.

>> So I think in this step, it is wasteful to compile
>> these *lsp file to *fasl file, especially for SBCL with :SB-FASTEVAL.
>
> Quite likely.

Another motivation for this compile speed optimization is that,
when I was compiling use ECL, the total time is over 20 minutes,
and I guess the daase step takes around 10 minutes.

>> So some simple testing:
>>
>> Compile FriCAS (without X11) with 8 cores (2.2GHz) takes
>> 2m53s real time and 11m CPU time.
>>
>> (So a rough estimation is that parallel part takes around 1min10s and
>> serial part takes around 1m40s.)
>
> Build time is rather nonliner, for example CPU time grows with
> growing number of jobs. For example on machine with 20
> physical cores and 40 logical ones due to hypethreading
> (all builds including X11 and generation of textual
> HyperDoc pages):
>

Yeah, the Amdahl's law. So it is more important to shorten the
serial part.

The following is the diff I used during experiment.
(This is without the compile-before-load optimization.)

- Qian

==========
diff --git a/src/algebra/Makefile.in b/src/algebra/Makefile.in
index 9431e0c7..aad42787 100644
--- a/src/algebra/Makefile.in
+++ b/src/algebra/Makefile.in
@@ -462,10 +462,10 @@ stamp-oboo3: stamp-db
@if [ ! -f stamp-oboo3 ] ; then \
echo "Bootstrap object copy" ; \
for A in ${CATLIST} ${DOMLIST} ; do \
- cp $$A.NRLIB/$$A.$(FASLEXT) ${OUT}/$$A.$(FASLEXT) || exit
1 ; \
+ cp $$A.NRLIB/$$A.lsp ${OUT}/$$A.$(FASLEXT).lsp || exit 1 ; \
done; \
for A in ${CATDOMS} ; do \
- cp $${A}-.NRLIB/$${A}-.$(FASLEXT)
${OUT}/$${A}-.$(FASLEXT) || exit 1 ; \
+ cp $${A}-.NRLIB/$${A}-.lsp ${OUT}/$${A}-.$(FASLEXT).lsp
|| exit 1 ; \
done; \
rm -rf *.NRLIB \
echo "Stage 3 object bootstrap (normal mode)" ; \
@@ -477,7 +477,7 @@ stamp-oboo3: stamp-db
DAASE=./r7 ${INTERPSYS} ) || exit 1 ; \
echo "Stage 3 object copy" ; \
for A in ${DOMLIST} ; do \
- cp $$A.NRLIB/$$A.$(FASLEXT) ${OUT}/$$A.$(FASLEXT) || exit
1 ; \
+ cp $$A.NRLIB/$$A.lsp ${OUT}/$$A.$(FASLEXT).lsp || exit 1 ; \
done ; \
touch stamp-oboo3 ; \
fi
diff --git a/src/interp/nlib.lisp b/src/interp/nlib.lisp
index fa10d184..9ff1c2f3 100644
--- a/src/interp/nlib.lisp
+++ b/src/interp/nlib.lisp
@@ -220,7 +220,8 @@
(defun |compile_lib_file|(fn)
(if FRICAS-LISP::algebra-optimization
(proclaim (cons 'optimize FRICAS-LISP::algebra-optimization)))
- (compile-file fn))
+ ;;(compile-file fn)
+)

;; (RDROPITEMS filearg keys) don't delete, used in files.spad
diff --git a/src/lisp/fricas-lisp.lisp b/src/lisp/fricas-lisp.lisp
index aeaa3537..2aa8df23 100644
--- a/src/lisp/fricas-lisp.lisp
+++ b/src/lisp/fricas-lisp.lisp
@@ -355,7 +355,7 @@
;;; (format *error-output* "entred load_quietly ~&")
#-:GCL
(handler-bind ((warning #'muffle-warning))
- (load f))
+ (load (concatenate 'string f ".lsp")))
#+:GCL
(load f)
;;; (format *error-output* "finished load_quietly ~&")

Waldek Hebisch

unread,

Jul 25, 2022, 11:06:48 AM7/25/22

to fricas...@googlegroups.com

On Mon, Jul 25, 2022 at 06:58:54AM +0800, Qian Yun wrote:
>
>
> On 7/25/22 02:47, Waldek Hebisch wrote:
> >On Sun, Jul 24, 2022 at 09:29:24PM +0800, Qian Yun wrote:
> >>As you know, the majority time of building FriCAS is to compile
> >>over 1000 SPAD files to LSP files and then to LISP_BIN file.
> >>Luckily, this step can be paralleled.
> >>
> >>So on a multi-core machine, to reduce build time, the focus should
> >>be on the serial part of build process.
> >>
> >>The building of algebra/*daase is the most important serial part.
> >>
> >>I'm not entirely familiar with the process, but it seems there
> >>are a few stages to generate daase, and each stage requires to
> >>compile many (if not full) spad files, and the compiled result
> >>is discarded.
> >
> >More precisely, the result is replaced by result of subsequent
> >compilation. But we compile them because we need to load
> >some of them (at it is tricky to decide if we can skip some).
>
> Yes, the step "make stamp-oboo3" loads some files, but we can
> use "compile-file before load" to just compile the needed files.
>
> >> So I think in this step, it is wasteful to compile
> >>these *lsp file to *fasl file, especially for SBCL with :SB-FASTEVAL.
> >
> >Quite likely.
>
> Another motivation for this compile speed optimization is that,
> when I was compiling use ECL, the total time is over 20 minutes,
> and I guess the daase step takes around 10 minutes.

Current trunk, ECL 16.1.2 with -j 10 (on quad core machine with
hyperthreading) needs:
real 16m42.871s
user 61m26.912s
sys 2m54.504s

Of that 350 sec goes into boo_db stage.

> The following is the diff I used during experiment.
> (This is without the compile-before-load optimization.)

With that boo_db stage goes down to 118 s. However,
build fails at end of spad compile stage (before
generating HyperDoc pages) and the time is:

real 13m42.144s
user 74m20.816s
sys 2m32.908s

so real time may be lower, but there is increase in
CPU time. I do not know why, maybe loading compiled
files is cheaper than loading Lisp...

--
Waldek Hebisch

Qian Yun

unread,

Jul 26, 2022, 7:48:04 AM7/26/22

to fricas...@googlegroups.com

My idea is to just skip compile (some) lisp file during
database generation stage, similar to the "$SaveParseOnly"
variable.

For the following steps, compile lisp file is important,
otherwise the repeated loading of lisp file will take more time.

See my updated patch (still a hack) for the following
benchmark result. Roughly saves 9% of build time,
not as much as I originally thought.

- Qian

SBCL with hack, 8 cores at 2.2 GHz

real 2m42.395s
user 10m56.020s
sys 0m33.980s

SBCL without hack, 8 cores at 2.2 GHz

real 2m57.945s
user 11m15.622s
sys 0m33.902s

SBCL with hack, 16 threads and cpu boost

real 1m17.841s
user 8m11.300s
sys 0m32.597s

SBCL without hack, 16 threads and cpu boost

real 1m25.606s
user 8m19.650s
sys 0m29.466s

===============
diff --git a/src/algebra/Makefile.in b/src/algebra/Makefile.in
index 9431e0c7..87451c42 100644
--- a/src/algebra/Makefile.in
+++ b/src/algebra/Makefile.in
@@ -462,11 +462,12 @@ stamp-oboo3: stamp-db

@if [ ! -f stamp-oboo3 ] ; then \
echo "Bootstrap object copy" ; \
for A in ${CATLIST} ${DOMLIST} ; do \
- cp $$A.NRLIB/$$A.$(FASLEXT) ${OUT}/$$A.$(FASLEXT) || exit
1 ; \

+ cp $$A.NRLIB/$$A.lsp ${OUT}/$$A.lsp || exit 1 ; \

done; \
for A in ${CATDOMS} ; do \
- cp $${A}-.NRLIB/$${A}-.$(FASLEXT)
${OUT}/$${A}-.$(FASLEXT) || exit 1 ; \

+ cp $${A}-.NRLIB/$${A}-.lsp ${OUT}/$${A}-.lsp || exit 1 ; \
done; \
+ cp *.NRLIB/*.$(FASLEXT) ${OUT}/ ; \

rm -rf *.NRLIB \
echo "Stage 3 object bootstrap (normal mode)" ; \

echo > oboo3.input ; \
@@ -477,8 +478,9 @@ stamp-oboo3: stamp-db

DAASE=./r7 ${INTERPSYS} ) || exit 1 ; \
echo "Stage 3 object copy" ; \
for A in ${DOMLIST} ; do \
- cp $$A.NRLIB/$$A.$(FASLEXT) ${OUT}/$$A.$(FASLEXT) || exit
1 ; \

+ cp $$A.NRLIB/$$A.lsp ${OUT}/$$A.lsp || exit 1 ; \
done ; \
+ cp *.NRLIB/*.$(FASLEXT) ${OUT}/ ; \
touch stamp-oboo3 ; \
fi

diff --git a/src/algebra/boo_db.input b/src/algebra/boo_db.input
index f65c96fd..cd0e4c9a 100644
--- a/src/algebra/boo_db.input
+++ b/src/algebra/boo_db.input
@@ -1,3 +1,4 @@
+)boot $skipCompileLisp := true
)boot $SaveParseOnly := true
)read komp_all.input
)boot processGlobals()
diff --git a/src/interp/nlib.lisp b/src/interp/nlib.lisp
index fa10d184..2b966f7d 100644

--- a/src/interp/nlib.lisp
+++ b/src/interp/nlib.lisp
@@ -220,7 +220,8 @@
(defun |compile_lib_file|(fn)
(if FRICAS-LISP::algebra-optimization
(proclaim (cons 'optimize FRICAS-LISP::algebra-optimization)))
- (compile-file fn))

+ (if (not (boundp '|$skipCompileLisp|)) (compile-file fn))

+)

;; (RDROPITEMS filearg keys) don't delete, used in files.spad
diff --git a/src/lisp/fricas-lisp.lisp b/src/lisp/fricas-lisp.lisp

index aeaa3537..ffa56d25 100644
--- a/src/lisp/fricas-lisp.lisp
+++ b/src/lisp/fricas-lisp.lisp
@@ -355,7 +355,10 @@ with this hack and will try to convince the GCL
crowd to fix this.

;;; (format *error-output* "entred load_quietly ~&")
#-:GCL
(handler-bind ((warning #'muffle-warning))
- (load f))
+ (load

+ (cond ((probe-file f) f)
+ ((probe-file (concatenate 'string f ".fasl")) f)
+ (t (compile-file (concatenate 'string
(string-right-trim ".fasl" f) ".lsp"))))))

#+:GCL
(load f)
;;; (format *error-output* "finished load_quietly ~&")

===============

Waldek Hebisch

unread,

Jul 26, 2022, 2:58:18 PM7/26/22

to fricas...@googlegroups.com

On Tue, Jul 26, 2022 at 07:47:05PM +0800, Qian Yun wrote:
> My idea is to just skip compile (some) lisp file during
> database generation stage, similar to the "$SaveParseOnly"
> variable.

Well, AFAICS in principle there is no need to compile categories.
More precisely, Spad category has two part: proper category
and default package. Default package more or less should
be handled like other domains and packages. OTOH proper
categories are quite different from other constructors.
In particular currently categories are evaluated during
typechecking so we need runnable code for categories at
quite erarly stages. But we should be able to re-organize
compiler representation of types so that categories are
pure data. This date could be initialized in lazy way
which hopefully could get rid of one pass from stamp_db
stage. And code for categories calls eval and few other
things like JoinInner. My profile data indicates
that Lisp eval takes about 12% of stamp_db stage
(sb-fasteval does not make significant difference here).

So better handling of categories could save both time
compiling Lisp files and probably also lead to
savings during normal Spad compilations.

Also, currently there are two reasons to compile Spad
files before main compilation:
1) we need compiled code for inlining. For this
purpose actually all we need is data (Lisp forms).
Compiling is useful to speed up loading of this
info, but we do not need most of generated code
and in principle we could load data in different way.
2) When evaluating categories we need to evaluate some
types appearing is categories (I did not work out
exact rules). Keeping types as data in principle
could avoid need for early compilation. Again
we need to solve problem of efficiently loading
needed data during main compile.

--
Waldek Hebisch

Reply all

Reply to author

Forward