On Wed, Jun 24, 2026 at 10:37:26PM +0800, Qian Yun wrote:
> On 6/24/26 9:39 PM, Waldek Hebisch wrote:
> >
> > Some comments.
> >
> > First, I have applied your patch to my private tree. I intend to do
> > some heavier tests, but ATM in light testing I see no problems and
> > there is nice speedup on mapleok.input.
> >
> > I am a bit concerned with GC time. Your timings indicate that
> > it is significantly higher than before. Usualy savings on
> > arithmetic will be bigger than increase in GC time, but in
> > some cases (basically when operator names differ) we can
> > do comparisons without arithmetic and in some cases
> > artithmetic may be cheap. So there is risk of increased
> > runtime in such cases.
>
> With my recent changes to optimize =$EXPR, the price of
> kernel comparison is cheaper, so we can tolerate for
> larger kernel cache threshold, so fewer numbers of GCs,
> fewer time spent in GC.
Part of my worry is looking into future. On bigger machines
we probably can handle more kernels. Fixed threshold can
be quite good today, but is likely to be suboptimal in
changed use. Good adaptive strategy has chance of working
with over much wider range of parameters.
> > ATM it seems to me that triggering expansion in geometric
> > progression (like the old code did) is safer than linear
> > increments. IIUC you use high threshold (that is 300)
> > so you normally get just one increment. But with really
> > large number of active kernels there would be a lot of
> > increments and lot of GC-s.
>
> The old way is "cache_size := 2*cache_size + 10":
>
> 0 -> 10 -> 30 -> 70 -> 150 -> 310 -> 630 -> 1270 -> 2550
>
> If the gap grows to hundreds, or thousands, then it will
> be a long time to trigger next kernel recycle.
>
> I can test and compare both methods.
>
> Also, if we assume that in a long run of computation,
> latter computation does not depends on previous ones,
> then the the number of kernels should grows linearly,
> thus the recycle of kernels should also be done linearly.
Typical reasoning uses notion of "working set". We want
whole working set in the cache and we want to collect
kernels not in the working set. We do not know how
big is the working set, so we approvimate size by say
power of 2. If we get above our current estimate of
the size we collect. If after collection too many
kernels survive we increase the estimate, if much
smaller number survive we decrese the estimate.
Assuming that number of kernels is proportional to size
of all date cost of GC will be proportinal to number
of kernels. Using linear strategy to trigger GC
gives quagratic run time. OTOH increasing number of
kernels in cache by constant factor should only
modestly increase search time. That is when binary
search works it is just 1 or two extra comparisons.
When using linear search number of comparisons
increases by the same factor as cache size. So
bigger cache means modest increase in search time.
But it cuts number of garbage collections so that total
time spent on garbage collection remains linear.
> As I said in above, because of optimization in =$EXPR,
> current linear threshold could be 600 instead of 300.
>
> > Another question is tactic that you use, namely triggering
> > full GC to collect unneded kernels. IIUC this optimizes
> > speed in case when there is moderate number of kernels
> > and no pressure to collect unneded kernels. But IIUC
> > intended use of weak pointers is that GC happen naturally
> > and collect data available via weak pointers. So, normal
> > use would be to have weak pointers all the time and check
> > for dangling pointer on each access. That clearly would
> > need more code, so possibly increase time needed to access
> > kernels in normal case. But is should lower GC time. In
> > particular, full GC is expensive and in sbcl most GC-s are
> > incremental, so much cheaper. I do not know if such variant
> > is better, but it would be good to test.
> >
>
> If we do not use full gc, the default generational GC
> will not do a full recycle, causing the kernel cache to
> remain relatively big, thus the time saving is not great.
> (I tested that before. So instead of shrinking 3000 kernels
> to 300, it could only shrink to 2000 instead.)
But did you try approach suggested above? IIUC you have
weak pointers only when you decide if you need to expand
the cache. It is likely that there were several incremental
GC between time that kernel is created and time when
you try to GC kernels. If a piece of data sorvives few
(maybe only 2) incremental GC it is promoted to stable
pool and only collected by full GC. Note: I am writing
"incremental" and "full" as if there were only two
possibilities. In general generational GC may have
several generations, but this does not change the
principle: if you do not collect garbage prompty, then
you need to do more work to collect it.
Using weak pointers all the time means that kernels can by
collected during incremental GC.
--
Waldek Hebisch