triple stack compilation vs Peyton-Jones (S kernel vs, pre-S)

20 views
Skip to first unread message

dr.mt...@gmail.com

unread,
Dec 23, 2025, 3:10:45 PM (2 days ago) Dec 23
to Shen
This is interesting; the compilation of Shen to Kλ in TBoS.  The first using a method taken from Peyton-Jones and the second, the triple stack method, developed for the S series kernel.    There is a handwaving proof of their equivalence in TBoS, but I have developed, with the aid of ChatGPT, a nested inductive proof of their equivalence.  I will spare you all of that.  Proofs of that kind are about as interesting as other people's baby photos.   I have to check it.  It's obviously important in cementing the credentials of the method.

More accessible are the benchmark comparisons of the two wrt how much code they take up and how fast they are.   This was actually a bit of a challenge; I had to resurrect a pre-S kernel executable.   I had to go back to CLisp!    (The Shen/Chez combination is about an order of magnitude faster than the CLisp platform - the first platform on which Shen 1.2 ran).

The figures go towards explaining how the S-kernel turned out to be more capable and yet smaller than its predecessor.

Anyway nuff said - I've written a draft paper, and to cut out the boring, stuff here are the conclusions.

Mark

The Motivation for the Triple Stack Method

The motivation for the triple stack method is twofold:

A. Conceptual simplicity; the method is simpler than the Peyton-Jones method.
B.  Operational efficiency; the method is operationally more efficient than the Peyton-Jones method.

Conceptual Simplicity

The procedural triple-stack method is substantially simpler than the classical compilation route, both conceptually and in implementation. The Peyton-Jones approach requires the introduction of an intermediate language with pattern-matching abstractions, the construction of higher-order λ-terms containing patterns and an explicit failure object to represent match failure. By contrast, the triple-stack method dispenses entirely with intermediate λ-structures and failure values, realising pattern matching directly through a single procedural traversal involving stack manipulation, recogniser and selector lookup, and substitution into the right-hand side.

This simplification is reflected in the size of the implementations. In the Shen system, the Peyton-Jones–style compiler occupies approximately 419 lines of code, whereas the triple-stack compiler requires 276 lines.

Operational Efficiency

 All empirical measurements were conducted using CLisp, since this is the only platform for which executable versions of both the historical Peyton-Jones–based Shen compiler (2010) and the modern triple-stack compiler (2025) are available.  CLisp is an old and relatively slow Common Lisp implementation that has not been actively developed since approximately 2010.  In particular, Shen/Chez delivers substantially higher absolute performance. The measurements reported here are therefore not intended to characterise absolute compilation speed, but rather to provide a controlled relative comparison between the two compilation methods under identical conditions.

Empirical measurements on a small, pattern-heavy interpreter benchmark (118 LOC), with garbage collection forced prior to each run, show a modest but consistent improvement in compilation time for the triple-stack method. Across three runs, the median compilation time was 0.093s for the triple-stack compiler and 0.109s for the Peyton-Jones compiler, corresponding to an improvement of approximately 17%.   

Compiling an implementation of the SECD (23 LOC) machine showed a more significant speedup. Across three runs, the median compilation time was 0.015s for the triple-stack compiler and 0.031s for the Peyton-Jones compiler, corresponding to an improvement of approximately 2×.  On a larger program for conducting automated reasoning (460 LOC), the median compilation time for the triple-stack was 0.40s and the Peyton-Jones method 0.48s representing a 20% gain in operational efficiency.

The difference in observed speedup between the different benchmarks reflects the relative contribution of pattern compilation to overall compilation cost. In the interpreter and automated reasoning benchmarks, many clauses had simple patterns but comparatively large right-hand sides, so the cost of compiling expressions dominates the cost of compiling patterns. Reductions in pattern-compilation overhead therefore have a limited effect on total compilation time. By contrast, the SECD benchmark consists primarily of clauses with deep and overlapping patterns and relatively small right-hand sides. In this setting, pattern compilation constitutes a larger proportion of the total work, and improvements in the matching compiler translate directly into more substantial speedups.

Conclusion 

We have set out a traditional model for compiling a pattern directed functional programming language derived from Peyton-Jones, with the triple stack method used in Shen.  We have proved that they produce equivalent code but that the triple stack method is conceptually less complex and easier to code.  Moreover it operates with improved time efficiency compared to the older method.Top of Form

 

Bottom of Form

 

Reply all
Reply to author
Forward
0 new messages