Preview of 2.14 work. Parser replacement.

2 views

Skip to first unread message

Brazil

unread,

Mar 7, 2026, 11:19:52 PMMar 7

to tinymux

Here’s a concise, shareable summary in a few clean paragraphs:

The **brazil** branch introduces a major architectural improvement over **master**: it separates parsing from evaluation and adds an LRU cache (1024 entries) for parsed ASTs. The pipeline is now Ragel-generated scanner → recursive-descent parser → AST construction → cached tree-walking evaluation. In contrast, **master** uses a single-pass, character-by-character scanner/parser/evaluator with no caching — every evaluation re-parses the expression text from scratch.

We ran a heavy smoke-test benchmark with five repeated expression loops (total 190,000 iterations across B1–B5, plus the full 395-test suite):

- B1: 50k × `add(mul(itext(0),itext(0)),sub(itext(0),1))`
- B2: 20k × `sha1(itext(0))`
- B3: 20k × `switch(mod(itext(0),5), …)`
- B4: 50k × `cat(setr(0,…),setr(1,…),add(%q0,%q1))`
- B5: 50k × `itext(0)` (baseline)

**Results**: brazil completed everything in **284 ms** wall time. Master could not finish in any reasonable timeframe — it was manually killed after ~2.5 minutes, and one unattended run that was allowed to complete took **153 seconds**. That puts master >500× slower on these workloads (extrapolated from the partial runs).

The massive speedup comes almost entirely from **AST caching**, not the Ragel scanner or recursive descent per se. Master re-parses the same string 50,000 times per benchmark through a slow character-state machine. Brazil parses each unique expression only once, then re-uses the AST via cheap tree walks for the remaining iterations.

Correctness is unchanged: both branches produce identical results on all shared tests, and brazil’s expanded 395-test suite (47 more cases) passes completely. The change was introduced and validated incrementally over 44 commits with continuous smoke-test coverage.