Announcing beancompat

125 views
Skip to first unread message

Timothy Jesionowski

unread,
May 3, 2026, 1:41:43 PM (11 days ago) May 3
to bean...@googlegroups.com
Hi all,

Following the spec process thread earlier this year, I've put together
a tool aimed at the "completeness" problem Martin identified: the
0.8→1.0 gap where lots of little details are easy to miss.

beancompat is a black-box property test suite for beancount implementations:
https://github.com/TJesionowski/beancompat

The idea: exercise the full external interface that downstream tooling
(Fava, plugins, extensions) relies on, and document where
implementations diverge. This is descriptive, not normative, but the
worst incompatibilities are built by accident..

What's there now:

- Hand-written fixtures covering parse, booking, interpolation, BQL,
metadata, tags/links, flags, prices, cost inventory, and more
- Hypothesis-based generative tests that find unknown divergences by
checking agreement across implementations
- A language-independent fixture format (JSON pairs of .beancount
snippet + expected output) so non-Python implementations can consume
them directly without a Python harness
- Adapters for beancount v3 (reference), limabean, and rustledger
- Full Fava-compat capability surface: CAP_HASH, CAP_PLUGINS,
CAP_SUMMARIZE, CAP_INGEST, plus fixtures covering all 31
BeancountOptions keys Fava reads

What I've found so far:

A few concrete divergences are already documented against limabean and
rustledger:

1. Currency list ordering in `open` directives — limabean's currency
list is non-deterministic across runs. The allowed-currency list
appears to be iterated from a HashSet with randomized hashing rather
than preserved in source order. The generative test suite surfaced
this immediately.

2. `display_precision_by_currency` absent from options — rustledger's
DisplayContext has no public iterator over inferred precisions, so the
field is never populated. Parser-only adapters (beancount-parser-lima)
have the same gap for a different reason: no loader means dcontext is
never computed.

3. Options coverage gaps — against the 31-key fixture (all options
Fava reads): limabean rejects `plugin_processing_mode`,
`infer_tolerance_from_cost`, `render_commas`, and `booking_method` as
unknown (parse error on load); rustledger aliases
`tolerance_multiplier` and `inferred_tolerance_multiplier` so they
can't be set or reported independently.

These are documented as `known_divergences` in the fixture files.

TurboBean is on the radar and is the highest-value target given the
vNext inventory divergence, but the adapter is blocked on upstream
shipping a structured-output command. I haven't had time to update for
Martin's recent changes, but that's also on the list.

Feedback and contributions welcome. I would love to get a few
collaborators to keep this thing relevant, so if you're interested
just let me know.

Tim

Justus Pendleton

unread,
May 4, 2026, 8:11:41 PM (10 days ago) May 4
to Beancount
This is fantastic work! What kinds of things -- besides the mentioned TurboBean work -- do you see as gaps or next steps?

Timothy Jesionowski

unread,
May 5, 2026, 11:29:15 AM (9 days ago) May 5
to bean...@googlegroups.com
The new Beancount stuff Martin is shipping for one. And I kinda want to let the generative tests run for longer. 

But really what I need to do is figure out the UX for implementors or anyone who isn't me. Like, I could just keep going and come up with formal methods specifications or whatever, but the real question is what's useful to the community.



Sincerely,
Timothy Jesionowski

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/beancount/9c6eb1b0-42f2-435a-b51e-23e16feb5e04n%40googlegroups.com.

Chary Ev2geny

unread,
May 5, 2026, 12:59:15 PM (9 days ago) May 5
to Beancount
Hi, 

what you have done seems to be very interesting. But I am just trying to understand how it works (sorry if my questions are stupid, I am not a professional SW engineer)

Suppose you test some tool (e.g.  rustledger). As an input you feed different beancount files, correct?

What do you take as an output from that tool? Or do you have to write some adapter for every tool, which converts output (whatever it is) to some general format? What this format would be?

Simon Guest

unread,
May 5, 2026, 5:31:12 PM (9 days ago) May 5
to bean...@googlegroups.com
Hi Tim,

[limabean author here]

This is great! Thanks for spotting the need for such a tool and making this.

I am interested in making limabean interface to this a bit better. That surely means me writing some Clojure to JSON adapter, since limabean is undeniably a Clojure program. (The Rust parser and booking algorithm are implementation details, and not where the public interface is exposed.)

I will have a further look into this.

Is your preference to collaborate via issues and discussions in your GitHub repo?

I am also interested in progressing understanding around divergences. For example, you are right in noticing that limabean discards the order of currencies in open directives. I believe that to be a faithful implementation of the spec ("The comma-separated optional list of constraint currencies enforces that all changes posted to this account are in units of one of the declared currencies."), and the fact that other implementations have chosen to implement this as a list is an implementation detail. But this mailing list is surely not the best place for such discussion. Could that also be done in your repo?

cheers,
Simon
Reply all
Reply to author
Forward
0 new messages