We used Hypothesis to create an experimental project to short the verification gap in AI generated code

8 views
Skip to first unread message

Lokhesh Ujhoodha

unread,
Feb 27, 2026, 11:35:30 AM (3 days ago) Feb 27
to Hypothesis users
Hey all,

I just wanted to share an experimental project I made to shorten the verification gap in AI generated using Hypothesis. It is meant to be a coding agent first library and skill: https://github.com/kurrent-io/poes/

We use Hypothesis for:
1. Strategy-based state generation, generates arbitrary aggregate states from user-declared field strategies, testing transitions from any valid  state, not just reachable ones.
2. Property testing, for every (transition, invariant) pair, Hypothesis runs
  hundreds of random examples to verify invariants and postconditions are preserved.
 3. BFS parameter sampling representative,  parameter combos to expand parametric transitions into concrete ones for exhaustive state exploration.
  4. Custom FrozenMap.strategy() — Built on st.dictionaries to generate hashable immutable maps for map-based aggregate state.
  5. RuleBasedStateMachine bridge to convert POES checks into Hypothesis stateful tests with  @rule, @precondition, and @invariant decorators.

To test the product, you need to go your coding agent and say "Use https://github.com/kurrent-io/poes/ to build and verify {prompt or spec} and then provide me with a proof-of-work (this is deterministic)"

This is specially useful in long horizon tasks because it helps the coding agent to anchor around important rules and it knows whether it has broken any previously written rules through verification.

The idea sits between formal verification and coding. We propose that the nature of event sourcing allows you to formulate code as pure functions e.g. the same input to the same function over the same state should produce the same output. It is important to have immutability and versioning of the state, else the above properties wouldn't hold.

When you extend that you can force, using the framework, to prove that every valid input should produce a valid output under certain rules and further to validate state transitions as well (bounded verification and weak-proof for unbounded) by exploring all of them.

The limitations:

- Exploring all the states is hard, so this can only prove smaller domains or you have to tell the agent to either break state in small parts or sample the state exploration. We have taken inspiration from the TLA+ is verified by TLC.

- It doesn't test what you don't specify, but you can keep on iterating, the coding agents knows how to keep on adding/remove things using the framework.

It provides bounded guarantees rather than complete formal verification. It confirms no rule violations occur within the tested domain, but cannot prove exhaustive correctness. Additionally, this approach only covers closed environments where inputs are fully controlled. But for open environments (production) you can persist to KurrentDB using the framework.

If you are interested you can read more about it at: https://www.kurrent.io/blog/proof-oriented-event-sourcing/ 

And if you can do it, I would really like your feedback on the product too.

Cheers!
Lokhesh
Reply all
Reply to author
Forward
0 new messages