to get full deterministic simulation testing in Go: use another goroutine

72 views
Skip to first unread message

Jason E. Aten

unread,
Oct 12, 2025, 8:33:06 PM (3 days ago) Oct 12
to golang-nuts
It sounds like a paradox.

Did adding another goroutine really make my testing/synctest 
based network simulation fully deterministic, suitable for DST?

Yep. 

On the fifth rewrite, I finally discovered the fundamental
way to leverage the testing/synctest package and get 
a fully deterministic network simulation.

The trick, now implemented and available in the latest
release of my network package and its simulation engine


is to use one additional goroutine to accept and queue all channel operations
"in the background".

Don't try to interleave synctest.Wait with select 
and channel operations on the same goroutine. 

Its too much of a mess. More importantly, it didn't work.

It was incredibly hard to get determinism out of it. I tried four 
different ways that did not work. They would look like they were
going to work, but then under load testing I would get straggling
requests that missed their previous batch. This created 
non-determinism, aka non reproducible simulation. 

That's not good. We want the determinism of DST so that any bug 
we find in our distributed system is instantly reproducible. 
If DST is a new idea, this is a great motivating conversation[1].

Instead of mixing client requests over channels with sleep/synctest.Wait
logic directly, what you want to do is: buffer all client goroutine 
channel requests into a master event queue (MEQ) on a separate goroutine that runs 
completely independently of the main scheduler goroutine (the
one that will sleep and call synctest.Wait).

Let that background accumulator goroutine be the one 
with your big for/select loop to service client requests. 

Those requests that used to go directly to 
the scheduler goroutine now all get queued, and then handled in
one batch once the scheduling time quantum ends.

The scheduler simply sleeps for its time quantum, invokes the barrier synctest.Wait(),
and, and then locks and reads out the accumulated events from the MEQ, and
then unlocks the MEQ so the background goroutine will have access when
the scheduler restarts the clock (with their next sleep). 

The scheduler sorts the accumulated batch of events using deterministic sorting
criteria, dispatches them (matching sends and reads and firing timers
in the network), and then deterministically orders the any newly available replies.

And voila: deterministic simulation testing (DST) of network operations in Go.

Enjoy.

Jason

"FoundationDB: from idea to Apple acquisition"
Dave Scherer, CTO of FoundationDB and Antithesis, 
really motivates why they invented DST. In short, its
crazy difficult to test distributed systems well in any other way.
Reply all
Reply to author
Forward
0 new messages