It sounds like a paradox.
Did adding another goroutine really make my testing/synctest
based network simulation fully deterministic, suitable for DST?
Yep.
On the fifth rewrite, I finally discovered the fundamental
way to leverage the testing/synctest package and get
a fully deterministic network simulation.
The trick, now implemented and available in the latest
release of my network package and its simulation engine
is to use one additional goroutine to accept and queue all channel operations
"in the background".
Don't try to interleave synctest.Wait with select
and channel operations on the same goroutine.
Its too much of a mess. More importantly, it didn't work.
It was incredibly hard to get determinism out of it. I tried four
different ways that did not work. They would look like they were
going to work, but then under load testing I would get straggling
requests that missed their previous batch. This created
non-determinism, aka non reproducible simulation.
That's not good. We want the determinism of DST so that any bug
we find in our distributed system is instantly reproducible.
If DST is a new idea, this is a great motivating conversation[1].
Instead of mixing client requests over channels with sleep/synctest.Wait
logic directly, what you want to do is: buffer all client goroutine
channel requests into a master event queue (MEQ) on a separate goroutine that runs
completely independently of the main scheduler goroutine (the
one that will sleep and call synctest.Wait).
Let that background accumulator goroutine be the one
with your big for/select loop to service client requests.
Those requests that used to go directly to
the scheduler goroutine now all get queued, and then handled in
one batch once the scheduling time quantum ends.
The scheduler simply sleeps for its time quantum, invokes the barrier synctest.Wait(),
and, and then locks and reads out the accumulated events from the MEQ, and
then unlocks the MEQ so the background goroutine will have access when
the scheduler restarts the clock (with their next sleep).
The scheduler sorts the accumulated batch of events using deterministic sorting
criteria, dispatches them (matching sends and reads and firing timers
in the network), and then deterministically orders the any newly available replies.
And voila: deterministic simulation testing (DST) of network operations in Go.
Enjoy.
Jason
"FoundationDB: from idea to Apple acquisition"
Dave Scherer, CTO of FoundationDB and Antithesis,
really motivates why they invented DST. In short, its
crazy difficult to test distributed systems well in any other way.