announcing gosim: simulation testing for Go

608 views
Skip to first unread message

Jelle van den Hooff

unread,
Dec 10, 2024, 12:42:02 PM12/10/24
to golan...@googlegroups.com
Hi golang-nuts,

I am excited to share Gosim: simulation testing for Go (https://github.com/jellevandenhooff/gosim). Gosim is a project I have been working on for quite a while that aims to make testing distributed systems easier. It implements simulation testing as popularized by FoundationDB (https://www.youtube.com/watch?v=4fFDFbi3toc).

Gosim runs mostly-standard Go code in its simulated environment. It supports standard packages like `os`, `net`, gRPC, protobuf, and more; the largest real-world program I have successfully run is etcd. Inside of the simulation, Gosim implements fake time, network, disks, and machines. Tests can manipulate the network to eg. partition a host, or restart a machine, and verify that code still behaves as it should -- and all that without needing to manage real VMs or containers.

Gosim works by source-translating Go to replace all references to concurrency primitives, the operating system, and non-deterministic code to its own runtime. So `go foo()` becomes `gosimruntime.Go(foo)`, etc. Then, Gosim implements a (subset of) the Linux system call interface to simulate disk and network. More details on the design are in https://github.com/jellevandenhooff/gosim/blob/main/docs/design.md. Gosim's system call implementations are (currently) in https://github.com/jellevandenhooff/gosim/blob/main/internal/simulation/os_linux.go.

To give you a taste of the kinds of tests Gosim can write, below is a snippet of a test running Etcd (taken from https://github.com/jellevandenhooff/gosim/blob/main/examples/etcd/etcd_test.go). The test creates several Gosim machines that have their own network stack, disk, global variables, and more, and lets them run and communicate. From the point of view of the code, each Etcd instance runs on its own machine and is its own independent process. The simulation however runs all machines in the same Go process so that you can easily debug what happens, the test is reproducible, and overhead is low.

I have tried to make Gosim easy to use. To get started you can run a test by replacing `go test ...` with `gosim test`. If Gosim might be useful for you, I would be happy to chat and prioritize future features. Some things I would certainly like to add are support for running main() functions; simulating clock drift; support for running different versions of code; and built-in simulation of common cloud APIs like S3.

Gosim is experimental, so it will change and break, and only runs Go code. So it can test systems that are written in Go, but it will not work with external dependencies. I have some ideas on using eg. Wazero to run Sqlite or Postgres inside of the Go process but those are, well, still ideas.

Jelle

// TestEtcd runs a 3 node etcd cluster, partitions the network between the
// nodes, and makes sure key-value puts and gets work.
func TestEtcd(t *testing.T) {
	gosim.SetSimulationTimeout(2 * time.Minute)

	// run machines:
	gosim.NewMachine(gosim.MachineConfig{
		Label: "etcd-1",
		Addr:  netip.MustParseAddr("10.0.0.1"),
		MainFunc: func() {
			runEtcdNode("etcd-1", "10.0.0.1")
		},
	})
	gosim.NewMachine(gosim.MachineConfig{
		Label: "etcd-2",
		Addr:  netip.MustParseAddr("10.0.0.2"),
		MainFunc: func() {
			time.Sleep(100 * time.Millisecond)
			runEtcdNode("etcd-2", "10.0.0.2")
		},
	})
	gosim.NewMachine(gosim.MachineConfig{
		Label: "etcd-3",
		Addr:  netip.MustParseAddr("10.0.0.3"),
		MainFunc: func() {
			time.Sleep(200 * time.Millisecond)
			runEtcdNode("etcd-3", "10.0.0.3")
		},
	})

	// mess with the network in the background
	go nemesis.Sequence(
		nemesis.Sleep{
			Duration: 10 * time.Second,
		},
		nemesis.PartitionMachines{
			Addresses: []string{
				"10.0.0.1",
				"10.0.0.2",
				"10.0.0.3",
			},
			Duration: 30 * time.Second,
		},
	).Run()
 

Jason E. Aten

unread,
Dec 10, 2024, 4:20:34 PM12/10/24
to golang-nuts
Jelle, gosim looks awesome and very cool.  Also thanks for the link to the interesting talk.

How, in gosim, would you set a different seed to change the behavior of the 
pseudo-random-number-generator based things like maps and 
goroutine scheduling, Go select statements? It seems
trying a bunch of different random seeds would be an essential
part of a test run.  This might be worth putting in the up front examples.

Jason

roger peppe

unread,
Dec 10, 2024, 5:53:22 PM12/10/24
to Jelle van den Hooff, golan...@googlegroups.com
Impressive stuff! Some potentially interesting overlap with the new "synctest" package. Do you have any thoughts on that?


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/CAP%3DJquaBu1O5rN6aR6fMs03q4O92cPAc9DfGQZ9fck9zB2sEkw%40mail.gmail.com.

Jelle van den Hooff

unread,
Dec 10, 2024, 6:41:13 PM12/10/24
to golang-nuts
Hi Jason, thanks for your kind words and the helpful suggestion. You are right, not having seeds in the examples was a big omission! I've added them in https://github.com/jellevandenhooff/gosim/commit/6960c38670862405b71ee7f6c06851fa2d19566a:

go run github.com/jellevandenhooff/gosim/cmd/gosim test -v -seeds=1-3 -run=TestGosim . === RUN TestGosim (seed 1) 1 main/4 14:10:03.000 INF examples/simple_test.go:12 > Are we in the Matrix? true method=t.Logf 2 main/4 14:10:03.000 INF examples/simple_test.go:13 > Random: 811966193383742320 method=t.Logf --- PASS: TestGosim (0.00s simulated) === RUN TestGosim (seed 2) 1 main/4 14:10:03.000 INF examples/simple_test.go:12 > Are we in the Matrix? true method=t.Logf 2 main/4 14:10:03.000 INF examples/simple_test.go:13 > Random: 5374891573232646577 method=t.Logf --- PASS: TestGosim (0.00s simulated) === RUN TestGosim (seed 3) 1 main/4 14:10:03.000 INF examples/simple_test.go:12 > Are we in the Matrix? true method=t.Logf 2 main/4 14:10:03.000 INF examples/simple_test.go:13 > Random: 3226404213937589817 method=t.Logf --- PASS: TestGosim (0.00s simulated) ok translated/github.com/jellevandenhooff/gosim/examples 0.254s


Op dinsdag 10 december 2024 om 13:20:34 UTC-8 schreef Jason E. Aten:

Jelle van den Hooff

unread,
Dec 10, 2024, 6:41:15 PM12/10/24
to golang-nuts
Hi Roger, thanks for the compliment.

Yes, there is quite some overlap with the new "testing/synctest" package. The tests you can write with synctest I think you can also write with gosim, as gosim's scheduler does what synctest does: If threads are paused, synctest and gosim both advance an internal clock, and so tests that take a long wall-clock time can be fast in both.

I think synctest is an interesting point in the design space. In Go tests you can use interfaces to mock the OS, the network, etc, but time and scheduling is impossible to mock because you don't know when goroutines are blocked. Synctest fixes that, and once you have synctest, you can test almost all the same scenarios as in Gosim _if_ you mock all interactions with the OS and avoid using any shared global state.

The trade-off is where the complexity is: With mocks and synctest you do not need significant changes to the runtime, but none of your code (or your dependencies) can use standard OS calls. With Gosim, the program under test does not need to change, but you rely on a more complicated mocking and rewriting mechanism. Practically this means Gosim can test programs using Bolt (https://pkg.go.dev/go.etcd.io/bbolt, https://github.com/jellevandenhooff/gosim/blob/main/examples/bolt/bolt_test.go) and test how Bolt behaves when a machine restarts without having to change any of the code in Bolt.

You could perhaps reuse the underlying mocks (for a network that drops packets, etc.) between Gosim and synctest. However, Gosim currently integrates at the syscall layer, so the interface exposed is quite different than the high-level mocks you would need to replace os.File, net.Conn, etc. In an earlier version of Gosim I tried mocking those higher-level interfaces, but I found it quite difficult: The API-surface is broad and not nearly as well-defined as Posix. Simulating that API accurately is important for testing error handlers that match error types returned by a net.Conn.

Gosim also adds determinism (running the same test twice results in the same output) which is helpful if you are trying to debug rare failures. You can imagine future Antithesis-like tricks to test behavior: Run with same seed up to an interesting simulated time, and then change the seed. I think adding that to synctest would be quite difficult.

This blog post https://www.polarsignals.com/blog/posts/2024/05/28/mostly-dst-in-go describes yet another approach, running go with the -faketime flag (used on the go playground) inside of wasm to get deterministic execution and standard OS calls by interposing at the wasm-syscall boundary, which means the program needs to build under wasm.

Jelle
Op dinsdag 10 december 2024 om 14:53:22 UTC-8 schreef roger peppe:

Jason E. Aten

unread,
Aug 27, 2025, 1:41:06 PM (9 days ago) Aug 27
to golang-nuts
Hi Jelle,

Gosim is all the more impressive now that I've tried my hand at writing 
tests with synctest. It is indeed very, very hard to get strict determinism out
of the Go runtime. The fact that Gosim emulates Linux at the system
call level is beyond impressive. I fully get now why Gosim has to
translate all Go source to use the Gosim deterministic runtime. 

Gosim is obviously the result of alot of hard and painstaking work.
Thank you, and congratulations on getting it to this point.

I'm able to run gosim test when built under go1.23.5, but later Go 1.24 and 1.25
seem to have difficulties -- I think because of the linkname (restrictions? changes?)
that make some things inaccessible.

~/go/src/github.com/jellevandenhooff/gosim/examples/etcd (main) $ gosim test -v

2025/08/27 12:28:46 ERROR missing function body pkg=internal/runtime/sys name=GetCallerPC

Does gosim strictly need linkname magic? Is there some approach to fixing Gosim

to work with either of the last two Go versions, given the new linkname restrictions

and/or updates?  Have you been able to make Gosim work with Go 1.25 for instance?

Thanks!

Jason


Reply all
Reply to author
Forward
0 new messages