Virtual time for testing

384 views
Skip to first unread message

Christian Worm Mortensen

unread,
Jan 28, 2021, 4:15:50 PM1/28/21
to golang-nuts
Hi!

Suppose I want to unit test this function:

func generator() <-chan int {
ret := make(chan int)
go func() {
for i := 0; i < 10; i++ {
ret <- i
time.Sleep(time.Second)
}
}()
return ret
}

What is a good way to do that? One way is to do it is like this:

func testGenerator() {
start := time.Now()
g := generator()
for i := 0; i < 10; i++ {
v := <-g
if v != i {
panic("Wrong value")
}
}
elapsed := time.Now().Sub(start)
if elapsed < 9*time.Second || elapsed > 11*time.Second {
panic("Wrong execution time")
}
}

However there are several issues with this:

1) The unit test takes a long time to run - 10 seconds.
2) The unit test is fragile to fluctuations in CPU availability
3) The unit test is not very accurate

Of course this is a simple example. But what if I want to test a complicated piece of code with many go routines interacting in complicated ways and with long timeouts?

In other programming languages, I have been able to implement a form of virtual time which increases only when all threads are waiting for time to increase. This allows functions like generator above to be tested basically instantly and this has been extremely useful for me in many projects over the years.

Can I do something similar in Go? I would expect I would need to wrap time.Now, time.Sleep and time.After which I will be happy to do.

I can see that Go has a deadlock detector. If somehow it was possible to have Go start a new Go routine when a deadlock was detected, I think it would be pretty straight forward to implement virtual time as described. I could then do something like:

runtime.registerDeadlockCallback(func () {
  // Increase virtual time and by that:
  //  * Make one or more wrapped time.Sleep calls return or 
  //  * Write to one or more channels returned by wrapped time.After.
})

Obviously this would only be needed for test code, not production code.

Thanks,

Christian

Amnon

unread,
Jan 28, 2021, 4:33:33 PM1/28/21
to golang-nuts
Try something like 
github.com/facebookgo/clock

Christian Worm Mortensen

unread,
Jan 28, 2021, 5:28:19 PM1/28/21
to Amnon, golang-nuts
Hi Amnon,

Thank you for your suggestion. I have taken a look at the package but it does not seem to really work. It seems to rely on runtime.Gosched() to suspend the current go routine until all other go routines are blocked. That is, it relies on  runtime.Gosched() to provide functionality similar to what I wanted  with the fictional runtime.registerDeadlockCallback function.

However, runtime.Gosched() is not documented to do that. Also, I confirmed with a test that it also does not do that in practice. Here is what I did for anyone interested. I ran this program based on the example from https://github.com/facebookarchive/clock:

package main

import (
  "fmt"
  "runtime"
  "time"

  "github.com/facebookarchive/clock"
)

func main() {
  mock := clock.NewMock()
  count := 0

  // Kick off a timer to increment every 1 mock second.
  go func() {
    ticker := mock.Ticker(1 * time.Second)
    for {
      <-ticker.C
      count++

      // New code inserted by me to use CPU:
      j := rand.Int()
      for i := 1; i < 10000000; i++ {
        j++
      }
      fmt.Printf("Value: %v\n", j)
    }
  }()
  runtime.Gosched()

  // Move the clock forward 10 second.
  mock.Add(10 * time.Second)

  // This prints 10.
  fmt.Println(count)
}

Now, with my modification the program no longer prints 10. For me it printed:

Value: 9999999
Value: 9999999
Value: 9999999
4

Best,

Christian

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/Y9Ccen0uMcs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/d58a4e2a-eead-4ec9-b9a9-c0a43a699e89n%40googlegroups.com.

Jesper Louis Andersen

unread,
Jan 29, 2021, 9:45:18 AM1/29/21
to Christian Worm Mortensen, golang-nuts
On Thu, Jan 28, 2021 at 10:15 PM Christian Worm Mortensen <c...@epust.dk> wrote:
Suppose I want to unit test this function:

func generator() <-chan int {
ret := make(chan int)
go func() {
for i := 0; i < 10; i++ {
ret <- i
time.Sleep(time.Second)
}
}()
return ret
}

What is a good way to do that?


Abstract away time.Sleep(time.Second). Set up a channel in lieu of the sleep and use that in the code instead. Because you have `select { ... }` you can even listen to multiple different events for firing should it become necessary. Some rough things:

In real code, you often want a `context.Context` inside your goroutines, mainly in order to be able to tear down goroutine networks once they aren't useful anymore. This sets up the code neatly for a channel based approach.

In real code, you often want something to progress based on an event trigger, rather than a sleeping pattern. This can often speed up processing by quite a lot, because you can do something as soon as you are ready to do it.

In real code, there is a chance you can arrange the above code such that your goroutine never sleeps, but the sleep is on the receiver side. This can improve parallelism and also open up the possibility of latency hiding in the communication.

As a general rule: you often want time to be an injected parameter in a system rather than something the system asks for when it's needed. The reason being that you now control time in the test scenario[0]. This often generalizes to communication.

[0] Aside: There's a clear similarity to linear logic here. In a two-agent LL system, choice can be made either by the test harness or by the system-under-test. You usually want to move as much choice onto the side of the test-harness as to minimize the need for extensive mocking in your program. In LL, there's two "or-operators" corresponding to disjunction of each party.

 

--
J.

Volker Dobler

unread,
Jan 29, 2021, 10:11:34 AM1/29/21
to golang-nuts
One way to do this is have an internal implementation like
func generatorImpl(sleep func(time.Duration)) <-chan int
and func generator just calls that one with time.Sleep.
Tests are done against generatorImpl where you know have
detailed control of how much (typically none) time is
actually slept. 

Expiration of cookies is tested in that way, see e.g.
So while technically Jar.Cookies is never tested the
risk is basically nil.

V.

mspr...@us.ibm.com

unread,
Jan 29, 2021, 3:24:35 PM1/29/21
to golang-nuts
Volker: injecting sleep is a nice idea, in the general vein that Jesper said of injecting time.  However, as soon as we zoom out a step and need to test both that generator and the goroutine(s) consuming and acting upon that channel activity, we get back to the essence of the original question: how to test when we have a bunch of goroutines doing stuff and the test needs to wait for them all to finish before advancing time?

FYI, in Kubernetes we have done something similar to the Facebook clock package --- but recently we have called out the narrower interface used by code that only reads time.  See PassiveClock in https://github.com/kubernetes/utils/blob/master/clock/clock.go and https://github.com/kubernetes/apimachinery/blob/master/pkg/util/clock/clock.go (yeah, we have two forked lines of development of this clock thing, sigh).

The pattern of using channel activity to coordinate asynchronous activity is inherently inimical to what the original poster asked for.  An alternative is to define clocks that run procedures rather than do channel sends.  See the EventClock in https://github.com/kubernetes/apiserver/blob/master/pkg/util/flowcontrol/fairqueuing/testing/clock/event_clock.go .  A mocked one of those could know when all the timed activities have completed --- if all the timed activities were synchronously contained in EventFuncs.  Sadly this is too restrictive a pattern for a lot of real code.  You will see in that package an additional idea: explicitly tracking (at "user level") when the goroutines in question block/unblock.  This is painful, but I see no better way (given the golang runtime interface as it is defined today).

Regards,
Mike

Christian Worm Mortensen

unread,
Jan 30, 2021, 3:13:17 PM1/30/21
to mspr...@us.ibm.com, golang-nuts
Hi Mike,

Thank you for your consideration. I think you exactly got the essence of my question: How do I wait on all go routines to finish (or be blocked on one or more channels) before advancing time.

A key thing I would like from such a solution is that it does not require too heavy modifications to the code to be tested or put restrictions on how it can do things.

I think it may be possible to solve it with some explicit check in / check out as I think you also suggest. I guess in essence you will check out before you call select and check in again after select is done waiting. I think this will still not work if buffered channels are used. But maybe if buffered channels are mocked, it may be doable.

I think I may want to make a feature request on this. I see several options:

* Make a version of runtime.gosched that only returns when no other go routines can run
* Make it possible to read the number of go routines that are ready to run. You could then make a loop where you call runtime.gosched until that value is 0.
* Make it possible to start a special go routine when the system is deadlocked.

One problem is what to do if the program is waiting on external IO such as the completion of an HTTP request. I guess in an ideal solution it would be possible for the program to decide if it will advance time in that situation or not.

Please let me know if you have any ideas of other things to put into the feature request.

Thanks,

Christian

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/Y9Ccen0uMcs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

Christian Worm Mortensen

unread,
Jan 31, 2021, 4:14:23 AM1/31/21
to mspr...@us.ibm.com, golang-nuts
  I ended up creating an issue on this: https://github.com/golang/go/issues/44026  

roger peppe

unread,
Feb 1, 2021, 5:54:58 AM2/1/21
to Christian Worm Mortensen, mspr...@us.ibm.com, golang-nuts
On Sat, 30 Jan 2021 at 20:12, Christian Worm Mortensen <c...@epust.dk> wrote:
Hi Mike,

Thank you for your consideration. I think you exactly got the essence of my question: How do I wait on all go routines to finish (or be blocked on one or more channels) before advancing time.

This is an interesting problem that's not currently solved.

In the past, I've made a lot of use of this package: https://pkg.go.dev/github.com/rogpeppe/clock
It was originally developed as part of Canonical's Juju project.

To wait for goroutines to finish, you can use the WaitAdvance method, which waits for at least n goroutines to block
on the clock before advancing time. This relies, of course, on all the code under test using the Clock interface,
but that's not usually that hard to arrange.

There are a couple of deeper problems with this particular approach though:

 - in order to use WaitAdvance, you need to know the total number of goroutines involved, but this is implementation-dependent, so someone
can break tests by making an apparently innocuous change that happens to change goroutine count.

- it's still easy to make mistakes. It's easy to assume that when the goroutines are blocked, the state that you're trying to observe
is static, but there may well be other goroutines still running that have previously been triggered. This means that one
can end up polling state anyway if you're trying to test behaviour of an independent "agent" goroutine.

In the end, I've largely given up on this fake clock approach in favour of testing with real time (on the order of 10s of milliseconds not seconds)
and polling to wait for externally visible changes. This approach isn't ideal either - if you make the time intervals too short, your
tests will be flaky; too long and you're waiting too long for tests to run. But at least the tests aren't relying on details of
the internal implementation.

I'd love to see a way of fixing this in the standard Go runtime, but it's not easy. Goroutines can be blocked in system calls (e.g. making an HTTP call),
while still making progress, so just "wait for everything to be blocked before advancing the clock" isn't a sufficient heuristic.
Also I'm not sure that a single clock is good enough because you might well want to be able to time out your tests even as you're faking out
the clock for the code being tested.

  cheers,
    rog.


You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CABTkUoZcX8LHgmO1_NA3Bh9vN071s_H6dv-%3DMMrgTAXbji%2BpEA%40mail.gmail.com.

Christian Worm Mortensen

unread,
Feb 2, 2021, 1:30:24 PM2/2/21
to roger peppe, mspr...@us.ibm.com, golang-nuts
Hi Roger,

Thank you for sharing how you have solved the problem in the past and the problems you have had. As I see it, my proposal would solve your problem perfectly in many cases without the need to keep track of anything. If you like it, it may be helpful to express your support: https://github.com/golang/go/issues/44026

Thanks,

Christian
Reply all
Reply to author
Forward
0 new messages