10ms delay before receiving message from closed buffered channel

242 views
Skip to first unread message

Yicheng Qin

unread,
Jun 24, 2015, 5:27:51 PM6/24/15
to golang-nuts, Xiang Li
Hi all,

We observe at least 10ms delay before receiving message from closed buffered channel. We met this when we run some code basically like this:

```
chs := make([]chan struct{}, 10000)
// init
for i := range chs {
  chs[i] = make(chan struct{}, 1)
}
// close
for _, ch := range chs {
  close(ch)
}
// receive
for _, ch := range chs {
  select {
  case <-ch:
  case <-time.After(10 * time.Millisecond):
    t.Fatalf("")
  }
}
```

The test environment is under --cpu 4 --race on travis(2 bursted cpu) when running `go test` for the whole test set. After I adjust the timeout to 1s, I didn't observe this behavior anymore. Is this behavior expected?

I check the spec, which says that `After calling close, and after any previously sent values have been received, receive operations will return the zero value for the channel's type without blocking.`

Brad Fitzpatrick

unread,
Jun 24, 2015, 10:01:47 PM6/24/15
to Yicheng Qin, golang-nuts, Xiang Li
OS? Arch? Go version?

I assume linux-amd64 from coreos.com and -race. But which Go version?

I wonder if you're just generating lots of garbage and causing the GC to run, making that time.After channel 10000 times.



--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yicheng Qin

unread,
Jun 25, 2015, 12:20:11 PM6/25/15
to Brad Fitzpatrick, golang-nuts, Xiang Li
`uname -a`: Linux testing-worker-linux-docker-545af3bd-3302-linux-3 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

`lsb release -a`: Description: Ubuntu 12.04.5 LTS

`go version`: go version go1.4 linux/amd64

the test is run under `--race --cpu 1,2,4`

I added the trace flag and got this:
```
2015/06/25 15:58:49 TestWaitTestStress start
gc3(1): 2+0+140+1 us, 0 -> 0 MB, 338 (392-54) objects, 6 goroutines, 32/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc4(1): 1+0+132+0 us, 0 -> 0 MB, 678 (877-199) objects, 6 goroutines, 41/23/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc5(1): 1+0+141+0 us, 0 -> 0 MB, 1135 (1583-448) objects, 6 goroutines, 42/34/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc6(1): 1+0+162+0 us, 0 -> 0 MB, 1786 (2593-807) objects, 6 goroutines, 49/34/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc7(1): 1+0+232+0 us, 0 -> 0 MB, 2548 (3880-1332) objects, 6 goroutines, 59/42/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc8(1): 1+1+365+0 us, 0 -> 0 MB, 5270 (7240-1970) objects, 6 goroutines, 88/52/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc9(1): 2+1+362+1 us, 0 -> 1 MB, 10159 (13796-3637) objects, 6 goroutines, 140/80/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc10(1): 2+1+568+0 us, 0 -> 1 MB, 12398 (19316-6918) objects, 6 goroutines, 168/133/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
--- FAIL: TestWaitTestStress (0.20s)
wait_time_test.go:67: cannot receive from ch as expected
```

If any channel failed to receive the response in 10ms, the test fails.

Russ Cox

unread,
Jun 25, 2015, 1:11:32 PM6/25/15
to Yicheng Qin, Brad Fitzpatrick, golang-nuts, Xiang Li
In this code:


  select {
  case <-ch:
  case <-time.After(10 * time.Millisecond):
    t.Fatalf("")
  }

If the second case triggers, it does not imply that the channel is not closed (obviously it is closed, since you closed it before executing the select). It means that in the time between time.After setting the timer for its channel and the time that select looked at the channel, the timer went off. As Brad speculated, it could be garbage collection. It could also be that your process was interrupted by the operating system so that some other process could run. There are many possible reasons. I would not worry about it happening once in a while. It might be more interesting to check how many times it happens out of the 10000.

Russ

Yicheng Qin

unread,
Jun 25, 2015, 1:34:44 PM6/25/15
to Russ Cox, Brad Fitzpatrick, golang-nuts, Xiang Li
Got it. So i will make it wait longer to pass every time.

The long delay starts to happen when we set --cpu > 1. It happens once per 100 tests, so the probability is 1 out of 1 million times.

It confused me most due to its order. When it comes to the select statement, the `ch` has been closed, and `time.After` is not called. But it goes into the `time.After` case finally. I guess its internal implementation is to 1. execute `time.After` 2. wait for two channels 3. select one from ready channels randomly.

Brad Fitzpatrick

unread,
Jun 25, 2015, 4:28:35 PM6/25/15
to Yicheng Qin, Russ Cox, golang-nuts, Xiang Li
The time.After _is_ evaluated each time.

From http://golang.org/ref/spec#Select_statements :

`
Execution of a "select" statement proceeds in several steps:

1) For all the cases in the statement, the channel operands of receive operations and the channel and right-hand-side expressions of send statements are evaluated exactly once, in source order, upon entering the "select" statement. 
...
`


Yicheng Qin

unread,
Jun 25, 2015, 6:13:26 PM6/25/15
to Brad Fitzpatrick, Russ Cox, golang-nuts, Xiang Li
Exactly what I need to know.
Thanks for the help to track down this!
Reply all
Reply to author
Forward
0 new messages