Converting internal runtime APIs to use deadlines instead of timeouts?

Matthew Dempsky

unread,

Sep 11, 2014, 1:19:13 PM9/11/14

to golan...@googlegroups.com

Consider this sequence of function calls in package runtime:

- timerproc() calculates now := nanotime(), runs any expired timer callbacks, calculates delta := t.when - now for the earliest unexpired timer, and then calls notetsleepg(..., delta)

- notetsleepg() (actually notetsleep_internal()) calculates deadline = nanotime() + ns, then each iteration calls semasleep(ns) or futexsleep(..., ns) and then re-calculates ns = deadline - nanotime()

- semasleep() on {openbsd,netbsd} needs to compute again deadline = nanotime() + ns and call the low-level sleep syscall with deadline, though other OSes seem to natively use the relative timeout ns directly

It seems like all OSes are potentially affected by a bit of deadline drift (e.g., if running timer callbacks takes a non-trivial amount of time), and on openbsd and netbsd in particular most of the nanotime() calls are unnecessary.

Would it make sense to convert this chain of calls to use absolute deadlines throughout, and push the responsibility for calculating deadline - nanotime() (on OSes that need a relative timeout) down into {futex,sema}sleep() where it can also minimize deadline drift? If so, is that a change that's still appropriate for 1.4, or should I file an issue and wait for the next release cycle?

Russ Cox

unread,

Sep 11, 2014, 3:23:04 PM9/11/14

to Matthew Dempsky, golang-dev

Changing the code carries risk. I don't see any benefit to offset that risk. This seems like a non-bug to me.

Russ

Matthew Dempsky

unread,

Sep 11, 2014, 4:10:37 PM9/11/14

to Russ Cox, golang-dev

On Thu, Sep 11, 2014 at 12:22 PM, Russ Cox <r...@golang.org> wrote:

Changing the code carries risk. I don't see any benefit to offset that risk. This seems like a non-bug to me.

The hypothesized benefit is that this could reduce sporadic delays in timer callback invocations, which I'm hoping could fix openbsd/386's timer flakiness (e.g., http://build.golang.org/log/bd19f8590db8c7f317b3f968ac1af5126f5449cf). But I appreciate your concern about risk, so I'll do some measurements to see if I can demonstrate whether timer delays are actually happening or not.

Matthew Dempsky

unread,

Sep 11, 2014, 6:05:22 PM9/11/14

to Russ Cox, golang-dev

On Thu, Sep 11, 2014 at 1:10 PM, Matthew Dempsky <mdem...@google.com> wrote:

I'll do some measurements to see if I can demonstrate whether timer delays are actually happening or not.

I modified package runtime to track the discrepancy between the nanotime() call in timerproc() and the subsequent one in notetsleep_internal(), and tested running time.test -test.cpu=1,2,4 in a loop on both linux/amd64 (Z620 desktop) and openbsd/386 (4-core VMware VM).

After running that for an hour so far, I've seen linux get a 10ms delay once, both have had a couple delays in the 2-4ms range, delays in the 100s of microsecond range every 10 or so seconds on openbsd and every minute or so on linux, but the rest of the time it's even smaller.

Doesn't seem anywhere near the frequency/magnitude needed to exceed time_test's 200ms allowance, so I'll continue looking elsewhere for a root cause of openbsd/386's flakiness.

Reply all

Reply to author

Forward