Here is the semantics:
- A freshly created watchdog starts in the idle state and is activated
by a call to Reset.
- Every call to Reset will eventually cause an send on the Timeouts
channel unless a subsequent Reset follows
- When the watchdog times out it will only send a single bool on the
Timeouts channel
- Once the timeout has occurred but before it has been handled calling
Reset does not affect state.
- Once the timeout has been handled the watchdog returns to the idle state.
Comments and ideas welcome. As I posted I noticed a bug that sometimes
causes timeouts to occur too late, see if you can spot it too. If
there is interest I could clean this up for inclusion in time.
Kai
type Watchdog struct {
resets chan int64
Timeouts chan bool
}
func NewWatchdog() *Watchdog {
wd := &Watchdog{
resets: make(chan int64, 50),
Timeouts: make(chan bool),
}
go wd.loop()
return wd
}
func (wd *Watchdog) loop() {
var t0, t1 int64
var ok bool
idle:
t0 = <-wd.resets
t1, ok = <-wd.resets
for ok {
t0 = t1
t1, ok = <-wd.resets
}
loop:
time.Sleep(t0)
t1, ok = <-wd.resets
if !ok {
wd.Timeouts <- true
goto idle
}
for ok {
t0 = t1
t1, ok = <-wd.resets
}
goto loop
}
func (wd *Watchdog) Reset(timeoutNS int64) {
wd.resets <- timeoutNS
}
--
Kai Backman, programmer
http://tinkercad.com - The unprofessional solid CAD
Kai
type watchdog struct {
resets chan int64
timeouts chan bool
}
func newWatchdog() *watchdog {
wd := &watchdog{
resets: make(chan int64, 200),
timeouts: make(chan bool),
}
go wd.loop()
return wd
}
func (wd *watchdog) pump(t0 int64) (t1 int64) {
t1 = t0
for {
select {
case t := <- wd.resets:
if t > t0 {
t1 = t
}
default:
return
}
}
panic("unreachable")
}
func (wd *watchdog) loop() {
var t0 int64
idle:
t0 = <-wd.resets
t0 = wd.pump(t0)
loop:
time.Sleep(t0 - time.Nanoseconds())
now := time.Nanoseconds()
t0 = wd.pump(now)
if t0 == now {
wd.timeouts <- true
goto idle
}
goto loop
}
func (wd *watchdog) reset(timeoutNS int64) {
wd.resets <- timeoutNS+time.Nanoseconds()
}
On Sat, Feb 5, 2011 at 12:14 AM, Kai Backman <kai.b...@gmail.com> wrote:
> I recently implemented the client part of our lockserver and needed to
> handle some tricky cache and lock expiration conditions. Nothing in
> time felt like a good fit so I wrote a simple watchdog timer, code
> below.
--
> I recently implemented the client part of our lockserver and needed to
> handle some tricky cache and lock expiration conditions. Nothing in
> time felt like a good fit so I wrote a simple watchdog timer, code
> below.
Have you played with Timer:
http://golang.org/pkg/time/#Timer
It looks very similar in essence.
--
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter
On Mon, Feb 7, 2011 at 4:24 PM, Gustavo Niemeyer <gus...@niemeyer.net> wrote:
> Have you played with Timer:
>
> http://golang.org/pkg/time/#Timer
I was aware of Timer. I'm curious, could you give me an example how
you would achieve identical semantics using Timer?
Kai
I can't because you haven't really described the problem you're trying
to solve, but the primitives you describe seem to be available in Timer.
If you can describe why Watchdog solves your problem when Timer
does not, maybe it'd be easier to figure what's the delta between
them and if there's a way to merge the two.
A watchdog has two parts to the API: a reset function and a channel
for timeouts. It contains an internal timer that is decrementing while
the watchdog is active. Each call to reset sets the timer to a new
value and if the timer ever hits 0 we need to send a notification on
the channel. Once the timer has tripped but before the notifican has
been acknowledged reset should not cause the notification to be lost.
There is some similarity to how watchdog timers work in embedded
systems. I'm using the watchdog to identify when a lockserv client
should disable it's cache and enter session jeopardy and later when to
drop locks. Each successful KeepAlive from the master resets the
watchdog to the session lease time given by the master. There is a
more detailed explanation of the problem in the Chubby paper by Mike
Burrows, specifically in the section on client caching.
Up to here it's pretty much a matter of, on each Reset, stopping the
prior Timer and then creating a new Timer.
> Once the timer has tripped but before the notifican has
> been acknowledged reset should not cause the notification to be lost.
This sounds like the actual distinction. Timer won't work like this. A call
to Reset will cancel the notification delivery it not yet sent.
I get a fuzzy feeling out of the idea that the logic is synchronizing solely
on the clock. Time will pass between the calls to Nanosecond() and the
use of the values, so it feels like false determinism. I don't know enough
about your context to tell if that's a problem or not, though.
it seems to me that your implementation doesn't quite
do this because sends to the Timeouts channel only
block until the notification handler is ready to *start*
processing the notification, not until it has finished.
for instance, the following code will print
"notification received" twice, not once,
even though Reset it called while the first event
is being processed:
w := NewWatchdog()
go func() {
for _ = range w.Timeouts {
fmt.Println("notification received")
time.Sleep(0.5e9)
}
}()
w.Reset(0.2e9)
time.Sleep(0.3e9)
w.Reset(0.1e9)
time.Sleep(1e9)
to get around this, you could add a back channel to say
when processing is complete; or you could use
a function call instead of a channel.
func NewWatchdog(timeout func())
i don't really know if Watchdog belongs in the time package.
it seems a little specialist to me.
here's a version that uses the existing time functionality.
it's a little shorter, but not as pretty as your version.
type Watchdog struct {
mu sync.Mutex
handling bool
deadline int64
Timeouts chan bool
timer *time.Timer
}
func NewWatchdog() *Watchdog {
return &Watchdog{Timeouts: make(chan bool)}
}
func (w *Watchdog) Reset(t int64) {
t += time.Nanoseconds()
w.mu.Lock()
if t <= w.deadline {
return
}
if w.timer != nil {
w.timer.Stop()
}
w.timer = time.AfterFunc(t-time.Nanoseconds(), func() {
// If previous timeout is still being handled, then
// ignore this timeout.
w.mu.Lock()
if w.handling {
w.mu.Unlock()
return
}
w.handling = true
w.mu.Unlock()
w.Timeouts <- true
w.mu.Lock()
w.handling = false
w.mu.Unlock()
})
w.deadline = t
w.mu.Unlock()
}
On 7 February 2011 14:14, Kai Backman <kai.b...@gmail.com> wrote:
> if t > t0 {
if t > t1?
> time.Sleep(t0 - time.Nanoseconds())
i find it interesting that any code dealing with time
in a non-trivial way ends up using absolute time
(using relative time being one source of bugs in
your original version).
that's true of the time package functions internally too.
every time one switches back and forth between
relative and absolute time, there's a risk of some slippage.
i wonder if it wouldn't be useful to have
absolute time versions of the functions in the
time package (time.SleepAbs, time.AfterAbs etc)
to help with this.
s/some/more/. It is there either way.
> i wonder if it wouldn't be useful to have
> absolute time versions of the functions in the
> time package (time.SleepAbs, time.AfterAbs etc)
> to help with this.
Why? It's not precise no matter what.
using absolute times means that we can get closer
to the theoretical optimum for the system.
on my system, each time you do:
dt := t - time.Nanoseconds()
t = time.Nanoseconds() + dt
t loses between 1.2 and 5 microseconds.
when Kai's code runs, that transition happens 2.5 times,
so that's *at least* an unnecessary 3 microseconds of inaccuracy
with every call.
>> i wonder if it wouldn't be useful to have
>> absolute time versions of the functions in the
>> time package (time.SleepAbs, time.AfterAbs etc)
>> to help with this.
>
> Why? It's not precise no matter what.
mostly as a matter of convenience.
i'd prefer time.SleepAbs(t) to time.Sleep(t - time.Nanoseconds())
it's less error-prone if you can work in absolute times throughout.
You wanna save 3ms on a call to sleep. That's as close to early
optimization as I've seen.
to look at it another way, a call to syscall.Sleep takes about 6us on my system.
in that context 3µs is not negligible.
nonetheless, i still think the strongest argument is the second one -
it's less error prone to work in absolute time throughout. it simplifies
the code.
If *sleeping* for 3us more when a goroutine wakes up is not
negligible, Go is not suitable.
> nonetheless, i still think the strongest argument is the second one -
> it's less error prone to work in absolute time throughout. it simplifies
> the code.
Sure, that's a different argument. Pretty much every sleep I see is a
delta, but you can argue about different use cases of course.
In this particular context depending on the clock is acceptable. We do
note require different servers to have synchronized clocks, we only
require them to advance time at roughly the same rate. For the
jeopardy timeout both the server and client treat it conservatively,
clocks could be advancing with a speed difference of almost 5% and
things would still work out correctly.
On Tue, Feb 8, 2011 at 2:23 PM, roger peppe <rogp...@gmail.com> wrote:
> is the stipulation that notification events should not be sent
> when a notification is being handled.
> it seems to me that your implementation doesn't quite
> do this
This is correct. I also happen to know that new reset calls wont
happen while the notification is processed, which makes the
implementation slightly simpler but also more domain dependent.
> i don't really know if Watchdog belongs in the time package.
> it seems a little specialist to me.
I've come to the same conclusion as we are having this discussion.
There are clearly needs for more complicated timing constructs but as
we are dissecting this one example it is obvious how domain specific
it is. I'm assuming that other solutions will be equally domain
specific, but in a different way.
> if t > t1?
fixed, thanks.
> i find it interesting that any code dealing with time
> in a non-trivial way ends up using absolute time
I agreed. I've learnt this several times over the years but still the
original delta time version was somehow temptingly "easier". And
buggy.
Thanks for the feedback both of you.