making lifeguard more durable

1 view
Skip to first unread message

David Kavanagh

unread,
Aug 8, 2008, 4:38:31 PM8/8/08
to lifegu...@googlegroups.com
I always had in mind that certain information would be backed up in
some way. This would allow lifeguard to restarted if it fails and pick
up where it left off. The instance list is really the only thing that
needs to be saved. This could be flushed to a file at strategic times
(like a write through cache). Another option is to use SimpleDB. I
don't want to just implement this with SimpleDB because that service
is still restricted. Probably the best option is to implement an
instance list backing store, via an interface. We'd still use the
in-memory list for fast response, but there would be a module to call
to flush dirty values out. I could provide a disk based version and a
SimpleDB version. I imagine a MySQL version would be nice to have.

Taking this to the next level would be a hot backup lifeguard pool
manager (running on another server). This would be a lot easier to
implement if the instance list backing store were in SimpleDB or a
host MySQL server (i.e. outside of the lifeguard pool manager instance
itself).

David

Chris Liebman

unread,
Aug 8, 2008, 4:44:08 PM8/8/08
to lifegu...@googlegroups.com
I'd also like to see a plugable model for starting and stopping
instances (esp usefull where instances need 20-60GB of data loaded
befor they start processing work) and a plugable way to auto add
instances that are started outside (same long startup time.... 2+
hours).
-- Chris

dkav...@gmail.com

unread,
Aug 28, 2008, 6:53:04 PM8/28/08
to lifeguard-dev
I just commited code that has improved service instance status
reporting. Essentially, there is a status thread that runs, sending
status every 30 seconds. It reports the duty cycle it keeps track of
and the current busy/idle state. so, the pool manager has a much
better idea of the state of the instance. Status is supplied if the
instance is busy or idle for long periods of time, or crunching
through work quickly. Leveraging this in the pool manager, I was able
to check for the last status report and if the instance hasn't
reported status recently enough (and this is configurable as
"laggardLimit") the instance is terminated and replaced with a new
one.

David
Reply all
Reply to author
Forward
0 new messages