server pool ramp down change

dkav...@gmail.com

unread,

Oct 1, 2009, 5:49:15 PM10/1/09

to lifeguard-dev

I've been doing performance testing on a project at work that uses
lifeguard. I've plotted queue size vs # servers (and number of servers
reporting ready). I have a nice graph I can share.
One thing I noticed is that when the ramp down (after work is done)
happens, servers (by design) aren't killed till the 55 minutes
expires. This was put in to keep from killing servers, only to have to
restart within the hour if more work comes in. The idea is that it
saves money by keeping the server up for more of it's billed hour.
So, the problem with the current implementation is that the pool
manager doesn't track how many servers are slated to be terminated.
Once servers are at that 55 minute mark, it will start killing them.
The down side (I found from testing) is that some servers stay up into
that 2nd billable hour when the would have normally been killed.

Tomorrow, I'm going to implement code to track # of servers slated for
termination. Then, as the 55 minute mark passes for the servers, check
to see if we were going to shrink the pool, and kill those servers as
necessary. So, the side effect is that if I had 12 servers spun up for
10 minutes of work, I might get some or all of those killed at the 55
minute mark based on ramp down knobs.
Also, I need to figure out a good strategy for the case where more
work comes in later in the hour. How many of the servers that I was
going to terminate do I remove from the list? I hope to figure that
out by experimentation and some good old fashion reasoning.

David

shlomo....@gmail.com

unread,

Oct 1, 2009, 5:59:52 PM10/1/09

to lifegu...@googlegroups.com

David,

I'd like to see that graph, please.

I submitted a patch back in March (here is the thread
http://groups.google.com/group/lifeguard-dev/browse_thread/thread/96d379966057cbf9#
) to do something similar to what you're discussing. It adds a
configuration parameter "minMinutesPerBillableHour". Terminating an
instance is suppressed until it has been running for at least
minMinutesPerBillableHour in its current billing hour.

My patch hasn't made it into the code base yet - but I've been using
it in production for half a year already with no problems.

.. Shlomo

--
Shlomo Swidler
Cloud Computing Developer and Amazon EC2 Expert
http://clouddevelopertips.blogspot.com

David Kavanagh

unread,

Oct 1, 2009, 6:34:19 PM10/1/09

to lifegu...@googlegroups.com

OK, cool. I'll look at your code first then. No need to reinvent the wheel

David

David Kavanagh

unread,

Oct 2, 2009, 8:31:25 AM10/2/09

to lifegu...@googlegroups.com

Here's the graph. If you have some graphs (perhaps from a presentation
you did), I'd love to add those also.
Another thing is that we're going to do a 24 hour work simulation
sometime in the next week or so, which should give a lot of
interesting results. Of course, your production scaling data might
also be of interest.

David

David Kavanagh

unread,

Oct 2, 2009, 8:31:32 AM10/2/09

to lifegu...@googlegroups.com

http://code.google.com/p/lifeguard/wiki/ScalingResults

shlomo....@gmail.com

unread,

Oct 2, 2009, 8:36:15 AM10/2/09

to lifegu...@googlegroups.com

David,

Thanks for sharing - I look forward to seeing the 24-hour graphs of
performance under varying load.

Unfortunately I haven't been keeping this data around for my
production servers. I'll see what I can do to persist it.

.. Shlomo

David Kavanagh

unread,

Oct 2, 2009, 8:50:06 AM10/2/09

to lifegu...@googlegroups.com

I'm going to add another stat in the lifeguard logging. Currently it
logs the # servers it thinks are running. I also want to log the #
servers reporting ready. That'll make that 3 line in the graph easier
to gather.

David

David

unread,

Oct 2, 2009, 11:08:30 AM10/2/09

to lifeguard-dev

Shlomo,
I was looking over your patch. From what I can tell, it seems to give
terminating preference to instances that are "later" in their current
billable hour. I certainly think this is a good thing. Is there
anything else I'm missing from the patch? I like to see if I can
interpret what the changes do before applying them. In this case, I
have some logic I wanted to put into the pool manager and if I can
simply incorporate the same functionality you added, that would
accomplish the same goal and then some.

Thanks,
David

On Oct 2, 8:50 am, David Kavanagh <dkavan...@gmail.com> wrote:
> I'm going to add another stat in the lifeguard logging. Currently it
> logs the # servers it thinks are running. I also want to log the #
> servers reporting ready. That'll make that 3 line in the graph easier
> to gather.
>
> David
>
>
>
> On Fri, Oct 2, 2009 at 8:36 AM, <shlomo.swid...@gmail.com> wrote:
>
> > David,
>
> > Thanks for sharing - I look forward to seeing the 24-hour graphs of
> > performance under varying load.
>
> > Unfortunately I haven't been keeping this data around for my
> > production servers. I'll see what I can do to persist it.
>
> > .. Shlomo
>

> > On Fri, Oct 2, 2009 at 2:31 PM, David Kavanagh <dkavan...@gmail.com> wrote:
>
> >>http://code.google.com/p/lifeguard/wiki/ScalingResults
>

> >> On Fri, Oct 2, 2009 at 8:31 AM, David Kavanagh <dkavan...@gmail.com> wrote:
> >>> Here's the graph. If you have some graphs (perhaps from a presentation
> >>> you did), I'd love to add those also.
> >>> Another thing is that we're going to do a 24 hour work simulation
> >>> sometime in the next week or so, which should give a lot of
> >>> interesting results. Of course, your production scaling data might
> >>> also be of interest.
>
> >>> David
>

> >>> On Thu, Oct 1, 2009 at 6:34 PM, David Kavanagh <dkavan...@gmail.com> wrote:
> >>>> OK, cool. I'll look at your code first then. No need to reinvent the wheel
>
> >>>> David
>

> >>>> On Thu, Oct 1, 2009 at 5:59 PM, <shlomo.swid...@gmail.com> wrote:
>
> >>>>> David,
>
> >>>>> I'd like to see that graph, please.
>
> >>>>> I submitted a patch back in March (here is the thread

> >>>>>http://groups.google.com/group/lifeguard-dev/browse_thread/thread/96d...

> >>>>> ) to do something similar to what you're discussing. It adds a
> >>>>> configuration parameter "minMinutesPerBillableHour". Terminating an
> >>>>> instance is suppressed until it has been running for at least
> >>>>> minMinutesPerBillableHour in its current billing hour.
>
> >>>>> My patch hasn't made it into the code base yet - but I've been using
> >>>>> it in production for half a year already with no problems.
>
> >>>>> .. Shlomo
>

shlomo....@gmail.com

unread,

Oct 2, 2009, 11:11:32 AM10/2/09

to lifegu...@googlegroups.com

David,

It prevents servers from being terminated until
minMinutesPerBillableHour have passed that hour. When more than one
server is eligible for termination, it favors terminating the one
further into its billable hour.

.. Shlomo

David Kavanagh

unread,

Oct 2, 2009, 11:18:02 AM10/2/09

to lifegu...@googlegroups.com

Yes, I noticed, looking at the logic that's in SVN that the
calculation for min instance lifetime is flawed in that it doesn't
really do modulo 60 minutes. So, anything living 56 minutes is just as
likely to be killed as something living 67 minutes. That's not
something I wanted and planned on addressing.
I didn't really see a need to have the 2 properties that get set to 55
(minutes) though, and would likely just stick with a single one.

David

shlomo....@gmail.com

unread,

Oct 3, 2009, 2:47:55 PM10/3/09

to lifegu...@googlegroups.com

Cool, whatever code allows the behavior is fine with me - I'm not
attached to the particular patch I submitted. :-)

David Kavanagh

unread,

Oct 3, 2009, 4:08:51 PM10/3/09

to lifegu...@googlegroups.com

I ended up taking the bulk of you code. I have some testing to do before a checking.

David

On Oct 3, 2009 2:48 PM, <shlomo....@gmail.com> wrote:

Cool, whatever code allows the behavior is fine with me - I'm not
attached to the particular patch I submitted. :-)

On Fri, Oct 2, 2009 at 5:18 PM, David Kavanagh <dkav...@gmail.com> wrote: > > Yes, I noticed, loo...

Reply all

Reply to author

Forward