We are wanting to having a Puppet Meetup in NYC on February 3th, 2009,
6:30 PM. Place to be determined. Baring that my wife doesn't give
birth by then (which would prevent me from making it) how does this
sound to everyone else?
Brian G.,
Can you rally up the troops and also invite people from other groups?
Also let's have some specific topics to discuss.
After talking with Larry, I proposed "Westside Brewery" as the location. I was gonna reserve a table for 12. (Which we could adjust depending on how many people want to come)
Anyone have any objections to venue?
On Jan 18, 2009 11:42 AM, "Brian Gupta" <brian...@gmail.com> wrote:
Looking at the calendar Feb. 3rd is doable. Let's try to quickly
narrow down a location, before spreading the word. Any requirements?
Assuming we want to meet somewhere that has food?
For now I will go ahead and put it on the "NYC User Groups" Google
Calendar with a location TBD for now. (6:30-8:30pm)
As far as topics go, I am pretty open, but would definitely like to
find out if anyone has experience with Enterprise Ruby. Also, I am
looking to explore: debgem vs dpkg-tools (Two tools to make deploying
gems via Debian packages easier) http://www.debgem.com/ and
http://reprocessed.org/tags/dpkg-tools
Also we are using puppet "in the cloud" so we could definitely talk to
Rinaldo about that.
Cheers,
-Brian
--
- Brian Gupta
New York City user groups calendar:
http://nyc.brandorr.com/
On Thu, Jan 15, 2009 at 10:10 AM, Larry Ludwig <larr...@gmail.com> wrote: > > Hi All, > > We are w...
-L
--
Larry Ludwig
Empowering Media
1-866-792-0489 x600
Managed and Unmanaged Xen VPSes
http://www.hostcube.com/
http://broadcast.oreilly.com/2008/12/why-i-dont-like-cloud-auto-scaling.html
I'm somewhat for not completely taking the man out of the loop.
We have our control panel setup (DirectAdmin) take over 1 hour to build
with Puppet because of the control panel loves to compile so many things
:-(. We kept it that way since by default that's how it works. The
inertia is too great to change their (stupid) methodology.
It depends... if a no-op CPU command is billable to EC2 and they are
truly sitting idle does Amazon charge for it? Obliviously a running
instance will have CPU cycles, but not much if idle.
Also the build process of a new instance I suspect has a decent amount
of CPU time (obviously less if you are using binaries)
I'm sure there is some point where it makes more sense to leave the
instance running for X amount of hours, instead of regening it. (meaning
it costs more to build a new EC2 instance, than just leave an existing
one active) The other issues related to re-gening is new IP address and
the time for EC2 to have your new instance ready (ie 10 min)
Have you done any investigation on the cost benefits? If so what were
the results?
I'm obviously all for the automating the build of a new instance, and
also automatically adding it to the pool of resources. I'm not so sure
the next level of automation. Auto-scaling without any human interaction.
The issue I have with most auto-scaling is the very basic metrics used
to measure when to add/remove resources. I believe the issues are
very customer/app specific and while with one customer X metric might be
valid, with another you need X, Y and Z in some formula to determine
when to go to the next level. The other issue related to this is the
application itself. What may have allowed a customer app to scale to
one level, may require code changes to scale effectively at a much
higher level. This is something auto-scaling could never do. Vertical
scaling is ALWAYS easier than going horizontal.
Puppet is great at the automation of administration. To me auto-scaling
could be a different tool and write high level rules on when to scale
and what part to scale. Do I hear another tool in the making? :-)
-L
> Our eventual goal has always been *being able* to autoscale.
> Realistically though, since unexpected load is fairly uncommon, we
> still want a human in the scaling feedback loop. (So basically we want
> autoscaling with a "Human! Press this button!" step in the autoscaling
> process.)
>
> Cheers,
> Brian
>
> Brian Gupta wrote:
>>
>> Mostly we want the ability, to leverage the micro-accounting EC2
>> offers us. (To lower operational costs).
>>
>> Since EC2 bills by the hour, wouldn't it be prudent to be able to
>> spin
>> down idling webservers during the evening, and spin up extra ones
>> when
>> you know there are upcoming high traffic website events? Currently we
>> have this ability. For now it is still a manual process, in that we
>> have to say "please spin up X webserver nodes now in the xyz
>> environment", but they do autoconfigure themselves, update DNS, and
>> even adjust the load balancer config. We are still working on the
>> scheduling ability, but our puppet code and ec2 glue are all
>> complete,
>> if should be fairly straightforward to throw something together. (We
>> might even leverage the OS scheduler.)
>>
> Hmm this brings up an interesting discussion.
>
> It depends... if a no-op CPU command is billable to EC2 and they are
> truly sitting idle does Amazon charge for it? Obliviously a running
> instance will have CPU cycles, but not much if idle.
>
What the CPU is doing isn't really relevant, the EC2 machines are
really just Xen instances, and you pay for every hour (or portion of
an hour) that the instance is booted, regardless of whether it is
working or not.
> Also the build process of a new instance I suspect has a decent amount
> of CPU time (obviously less if you are using binaries)
>
Not really, these are more like copy-on-write clones of an existing
filesystem, so you don't really have to build a new instance, you just
boot an instance from a filesystem that already exists.
> I'm sure there is some point where it makes more sense to leave the
> instance running for X amount of hours, instead of regening it.
> (meaning
> it costs more to build a new EC2 instance, than just leave an existing
> one active) The other issues related to re-gening is new IP address
> and
> the time for EC2 to have your new instance ready (ie 10 min)
>
It doesn't cost any more to start a new instance than it does to leave
it running, you still just pay by the hour. What would make sense
though is to figure out how long the instance has been running before
deciding whether to shut it down. For example, if I have an instance
that is idle but has been running for 6 hours and 5 minutes, I'm going
to pay for that seventh hour whether I shut the instance down now or
in 55 minutes, so it makes sense to wait another 45 minutes or so to
see if some more work does come up. The other part of this is that if
you shut them down too aggressively, you may end up paying more, such
as if you shut an instance down after only 1 minute of idle time, you
may find that you had 4 instances start up, do 5 minutes worth of
work, and shut down only a few minutes apart. In that case you paid
for 4 hours worth of EC2 time, but only got 20 minutes worth of work
out of it.
In many applications, the new IP address is not an issue. For a web
farm for example, I'll usually run something like Varnish on the front-
end machines with the public IP addresses, and keep them running all
the time, then bring up or shut down backend hosts behind varnish as
needed, and have an init script that registers the new backends as
being available as part of their boot up.
> Have you done any investigation on the cost benefits? If so what were
> the results?
>
> I'm obviously all for the automating the build of a new instance, and
> also automatically adding it to the pool of resources. I'm not so
> sure
> the next level of automation. Auto-scaling without any human
> interaction.
>
> The issue I have with most auto-scaling is the very basic metrics used
> to measure when to add/remove resources. I believe the issues are
> very customer/app specific and while with one customer X metric
> might be
> valid, with another you need X, Y and Z in some formula to determine
> when to go to the next level. The other issue related to this is the
> application itself. What may have allowed a customer app to scale to
> one level, may require code changes to scale effectively at a much
> higher level. This is something auto-scaling could never do.
> Vertical
> scaling is ALWAYS easier than going horizontal.
>
> Puppet is great at the automation of administration. To me auto-
> scaling
> could be a different tool and write high level rules on when to scale
> and what part to scale. Do I hear another tool in the making? :-)
>
I've got several auto-scaling tools that I've written for different
applications in the past, now that I find myself abruptly unemployed
maybe it's time to package them up into something more generically
useful. :)
>> Our eventual goal has always been *being able* to autoscale.
>> Realistically though, since unexpected load is fairly uncommon, we
>> still want a human in the scaling feedback loop. (So basically we
>> want
>> autoscaling with a "Human! Press this button!" step in the
>> autoscaling
>> process.)
>>
I usually go with an "automated within reason" definition, something
along the lines of:
when ( load is greater X for Y minutes ) {
total=count_total_number_of_running_instances()
recent=count_number_of_instances_started_in_the_last_hour()
if ( total < MAX_TOTAL AND recent < MAX_RECENT ) {
start_a_new_instance()
} else {
notify_administrator()
}
}
--
Jason Kohles, RHCA RHCDS RHCE
em...@jasonkohles.com - http://www.jasonkohles.com/
"A witty saying proves nothing." -- Voltaire
Larry Ludwig wrote:
| Why not let Puppet install via ruby's gems? I started down a similar
| path with CentOS/RH creating RPMs for gems and felt using the native
| 'gem install' was better, especially when using different
| architectures (ie i386 and x86_64). In addition, if you use the
| puppet's type the recipe install is the exactly the same for any
| platform or operating system.
One major reason to not use gems (or CPAN for Perl), is that it doesn't
play together with the OS packaging system. If an RPM (for example) has
installed a file, gems and CPAN will happily overwrite it, without recording
in the RPM database that the file is now owned by another package. If
you install an updated RPM, it will happily overwrite the gem/CPAN installed
file.
And of course if you find a file that you don't know where it comes from,
you can't do 'rpm -q -f /path/to/file' to learn about it. Instead you
need to realize that you should look in another packaging system for that
information.
Having more than one package system on a machine sucks big time.
For CPAN, there's the cpan2rpm program, which can create an RPM from a
CPAN package, which you can then install using the rpm or yum commands,
and I believe there's a cpan2deb program for Debian/Ubuntu. That gives
me the proper interaction with the normal package system.
I see there's a gem2rpm command available also. I haven't tried using
that, though. If there's a gem2deb command, I'd suggest the OP to try
using that.
/Thomas Bellman
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org
iEYEARECAAYFAkl0pisACgkQDGpP8Cv3aqJFgQCfTCRwg5gsof0okMR4KlHA79Sn
DnAAni3HgyxNenj1U7Nw6JxIQFvFdses
=X+xD
-----END PGP SIGNATURE-----
>> What the CPU is doing isn't really relevant, the EC2 machines are
>> really just Xen instances, and you pay for every hour (or portion of
>> an hour) that the instance is booted, regardless of whether it is
>> working or not.
>
> So it's not based upon CPU/hour, really instance/hour? Forgive me on
> this issue as I haven't researched that much into this part of their
> pricing.
>
> So $72 for the full month (small) if an instance is kept running the
> full time. Seems similar pricing for what would the CPU part of a
> dedicated server would cost.
>
And if you only have a handful of servers and keep them running all
the time, then it doesn't really buy you much to put it in EC2, but if
you need a lot of servers on an infrequent basis, this can save some
big bucks.
>> when ( load is greater X for Y minutes ) {
>> total=count_total_number_of_running_instances()
>> recent=count_number_of_instances_started_in_the_last_hour()
>> if ( total < MAX_TOTAL AND recent < MAX_RECENT ) {
>> start_a_new_instance()
>> } else {
>> notify_administrator()
>> }
>>
>
> This might be a fine script for simple auto-scaling, but using load as
> a metric IMHO is a terrible method to determine scaling. Reasons:
Well, in this pseudo-code, I really meant load as 'an indication that
your application is requiring more resources that the current
instances can handle' rather than simply the system load average. In
my particular case most of my instances are processing requests from a
queue, and my definition of load encompasses primarily the length of
the queue backlog and the estimated time to process the backlog with
the currently available instances.