NYC Puppet Meetup

Larry Ludwig

unread,

Jan 15, 2009, 10:10:16 AM1/15/09

to puppet-nyc, puppet...@googlegroups.com, brian...@gmail.com

Hi All,

We are wanting to having a Puppet Meetup in NYC on February 3th, 2009,
6:30 PM. Place to be determined. Baring that my wife doesn't give
birth by then (which would prevent me from making it) how does this
sound to everyone else?

Brian G.,
Can you rally up the troops and also invite people from other groups?
Also let's have some specific topics to discuss.

Larry Ludwig

unread,

Jan 15, 2009, 10:10:16 AM1/15/09

to puppet-nyc, puppet...@googlegroups.com, brian...@gmail.com

windowsrefund

unread,

Jan 15, 2009, 1:54:20 PM1/15/09

to Puppet Users

Larry,

I'm in the area and would love to attend. I'll be watching this thread
for details.

Adam

Tim Hartmann

unread,

Jan 15, 2009, 2:14:12 PM1/15/09

to puppet...@googlegroups.com

Hey thats a really great idea! Are there any Puppet Users in Boston
that might be interested in a similar meet up?

-Tim

Brian Gupta

unread,

Jan 18, 2009, 11:42:54 AM1/18/09

to puppe...@googlegroups.com, puppet...@googlegroups.com

Looking at the calendar Feb. 3rd is doable. Let's try to quickly
narrow down a location, before spreading the word. Any requirements?
Assuming we want to meet somewhere that has food?

For now I will go ahead and put it on the "NYC User Groups" Google
Calendar with a location TBD for now. (6:30-8:30pm)

As far as topics go, I am pretty open, but would definitely like to
find out if anyone has experience with Enterprise Ruby. Also, I am
looking to explore: debgem vs dpkg-tools (Two tools to make deploying
gems via Debian packages easier) http://www.debgem.com/ and
http://reprocessed.org/tags/dpkg-tools

Also we are using puppet "in the cloud" so we could definitely talk to
Rinaldo about that.

Cheers,
-Brian

--
- Brian Gupta

New York City user groups calendar:
http://nyc.brandorr.com/

Brian Gupta

unread,

Jan 18, 2009, 1:02:48 PM1/18/09

to puppe...@googlegroups.com, puppet...@googlegroups.com

After talking with Larry, I proposed "Westside Brewery" as the location. I was gonna reserve a table for 12. (Which we could adjust depending on how many people want to come)

Anyone have any objections to venue?

On Jan 18, 2009 11:42 AM, "Brian Gupta" <brian...@gmail.com> wrote:

Looking at the calendar Feb. 3rd is doable. Let's try to quickly
narrow down a location, before spreading the word. Any requirements?
Assuming we want to meet somewhere that has food?

For now I will go ahead and put it on the "NYC User Groups" Google
Calendar with a location TBD for now. (6:30-8:30pm)

As far as topics go, I am pretty open, but would definitely like to
find out if anyone has experience with Enterprise Ruby. Also, I am
looking to explore: debgem vs dpkg-tools (Two tools to make deploying
gems via Debian packages easier) http://www.debgem.com/ and
http://reprocessed.org/tags/dpkg-tools

Also we are using puppet "in the cloud" so we could definitely talk to
Rinaldo about that.

Cheers,
-Brian

--
- Brian Gupta

New York City user groups calendar:
http://nyc.brandorr.com/

On Thu, Jan 15, 2009 at 10:10 AM, Larry Ludwig <larr...@gmail.com> wrote: > > Hi All, > > We are w...

Larry Ludwig

unread,

Jan 18, 2009, 1:15:19 PM1/18/09

to Puppet Users, Brian Gupta

> Also, I am
> looking to explore: debgem vs dpkg-tools (Two tools to make deploying
> gems via Debian packages easier)http://www.debgem.com/ andhttp://reprocessed.org/tags/dpkg-tools

Why not let Puppet install via ruby's gems? I started down a similar
path with CentOS/RH creating RPMs for gems and felt using the native
'gem install' was better, especially when using different
architectures (ie i386 and x86_64). In addition, if you use the
puppet's type the recipe install is the exactly the same for any
platform or operating system.

Larry Ludwig

unread,

Jan 19, 2009, 8:15:06 AM1/19/09

to Brian Gupta, Puppet Users

Brian Gupta wrote:

> Speed.
>
Speed? Speed to install or speed to run?

-L

--
Larry Ludwig
Empowering Media
1-866-792-0489 x600
Managed and Unmanaged Xen VPSes
http://www.hostcube.com/

Larry Ludwig

unread,

Jan 19, 2009, 8:48:54 AM1/19/09

to Brian Gupta, Puppet Users

> Keep in mind we spin up and down virtual nodes regularly, so that
> anything we can do to optimize deployment speed of a full rails stack
> is helpful. (When not anything, we are avoiding baking the gems into
> our base OS install image).
>
>
Yea obviously in binary form is the fastest. I didn't realize you
create/destroy a lot of instances. I would think in your case a few
months is at least the minimum lifecycle for an instance. I assume you
do to quickly create more capacity? Are you doing it on some automated
basis to add/remove more resources?

http://broadcast.oreilly.com/2008/12/why-i-dont-like-cloud-auto-scaling.html

I'm somewhat for not completely taking the man out of the loop.

We have our control panel setup (DirectAdmin) take over 1 hour to build
with Puppet because of the control panel loves to compile so many things
:-(. We kept it that way since by default that's how it works. The
inertia is too great to change their (stupid) methodology.

Larry Ludwig

unread,

Jan 19, 2009, 10:07:02 AM1/19/09

to Brian Gupta, Puppet Users

Brian Gupta wrote:
>
> Mostly we want the ability, to leverage the micro-accounting EC2
> offers us. (To lower operational costs).
>
> Since EC2 bills by the hour, wouldn't it be prudent to be able to spin
> down idling webservers during the evening, and spin up extra ones when
> you know there are upcoming high traffic website events? Currently we
> have this ability. For now it is still a manual process, in that we
> have to say "please spin up X webserver nodes now in the xyz
> environment", but they do autoconfigure themselves, update DNS, and
> even adjust the load balancer config. We are still working on the
> scheduling ability, but our puppet code and ec2 glue are all complete,
> if should be fairly straightforward to throw something together. (We
> might even leverage the OS scheduler.)
>
Hmm this brings up an interesting discussion.

It depends... if a no-op CPU command is billable to EC2 and they are
truly sitting idle does Amazon charge for it? Obliviously a running
instance will have CPU cycles, but not much if idle.

Also the build process of a new instance I suspect has a decent amount
of CPU time (obviously less if you are using binaries)

I'm sure there is some point where it makes more sense to leave the
instance running for X amount of hours, instead of regening it. (meaning
it costs more to build a new EC2 instance, than just leave an existing
one active) The other issues related to re-gening is new IP address and
the time for EC2 to have your new instance ready (ie 10 min)

Have you done any investigation on the cost benefits? If so what were
the results?

I'm obviously all for the automating the build of a new instance, and
also automatically adding it to the pool of resources. I'm not so sure
the next level of automation. Auto-scaling without any human interaction.

The issue I have with most auto-scaling is the very basic metrics used
to measure when to add/remove resources. I believe the issues are
very customer/app specific and while with one customer X metric might be
valid, with another you need X, Y and Z in some formula to determine
when to go to the next level. The other issue related to this is the
application itself. What may have allowed a customer app to scale to
one level, may require code changes to scale effectively at a much
higher level. This is something auto-scaling could never do. Vertical
scaling is ALWAYS easier than going horizontal.

Puppet is great at the automation of administration. To me auto-scaling
could be a different tool and write high level rules on when to scale
and what part to scale. Do I hear another tool in the making? :-)

-L

> Our eventual goal has always been *being able* to autoscale.
> Realistically though, since unexpected load is fairly uncommon, we
> still want a human in the scaling feedback loop. (So basically we want
> autoscaling with a "Human! Press this button!" step in the autoscaling
> process.)
>
> Cheers,
> Brian

Jason Kohles

unread,

Jan 19, 2009, 10:41:33 AM1/19/09

to puppet...@googlegroups.com

On Jan 19, 2009, at 10:07 AM, Larry Ludwig wrote:

>
> Brian Gupta wrote:
>>
>> Mostly we want the ability, to leverage the micro-accounting EC2
>> offers us. (To lower operational costs).
>>
>> Since EC2 bills by the hour, wouldn't it be prudent to be able to
>> spin
>> down idling webservers during the evening, and spin up extra ones
>> when
>> you know there are upcoming high traffic website events? Currently we
>> have this ability. For now it is still a manual process, in that we
>> have to say "please spin up X webserver nodes now in the xyz
>> environment", but they do autoconfigure themselves, update DNS, and
>> even adjust the load balancer config. We are still working on the
>> scheduling ability, but our puppet code and ec2 glue are all
>> complete,
>> if should be fairly straightforward to throw something together. (We
>> might even leverage the OS scheduler.)
>>
> Hmm this brings up an interesting discussion.
>
> It depends... if a no-op CPU command is billable to EC2 and they are
> truly sitting idle does Amazon charge for it? Obliviously a running
> instance will have CPU cycles, but not much if idle.
>

What the CPU is doing isn't really relevant, the EC2 machines are
really just Xen instances, and you pay for every hour (or portion of
an hour) that the instance is booted, regardless of whether it is
working or not.

> Also the build process of a new instance I suspect has a decent amount
> of CPU time (obviously less if you are using binaries)
>

Not really, these are more like copy-on-write clones of an existing
filesystem, so you don't really have to build a new instance, you just
boot an instance from a filesystem that already exists.

> I'm sure there is some point where it makes more sense to leave the
> instance running for X amount of hours, instead of regening it.
> (meaning
> it costs more to build a new EC2 instance, than just leave an existing
> one active) The other issues related to re-gening is new IP address
> and
> the time for EC2 to have your new instance ready (ie 10 min)
>

It doesn't cost any more to start a new instance than it does to leave
it running, you still just pay by the hour. What would make sense
though is to figure out how long the instance has been running before
deciding whether to shut it down. For example, if I have an instance
that is idle but has been running for 6 hours and 5 minutes, I'm going
to pay for that seventh hour whether I shut the instance down now or
in 55 minutes, so it makes sense to wait another 45 minutes or so to
see if some more work does come up. The other part of this is that if
you shut them down too aggressively, you may end up paying more, such
as if you shut an instance down after only 1 minute of idle time, you
may find that you had 4 instances start up, do 5 minutes worth of
work, and shut down only a few minutes apart. In that case you paid
for 4 hours worth of EC2 time, but only got 20 minutes worth of work
out of it.

In many applications, the new IP address is not an issue. For a web
farm for example, I'll usually run something like Varnish on the front-
end machines with the public IP addresses, and keep them running all
the time, then bring up or shut down backend hosts behind varnish as
needed, and have an init script that registers the new backends as
being available as part of their boot up.

> Have you done any investigation on the cost benefits? If so what were
> the results?
>
> I'm obviously all for the automating the build of a new instance, and
> also automatically adding it to the pool of resources. I'm not so
> sure
> the next level of automation. Auto-scaling without any human
> interaction.
>
> The issue I have with most auto-scaling is the very basic metrics used
> to measure when to add/remove resources. I believe the issues are
> very customer/app specific and while with one customer X metric
> might be
> valid, with another you need X, Y and Z in some formula to determine
> when to go to the next level. The other issue related to this is the
> application itself. What may have allowed a customer app to scale to
> one level, may require code changes to scale effectively at a much
> higher level. This is something auto-scaling could never do.
> Vertical
> scaling is ALWAYS easier than going horizontal.
>
> Puppet is great at the automation of administration. To me auto-
> scaling
> could be a different tool and write high level rules on when to scale
> and what part to scale. Do I hear another tool in the making? :-)
>

I've got several auto-scaling tools that I've written for different
applications in the past, now that I find myself abruptly unemployed
maybe it's time to package them up into something more generically
useful. :)

>> Our eventual goal has always been *being able* to autoscale.
>> Realistically though, since unexpected load is fairly uncommon, we
>> still want a human in the scaling feedback loop. (So basically we
>> want
>> autoscaling with a "Human! Press this button!" step in the
>> autoscaling
>> process.)
>>

I usually go with an "automated within reason" definition, something
along the lines of:

when ( load is greater X for Y minutes ) {
total=count_total_number_of_running_instances()
recent=count_number_of_instances_started_in_the_last_hour()
if ( total < MAX_TOTAL AND recent < MAX_RECENT ) {
start_a_new_instance()
} else {
notify_administrator()
}
}

--
Jason Kohles, RHCA RHCDS RHCE
em...@jasonkohles.com - http://www.jasonkohles.com/
"A witty saying proves nothing." -- Voltaire

Thomas Bellman

unread,

Jan 19, 2009, 11:11:23 AM1/19/09

to puppet...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Larry Ludwig wrote:

| Why not let Puppet install via ruby's gems? I started down a similar
| path with CentOS/RH creating RPMs for gems and felt using the native
| 'gem install' was better, especially when using different
| architectures (ie i386 and x86_64). In addition, if you use the
| puppet's type the recipe install is the exactly the same for any
| platform or operating system.

One major reason to not use gems (or CPAN for Perl), is that it doesn't
play together with the OS packaging system. If an RPM (for example) has
installed a file, gems and CPAN will happily overwrite it, without recording
in the RPM database that the file is now owned by another package. If
you install an updated RPM, it will happily overwrite the gem/CPAN installed
file.

And of course if you find a file that you don't know where it comes from,
you can't do 'rpm -q -f /path/to/file' to learn about it. Instead you
need to realize that you should look in another packaging system for that
information.

Having more than one package system on a machine sucks big time.

For CPAN, there's the cpan2rpm program, which can create an RPM from a
CPAN package, which you can then install using the rpm or yum commands,
and I believe there's a cpan2deb program for Debian/Ubuntu. That gives
me the proper interaction with the normal package system.

I see there's a gem2rpm command available also. I haven't tried using
that, though. If there's a gem2deb command, I'd suggest the OP to try
using that.

/Thomas Bellman
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAkl0pisACgkQDGpP8Cv3aqJFgQCfTCRwg5gsof0okMR4KlHA79Sn
DnAAni3HgyxNenj1U7Nw6JxIQFvFdses
=X+xD
-----END PGP SIGNATURE-----

Larry Ludwig

unread,

Jan 19, 2009, 11:42:55 AM1/19/09

to Puppet Users

>One major reason to not use gems (or CPAN for Perl), is that it doesn't
> play together with the OS packaging system. If an RPM (for example) has
> installed a file, gems and CPAN will happily overwrite it, without recording
> in the RPM database that the file is now owned by another package. If
> you install an updated RPM, it will happily overwrite the gem/CPAN installed
> file.

Yes this is true.

To me it's about being consistent. If you are going the (rpm/deb
packages) route keep using that for perl/CPAN. If not do all in CPAN
or gem package management, not mixing two package managers.

> For CPAN, there's the cpan2rpm program, which can create an RPM from a
> CPAN package, which you can then install using the rpm or yum commands,
> and I believe there's a cpan2deb program for Debian/Ubuntu. That gives
> me the proper interaction with the normal package system.

> I see there's a gem2rpm command available also. I haven't tried using
> that, though. If there's a gem2deb command, I'd suggest the OP to try
> using that.

cpan2rpm works great for the most part and use that quite a bit.
gem2rpm I personally had very mixed results (many things didn't
compile) and oped to using gem directly (at least on centos/RH) On
centos/RH there are very few pre-built gem rpms out there and was
another decision maker. CPAN rpms on the other hand there are many
available (DAG for example - http://dag.wieers.com/rpm/)

-L

Larry Ludwig

unread,

Jan 19, 2009, 12:27:27 PM1/19/09

to Puppet Users

> What the CPU is doing isn't really relevant, the EC2 machines are
> really just Xen instances, and you pay for every hour (or portion of
> an hour) that the instance is booted, regardless of whether it is
> working or not.

So it's not based upon CPU/hour, really instance/hour? Forgive me on
this issue as I haven't researched that much into this part of their
pricing.

So $72 for the full month (small) if an instance is kept running the
full time. Seems similar pricing for what would the CPU part of a
dedicated server would cost.

So it would then depend upon how much instances you have in the cloud
and the time for a developer to "auto scale" to actually save some
dough. If you are talking about a few instances then no, if in the
hundreds then you would have some real savings.

>
> In many applications, the new IP address is not an issue. For a web
> farm for example, I'll usually run something like Varnish on the front-
> end machines with the public IP addresses, and keep them running all
> the time, then bring up or shut down backend hosts behind varnish as
> needed, and have an init script that registers the new backends as
> being available as part of their boot up.

Yes agreed, depends upon what the instance is doing.

> when ( load is greater X for Y minutes ) {
> total=count_total_number_of_running_instances()
> recent=count_number_of_instances_started_in_the_last_hour()
> if ( total < MAX_TOTAL AND recent < MAX_RECENT ) {
> start_a_new_instance()
> } else {
> notify_administrator()
> }
>

This might be a fine script for simple auto-scaling, but using load as
a metric IMHO is a terrible method to determine scaling. Reasons:
- You don't know if your app is the cause of the load or some rogue/
foreign process (ie hacker who's taken over your instances) Is the
load legit or not?
- Can your existing code be written better since it became CPU and bus
bound? (remember unix load is a measure of cpu and bus IO.)
- Was this something that was already a known trend or was this an
unusual spike in activity?
- The load has already happened now you are reacting to the issue
instead of creating instances BEFORE they are needed. High Load is a
trailing indicator, not leading. Ideally with auto scaling it should
create instances just before they are needed, not after.
- I think metrics used are much more app specific than using just load

-L

Jason Kohles

unread,

Jan 19, 2009, 1:23:28 PM1/19/09

to puppet...@googlegroups.com

On Jan 19, 2009, at 12:27 PM, Larry Ludwig wrote:

>> What the CPU is doing isn't really relevant, the EC2 machines are
>> really just Xen instances, and you pay for every hour (or portion of
>> an hour) that the instance is booted, regardless of whether it is
>> working or not.
>
> So it's not based upon CPU/hour, really instance/hour? Forgive me on
> this issue as I haven't researched that much into this part of their
> pricing.
>
> So $72 for the full month (small) if an instance is kept running the
> full time. Seems similar pricing for what would the CPU part of a
> dedicated server would cost.
>

And if you only have a handful of servers and keep them running all
the time, then it doesn't really buy you much to put it in EC2, but if
you need a lot of servers on an infrequent basis, this can save some
big bucks.

>> when ( load is greater X for Y minutes ) {
>> total=count_total_number_of_running_instances()
>> recent=count_number_of_instances_started_in_the_last_hour()
>> if ( total < MAX_TOTAL AND recent < MAX_RECENT ) {
>> start_a_new_instance()
>> } else {
>> notify_administrator()
>> }
>>
>
> This might be a fine script for simple auto-scaling, but using load as
> a metric IMHO is a terrible method to determine scaling. Reasons:

Well, in this pseudo-code, I really meant load as 'an indication that
your application is requiring more resources that the current
instances can handle' rather than simply the system load average. In
my particular case most of my instances are processing requests from a
queue, and my definition of load encompasses primarily the length of
the queue backlog and the estimated time to process the backlog with
the currently available instances.

Ajai Khattri

unread,

Jan 19, 2009, 12:40:36 PM1/19/09

to puppe...@googlegroups.com, puppet...@googlegroups.com, brian...@gmail.com

Yeah, let's have it on the day that half a dozen other NY user groups have their meetings shall we? :-)

On Thu, Jan 15, 2009 at 10:10 AM, Larry Ludwig <larr...@gmail.com> wrote:

--
Aj.

Brian Gupta

unread,

Jan 19, 2009, 7:45:18 PM1/19/09

to puppe...@googlegroups.com, puppet...@googlegroups.com

Actually, the only user group that is meeting on Feb 3rd is the NYLUG
Python Workshop. (That I am aware of). Although I regret even this
conflict, please be aware that the reason Larry suggested Feb 3rd is
because some folks are visiting from out of town and are leaving the
next day. So the only other possible day was Monday Feb 2nd, and that
was non workable for a number of people.

Please don't expect "first Tuesday" to necessarily be a regular
occurrence. (IE: I will try to find a day that has even fewer
conflicts).

:)

Cheers,
Brian

Reply all

Reply to author

Forward