Some thoughts about automated infrastructure building and maintenance

4 views
Skip to first unread message

Clint Savage

unread,
Sep 3, 2009, 12:30:46 PM9/3/09
to vel...@googlegroups.com
Hi all,

In advance of Tomorrow's meeting, I've been thinking about automation
in a much larger way. I'm sure I'm not the first to come up with
these ideas, nor am I necessarily looking for an outright solution,
but I've been thinking about how you can better manage infrastructure
with Agile thoughts and components.

One thought that keeps coming to my mind is this idea of a central
place to store information about our nodes and how difference services
can use them to make our infrastructure more flexible. Even to the
point of building and deploying different roles at the sign of
increased (or decreased) load, activity, users, etc.

As I start to think about this, I come back to this idea that there is
really a simple tool that can generically describe the machines
(nodes) on the network and the role they play in the infrastructure.
The thing is, each time I look at this issue, I come up with several
common tools that could use this sort of 'Classification Engine'.
With that in mind, I've looked around at several options, not finding
one that fits my needs exactly, but I have found a couple that *might*
work in some way.

I'm going to first list the applications we work with right now at
Backcountry and then talk about the tool I envision a bit more, along
with short descriptions around the applications I've evaluated. I'm
doing this with a hope that one or many of you on this list can either
point me in the right direction with a good too or at least give me
some pointers in areas where I am lacking.

Our environment:

Currently, Backcountry has somewhere around 125 Linux hosts, mostly
running CENTOS/RHEL{4,5}. We currently have the following services
for deployment:

Cobbler - Build out of machines based upon hardware (or vm), has a
generic set of packages which get installed.
Puppet - When the new machine is built, A Puppet role is assigned to
the node and registered with the puppetmaster. This molds the machine
into a particular role.
Koji - We build mostly RPMs for our infrastructure so we can deploy
these through puppet
ControlTier - ControlTier also builds RPMs. In contrast to koji,
these RPMs are built for the application deployment and managed
completely through its own management interface.
Nagios - We use nagios to monitor the health of each machine and the
applications each run.
Cacti - Cacti provides the engineers with pretty graphs so they can
say 'Oh, that is bad/good'. It also helps us track traffic among
other small bits and pieces
RackMonkey - Currently in use by the hardware administrators to
identify where a node is located, what network components, ram, cpu,
etc.
VMWare - This is our current platform for virtualization in
production, but we will likely be using Virtuozzo (OpenVZ) in our QA
environments

As you can see above, part of the issue is that we have a diverse set
of applications that will either have to be modified or have some way
of receiving / updating to each other or two a centralized component.
My thoughts are specifically around a message queue, which can pass
messages back and forth with efficiency. I have seen one that is
currently being developed by the Fedora Project which might fit our
structure too.

Currently, our management of infrastructure is very manual, but we do
have some automated components (Puppet and ControlTier, for insance),
but I'm after a wholly automated system.

I've been thinking of ways to 'eat the elephant' and create a flexible
way to manage information to and from each (most or all) of these
nodes. I am quite certain that there are several good approaches to
tackling this problem However, I've taken a (not so) unique approach,
and started looking at applications that can be used for 'Classifying'
nodes, with the intent that each of the above systems can somehow read
/ write to this system to help make a more flexible and agile
infrastructure.

Each of the applications below are one's I have looked at and seem
close to what I need. I'm sure there are some that I have not yet
investigated. Here's the list:

NVentory

It has some cool Ruby elements (like Facter) to help integrate with
certain systems. NVentory is extensible and different components can
be added to extend its intended platform, which is managing inventory
mostly. It does support configurations and the ability to add
producers and consumers through a MessageQueue. The project is still
quite young, but holds lots of promise.

iClassify

I looked at iClassify for quite a while and really liked it's flexible
schema for adding and removing attributes. The system could really be
used to hold any type of information.
It has an agent that lives on each node and updates the particular
component information on the server. As I looked closer, it's biggest
flaw is that it has not been maintained by the original developers for
some time. I have heard that it's not as useful as it looks, but I'm
still interested here.

Puppet External Nodes

This feature of Puppet seems to have some great potential for managing
much of the low-level management of each node and role. But one of
the things I don't see is how it can easily manage the information
which would be pulled from / pushed to it by the other applications I
believe are crucial to making this system work. I will say that this
is the least tested solution I've looked at as I only had a day or so
to investigate, with no time to implement a test setup.

Rack Monkey

While great for managing locations and details about the racked
servers. It's not going to be good enough without some really heavy
modification to fit what I am envisioning. I don't think this would
fit well enough to hack in all the changes it would take to make this
sort of system.

There's a lot of information here. I'm still in the planning and
investigation stages of making this flexible classification system
idea a reality. I'm not even sure I'm sold on this idea, but I think
it's the best one so far. If you have ideas around any of this, bring
them with you to tomorrow's meeting and/or post them back to this
thread. I really hope this can be a good resource for helping me (and
others) identify and/or create a structure to better manage and
automate infrastructure.

I welcome your comments, suggestions and flames. I hope my thoughts
are clear and concise and I look forward to discussing this with all
of you tomorrow at the VeloSLC meeting as well as here on the list.

Thanks for your time.

Cheers,

Clint

Reply all
Reply to author
Forward
0 new messages