New wiki page: Criteria for "Fully Automated Provisioning"

13 views
Skip to first unread message

Damon Edwards

unread,
Mar 22, 2010, 6:44:56 PM3/22/10
to devops-toolchain
http://code.google.com/p/devops-toolchain/wiki/CriteriaForFullyAutomated

The original toolchain whitepaper had a list of criteria for what
"fully automated provisioning" actually meant. There wasn't a lot of
public discussion about the criteria at that time. The criteria came
mostly from a compilation of best practices that I saw working at
different places.

I dusted off the old list and posted it to the wiki. I posted some
thoughts as to why I think defining "fully automated provisioning" is
important:
http://dev2ops.org/blog/2010/3/22/criteria-for-fully-automated-provisioning.html

Any ideas? What's missing? What shouldn't be on the list?

Here's the list of criteria (but check the wiki page for latest
version):

1. Be able to automatically provision an entire environment -- from
"bare-metal" to running business services -- completely from
specification

Starting with bare metal (or stock virtual machine images), can you
provide a specification to your provisioning tools and the tools will
in turn automatically deploy, configure, and startup your entire system
and application stack? This means not leaving runtime decisions or
"hand-tweaking" for the operator. The specification may vary from
release to release or be broken down into individual parts provided to
specific tools, but the calls to the tools and the automation itself
should not vary from release to release (barring a significant
architectural change).

2. No direct management of individual boxes

This is as much a cultural change as it is a question of tooling.
Access to individual machines for purposes other than diagnostics or
performance analysis should be highly frowned upon and strictly
controlled. All deployments, updates, and fixes must be deployed only
through specification-driven provisioning tools that in turn manages
each individual server to achieve the desired result.

3. Be able to revert to a "previously known good" state at any time

Many web operations lack the capability to rollback to a "previously
known good" state. Once an upgrade process has begun, they are forced
to push forward and firefight until they reach a functionally acceptable
state. With fully automated provisioning you should be able to supply
your provisioning system with a previously known good specification
that will automatically return your applications to a functionally
acceptable state. The most successful rollback strategy is what can be
described as "rolling forward to a previous version”. Database issues
are generally the primary complication with any rollback strategy, but
it is rare to find a situation where a workable strategy can't be
achieved.

4. It’s easier to re-provision than it is to repair

This is a litmus test. If your automation is implemented correctly,
you will find it is easier to re-provision your applications than it is
to attempt to repair them in place. “Re-provisioning” could simply
mean an automated cycle of validating and regenerating application and
system configurations or it could mean a full provisioning cycle from
the base OS up to running business applications.

5. Anyone on your team with minimal domain specific knowledge can
deploy or update an environment

You don't always want your most junior staff to be handling
provisioning, but with a full automated provisioning system they
should be able to do just that. Once your domain specific experts
collaborate on the specification for that release, anyone familiar with
a few basic commands (and having the correct security permissions)
should be able to deploy that release to any integrated development,
test, or production environment.

Noah Campbell

unread,
Mar 22, 2010, 7:31:00 PM3/22/10
to devops-t...@googlegroups.com
I think there is one criteria missing; it has to do with the tool
chain. If you tool chain is not flexible, then you're going to be
fighting the business because the cost to change is too high.

Another criteria is the recognition that any tool in the tool chain
can be removed and the underlying steps to accomplish the task are
relatively straight forward to apply by a more senior or experienced
SA. For example, if the sequence automation tool broke, the steps for
doing an application rollout manually are accessible. This provides
fault tolerance for emergency situations. It also foster those with
junior skills but more experience to add value by creating automation
sequences.

The above relies on each layer to abstract just enough to provide
value, but not try to do everything. For example, the anti-pattern
called out in other threads regarding the use of pre/post install
hooks is an example of a package manager trying to do too much.
Finding that balance is critical for a healthy tool chain. I would
rather see automation of a small number of steps then the automation
of hundred of steps.

My 2c.

-Noah

> To unsubscribe from this group, send email to devops-toolchain+unsubscribegooglegroups.com or reply to this email with the words "REMOVE ME" as the subject.
>

Reply all
Reply to author
Forward
0 new messages