Command and Control

115 views
Skip to first unread message

technikolor

unread,
Sep 24, 2010, 11:46:50 AM9/24/10
to devops-toolchain
I wanted solicite opinions on the C&C layer.

As I see it, at the provisioning layer your options tend to be
generally constrained based on the platform your using, so the choice
is straight forward with only a handful of exceptions. As for
configuration, the world revolves around three options, cfengine,
puppet and chef, so picking one is fairly straight forward.

The C&C layer is the tough one, imho. There are a lot of options.
This layer also is perceived differently based on your shop. In a web
shop this layer is primarily about deployment. In non-web shops this
layer is primarily about ad-hoc command execution. I am in the latter
camp.

So far I break down the field a bit. Mussh is on the most simplistic
end of the spectrum, its written in BASH, its super simple but
effective. On the other extreme of the spectrum is ControlTier, which
I love, but its just a monster of a tool, both in terms of complexity
and resource requirements. In between are dozens of solutions, from
pssh, to Capistrano, to mcollective, to Zookeeper, and on and on and
on.

What makes it complex, I think, is that each tool has something
attractive to offer, but they all lean in some direction which may not
fit your needs. For instance, if you aren't into Java, Zookeeper
probably isn't for you. If your not into Ruby, Capistrano probably
isn't for you. And anything written in Python which uses the Python
SSL module is just a pita (imho).

So, the question really is, what are good solid options for
administrators who aren't developers by trade and are scared off by
some of these tools? Something you can plug scripts into, execute
scripts to audit systems, mass restart services, etc.

There isn't a right answer, but I want to re-invigorate the discussion
in hopes that I can settle on something.

tom cignarella

unread,
Sep 24, 2010, 2:29:51 PM9/24/10
to devops-t...@googlegroups.com
Here at Clickability we've just completed an implementation of Func - https://fedorahosted.org/func/
We are going to use it the for the first time this weekend for a new release going out. 

Greg Retkowski

unread,
Sep 24, 2010, 2:30:36 PM9/24/10
to devops-t...@googlegroups.com
There's lots of powerful tools out there - I find xapply to be the tool
I keep coming back to when I need to run a command across a set of
machines. It's a swiss army knife of parallel execution.

./a_command_that_provides_a_filtered_list_of_hostnames | xapply -P10 -f
'fping %1 && ssh %1 run-some-command' -

With my background I lean towards implementing something like
mcollective. But answering the question of 'what would I use as a
non-dev sysadmin' then xapply would do everything I needed.

Cheers,

-- Greg

Anthony Shortland

unread,
Sep 24, 2010, 2:54:46 PM9/24/10
to devops-t...@googlegroups.com
Looks like a cool tool-chain ...  ControlTier's node-list and ctl-exec command (http://controltier.org/wiki/Ctl-exec) is "a_command_that_provides_a_filtered_list_of_hostnames" ... with the  addition of including the xapply/ssh part of your pipeline too:

ctl-exec -I include_some_nodes -X exclude_some_nodes -C10 -- run-some-command

Anthony.

Adam

unread,
Sep 24, 2010, 4:07:37 PM9/24/10
to devops-t...@googlegroups.com
On the sysadmin side at Meebo, we wrap dsh
[http://www.netfort.gr.jp/~dancer/software/dsh.html.en] in some python
that asks our machine inventory for a host list. This could and prolly
should be split apart into a generic "get_my_hosts" command and a
"parallel execute on hosts from stdin" command.

The get_hosts sort of command right now is pretty dumbed down. It asks
the machine DB (a django app) for hosts satisfying many roles (web,
app1, app2, db, etc), sites, statuses, and such through a simple HTTP
call. I could add options for including and excluding other stuff like
certain racks in our colos. If it's in our machine DB, this script
could use it.

We're getting to the point though, where we need to include and
exclude hosts based on the classes they use in puppet. Since that
info's stored in puppet, we don't also store it in our machine DB. We
use an external nodes script for puppet that points to our DB, but
there are still many use cases where we need more visibility into
puppet in our parallel exec calls.

I believe the developer's deploy tools use a python script somebody
wrote to call SSH and rsync in parallel rolling chunks (5 hosts at a
time, e.g.)

-adam

P.S. I'm pretty sure there are some RPMs for dsh/libdshconfig around,
but I can post ours if you'd like.

Joshua Timberman

unread,
Sep 25, 2010, 11:51:37 AM9/25/10
to devops-t...@googlegroups.com
There seems to be a lot of aversion in the devops community to running
tools that are basically SSH in a for loop, or even paralell SSH. They
want to avoid the adhoc connotations of using tools like that, and I
can understand the perspective.

However, at the end of the day sometimes you just need to SSH to a
system or three and run some commands.

With Chef's command-line API tool Knife, we use it for adhoc commands
and "command-and-control"-lite usage. Knife provides subcommands, and
one that we utilize for this is ssh.

knife ssh role:opscode-chef 'sudo chef-client' -a ec2.public_hostname

For example, this command will search the Chef server for all the
nodes that have the role 'opscode-chef' applied, and use the value of
the "ec2.public_hostname" attribute[0] as the hostname to connect to.
Knife uses net-ssh-multi to open connections to all the systems that
return from the search, and run the command 'sudo chef-client' on
them.

Aside from the specified command, one can use knife ssh to open
interactive, screen, tmux, macterm sessions for each of the systems
that return from the search. In an upcoming version of Chef, csshX
will be supported as well.

[0]: this is because the automatically detected hostname and IP
address of ec2 systems is by default the private values, instead of
the public ones.

Noah Campbell

unread,
Sep 25, 2010, 11:57:30 AM9/25/10
to devops-t...@googlegroups.com

On Sep 25, 2010, at 8:51 AM, Joshua Timberman wrote:

> There seems to be a lot of aversion in the devops community to running
> tools that are basically SSH in a for loop,

By devops community do you mean non-SA types? If so, is there a gap and what is the gap?

Joshua Timberman

unread,
Sep 25, 2010, 12:16:38 PM9/25/10
to devops-t...@googlegroups.com
Hello!

On Sat, Sep 25, 2010 at 9:57 AM, Noah Campbell <noahca...@gmail.com> wrote:
> By devops community do you mean non-SA types?  If so, is there a gap and what is the gap?

In my observation, it is sysadmin types.

R.I.Pienaar

unread,
Sep 25, 2010, 12:16:42 PM9/25/10
to devops-t...@googlegroups.com

The problem I have with doing this for a lot of things - say to drive your deploys - are that they are just a bunch of commands. Possibly stuck in someones head or outdated on a wiki

Yes you need to run one off commands but I always ask myself if I am doing a specific one often and if so is it something I need to improve

If I need to improve it - say it's a step in my code deploy - then you need a more mature c&c than a ssh for loop you need something that is versioned, in use in dev and staging and tested well as part of your release testing. You nee something that provides measurable rather than human brain parsed status etc. Nested output from parallel ssh over 20 machines isn't clear as day and promote user error.

Good c&c tools strike a balance between these two but should enable the later with the former being an small side addition.

Scott McCarty

unread,
Sep 25, 2010, 12:22:53 PM9/25/10
to devops-t...@googlegroups.com

The "outdated" argument is true whether commands in your head, wiki, or code. The only way to mitigate this problem is by understanding all of the interactions in a group of systems.

I have found up to date architecture drawing to be the best mitigation for this problem.

This is a standard problem of inductive reasoning, will the apple really fall next time?

Scott M

Noah Campbell

unread,
Sep 25, 2010, 12:31:54 PM9/25/10
to devops-t...@googlegroups.com
I'm curious if there are any preliminaries that can be done/are done to determine if that apple will fall from the tree.  

For example:

- ping the server
- check environment variables
- /etc/init.d/<service> status
- tail a log file

What else do folks do to ensure the "apple" falls

-Noah

R.I.Pienaar

unread,
Sep 25, 2010, 12:37:14 PM9/25/10
to devops-t...@googlegroups.com

On 25 Sep 2010, at 09:22, Scott McCarty <scott....@gmail.com> wrote:

The "outdated" argument is true whether commands in your head, wiki, or code. The only way to mitigate this problem is by understanding all of the interactions in a group of systems.


If you are using your code to do your day to day adhoc tasks. And it works. Then how is it outdated or undocumented I am not sure I follow 100% what you mean

When it is code that is used regularly to complete a task then it's not outdated. When you update it via version control to match new requirements in your deploy then it arrives via staging/qa with your code it is deploying. They share a life cycle

Scott McCarty

unread,
Sep 25, 2010, 12:39:11 PM9/25/10
to devops-t...@googlegroups.com

Good point. I agree tesing is a critical part of command and control, but those tests are best created from knowledge of the overall system. This is where it can get harry w/o architectural knowledge.

This is more an argument for agile-ish development of c&c (which I agree with), but how does one incorporate that easily into existing c&c w/o developing their own scripts?

This is tougher than it sounds, I do it all of the time

Scott M

Scott McCarty

unread,
Sep 25, 2010, 12:42:36 PM9/25/10
to devops-t...@googlegroups.com

I thought you made the "outdated" argument, not me. It's all about architecture, testing, and version control. But somebody has to deploy the deployment app (button), and somebody has to update it.

Miles Fidelman

unread,
Sep 25, 2010, 12:43:43 PM9/25/10
to devops-t...@googlegroups.com
Noah Campbell wrote:
> I'm curious if there are any preliminaries that can be done/are done
> to determine if that apple will fall from the tree.
>
> For example:
>
> - ping the server
> - check environment variables
> - /etc/init.d/<service> status
> - tail a log file
>
> What else do folks do to ensure the "apple" falls
mostly application specific commands - usually via SSH

it's sort of unfortunate that nobody has really developed an update to
SNMP as a common interface for command and control applications

It seems like each tool (Nagios, Puppet, etc.) - defines its own
protocol, and then requires that you install its own modules in every
managed machine - what a royal pain. Shell and TCL scripts, and
judicious use of pipes, sed, sort, and whatnot very quickly prove to be
easier.

It sure would be nice if people started routinely including RESTful web
interfaces for monitoring and control into every piece of code.

Miles Fidelman

--
In theory, there is no difference between theory and practice.
In<fnord> practice, there is. .... Yogi Berra


Scott McCarty

unread,
Sep 25, 2010, 12:44:52 PM9/25/10
to devops-t...@googlegroups.com

Essentiall, I think the outdated argument is weak, version control, testing, and architecture are the stronger argument for controlled deployment, which all prohibit weak or untested paths to change, not because code is inherently more up to date, but because it is a more testable path

Scott M

Noah Campbell

unread,
Sep 25, 2010, 12:51:43 PM9/25/10
to devops-t...@googlegroups.com
Miles, you got me thinking...

I'm sure everyone has heard (some variation) of the following relating to bandwidth [1]:

Don't underestimate the bandwidth of a van full of hard drives.

I bring that up, because I think in the DevOps space the following holds true:

Don't underestimate the power of ssh and vi

The only draw back to ssh and vi is that accountability seems to be lost given everyone's best intentions. Secondly, the quirky behavior of a system (to Scott's point about architecture) is unique that it becomes hard to generalize its properties.

[1] http://www.codinghorror.com/blog/2007/02/the-economics-of-bandwidth.html

Adam

unread,
Sep 25, 2010, 1:45:31 PM9/25/10
to devops-t...@googlegroups.com
There's a range of tasks for which I'm currently using dsh (ssh in a for loop):
- service stuff (reload, restart, update outside of puppet 'cause it
has to be in prod yesterday)
- one-off information gathering tasks (which versions are running
where? who has junk in ~/?)
- versioned production config applies that can't wait 15 minutes for
our configuration management to apply

I see the biggest need for more mature c&c in my job in the 3rd sort
of task. Time-sensitive, cluster configuration changes and accompanied
service fiddlings. A good rule of thumb might be anything you use
ctrl+r for to change clusters should be a script of sorts and thus
version controlled. I'm having difficulty switching to this, I think
because of the ludicrous number of scripts it would require.

-adam

Scott McCarty

unread,
Sep 25, 2010, 1:54:20 PM9/25/10
to devops-t...@googlegroups.com

I thought is was a station wagon? ;-)

I just realized what I was trying to say. To avoid the "outdated" problem/anti-pattern automation is a condition which is necessary but not sufficient (can you tell my best friend is finishing his PHD in philosophy). Automation cannot solve this problem alone. I hear this argument a lot, maybe it wasn't being made. If I jumped on it too quickly, I appologize.

The sufficient conditions are
* good understanding of architecture (or lack there of ;-)
* automation (c&c code)
* tests

Only then can you feel "confident" that "few and far" between operations really will work the few times you run them anecdotally, I have an install script that I wrote for a very complex cluster install; I have tests and test runs which work with virtual machines and I am still scared to run it the three times a year I do. But that could just be because I am a crappy programmer.

For operations that are performed all of the time, this get's mitigated because it run all of the time, clearly automation, even w/o tests, is more efficient in this case.

All of that aside, I still think there is a place for mass ssh especially for trivial and emergency situations. There is also a soft spot in my heart for it.

Scott

On Sep 25, 2010 12:51 PM, "Noah Campbell" <noahca...@gmail.com> wrote:

Noah Campbell

unread,
Sep 25, 2010, 2:11:48 PM9/25/10
to devops-t...@googlegroups.com
Good one wrt. to ctrl+r

Joshua Timberman

unread,
Sep 25, 2010, 3:17:39 PM9/25/10
to devops-t...@googlegroups.com
Hello!

On Sat, Sep 25, 2010 at 10:43 AM, Miles Fidelman
<mfid...@meetinghouse.net> wrote:
>
> It seems like each tool (Nagios, Puppet, etc.) - defines its own protocol,
> and then requires that you install its own modules in every managed machine
> - what a royal pain.  Shell and TCL scripts, and judicious use of pipes,
> sed, sort, and whatnot very quickly prove to be easier.

This is one of the reasons why Chef's knife ssh uses ssh, which most
people are already using anyway.

Another knife subcommand 'bootstrap', was originally implemented to
prepare a base OS install to get Chef installed, but it can also be
used to run an arbitrary script via knife ssh.

> It sure would be nice if people started routinely including RESTful web
> interfaces for monitoring and control into every piece of code.

Chef has a RESTful web interface to the server, which exposes a lot of
data about the infrastructure it's already storing and indexing for
search. While Opscode doesn't provide (m)any prebuilt libraries or
interfaces to interact with it automatically, the code is there and
the libraries are available to use within your code. Unfortunately
most of the documentation is through reading the source.

But if someone develops something using this and wants to share how
they did it, please add a user guide to the Chef wiki :).

Joshua Timberman

unread,
Sep 25, 2010, 3:24:21 PM9/25/10
to devops-t...@googlegroups.com
On Sat, Sep 25, 2010 at 10:16 AM, R.I.Pienaar <r...@devco.net> wrote:
>
> The problem I have with doing this for a lot of things - say to drive your deploys - are that they are just a bunch of commands. Possibly stuck in someones head or outdated on a wiki

Thats why I like to do my deploys as a resource in Chef :-).

The ad-hoc deployment happens simply by running Chef. It might happen
by updating a data bag item and letting Chef pick it up in ~30
minutes.

Miles Fidelman

unread,
Sep 25, 2010, 3:36:25 PM9/25/10
to devops-t...@googlegroups.com
Joshua Timberman wrote:
>> It sure would be nice if people started routinely including RESTful web
>> interfaces for monitoring and control into every piece of code.
>>
> Chef has a RESTful web interface to the server, which exposes a lot of
> data about the infrastructure it's already storing and indexing for
> search. While Opscode doesn't provide (m)any prebuilt libraries or
> interfaces to interact with it automatically, the code is there and
> the libraries are available to use within your code. Unfortunately
> most of the documentation is through reading the source.
>
I was actually thinking of RESTful interfaces on the other side - i.e.,
if things being managed presented RESTful interfaces, then it would be
(relatively) easy to interrogate pretty much anything with a web browser
and/or curl, and to implement any kind of management interface one wants
as a web page with a little bit of javascript. Add https support and
one has a reasonable level of security.

R.I.Pienaar

unread,
Sep 25, 2010, 11:03:46 PM9/25/10
to devops-t...@googlegroups.com


Unfortunately a lot of deploys require some or all of:

- cross machine awareness, especially when deploys are a series of events
- finely controlled deployes to named groups of machines in batches
- awareness of monitoring status before/post
- some human level validation/decision to backout
- tight control of timings
- staggered restarts to avoid over subscribing on cache refreshes against dbs

and lots more in this realm, really if you have these it points to a deploy that needs to be improved, but you often hit a 80/20 situation and the budget (time or money) just isnt there to do the last 20%.

Automating these are tough within a CM tool - at least todays - and so C&C tools matter a lot, they provide building block API calls essentially allowing for a way to script such deploys by combining lots of little pieces. With enough little APIs existing in your C&C platform you can write deployment scripts that address the above issues.

I think this thread has gone off a bit on a tangent with the whole 'outdated' bit, really dont know what that's all about. All I am saying is if its not scripted - but meat cloud driven - you have a massive documentation bordon that is often not met.

--
R.I.Pienaar

Anthony Shortland

unread,
Sep 26, 2010, 1:04:42 AM9/26/10
to devops-t...@googlegroups.com
Replace "up to date architecture diagram" with "comprehensive resource model" and you've really got something useful!

On Sep 25, 2010, at 9:22 AM, Scott McCarty wrote:

John E. Vincent

unread,
Sep 27, 2010, 2:33:06 AM9/27/10
to devops-toolchain
On Sep 24, 11:46 am, technikolor <techniko...@gmail.com> wrote:
> <snipped>

(I'm coming in a bit late to the discussion but this is an area I feel
pretty strongly about so I figured I'd throw out my 2 cents.)

As others have pointed out, the biggest problem with things like SSH
loops or mux'd SSH sessions is that they simply aren't repeatable.
They cause us to trend away from best practices instead of towards
them. One of the biggest benefits of a devops toochain/philosophy is
that it removes the chance for human error.

I'm not a fan of general SSH "broadcasting" (for lack of a better
term). I strongly dislike check-by-ssh in Nagios because of the burden
that 23 different SSH-based host checks to a single host creates. It
directly affects the results of the host being monitored. Contrast
that with a more 'modern' solution like check_mk.

Mux'd SSH feels like an anti-pattern. Ask yourself these questions:

- What information am I trying to gather via SSH?
- How many times have I repeated or will I repeat this same operation?
- What part of my workflow dictates that I perform this manual
operation?
- How transient is the operation I'm performing or the data I'm
gathering?
- How much time have I spent trying to report on and parse the
information gleaned via an SSH loop.

In most cases, I think you'll find that it's not REALLY an ad-hoc
operation you're performing, you can't reliably consume the output of
20 SSH sessions at once ANYWAY and you probably didn't need to do it
in the first place. Actual interactive sessions with hosts, IMHO, can
and should be limited to individual distinct operations such as
troubleshooting an individual node (I don't know exactly what's wrong
with box X but I'm going to do some tcpdumps and see if anything
stands out). Otherwise you lose audit/accountability and risk
introducing inconsistent configuration. Checking log files isn't a
transient operation. Those should be going to a central log host. If
you have to do it more than once, you probably should do it
differently. I'm actually struggling to find any real reason that an
actual human needs to log on and operate interactively with a server
given best practices.

So the question really boils down to which C&C tool and how much do
you trust it. Without going into detail, our company has a serious
trust issue with Puppet. One of the SOPs for new releases is to
actually verify that the new code base got installed by running 'rpm -
qa | grep appname', 'rpm --verify' and doing manual inspection of the
installed app. Some of this stems from an early hiccup in the original
puppet configuration and some of it from people having accountability
issues (no, you really did do something wrong. Don't blame Puppet).
But some of it is legitimate i.e. a puppetrun not actually failing/
exiting with 1 when any step in the run fails. You have to address the
root issue.

On the flipside you have issues of NIH or 'We don't like the language
its written in'. Those are fairly stupid reasons not to use a tool but
I can understand an organization not wanting to use a tool that they
don't have the in-house resources to troubleshoot. Our Java developers
love ControlTier. Ruby developers like Chef because cookbooks are Ruby
code. In that same vein, we don't use Chef internally because everyone
seems to hate Ruby or doesn't want to learn Ruby (I'm the exception).
We would pick Fabric over Capistrano since our system-side stuff is
all Python.

There are some perfectly good tools in the devops toolchain that are
built around SSH. Capistrano is a good example but Capistrano alone
doesn't meet my requirement for auditability, reporting and the like.
Add Webistrano on top of it and you've got a fine combination. Knife
is another good example.

I'm also partial to not introducing another listening daemon on all my
hosts for security reasons which is why Vogeler is built around a
message queue and why I would use it instead of Func.

technikolor

unread,
Sep 27, 2010, 10:18:53 AM9/27/10
to devops-toolchain
John, I think you nailed a lot of the paradoxical nature of C&C really
well:

* The language a C&C tool is written in shouldn't matter... but it
does.
* SSH'ing to all your nodes is an anti-pattern, but running n agents
on each node is a security issue.
* Monitoring and control/configuration shouldn't be a drain on a
systems resources
* NIH is an issue



I think, based on all the discussion here... that C&C is still
something that hasn't hit the sweet spot yet. Provisioning and CM
have, I think... or at least, we've got solid options and they are in
the refining phase.

C&C isn't there yet. The devops big-wigs blast ssh for loops, but
most C&C tools are just a means of outsourcing the loop. The
deployment tools today (Capistrano and ControlTeir) are much closer to
the kinds of tools we need than the endless variety of dsh, pussh,
mussh, psh, etc, etc, etc, tools.

Frankly, Nanite is the tool I am most interested in. Somewhere
between ControlTier, Nanite and mcollective is the future. They feel
modern, innovative and "right"... they are simply too complex today.

Devops as a movement needs to really push for increased development
and innovation in that direction. So long as the message is "ssh
loops are stupid", but then confess that we all use them either
directly or by proxy, the devops toolchain just looks self-
contradictory.

Miles Fidelman

unread,
Sep 27, 2010, 10:52:27 AM9/27/10
to devops-t...@googlegroups.com
technikolor wrote:
> John, I think you nailed a lot of the paradoxical nature of C&C really
> well:
>
> * The language a C&C tool is written in shouldn't matter... but it
> does.
> * SSH'ing to all your nodes is an anti-pattern, but running n agents
> on each node is a security issue.
> * Monitoring and control/configuration shouldn't be a drain on a
> systems resources
> * NIH is an issue
>
>
<snip>

> Frankly, Nanite is the tool I am most interested in. Somewhere
> between ControlTier, Nanite and mcollective is the future. They feel
> modern, innovative and "right"... they are simply too complex today.
>
> Devops as a movement needs to really push for increased development
> and innovation in that direction. So long as the message is "ssh
> loops are stupid", but then confess that we all use them either
> directly or by proxy, the devops toolchain just looks self-
> contradictory.
>
Personally, I think there's an underlying issue: we should be focusing
on protocols and interfaces, not tools.

Right now, there's a proliferation of tools, each with their own
interfaces and agents - the classic recipe for
an n-squared proliferation. SSH to a command line is used so commonly
precisely because it's the only tool
that doesn't require deploying a complicated set of tools and agents.
What we need is a better, more expansive monitoring & control protocol
than SNMP.

By analogy: What's important is SMTP, without it, discussing the merits
of sendmail, postfix, exim, et. al., would be meaningless. With SMTP,
there are lots of functionally equivalent solutions to chose among.

Having said this, the devil, of course, is in the details. Developing a
new protocol is not easy, and usually involves multiple, funded efforts
that eventually shoot it out until a winner emerges. Right now, I'm not
sure where such work would be done, or who would fund it - but it sure
seems to be needed.

Miles Fidelman

John E. Vincent

unread,
Sep 27, 2010, 10:55:21 AM9/27/10
to devops-toolchain
I think mcollective/nanite model are pretty much the best way to go.
I'm not going to go on a sales pitch about Vogeler but I actually had
those same concepts in mind when I started writing it:

- I don't want to deal with writing another daemon and managing
security implications
- Nagios hit the sweet spot in terms of flexibility
- Message queues + schemaless data stores are the best way to go at
this point

Vogeler has those principles in mind
- The client watches a queue. The server watches a queue. External
script drops message on queue for all or subgroup of clients
- Client executes script based on message. Drops results + smallmeta
data as JSON back on the queue
- Server reads message+metadata and dumps to database

The flexibility aspect is that the "plugins" are simple text files
defining what command to run, and what data structure the results are
in. By using a schemaless datastore, I can use the command as a key
and the results as the value and dump that into a single document/
bucket for a given host.

This gives you the flexibility of an SSH loop with the accounting/
auditing/persistence and something of an ad-hoc nature. Using a
message queue architecture just makes sense in so many cases.

John E. Vincent

unread,
Sep 27, 2010, 11:00:09 AM9/27/10
to devops-toolchain
On Sep 27, 10:52 am, Miles Fidelman <mfidel...@meetinghouse.net>
wrote:
>
> Personally, I think there's an underlying issue: we should be focusing
> on protocols and interfaces, not tools.
>
> Right now, there's a proliferation of tools, each with their own
> interfaces and agents - the classic recipe for
> an n-squared proliferation.  SSH to a command line is used so commonly
> precisely because it's the only tool
> that doesn't require deploying a complicated set of tools and agents.  
> What we need is a better, more expansive monitoring & control protocol
> than SNMP.
>
> By analogy: What's important is SMTP, without it, discussing the merits
> of sendmail, postfix, exim, et. al., would be meaningless.  With SMTP,
> there are lots of functionally equivalent solutions to chose among.
>
> Having said this, the devil, of course, is in the details.  Developing a
> new protocol is not easy, and usually involves multiple, funded efforts
> that eventually shoot it out until a winner emerges.  Right now, I'm not
> sure where such work would be done, or who would fund it - but it sure
> seems to be needed.
>
> Miles Fidelman
>

I think your absolutely right. One thing that we've been discussing
around the office is the fact that a group of tools and people who
demand so much in terms of APIs have yet to provide any sort of
interoperability between each other.

I really think we're at the point where we could all agree on a
standard basic JSON message format for describing a resource with a
vendor-specific field included. I can't believe I'm even suggesting
the vendor specific-field since it's open to so much abuse but there
you go.

R.I.Pienaar

unread,
Sep 27, 2010, 11:06:29 AM9/27/10
to devops-t...@googlegroups.com

----- "John E. Vincent" <lusi...@gmail.com> wrote:

> I think mcollective/nanite model are pretty much the best way to go.
> I'm not going to go on a sales pitch about Vogeler but I actually had
> those same concepts in mind when I started writing it:
>
> - I don't want to deal with writing another daemon and managing
> security implications
> - Nagios hit the sweet spot in terms of flexibility
> - Message queues + schemaless data stores are the best way to go at
> this point
>
> Vogeler has those principles in mind
> - The client watches a queue. The server watches a queue. External
> script drops message on queue for all or subgroup of clients
> - Client executes script based on message. Drops results + smallmeta
> data as JSON back on the queue
> - Server reads message+metadata and dumps to database


On the published roadmap for mcollective is something similar, you'll be able to have actions written in different languages, so say you have a package agent that on redhat uses python - to talk with yum efficiently - but on other operating systems it could be shell/ruby/whatever.

But it will retain the API transparency that we have today, so from a client perspective the result sets from the api calls wont change at all. You'll get the same fine grained authorization, authentication and auditing thats already in mcollective for free with these external agents.

The registration system currently that uses mcollective as a transport for inventory/cmdb style information, I populate mongodb instances for example the nice thing with the mc model is that i can run 1 or 100 registration endpoints one feeding couch, one memcache, one a set of files or whatever you fancy and mc's infrastructure takes care of delivering the payload where needed. So building something like Vogeler's CMDB pushing isnt hard but you can additionally run/feed/consume it in many different ways via plugins.


--
R.I.Pienaar

John E. Vincent

unread,
Sep 27, 2010, 11:11:00 AM9/27/10
to devops-toolchain
On Sep 27, 11:06 am, "R.I.Pienaar" <r...@devco.net> wrote:
> On the published roadmap for mcollective is something similar, you'll be able to have actions written in different languages, so say you have a package agent that on redhat uses python - to talk with yum efficiently - but on other operating systems it could be shell/ruby/whatever.
>
> But it will retain the API transparency that we have today, so from a client perspective the result sets from the api calls wont change at all.  You'll get the same fine grained authorization, authentication and auditing thats already in mcollective for free with these external agents.
>
> The registration system currently that uses mcollective as a transport for inventory/cmdb style information, I populate mongodb instances for example the nice thing with the mc model is that i can run 1 or 100 registration endpoints one feeding couch, one memcache, one a set of files or whatever you fancy and mc's infrastructure takes care of delivering the payload where needed.  So building something like Vogeler's CMDB pushing isnt hard but you can additionally run/feed/consume it in many different ways via plugins.
>
> --
> R.I.Pienaar

That's awesome. I wasn't attempting to disparage mcollective in any
way, mind you ;) I would have prefered that we implemented it here but
as I said, there's something of a NIH/Ruby sucks thing going on.

R.I.Pienaar

unread,
Sep 27, 2010, 11:11:49 AM9/27/10
to devops-t...@googlegroups.com

----- "Miles Fidelman" <mfid...@meetinghouse.net> wrote:

> >
> Personally, I think there's an underlying issue: we should be focusing
> on protocols and interfaces, not tools.

I agree with this 100%.

At the moment a lot of the tools - certainly mcollective is - is at the point where we are still exploring ideas and finding ways to work, figuring out what doesnt work and what the best approach is toward building these tools. It's very early days.

I have though designed mcollective so that this eventual goal will be realized, there is a standardized structure for messages/replies and there is code that at least proves that you an talk to it from perl, php and even erlang. At the moment the managed node side is still Ruby only but once I have 1.0 out the door - soonish - addressing that is on the roadmap.

peco

unread,
Sep 27, 2010, 11:43:04 AM9/27/10
to devops-toolchain
Hello fellow devopsers and toolchainers!

This is a very interesting discussion so I decided to drop my thoughts
on
this too as I have implemented / researched the systems control over
the
last 8 years or so. I think all comments are spot on the lack of
facility
around distributed systems control.

So I would like to extend the thread a bit on what we feel are the
desired
properties / attributes of good distributed systems C&C framework.
Ideally
OS vendors will wake up and smell the roses and provide adequate
facility
in this area but I am not holding my breath.

Here, I will have a crack at it:
1) Guarantees we want from the universal distributed C&C
- Ordering of execution. Execution is guaranteed to happen in
order it is specified. Example: machine 1 and 2 and 3 must run first
and
finish before machine 4 picks up an operation.
- Parallelism. I want to be able to specify what operations
can
run in parallel and what operations have to execute in sequence.
Execution
will then be carried out according to specification.
- Shell abstraction. I want to be able to run commands and
get
output without having to think of the particular remote shell
environment
and its oddities.
- Secure channel. Commands are always carried out over a
secure
channel without me having to do key or password management, setup, etc
on
the distributed nodes.
- Asynchronous execution. Long running commands can be allowed
to
run async with callbacks.
- I/O Pipes. Input and output can be shared / piped between
operations.
- Consistent State. Executing distributed commands is based
on
consistent state and the consistency is preserved, This consistent
state
is always in sync with the actual machine and process states.
- Ergonomics. It should be clear whether commands succeeded
or
failed as unit and as a composed group when an admin executes
distributed
commands. Should not have to login to systems to verify if things
worked
or not.

On Sep 27, 10:11 am, "R.I.Pienaar" <r...@devco.net> wrote:

peco

unread,
Sep 27, 2010, 11:46:36 AM9/27/10
to devops-toolchain
Sorry for the spam, Now in a more human readable format:
2) Language bindings support.
- Perl, Python, Ruby
- Java, C++, C
3) OS support
- Windows
- Linux
4) Extra goodies
- REST API. Framework can be accessed over a web api for interop with
monitoring and provisioning, etc.
- Two phased commit. Should be able to run and operation, and
rollback if an error occurred.
- Execution plan. Should be able to look ahead and understand the
execution plan of a distributed command.

Miles Fidelman

unread,
Sep 27, 2010, 12:00:57 PM9/27/10
to devops-t...@googlegroups.com
John E. Vincent wrote:
> On Sep 27, 10:52 am, Miles Fidelman<mfidel...@meetinghouse.net>
> wrote:
>
>> Personally, I think there's an underlying issue: we should be focusing
>> on protocols and interfaces, not tools.
>> <snip>

>> By analogy: What's important is SMTP, without it, discussing the merits
>> of sendmail, postfix, exim, et. al., would be meaningless. With SMTP,
>> there are lots of functionally equivalent solutions to chose among.
>>
> I think your absolutely right. One thing that we've been discussing
> around the office is the fact that a group of tools and people who
> demand so much in terms of APIs have yet to provide any sort of
> interoperability between each other.
>
> I really think we're at the point where we could all agree on a
> standard basic JSON message format for describing a resource with a
> vendor-specific field included. I can't believe I'm even suggesting
> the vendor specific-field since it's open to so much abuse but there
> you go.
>
I'd constrain this just a bit more:

- RESTful interface for anything that can be expressed in a relatively
simple (one-line) command (e.g., reading or setting one variable,
executing a restart command)

- multiple representations for responses - HTML and JSON as required

- REST+JSON for things that require more detail, though for command
sequences, particularly with dependencies, XML might be necessary

I might start by looking at SMTP and seeing how much of the current
model could be translated into a RESTful form, and then look at some of
the OASIS workflow stuff as the basis for defining things like
installation and configuration sequences.

Miles

Barry Allard

unread,
Sep 27, 2010, 12:16:34 PM9/27/10
to devops-t...@googlegroups.com
A portable and agnostic RPC framework like Avro (which supports JSON) might be useful. RPC to SMTP would be useful for lossy, heavy traffic datacenters and slow links among other things.

And Thrift and Protocol Buffers are worth an eval also.

Barry Allard

Cc @puppetmasterd

John E. Vincent

unread,
Sep 27, 2010, 12:53:58 PM9/27/10
to devops-toolchain
On Sep 27, 12:16 pm, "Barry Allard" <barry.all...@gmail.com> wrote:
> A portable and agnostic RPC framework like Avro (which supports JSON) might be useful.  RPC to SMTP would be useful for lossy, heavy traffic datacenters and slow links among other things.
>
> And Thrift and Protocol Buffers are worth an eval also.
>
> Barry Allard
>
> Cc @puppetmasterd
>
Okay that's interesting. I've always shied away from RPC in general
because it felt so propreitary (possible due to experience with
specific implementations) and I didn't want to go down a route and end
up with RMI-style problems. I'll have to do some poking at Avro
because it's so self-contained.
Reply all
Reply to author
Forward
0 new messages