./a_command_that_provides_a_filtered_list_of_hostnames | xapply -P10 -f
'fping %1 && ssh %1 run-some-command' -
With my background I lean towards implementing something like
mcollective. But answering the question of 'what would I use as a
non-dev sysadmin' then xapply would do everything I needed.
Cheers,
-- Greg
ctl-exec -I include_some_nodes -X exclude_some_nodes -C10 -- run-some-command
The get_hosts sort of command right now is pretty dumbed down. It asks
the machine DB (a django app) for hosts satisfying many roles (web,
app1, app2, db, etc), sites, statuses, and such through a simple HTTP
call. I could add options for including and excluding other stuff like
certain racks in our colos. If it's in our machine DB, this script
could use it.
We're getting to the point though, where we need to include and
exclude hosts based on the classes they use in puppet. Since that
info's stored in puppet, we don't also store it in our machine DB. We
use an external nodes script for puppet that points to our DB, but
there are still many use cases where we need more visibility into
puppet in our parallel exec calls.
I believe the developer's deploy tools use a python script somebody
wrote to call SSH and rsync in parallel rolling chunks (5 hosts at a
time, e.g.)
-adam
P.S. I'm pretty sure there are some RPMs for dsh/libdshconfig around,
but I can post ours if you'd like.
However, at the end of the day sometimes you just need to SSH to a
system or three and run some commands.
With Chef's command-line API tool Knife, we use it for adhoc commands
and "command-and-control"-lite usage. Knife provides subcommands, and
one that we utilize for this is ssh.
knife ssh role:opscode-chef 'sudo chef-client' -a ec2.public_hostname
For example, this command will search the Chef server for all the
nodes that have the role 'opscode-chef' applied, and use the value of
the "ec2.public_hostname" attribute[0] as the hostname to connect to.
Knife uses net-ssh-multi to open connections to all the systems that
return from the search, and run the command 'sudo chef-client' on
them.
Aside from the specified command, one can use knife ssh to open
interactive, screen, tmux, macterm sessions for each of the systems
that return from the search. In an upcoming version of Chef, csshX
will be supported as well.
[0]: this is because the automatically detected hostname and IP
address of ec2 systems is by default the private values, instead of
the public ones.
> There seems to be a lot of aversion in the devops community to running
> tools that are basically SSH in a for loop,
By devops community do you mean non-SA types? If so, is there a gap and what is the gap?
On Sat, Sep 25, 2010 at 9:57 AM, Noah Campbell <noahca...@gmail.com> wrote:
> By devops community do you mean non-SA types? If so, is there a gap and what is the gap?
In my observation, it is sysadmin types.
The problem I have with doing this for a lot of things - say to drive your deploys - are that they are just a bunch of commands. Possibly stuck in someones head or outdated on a wiki
Yes you need to run one off commands but I always ask myself if I am doing a specific one often and if so is it something I need to improve
If I need to improve it - say it's a step in my code deploy - then you need a more mature c&c than a ssh for loop you need something that is versioned, in use in dev and staging and tested well as part of your release testing. You nee something that provides measurable rather than human brain parsed status etc. Nested output from parallel ssh over 20 machines isn't clear as day and promote user error.
Good c&c tools strike a balance between these two but should enable the later with the former being an small side addition.
The "outdated" argument is true whether commands in your head, wiki, or code. The only way to mitigate this problem is by understanding all of the interactions in a group of systems.
I have found up to date architecture drawing to be the best mitigation for this problem.
This is a standard problem of inductive reasoning, will the apple really fall next time?
Scott M
The "outdated" argument is true whether commands in your head, wiki, or code. The only way to mitigate this problem is by understanding all of the interactions in a group of systems.
Good point. I agree tesing is a critical part of command and control, but those tests are best created from knowledge of the overall system. This is where it can get harry w/o architectural knowledge.
This is more an argument for agile-ish development of c&c (which I agree with), but how does one incorporate that easily into existing c&c w/o developing their own scripts?
This is tougher than it sounds, I do it all of the time
Scott M
I thought you made the "outdated" argument, not me. It's all about architecture, testing, and version control. But somebody has to deploy the deployment app (button), and somebody has to update it.
it's sort of unfortunate that nobody has really developed an update to
SNMP as a common interface for command and control applications
It seems like each tool (Nagios, Puppet, etc.) - defines its own
protocol, and then requires that you install its own modules in every
managed machine - what a royal pain. Shell and TCL scripts, and
judicious use of pipes, sed, sort, and whatnot very quickly prove to be
easier.
It sure would be nice if people started routinely including RESTful web
interfaces for monitoring and control into every piece of code.
Miles Fidelman
--
In theory, there is no difference between theory and practice.
In<fnord> practice, there is. .... Yogi Berra
Essentiall, I think the outdated argument is weak, version control, testing, and architecture are the stronger argument for controlled deployment, which all prohibit weak or untested paths to change, not because code is inherently more up to date, but because it is a more testable path
Scott M
I'm sure everyone has heard (some variation) of the following relating to bandwidth [1]:
Don't underestimate the bandwidth of a van full of hard drives.
I bring that up, because I think in the DevOps space the following holds true:
Don't underestimate the power of ssh and vi
The only draw back to ssh and vi is that accountability seems to be lost given everyone's best intentions. Secondly, the quirky behavior of a system (to Scott's point about architecture) is unique that it becomes hard to generalize its properties.
[1] http://www.codinghorror.com/blog/2007/02/the-economics-of-bandwidth.html
I see the biggest need for more mature c&c in my job in the 3rd sort
of task. Time-sensitive, cluster configuration changes and accompanied
service fiddlings. A good rule of thumb might be anything you use
ctrl+r for to change clusters should be a script of sorts and thus
version controlled. I'm having difficulty switching to this, I think
because of the ludicrous number of scripts it would require.
-adam
I thought is was a station wagon? ;-)
I just realized what I was trying to say. To avoid the "outdated" problem/anti-pattern automation is a condition which is necessary but not sufficient (can you tell my best friend is finishing his PHD in philosophy). Automation cannot solve this problem alone. I hear this argument a lot, maybe it wasn't being made. If I jumped on it too quickly, I appologize.
The sufficient conditions are
* good understanding of architecture (or lack there of ;-)
* automation (c&c code)
* tests
Only then can you feel "confident" that "few and far" between operations really will work the few times you run them anecdotally, I have an install script that I wrote for a very complex cluster install; I have tests and test runs which work with virtual machines and I am still scared to run it the three times a year I do. But that could just be because I am a crappy programmer.
For operations that are performed all of the time, this get's mitigated because it run all of the time, clearly automation, even w/o tests, is more efficient in this case.
All of that aside, I still think there is a place for mass ssh especially for trivial and emergency situations. There is also a soft spot in my heart for it.
Scott
On Sat, Sep 25, 2010 at 10:43 AM, Miles Fidelman
<mfid...@meetinghouse.net> wrote:
>
> It seems like each tool (Nagios, Puppet, etc.) - defines its own protocol,
> and then requires that you install its own modules in every managed machine
> - what a royal pain. Shell and TCL scripts, and judicious use of pipes,
> sed, sort, and whatnot very quickly prove to be easier.
This is one of the reasons why Chef's knife ssh uses ssh, which most
people are already using anyway.
Another knife subcommand 'bootstrap', was originally implemented to
prepare a base OS install to get Chef installed, but it can also be
used to run an arbitrary script via knife ssh.
> It sure would be nice if people started routinely including RESTful web
> interfaces for monitoring and control into every piece of code.
Chef has a RESTful web interface to the server, which exposes a lot of
data about the infrastructure it's already storing and indexing for
search. While Opscode doesn't provide (m)any prebuilt libraries or
interfaces to interact with it automatically, the code is there and
the libraries are available to use within your code. Unfortunately
most of the documentation is through reading the source.
But if someone develops something using this and wants to share how
they did it, please add a user guide to the Chef wiki :).
Thats why I like to do my deploys as a resource in Chef :-).
The ad-hoc deployment happens simply by running Chef. It might happen
by updating a data bag item and letting Chef pick it up in ~30
minutes.
Unfortunately a lot of deploys require some or all of:
- cross machine awareness, especially when deploys are a series of events
- finely controlled deployes to named groups of machines in batches
- awareness of monitoring status before/post
- some human level validation/decision to backout
- tight control of timings
- staggered restarts to avoid over subscribing on cache refreshes against dbs
and lots more in this realm, really if you have these it points to a deploy that needs to be improved, but you often hit a 80/20 situation and the budget (time or money) just isnt there to do the last 20%.
Automating these are tough within a CM tool - at least todays - and so C&C tools matter a lot, they provide building block API calls essentially allowing for a way to script such deploys by combining lots of little pieces. With enough little APIs existing in your C&C platform you can write deployment scripts that address the above issues.
I think this thread has gone off a bit on a tangent with the whole 'outdated' bit, really dont know what that's all about. All I am saying is if its not scripted - but meat cloud driven - you have a massive documentation bordon that is often not met.
--
R.I.Pienaar
Right now, there's a proliferation of tools, each with their own
interfaces and agents - the classic recipe for
an n-squared proliferation. SSH to a command line is used so commonly
precisely because it's the only tool
that doesn't require deploying a complicated set of tools and agents.
What we need is a better, more expansive monitoring & control protocol
than SNMP.
By analogy: What's important is SMTP, without it, discussing the merits
of sendmail, postfix, exim, et. al., would be meaningless. With SMTP,
there are lots of functionally equivalent solutions to chose among.
Having said this, the devil, of course, is in the details. Developing a
new protocol is not easy, and usually involves multiple, funded efforts
that eventually shoot it out until a winner emerges. Right now, I'm not
sure where such work would be done, or who would fund it - but it sure
seems to be needed.
Miles Fidelman
> I think mcollective/nanite model are pretty much the best way to go.
> I'm not going to go on a sales pitch about Vogeler but I actually had
> those same concepts in mind when I started writing it:
>
> - I don't want to deal with writing another daemon and managing
> security implications
> - Nagios hit the sweet spot in terms of flexibility
> - Message queues + schemaless data stores are the best way to go at
> this point
>
> Vogeler has those principles in mind
> - The client watches a queue. The server watches a queue. External
> script drops message on queue for all or subgroup of clients
> - Client executes script based on message. Drops results + smallmeta
> data as JSON back on the queue
> - Server reads message+metadata and dumps to database
On the published roadmap for mcollective is something similar, you'll be able to have actions written in different languages, so say you have a package agent that on redhat uses python - to talk with yum efficiently - but on other operating systems it could be shell/ruby/whatever.
But it will retain the API transparency that we have today, so from a client perspective the result sets from the api calls wont change at all. You'll get the same fine grained authorization, authentication and auditing thats already in mcollective for free with these external agents.
The registration system currently that uses mcollective as a transport for inventory/cmdb style information, I populate mongodb instances for example the nice thing with the mc model is that i can run 1 or 100 registration endpoints one feeding couch, one memcache, one a set of files or whatever you fancy and mc's infrastructure takes care of delivering the payload where needed. So building something like Vogeler's CMDB pushing isnt hard but you can additionally run/feed/consume it in many different ways via plugins.
--
R.I.Pienaar
> >
> Personally, I think there's an underlying issue: we should be focusing
> on protocols and interfaces, not tools.
I agree with this 100%.
At the moment a lot of the tools - certainly mcollective is - is at the point where we are still exploring ideas and finding ways to work, figuring out what doesnt work and what the best approach is toward building these tools. It's very early days.
I have though designed mcollective so that this eventual goal will be realized, there is a standardized structure for messages/replies and there is code that at least proves that you an talk to it from perl, php and even erlang. At the moment the managed node side is still Ruby only but once I have 1.0 out the door - soonish - addressing that is on the roadmap.
- RESTful interface for anything that can be expressed in a relatively
simple (one-line) command (e.g., reading or setting one variable,
executing a restart command)
- multiple representations for responses - HTML and JSON as required
- REST+JSON for things that require more detail, though for command
sequences, particularly with dependencies, XML might be necessary
I might start by looking at SMTP and seeing how much of the current
model could be translated into a RESTful form, and then look at some of
the OASIS workflow stuff as the basis for defining things like
installation and configuration sequences.
Miles