Re: Reducing system load

335 views
Skip to first unread message

jcbollinger

unread,
Jun 19, 2012, 9:38:28 AM6/19/12
to puppet...@googlegroups.com


On Tuesday, June 19, 2012 5:23:42 AM UTC-5, Duncan wrote:
Hi folks, I'm scratching my head with a problem with system load.

When Puppet checks in every hour, runs through all our checks, then exits having confirmed that everything is indeed as expected, the vast majority of the time no changes are made.  But we still load our systems with this work every hour just to make sure.  Our current configuration isn't perhaps the most streamlined, taking 5 minutes for a run.

The nature of our system, however, is highly virtualised with hundreds of servers running on a handful of physical hosts.  It got me thinking about how to reduce the system load of Puppet runs as much as possible.  Especially when there may be a move to outsource to virtualisation hosts who charge per CPU usage (but that's a business decision, not mine).

Is there a prescribed method for reducing Puppet runs to only be done when necessary?  Running an md5sum comparison on a file every hour isn't much CPU work, but can it be configured so that Puppet runs are triggered by file changes?  I know inotify can help me here, but I was wondering if there's anything already built-in?

You seem to be asking whether there's a way to make the Puppet agent run to see whether it should run.  Both "no, obviously not" and "yes, it's automatic" can be construed as correct answers.  In a broader context, anything you run to perform the kind of monitoring you suggest will consume CPU.  You'd have to test to see whether there was a net improvement.

Consider also that although file checksumming is one of the more expensive operations Puppet performs, files are not the only managed resources in most Puppet setups.  You'll need to evaluate whether it meets your needs to manage anything only when some file changes.

There are things you can do to reduce Puppet's CPU usage, however.  Here are some of them:
  • You can lengthen the interval between runs (more than you already have done).
  • You can apply a lighter-weight file checksum method (md5lite or even mtime).
  • You can employ schedules to reduce the frequency at which less important resources are managed.
  • You can minimize the number of resources managed on each node.
John

Brian Gallew

unread,
Jun 19, 2012, 9:59:51 AM6/19/12
to puppet...@googlegroups.com
There actually is a way to do this, though you may find it to be more painful to work with.

Imagine, if you will, two environments: production and maintenance.

The production environment is the one you're running right now, for production.  It fully manages everything and ensures that your systems are all fully up-to-spec.  It takes about 5 minutes for a full run of this manifest.

The maintenance environment, on the other hand, manages /etc/passwd, exported resources, and a couple critical resources that change frequently.  It doesn't check package versions, update /etc/ssh/ssh_known_hosts, configure backup software, etc.  It's main purpose is to keep puppet running.

Once you have these two environment configured, you move the majority of your hosts from "production" to "maintenance", and your puppet runtime drops.  When you make actual changes to the manifests, you temporarily move all those hosts back into the production manifest so they get applied, and then revert them to maintenance.

Another possibility for reducing overall CPU usage is to reduce the number of times a day that Puppet runs.  If you cut it back to twice daily, then your total CPU usage goes from 120 minutes/host to 10 minutes/host.  That is, in fact, how we run Puppet where I work, though we do that out of a culture of a "no changes during production" mindset rather than saving CPU cycles.

Finally, consider the actual reasons for your long run times.  If it's primarily that you are checksumming large file trees, you may want to consider other alternatives.  While Puppet is fabulous for templated files, perhaps the bulk of those files could go into a bzr/svn/git/hg/whatever repository?  Then your manifest for that directory is reduced to an exec{} for creating it, and either an exec{} or perhaps a cron{} for running the appropriate update.

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/5UkHTsXNKIsJ.

To post to this group, send email to puppet...@googlegroups.com.
To unsubscribe from this group, send email to puppet-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.

Pablo Fernandez

unread,
Jun 20, 2012, 5:34:48 AM6/20/12
to puppet...@googlegroups.com
Hi,

I really think the original question is very good: "why do you need to compile all manifests again and again when there is no change on the sources (files/ENC/whatever input)?"

Tricks like the proposed ones are clearly not the solution, and even if the language is not prepared for that today, this is probably something worth developing for the future. How many changes do you perform to the system? I believe 95% of the compile time (probably closer to 99%) produces exactly the same output again and again.

BR/Pablo

Walter Heck

unread,
Jun 20, 2012, 5:49:19 AM6/20/12
to puppet...@googlegroups.com
But the way I'm reading this, the question of the OP is to reduce cpu
load on the agents, not the master. Puppet is currently unable to see
wether or not something changed on the machine since last run without
actually checking. I guess there's a bunch of indications that you
could use depending on the OS, but I doubt you can make that 100%
waterproof.

Walter
--
Walter Heck

--
Check out my startup: Puppet training and consulting @ http://www.olindata.com
Follow @olindata on Twitter and/or 'Like' our Facebook page at
http://www.facebook.com/olindata

Brice Figureau

unread,
Jun 20, 2012, 8:18:17 AM6/20/12
to puppet...@googlegroups.com
On Tue, 2012-06-19 at 03:23 -0700, Duncan wrote:
> Hi folks, I'm scratching my head with a problem with system load.
>
> When Puppet checks in every hour, runs through all our checks, then
> exits having confirmed that everything is indeed as expected, the vast
> majority of the time no changes are made. But we still load our
> systems with this work every hour just to make sure. Our current
> configuration isn't perhaps the most streamlined, taking 5 minutes for
> a run.
>
> The nature of our system, however, is highly virtualised with hundreds
> of servers running on a handful of physical hosts. It got me thinking
> about how to reduce the system load of Puppet runs as much as
> possible. Especially when there may be a move to outsource to
> virtualisation hosts who charge per CPU usage (but that's a business
> decision, not mine).
>
> Is there a prescribed method for reducing Puppet runs to only be done
> when necessary? Running an md5sum comparison on a file every hour
> isn't much CPU work, but can it be configured so that Puppet runs are
> triggered by file changes? I know inotify can help me here, but I was
> wondering if there's anything already built-in?

It depends on what you really want to achieve. Part of the CPU
consumption is to make sure the configuration on the node is correct.

I see a possibility where:
* you don't care if there is some configuration drift on the agent (ie
manual modifications for instance)
* you run the agent on demand when you need a change

This can be done with something like MCollective where you can decide to
remotely launch some puppet runs as you see fit.

Now you need to have a way to map from manifest modification to a set of
hosts where you need a puppet run (which might not be that trivial).
--
Brice Figureau
Follow the latest Puppet Community evolutions on www.planetpuppet.org!

Mohit Chawla

unread,
Jun 20, 2012, 8:22:32 AM6/20/12
to puppet...@googlegroups.com
Hello,

On Wed, Jun 20, 2012 at 5:48 PM, Brice Figureau
<brice-...@daysofwonder.com> wrote:

>
> Now you need to have a way to map from manifest modification to a set of
> hosts where you need a puppet run (which might not be that trivial).

One possible approach here
http://www.devco.net/archives/2012/04/28/trigger-puppet-runs.php

jcbollinger

unread,
Jun 20, 2012, 9:39:09 AM6/20/12
to puppet...@googlegroups.com


On Wednesday, June 20, 2012 4:34:48 AM UTC-5, pablo.f...@cscs.ch wrote:

I really think the original question is very good: "why do you need to compile all manifests again and again when there is no change on the sources (files/ENC/whatever input)?"

That's a fair question, but the original one was about agent runs, not about the master's behavior.  For what it's worth, the master needs to recompile manifests because the result depends not only on the manifest files, but also on node facts and external data provided by ENCs and / or query functions such as extlookup() and hiera(), and perhaps even on functions whose results inherently vary from run to run (e.g. "inline_template('<%=Time.now.inspect%>')").


Tricks like the proposed ones are clearly not the solution,

Of course not.  They are not proposed as solutions for the problem you are now talking about.
 
and even if the language is not prepared for that today, this is probably something worth developing for the future. How many changes do you perform to the system? I believe 95% of the compile time (probably closer to 99%) produces exactly the same output again and again.

I'm sure you're right that at many sites, most compilations produce the same result as previous ones, but at some sites, new compilations never produce the same results as previous ones (see the inline_template example above).  It may be that the catalog compiler can be made smart enough to recognize which catalog + facts combinations are safe to cache, and that might indeed be a big win in puppetmaster capacity for some, but I think it's a harder problem than you may have supposed.


John

Michael Baydoun

unread,
Jun 20, 2012, 2:50:38 PM6/20/12
to puppet...@googlegroups.com
puppet kick from your master after you make a change

jcbollinger

unread,
Jun 21, 2012, 9:22:44 AM6/21/12
to puppet...@googlegroups.com


On Wednesday, June 20, 2012 1:50:38 PM UTC-5, IndyMichaelB wrote:
puppet kick from your master after you make a change

Combining normal agent behavior with periodic kicks would increase the workload of all parties, so I suppose you are suggesting to configure agents to update only when kicked, or perhaps to combine listening for kicks with a much longer run interval.

At least half the reason for periodic agent runs is to maintain client nodes in the declared state, however, regardless of whether manifests have changed on the master.  Relying exclusively on kicks would give up that state maintenance, and lengthening the run interval would increase the window in which misconfiguration of managed resources can persist.  These may be acceptable tradeoffs for some, but given that the OP is already using an unusually short run interval, I would guess that they are not acceptable for him.


John

Len Rugen

unread,
Jun 21, 2012, 12:02:37 PM6/21/12
to puppet...@googlegroups.com
Some thoughts from our similar environment:

  1. Puppet client runs are like bugs and a light, they will tend to cluster together.  If some client runs are slow, other clients wait, over time, they all end up trying to run at the same time. This was easily observed on the foreman "run distribution over the last 30 minutes" We solved by restarting the client puppet service weekly and using the splay option.
  2. Some resources types use more CPU than others, I've observed anything using yum (ensure latest for example) and recursive directories of course.
  3. Foreman "report metrics" is a good tool, I just looked at a few of my systems and config_retrieval is about 50% of the ET on the systems sampled. 
  4. Are modifications happening? Sometimes a class will re-do each puppet run.  Those need to be fixed.



R.I.Pienaar

unread,
Jun 21, 2012, 12:07:42 PM6/21/12
to puppet...@googlegroups.com


----- Original Message -----
> From: "Len Rugen" <lenr...@gmail.com>
> To: puppet...@googlegroups.com
> Sent: Thursday, June 21, 2012 5:02:37 PM
> Subject: Re: [Puppet Users] Re: Reducing system load
>
> Some thoughts from our similar environment:
>
>
> 1. Puppet client runs are like bugs and a light, they will tend
> to cluster together. If some client runs are slow, other clients
> wait, over time, they all end up trying to run at the same time.
> This was easily observed on the foreman "run distribution over
> the last 30 minutes" We solved by restarting the client puppet
> service weekly and using the splay option.

is this really still happening? I thought that got fixed ages ago

Zach

unread,
Jun 21, 2012, 1:38:12 PM6/21/12
to puppet...@googlegroups.com
Hopefully, in the future, this feature will be added and you can determine what is taking the longest.  http://projects.puppetlabs.com/issues/2576 

Len Rugen

unread,
Jun 21, 2012, 8:28:05 PM6/21/12
to puppet...@googlegroups.com
Re: is this really still happening? I thought that got fixed ages ago

I can't say, we would mask the symptoms now.  Don't take the comment as a bug report :-) 

Jeff McCune

unread,
Jun 22, 2012, 3:40:24 AM6/22/12
to puppet...@googlegroups.com
On Thu, Jun 21, 2012 at 5:28 PM, Len Rugen <lenr...@gmail.com> wrote:
>
> Re: is this really still happening? I thought that got fixed ages ago
>
> I can't say, we would mask the symptoms now.  Don't take the comment as a bug report :-)

This is actually a pretty simple patch.  What would you like the
timestamp to look like?  This is what it looks like with a green
default ruby timestamp.
http://links.puppetlabs.com/jeff_debug_time.png

You can play around with it if you hack at this line:
https://github.com/puppetlabs/puppet/blob/master/lib/puppet/util/log/destinations.rb#L119

-Jeff

Felix Frank

unread,
Jun 22, 2012, 6:08:06 AM6/22/12
to puppet...@googlegroups.com
On 06/21/2012 06:07 PM, R.I.Pienaar wrote:
>> 1. Puppet client runs are like bugs and a light, they will tend
>> > to cluster together. If some client runs are slow, other clients
>> > wait, over time, they all end up trying to run at the same time.
>> > This was easily observed on the foreman "run distribution over
>> > the last 30 minutes" We solved by restarting the client puppet
>> > service weekly and using the splay option.
> is this really still happening? I thought that got fixed ages ago

I definitely reproduced on 2.6.14 until I switched from agent to cron
for good.
It may be gone from 2.7, of course.

Cheers,
Felix
Reply all
Reply to author
Forward
0 new messages