Can't hosts already stagger their agent checkin times by using per-host runinterval settings?
At some point, sure, agents may not be the best path forward but I don't see when I'd reach that point.
There is another option, mcollective.
Using cron is somewhat manual, and you have to determine whenever to run puppet to avoid a thundering herd effect.
Mcollective lets you use the
Mco puppet runall <concurrency>
Where the concurrency sets how many agents run at the same time.
It is resielent as cron is, and also you have a central point of puppet agents management, instead of (semi) manually set cron entries.
Regards
--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/c0aa9f0c-7730-481e-b026-8ac80df43a5f%40googlegroups.com.
(inline)
On Wed, Apr 30, 2014 at 08:21:15AM -0700, jcbollinger wrote:
> On Tuesday, April 29, 2014 10:15:35 AM UTC-5, Christopher Wood wrote:
>
> Can't hosts already stagger their agent checkin times by using per-host
> runinterval settings?
>
> No. Different agents with different runintervals will still all hit the
> server at nearly the same time when they are started together, and they
> will do so again periodically thereafter (just not every run). Moreover,
> it's nasty to use a policy knob such as runinterval to address a technical
> issue such as avoiding a thundering herd effect.
In theory the agent runs will intersect and kill the puppetmaster in the timespan around when the lowest common denominator of all the runintervals comes around.
In practice if this ever happens to me (hasn't so far) I will shrug and say to wait for the next agent run in less than an hour. Right now my runinterval defaults to (1800 + fqdn_rand(600)), implying that any LCD intersections aren't so frequent yet the agents update pretty frequently.
Out of personal preference I don't make a distinction between technical versus policy issues; that they are both components of the same services.
> Puppet does have the 'splay' and 'splaylimit' configuration settings as a
> possible solution, however. If you can accept some variation in the
> interval between one agent run and the next then those are pretty
> effective, albeit non-deterministic.
I abandoned splay use when it interfered with 'puppet kick'.
I probably wouldn't use it these days because it's not obvious how long the splay will wait
and I'm trying to get away from using inferences in preference to literal values.
I sometimes don't infer the same results as other people do.
https://groups.google.com/forum/#!topic/puppet-users/EaoiHSd4eEM
http://projects.puppetlabs.com/issues/1100
>
>
> At some point, sure, agents may not be the best path forward but I don't
> see when I'd reach that point.
(agents as in puppet agent daemons/services)
> I'm uncertain whether by "agents" you mean running the agent as a service,
> or whether you mean using the agent at all (as opposed to using "puppet
> apply"). Garrett was not suggesting the latter; he was suggesting using
> cron to schedule runs of the agent in non-daemon mode. You can also
> schedule runs of "puppet apply" that way, but that's a whole different
> ball game.
This gets back to, how do I schedule my agent runs to avoid puppetmaster service issues due to load? There are only so many cron slots in a day, manually scheduling that many machines could get boring, and doing it automatically could lead to the same herd issues if I get the algorithm wrong.
The answer sounds like it might be a control host kicking off an "mco puppet runall $limit" job via cron but I'm not there yet.
> There is a lot to be said for scheduling the agent via cron. In addition
> to possible applications in load leveling, it can make Puppet more
> resilient. For example, I was recently working on the provider of a
> custom type, and I managed to let a broken version escape to some of my
> systems, where it crashed the agent daemon. I had to manually restart the
> daemons on those systems. If I were launching Puppet from cron then the
> Puppet runs would still have failed, but the next runs, when a fixed
> version of the provider was available, would have gone fine without any
> manual intervention.
This is where I plug how you should already be killing your own daemons so that you can build service resilience via some form of watcher, and doing dev->stage->prod incremental rollouts so that most of your problems don't happen in production. In my case monit would have brought up puppet on the affected dev hosts, and those hosts would have grabbed the new provider as soon as it was available on the puppetmasters. (Obviously the first time I tried this technique I learned the hard way about monit supervising sshd and init supervising monit.)