many agents connecting at same time and 100+ nodes failed.

137 views
Skip to first unread message

Suresh P

unread,
Nov 6, 2014, 2:58:29 AM11/6/14
to puppet...@googlegroups.com
12:51 (34 minutes ago)
Hi,

In my puppet setup, i'm managing around 1700+ nodes via foreman.     I have 3 puppet master ( one of the server is CA) under load-balancing and one foreman as ENC and report viewer.

Around 900+ nodes connecting at same time, due to that 150 out of 900 getting following error.
"Could not retrieve catalog from remote server: execution expired"

To fix this,   I have stopped puppet agent in 900+ nodes.   And started each agent with some intervals so that it can be shared to all minutes.   But it helped only for one day. 

Kindly refer the attached image.  And suggest me any solution.

Thanks & Regards,
Suresh.P


james.e...@fasthosts.com

unread,
Nov 6, 2014, 3:25:36 AM11/6/14
to puppet...@googlegroups.com
Try using the splay config option on the agents.  It should help to distribute the agent runs.
https://docs.puppetlabs.com/references/latest/configuration.html#splay

If that fails, you could try running the puppet agent from a cron job instead with randomised start times as per the below link.

http://mycfg.net/articles/random-start-times-for-cron-jobs-with-puppet.html

Felix Frank

unread,
Nov 6, 2014, 3:29:13 AM11/6/14
to puppet...@googlegroups.com
On 11/06/2014 09:25 AM, james.e...@fasthosts.com wrote:
>
> If that fails, you could try running the puppet agent from a cron job
> instead with randomised start times as per the below link.

+1

I have yet to see a disadvantage of cron vs. the background agent.

james.e...@fasthosts.com

unread,
Nov 6, 2014, 4:37:22 AM11/6/14
to puppet...@googlegroups.com
I used to have issues with the agent leaking memory over time.  This is going back to 2.6 days.
I implemented a cron job back then to restart the agent every night and never removed the job (even though I'm now running 3.6), so I don't know if there are still memory issues with the agent daemon.

If you were aiming for a smaller resource footprint on the server, the cron route would likely be better as it's one less daemon running 24-7 on each node.

Christopher Wood

unread,
Nov 6, 2014, 12:22:20 PM11/6/14
to puppet...@googlegroups.com
(Apparently I enjoy splitting tiny hairs in a thread branch. Possibly rhubarbing on, but here goes.)

Cultural...

In my experience, it is easier to tell people that any host where an agent is running is a puppetized host. I don't understand why it's more difficult to read /etc/motd or grep root's crontab but that's probably my view of things and not shared elsewhere. (This is more for the transition period as everything goes under puppet management.)

And then a manager might go and tell everybody that a host with an agent running is a puppetized host and it's hard to go and change puppet methods without requiring retraining. Your puzzlement at the difficulty level of changing from agent to cron likely matches mine.

Technological...

The fqdn_rand() function doesn't necessarily spread things as evenly as we might expect given randomness over large ranges. To illustrate, fictional example:

$ for x in `seq 1 10000`; do echo "notice(fqdn_rand(60, 'host${x}.cwood.com'))"; done >/tmp/xx.pp
$ puppet apply --color=false /tmp/xx.pp | grep Scope | awk '{print $3}' | sort | uniq -c | sort -rn >/tmp/yy
$ head -5 /tmp/yy
199 1
195 51
188 52
184 55
184 48
$ tail -5 /tmp/yy
151 9
150 46
145 5
144 2
141 19

Obviously using cron I might choose something different than run-on-a-random-minute, but I will still be a bit suspicious about this.

Also, cron isn't necessarily good at running things intermittently over less clock-friendly periods. If I'd like something to run every 47 minutes, just to pick, that's going to bit a bit harder to express with cron.



> --
> You received this message because you are subscribed to the Google Groups "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/545B3130.90007%40alumni.tu-berlin.de.
> For more options, visit https://groups.google.com/d/optout.

Christopher Wood

unread,
Nov 11, 2014, 10:48:42 AM11/11/14
to puppet...@googlegroups.com
Following up to myself since it turned out that running puppet agents via cron was better for us due to one important feature...


[cwood@client ~]$ mcod puppet runonce --environment=puppetupgrades --no-splay -F hostname=puppetmasterlab
running: mco puppet -c /home/cwood/.mcollective.dev runonce --environment=puppetupgrades --no-splay -F hostname=puppetmasterlab

* [ ============================================================> ] 1 / 1


puppetmasterlab.domain.com Request Aborted
Cannot specify any custom puppet options when the daemon is running


...I can do agent runs using other environments with mcollective.

It's interesting how things work out over time.
> To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/20141106172204.GA26122%40iniquitous.heresiarch.ca.

Suresh P

unread,
Jan 8, 2015, 7:40:05 AM1/8/15
to puppet...@googlegroups.com, christop...@pobox.com
Hi All,

I have found one more issue.   
When we install puppet agent, it creates logrotate which will kill the puppet and restart it.   We have configured logroate for all the nodes at 1st minute of everday(00:01).   Because of that all the node's puppet agent get reloaded at 00:01 minutes so all agents trying to connect the puppet masters at same polling interval.

Regards,
Suresh.

Christopher Wood

unread,
Jan 8, 2015, 2:27:31 PM1/8/15
to puppet...@googlegroups.com
Yes it does, and that's something that you would configure to not happen at the same time (with fqdn_rand, or better, remote syslog).

https://docs.puppetlabs.com/references/latest/function.html#fqdnrand

I found some irritating hitches with running through a cut-down environment using mcollective (plugin re-sync, usual hiccoughs with mcollective in our environment), plus popular opinion here did not support adding to our lengthy list of cron jobs. Long story short, we are still running the puppet agent as a daemon. Which you would pick still depends on what criteria you are aiming for.

On Thu, Jan 08, 2015 at 04:40:05AM -0800, Suresh P wrote:
> Hi All,
>
> I have found one more issue.   
> When we install puppet agent, it creates logrotate which will kill the
> puppet and restart it.   We have configured logroate for all the nodes at
> 1st minute of everday(00:01).   Because of that all the node's puppet
> agent get reloaded at 00:01 minutes so all agents trying to connect the
> puppet masters at same polling interval.
>
> Regards,
> Suresh.
>
> On Tuesday, 11 November 2014 21:18:42 UTC+5:30, Christopher Wood wrote:
>
> Following up to myself since it turned out that running puppet agents
> via cron was better for us due to one important feature...
>
> [cwood@client ~]$ mcod puppet runonce --environment=puppetupgrades
> --no-splay -F hostname=puppetmasterlab
> running: mco puppet -c /home/cwood/.mcollective.dev runonce
> --environment=puppetupgrades --no-splay -F hostname=puppetmasterlab
>
>  * [ ============================================================> ] 1 /
> 1
>
> [1]puppetmasterlab.domain.com            Request Aborted
>    Cannot specify any custom puppet options when the daemon is running
>
> ...I can do agent runs using other environments with mcollective.
>
> It's interesting how things work out over time.
>
> On Thu, Nov 06, 2014 at 12:22:04PM -0500, Christopher Wood wrote:
> > On Thu, Nov 06, 2014 at 09:28:32AM +0100, Felix Frank wrote:
> > > On 11/06/2014 09:25 AM, [2]james.e...@fasthosts.com wrote:
> > > >
> > > > If that fails, you could try running the puppet agent from a cron
> job
> > > > instead with randomised start times as per the below link.
> > >
> > > +1
> > >
> > > I have yet to see a disadvantage of cron vs. the background agent.
> >
> > (Apparently I enjoy splitting tiny hairs in a thread branch. Possibly
> rhubarbing on, but here goes.)
> >
> > Cultural...
> >
> > In my experience, it is easier to tell people that any host where an
> agent is running is a puppetized host. I don't understand why it's more
> difficult to read /etc/motd or grep root's crontab but that's probably
> my view of things and not shared elsewhere. (This is more for the
> transition period as everything goes under puppet management.)
> >
> > And then a manager might go and tell everybody that a host with an
> agent running is a puppetized host and it's hard to go and change puppet
> methods without requiring retraining. Your puzzlement at the difficulty
> level of changing from agent to cron likely matches mine.
> >
> > Technological...
> >
> > The fqdn_rand() function doesn't necessarily spread things as evenly
> as we might expect given randomness over large ranges. To illustrate,
> fictional example:
> >
> > $ for x in `seq 1 10000`; do echo "notice(fqdn_rand(60,
> 'host${x}.[3]cwood.com'))"; done >/tmp/xx.pp
> > $ puppet apply --color=false /tmp/xx.pp | grep Scope | awk '{print
> $3}' | sort | uniq -c | sort -rn >/tmp/yy
> > $ head -5 /tmp/yy
> >     199 1
> >     195 51
> >     188 52
> >     184 55
> >     184 48
> > $ tail -5 /tmp/yy
> >     151 9
> >     150 46
> >     145 5
> >     144 2
> >     141 19
> >
> > Obviously using cron I might choose something different than
> run-on-a-random-minute, but I will still be a bit suspicious about this.
> >
> > Also, cron isn't necessarily good at running things intermittently
> over less clock-friendly periods. If I'd like something to run every 47
> minutes, just to pick, that's going to bit a bit harder to express with
> cron.
> >
> >
> >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups "Puppet Users" group.
> > > To unsubscribe from this group and stop receiving emails from it,
> send an email to [4]puppet-users...@googlegroups.com.
> > > To view this discussion on the web visit
> [5]https://groups.google.com/d/msgid/puppet-users/545B3130.90007%40alumni.tu-berlin.de.
> > > For more options, visit [6]https://groups.google.com/d/optout.
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "Puppet Users" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to [7]puppet-users...@googlegroups.com.
> > To view this discussion on the web visit
> [8]https://groups.google.com/d/msgid/puppet-users/20141106172204.GA26122%40iniquitous.heresiarch.ca.
> > For more options, visit [9]https://groups.google.com/d/optout.
>
> References
>
> Visible links
> 1. http://puppetmasterlab.domain.com/
> 2. javascript:
> 3. http://cwood.com/
> 4. javascript:
> 5. https://groups.google.com/d/msgid/puppet-users/545B3130.90007%40alumni.tu-berlin.de
> 6. https://groups.google.com/d/optout
> 7. javascript:
> 8. https://groups.google.com/d/msgid/puppet-users/20141106172204.GA26122%40iniquitous.heresiarch.ca
> 9. https://groups.google.com/d/optout

Felix Frank

unread,
Jan 9, 2015, 8:39:28 AM1/9/15
to puppet...@googlegroups.com
Randomizing the time of logrotation as a workaround for this particular
issue seems drastic to me.

https://docs.puppetlabs.com/references/latest/configuration.html#splay

This may do the trick.

Cheers,
Felix

Christopher Wood

unread,
Jan 9, 2015, 9:20:07 AM1/9/15
to puppet...@googlegroups.com
It's only drastic if you're worried about not having your machines' logs in the same log file at the same time (because you log locally). 100 machines is as good a time to start logging non-locally as any.

If anybody didn't want to get that drastic, they could always $rotateminute = 0 + fqdn_rand(10, "$::fqdn puppet log rotation") but of course that only scales so high.

Digging in further, the logrotate fragment sends a sigusr2 to a puppet master but then runs /etc/init.d/puppet reload for the agent, which sends a sighup to the puppet process.

Rather than messing with splay time or logrotate time, perhaps it's better to modify the logrotate stanza to send the puppet agent a sigusr2 rather than a sighup? Or is it better to restart the agent nightly regardless? Or break those two jobs out, considering the effect on a puppet master of a large fleet checking in? There's probably a stack of YMMV here.

https://docs.puppetlabs.com/references/latest/man/agent.html#DIAGNOSTICS
> --
> You received this message because you are subscribed to the Google Groups "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/54AFD9FF.2010606%40alumni.tu-berlin.de.
Reply all
Reply to author
Forward
0 new messages