Re: Could not run Puppet configuration client: execution expired

3,081 views
Skip to first unread message

jcbollinger

unread,
Jun 20, 2012, 9:18:17 AM6/20/12
to puppet...@googlegroups.com


On Wednesday, June 20, 2012 5:35:39 AM UTC-5, Kmbu wrote:
Hello,

I'm running Puppet 2.7.6 and currently expanding the number of servers managed by Puppet. At around the 160-170 host mark (with a 5-minute run interval + splay), my puppetmaster server is starting to die. Is this normal?

If you're using the master's built-in WEBrick based server, then yes, it's normal.  In fact, you're getting pretty high throughput.  The default run interval is six times longer (30 minutes), and at that interval people sometimes report problems by the time they get to the number of hosts you are successfully supporting.

The next level of Puppet scaling is to run Puppet inside a 'real' web server, with Apache + passenger being the usual choice.  That could increase your capacity by several times, depending on the host hardware and the characteristics of your manifests.
 
I've already gone through a few threads mentioning this same error, but it's still not clear to me where the failure is happening exactly (I'm thinking as the agent tries to send the report to the puppetmaster?) or whether anything can be done about it. The puppetmasterd process seems to be using 90-100% of one CPU, but there is another one.

The puppetmaster is not natively multithreaded, but running it via passenger will allow multiple instances to operate in parallel.  If you're maxing out one of two available CPUs, then that would probably give you about a factor of two improvement, provided enough RAM is available.

Are there any parameters I can tweak on the puppetmaster server to help it handle the load better?

See above.  Quicker and easier, however, would be to lengthen the clients' run interval.  Five minutes is very short.


John

Kmbu

unread,
Jun 20, 2012, 10:52:21 AM6/20/12
to puppet...@googlegroups.com

Thanks for your support. Please find my responses below.


On Wednesday, 20 June 2012 15:18:17 UTC+2, jcbollinger wrote:


On Wednesday, June 20, 2012 5:35:39 AM UTC-5, Kmbu wrote:
Hello,

I'm running Puppet 2.7.6 and currently expanding the number of servers managed by Puppet. At around the 160-170 host mark (with a 5-minute run interval + splay), my puppetmaster server is starting to die. Is this normal?

If you're using the master's built-in WEBrick based server, then yes, it's normal.  In fact, you're getting pretty high throughput.  The default run interval is six times longer (30 minutes), and at that interval people sometimes report problems by the time they get to the number of hosts you are successfully supporting.

The next level of Puppet scaling is to run Puppet inside a 'real' web server, with Apache + passenger being the usual choice.  That could increase your capacity by several times, depending on the host hardware and the characteristics of your manifests.

I'm already running apache/passenger. Is there anything else I can do?
 
 
I've already gone through a few threads mentioning this same error, but it's still not clear to me where the failure is happening exactly (I'm thinking as the agent tries to send the report to the puppetmaster?) or whether anything can be done about it. The puppetmasterd process seems to be using 90-100% of one CPU, but there is another one.

The puppetmaster is not natively multithreaded, but running it via passenger will allow multiple instances to operate in parallel.  If you're maxing out one of two available CPUs, then that would probably give you about a factor of two improvement, provided enough RAM is available.

Are there any parameters I can tweak on the puppetmaster server to help it handle the load better?

See above.  Quicker and easier, however, would be to lengthen the clients' run interval.  Five minutes is very short.

Yeah. I guess I can do that, but my number of servers is going to go up quickly very fast. I expect to have over 1000 servers within the next week or two. I hope it can hold. I'm also looking into load-balancing options.
 


John

Jake - USPS

unread,
Jun 20, 2012, 11:39:00 AM6/20/12
to puppet...@googlegroups.com
Check my reply in  https://groups.google.com/forum/?fromgroups#!searchin/puppet-users/USPS/puppet-users/q3bFvenGueI/hQExZ1X7pcwJ 

I'll add that we do loadbalance across multiple puppetmasters.  At first we were using DNS round-robin to do it, and now use haproxy which a good article on utilizing was written up not too long ago http://blog.ronvalente.net/blog/2012/05/19/puppet/.

Regards,
Jake

Felix Frank

unread,
Jun 20, 2012, 11:44:19 AM6/20/12
to puppet...@googlegroups.com
On 06/20/2012 05:39 PM, Jake - USPS wrote:
> I'll add that we do loadbalance across multiple puppetmasters. At first
> we were using DNS round-robin to do it, and now use haproxy which a good
> article on utilizing was written up not too long
> ago http://blog.ronvalente.net/blog/2012/05/19/puppet/.

Fascinating. What I don't get is: This describes an active/passive setup
(note how one server is configured as 'backup' in haproxy), yet speaks
of loadbalancing.

Would you elaborate on how you manage to go active/active (especially
seeing as DRBD is involved)?

Thanks,

Felix

Jake - USPS

unread,
Jun 20, 2012, 12:45:40 PM6/20/12
to puppet...@googlegroups.com
Sorry ... I didn't really follow that link in my setup ... just thought it would be a good reference as I don't have anything documented.  Another good thing to note is that article uses NGINX.

I'm cheating and am not FULLY redundant.  We have a single CA PM that is not balanced against or redundant for other 'workhorse' PMs.  And I don't worry about CRLs right now.  :(

It's something I want to get addressed, but not a priority atm.

As for doing an active/active the way I am, my haproxy looks similar to the one in the link except no lines have 'backup' in them.

Again, no DRBD which is mainly for the cert stuff and that is all handled by a single node (I set ca_server to my CA PM on all nodes).  But, maybe you could have a shared NFS/storage for the certificates to address it?

Regards,
Jake

Felix Frank

unread,
Jun 21, 2012, 5:22:38 AM6/21/12
to puppet...@googlegroups.com
Hi,

I see. So I assume your manifests are in NFS (or some other shared
storage) as well?
Is there anything else I should be wary of when putting my masters
behind HAproxy this way?

Thanks,
Felix

jcbollinger

unread,
Jun 21, 2012, 9:54:59 AM6/21/12
to puppet...@googlegroups.com


On Wednesday, June 20, 2012 9:52:21 AM UTC-5, Kmbu wrote:

[...]


I'm already running apache/passenger. Is there anything else I can do?

It is suspicious that you say you are using passenger, but the workload is not being spread over both CPUs.  Make sure your clients are accessing the puppetmaster via apache, and not some standalone puppetmaster.  One way to do that is to make sure no standalone puppetmaster is running in the first place (passenger will start its own puppetmaster instances as needed).  In any case, the logs should reveal what master process is servicing your clients.

If all your clients are indeed going through apache, then perhaps you have a configuration problem on the apache / passenger side.  You will find a lot of advice on it in this group and elsewhere.  Or maybe RAM is your limiting resource.  Throughput will really tank when the master(s) run out of physical RAM and start swapping to disk, and all that I/O could lead to CPU idle time that you wouldn't otherwise see.

Once you get both CPUs loaded up, the next level of scaling is higher-capacity hardware, load balancing across multiple masters, or both.  Just two cores is pretty wimpy these days, so I'd look first at moving to better hardware.  Is the master running in a VM?  In that case there might be some improvement available from running it directly on a physical machine, or else it should be easy to assign more cores and / or more RAM to it.

Alternatively, you can accommodate more clients by reducing the work required to support each one.  Some ways to do that are
  • lengthening the interval between agent runs
  • minimizing the number of managed resources
  • using a lighter-weight checksum method (md5lite, mtime, etc.) for managed File content
Whether any of those are viable depends on your requirements for nodes.

John

Jake - USPS

unread,
Jun 21, 2012, 10:40:41 AM6/21/12
to puppet...@googlegroups.com
I'm going to admit I don't have the best setup here!  :)  There are things I want to do to improve what I have now, just need to get room on the plate to do it.

Right now my 'CA PM' is also a PM for the other PMs ... :-\  So my PMs that are behind HAproxy have a puppet::master class assigned to them and have their modules directory as a managed resource and are force to have in puppet.conf server='CA PM' instead of the VIP I used for everything else.  This then gets updated from the 'CA PM'.  Basically I deploy updates to a special dir on my 'CA PM' and then all my other PM will receive the updates from there.

This works for us although like I said I want to make it better, doing what you assumed I am doing ... shared storage.  But since we can only make changes with a CHG ticket I basically make the update and then force a puppet run on my PMs (remote execution) and everything is updated in like 5 minutes.  This is done during a time when the rest of the environment is not accessing the PMs.

But yes, the way I am doing it now could cause issues.  If I updated on my CA PM and then didn't follow up on my other PMs they could get out of sync.  Then when an agent is accessing the VIP it would go to perhaps an updated PM initially and throughout the puppet run go between different nodes, some potentially updated others not and could cause issues.

The only other issue I've ran into is if apache on a PM restarts or a PM restarts while agents are accessing it sometimes I'll get failed runs.  Out of 4800+ systems this usually amounts to like ~200 failures until the next batch of runs (every 30 minutes here) which clears it up (even if apache/node still down).  I'm not sure if this is a limitation of something I am doing, or if its just to be expected.  Before using haproxy I had a VIP in DNS that would round robin between systems.  Doing that I would get like ~1000 failures under such a situation as DNS doesn't know when a node goes down, and that would continue until everything was back up.

So since what I have isn't bullet proof I don't have anything documented ... but eventually ... :)

Regards,
Jake

Kmbu

unread,
Jun 21, 2012, 11:46:51 AM6/21/12
to puppet...@googlegroups.com

Wow! I think you're right. I've set up Apache/Passenger but I use it for Dashboard, not the puppetmaster itself :-) Let me see if I can push my luck. Is there a quick guide to moving Puppet to Passenger when Apache/Passenger are already in place? Thanks a bunch, John.

jcbollinger

unread,
Jun 21, 2012, 5:25:29 PM6/21/12
to puppet...@googlegroups.com


On Thursday, June 21, 2012 10:46:51 AM UTC-5, Kmbu wrote:

I've set up Apache/Passenger but I use it for Dashboard, not the puppetmaster itself :-) Let me see if I can push my luck. Is there a quick guide to moving Puppet to Passenger when Apache/Passenger are already in place?

As far as I know, there isn't much special to distinguish that case.  The first step in performing a Puppet / passenger installation is to get the puppetmaster running standalone, and the next is to get Apache / passenger installed.  It sounds like you're ready to move on from there.

Google can help you find some docs on the Puppetlabs site, but they're a bit dated.  As an alternative, this looks like a pretty good description of the whole process: http://www.tomhayman.co.uk/linux/install-puppet-modpassenger-mysql-stored-procs-centos-6-rhel6/ (don't overlook parts 2 and 3 of that series, where you'll find most of the details of interest to you at this point).


John

Felix Frank

unread,
Jun 22, 2012, 5:09:04 AM6/22/12
to puppet...@googlegroups.com
On 06/21/2012 04:40 PM, Jake - USPS wrote:
>
> This works for us although like I said I want to make it better, doing
> what you assumed I am doing ... shared storage. But since we can only
> make changes with a CHG ticket I basically make the update and then
> force a puppet run on my PMs (remote execution) and everything is
> updated in like 5 minutes. This is done during a time when the rest of
> the environment is not accessing the PMs.

This all doesn't nearly sound as bad to me as it may feel to you right
now :-)

> The only other issue I've ran into is if apache on a PM restarts or a PM
> restarts while agents are accessing it sometimes I'll get failed runs.
> Out of 4800+ systems this usually amounts to like ~200 failures until
> the next batch of runs (every 30 minutes here) which clears it up (even
> if apache/node still down). I'm not sure if this is a limitation of
> something I am doing, or if its just to be expected. Before using

There are some HAproxy options you can look into that may help you:
- redispatch: should allow HAproxy to redirect a compilation request to
another master if the original target apache won't respond
- disable-on-404: build a health check into your apache, make it
generate 404 for a while before actually stopping the process. haproxy
stops opening new sessions with this apache instance.

There are more things that may improve this situation - HAproxy is
really quite powerful where HTTP is in use.

Best,
Felix

Kmbu

unread,
Jun 22, 2012, 5:31:02 AM6/22/12
to puppet...@googlegroups.com
Great reference. I ran into the next hurdle, however. It seems from the apache logs that it thinks Puppet is installed in the standard locations (/etc/puppet, /var/lib/puppet). This is not true for our environment. Where can I set this up?? This is what I see in the apache error log:

[Fri Jun 22 11:13:23 2012] [notice] Graceful restart requested, doing restart
[Fri Jun 22 11:13:23 2012] [notice] Digest: generating secret for digest authentication ...
[Fri Jun 22 11:13:23 2012] [notice] Digest: done
[Fri Jun 22 11:13:24 2012] [notice] Apache/2.2.21 (Unix) mod_ssl/2.2.21 OpenSSL/1.0.0e DAV/2 Phusion_Passenger/3.0.7 configured -- resuming normal operations
Could not prepare for execution: Got 2 failure(s) while initializing: change from absent to directory failed: Could not set 'directory on ensure: Permission denied - /etc/puppet; change from absent to directory failed: Could not set 'directory on ensure: Permission denied - /var/lib/puppet
 

John

Reply all
Reply to author
Forward
0 new messages