12% of my puppet clients -- Could not retrieve catalog from remote server: execution expired

1,265 views
Skip to first unread message

Tim Lank

unread,
May 8, 2012, 8:35:34 AM5/8/12
to Puppet Users
how do I troubleshoot this error that occurs for about 12% of the
puppet clients (~70 out of ~550.)

Arnau Bria

unread,
May 8, 2012, 9:03:13 AM5/8/12
to puppet...@googlegroups.com
do they run as daemon?
always the 70 same hosts are failling?
do they run at same time?

Cheers,
Arnau

Tim Lank

unread,
May 8, 2012, 10:59:55 AM5/8/12
to puppet...@googlegroups.com
they do run as a daemon
pretty much always the same 70 and they don't all run at the same
time. Many do, but not all.
> --
> You received this message because you are subscribed to the Google Groups "Puppet Users" group.
> To post to this group, send email to puppet...@googlegroups.com.
> To unsubscribe from this group, send email to puppet-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
>

Steve Shipway

unread,
May 9, 2012, 5:45:31 AM5/9/12
to puppet...@googlegroups.com
Not sure if it is the same issue, but we had a lot of timeout errors for catalogue retrieval once we started getting to the 200 nodes/hour point. We changed puppet to be every 2 hours, and all was well, until we had 450 nodes (again, 200/hr) and the problem resurfaced. I take it to be some limitation in the puppet system.

Now we've just finished installing a fully distributed puppet setup, with one frontend and four backend puppetmasters. This should be able to handle 800/hr if the previous test were right, and we can expand horizontally indefinitely.

It could just be that you've reached the limit of your puppet infrastructure.

I also found that such features as storeconfigs greatly slow things down and reduce how many catalogues/hr can be served (thin storeconfigs is much better). We were advised of this limitation when we put it in, but I had to try it out myself and see...

Steve

Steve Shipway
University of Auckland ITS
UNIX Systems Design Lead
s.sh...@auckland.ac.nz
Ph: +64 9 373 7599 ext 86487


________________________________________
From: puppet...@googlegroups.com [puppet...@googlegroups.com] on behalf of Tim Lank [tim...@timlank.com]
Sent: Wednesday, 9 May 2012 2:59 a.m.
To: puppet...@googlegroups.com
Subject: Re: [Puppet Users] 12% of my puppet clients -- Could not retrieve catalog from remote server: execution expired

Jake - USPS

unread,
May 9, 2012, 8:55:31 AM5/9/12
to puppet...@googlegroups.com
I was getting timeouts before as well.  Usually had to do with apache MaxClients being reached (running apache/passenger setup) so then increased that if the system could handle some more load.  Other times it was from too much load on our puppetmasters so needed to increase # of CPU and adjust 'PassengerMaxPoolSize' in the apache config.  Finally, we also ran into 'open file' limits issues with the number of connections/sockets which would cause issues with passenger, so I had to bump that up as well (from 1024 default to 2048). 

We have ~4500 systems running every 30 minutes.  We use 4 systems with 16 cores each to support this.  The systems run with a load of around 30% right now, so really all we need is probably 2 of these systems ... but we want redundancy.

So we have ~9000/hr with this setup.  To give you an idea of run/hr and horsepower.

Regards,
Jake

Reply all
Reply to author
Forward
0 new messages