execution expired with a puppetmaster with more then enough resources

758 views
Skip to first unread message

Philipp Dallig

unread,
Nov 1, 2014, 4:58:00 PM11/1/14
to puppet...@googlegroups.com
Hi,
Nodes with a filled catalog runs from time to time in: Could not retrieve catalog from remote server: execution expired
In my opinion i have a strong puppetmaster.
Some Facts:
max memory: 4 GB RAM (in average more then 2 GB free)
load average: 0,14
4 vCPU
visualization with lxc
OS: CentOS7
HostOS: CentOS 7
Puppetmaster Version: 3.7
Puppet Agents Version: 3.4+
Passenger-Version: 4.0.50
PuppetDB in the same Container

I have Foreman as an ENC, Foreman runs on the same Host in an other LXC Container.

Postgres 9.1 for Foreman und PuppetDB is on an other Host.

At the moment i have no more ideas, how i can solve this problem.

Maybe a passenger-memory-stats report help for a solution



---------- Apache processes ----------
PID    PPID  VMSize     Private  Name
--------------------------------------
1571   1     126.6 MB   4.1 MB   /usr/sbin/httpd -f /www/conf/httpd/puppetmaster.mycompany.com/httpd.conf -k start
15954  1571  2006.7 MB  6.8 MB   /usr/sbin/httpd -f /www/conf/httpd/puppetmaster.mycompany.com/httpd.conf -k start
15955  1571  2006.7 MB  6.9 MB   /usr/sbin/httpd -f /www/conf/httpd/puppetmaster.mycompany.com/httpd.conf -k start
15957  1571  2006.7 MB  6.8 MB   /usr/sbin/httpd -f /www/conf/httpd/puppetmaster.mycompany.com/httpd.conf -k start
15959  1571  2006.7 MB  6.8 MB   /usr/sbin/httpd -f /www/conf/httpd/puppetmaster.mycompany.com/httpd.conf -k start
15966  1571  2006.7 MB  6.8 MB   /usr/sbin/httpd -f /www/conf/httpd/puppetmaster.mycompany.com/httpd.conf -k start
### Processes: 6
### Total private dirty RSS: 38.18 MB


-------- Nginx processes --------

### Processes: 0
### Total private dirty RSS: 0.00 MB


----- Passenger processes -----
PID    VMSize    Private  Name
-------------------------------
15936  212.2 MB  0.3 MB   PassengerWatchdog
15939  879.7 MB  1.0 MB   PassengerHelperAgent
15945  215.4 MB  0.9 MB   PassengerLoggingAgent
16119  465.8 MB  79.8 MB  Passenger RackApp: /www/sites/puppetmaster.mycompany.com/puppetmasterd
16126  465.9 MB  78.4 MB  Passenger RackApp: /www/sites/puppetmaster.mycompany.com/puppetmasterd
16135  466.0 MB  82.9 MB  Passenger RackApp: /www/sites/puppetmaster.mycompany.com/puppetmasterd
16144  530.9 MB  96.9 MB  Passenger RackApp: /www/sites/puppetmaster.mycompany.com/puppetmasterd
### Processes: 7
### Total private dirty RSS: 340.24 MB



If someone need more informations, let me know.


Felix Frank

unread,
Nov 10, 2014, 3:59:49 AM11/10/14
to puppet...@googlegroups.com
Hi,

how long does it take to compile one catalog? Examine the puppetmaster
logfile.

How many agents are there and how frequently do they check in?

Depending on the scale of your agent network, 4 CPUs may actually be too
few. The master is almost always CPU bound.

Cheers,
Felix
> --
> You received this message because you are subscribed to the Google
> Groups "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to puppet-users...@googlegroups.com
> <mailto:puppet-users...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-users/9748f970-59fd-4621-8944-84bae189072d%40googlegroups.com
> <https://groups.google.com/d/msgid/puppet-users/9748f970-59fd-4621-8944-84bae189072d%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

Philipp Dallig

unread,
Nov 16, 2014, 3:48:27 PM11/16/14
to puppet...@googlegroups.com
Hi Felix,

Catalog compile time is around 10 seconds
One Node:
Nov 14 14:25:20 puppetmaster01 puppet-master[6884]: Compiled catalog for test-www01-xxx-com.xxxxx.net in environment test in 6.24 seconds

On my search i also find this curious behavior. Any ideas why?
Nov 15 22:08:00 puppetmaster01 puppet-master[27680]: Compiled catalog for rpm-repo.xxxxxx.com in environment production in 2.94 seconds
Nov 15 22:40:00 puppetmaster01 puppet-master[27680]: Compiled catalog for rpm-repo.xxxxxx.com in environment production in 122.44 seconds
Nov 15 23:07:59 puppetmaster01 puppet-master[27659]: Compiled catalog for rpm-repo.xxxxxx.com in environment production in 2.18 seconds

I also found this message in log file. Possible a PuppetDB performance problem?
Nov 16 00:41:58 puppetmaster01 puppet-master[27659]: Could not retrieve resources from the PuppetDB at puppetmaster01.xxxxxxx.com:8081: Net::ReadTimeout on node rpm-repo.xxxxxxxxx.com

CPU load average: 0,19, 0,18, 0,15 (values from top at the moment)

At the moment i have around 200 nodes (most nodes with empty catalog). Every node connect every 30 minutes to puppet master.

In last days i play a little bit with puppet-load.rb (found in /usr/share/puppet/ext). 35 nodes with 10 repeats give a good availability of the puppetmaster service.

greetings
Philipp

Felix Frank

unread,
Nov 22, 2014, 2:56:47 PM11/22/14
to puppet...@googlegroups.com
On 11/16/2014 09:48 PM, Philipp Dallig wrote:
>
> I also found this message in log file. Possible a PuppetDB performance
> problem?
> Nov 16 00:41:58 puppetmaster01 puppet-master[27659]: Could not
> retrieve resources from the PuppetDB at
> puppetmaster01.xxxxxxx.com:8081: Net::ReadTimeout on node
> rpm-repo.xxxxxxxxx.com

> At the moment i have around 200 nodes (most nodes with empty catalog).
> Every node connect every 30 minutes to puppet master.

Hi,

well with 200 nodes at a 30 minute interval, you should be fine. The
4*1800 CPU seconds that are available should be plenty for this node
count, as long as compile times don't rise to ~ 40 seconds.

Do the PuppetDB errors coincide with the overly long compiler runs?

FWIW, the read timeout does not feel like a performance issue to me, but
rather like a transient connectivity issue (packet loss?). This is just
a shot in the dark though - I have next to no experience operating a
PuppetDB instance.

Are the overlong compiler runs always exactly 120 seconds above the norm?

Thanks,
Felix

Philipp Dallig

unread,
Nov 23, 2014, 4:31:40 AM11/23/14
to puppet...@googlegroups.com
Hi Felix,

in PuppetDB log i found no Errors.

At last Friday i add a monitor server node. This node collects a lot of exported resources (Access to PuppetDB is necessary). This node has also lot of execution expired erros.

Parsing the Puppetmaster log i found this:
Nov 23 04:04:01 puppetmaster01 puppet-master[20815]: Compiled catalog for xymon.xxxxx.com in environment production in 1.16 seconds
Nov 23 04:36:01 puppetmaster01 puppet-master[20815]: Compiled catalog for xymon.xxxxx.com in environment production in 121.07 seconds
Nov 23 05:04:01 puppetmaster01 puppet-master[20815]: Compiled catalog for xymon.xxxxx.com in environment production in 0.83 seconds
Nov 23 05:38:01 puppetmaster01 puppet-master[32120]: Compiled catalog for xymon.xxxxx.com in environment production in 241.22 seconds
Nov 23 06:04:01 puppetmaster01 puppet-master[2223]: Compiled catalog for xymon.xxxxx.com in environment production in 1.01 seconds
Nov 23 06:34:01 puppetmaster01 puppet-master[32120]: Compiled catalog for xymon.xxxxx.com in environment production in 0.90 seconds
Nov 23 07:04:01 puppetmaster01 puppet-master[22144]: Compiled catalog for xymon.xxxxx.com in environment production in 0.89 seconds
Nov 23 07:36:01 puppetmaster01 puppet-master[21761]: Compiled catalog for xymon.xxxxx.com in environment production in 121.08 seconds
Nov 23 08:06:01 puppetmaster01 puppet-master[22144]: Compiled catalog for xymon.xxxxx.com in environment production in 121.11 seconds
Nov 23 08:34:01 puppetmaster01 puppet-master[31519]: Compiled catalog for xymon.xxxxx.com in environment production in 0.91 seconds
Nov 23 09:04:01 puppetmaster01 puppet-master[21761]: Compiled catalog for xymon.xxxxx.com in environment production in 1.03 seconds
Nov 23 09:34:02 puppetmaster01 puppet-master[21761]: Compiled catalog for xymon.xxxxx.com in environment production in 0.95 seconds
Nov 23 10:08:06 puppetmaster01 puppet-master[21761]: Compiled catalog for xymon.xxxxx.com in environment production in 241.31 seconds

If the seconds for building of the catalog bigger then 100, then i got an error message.

Paket loss? i don't think so,  but I will do some testing. I hope a postgreSQL Update will solve the problem.

Thanks for your help
Philipp

Felix Frank

unread,
Nov 23, 2014, 3:31:12 PM11/23/14
to puppet...@googlegroups.com
On 11/23/2014 10:31 AM, Philipp Dallig wrote:
> Nov 23 10:08:06 puppetmaster01 puppet-master[21761]: Compiled catalog
> for xymon.xxxxx.com in environment production in 241.31 seconds

There's some 120 second compiler runs, and some 240s. Multiples of 120
seconds? That certainly has a meaning, but I have no idea what that is.

Can you set up a webrick master on an alternate port? See if you can
reproduce with an especially problematic agent? It should then be fairly
easy to latch onto the master process with strace to get a detailed look
at just what the process is doing during those two minutes of delay.
Ruby profiling might also be a sensible option.

Good luck,
Felix

jcbollinger

unread,
Nov 24, 2014, 9:10:31 AM11/24/14
to puppet...@googlegroups.com


On Sunday, November 23, 2014 2:31:12 PM UTC-6, Felix.Frank wrote:
On 11/23/2014 10:31 AM, Philipp Dallig wrote:
> Nov 23 10:08:06 puppetmaster01 puppet-master[21761]: Compiled catalog
> for xymon.xxxxx.com in environment production in 241.31 seconds

There's some 120 second compiler runs, and some 240s. Multiples of 120
seconds? That certainly has a meaning, but I have no idea what that is.


Good call.  A distribution of compile times like that -- clustered around integer multiples of 120s -- is not natural.  It's shouting "timeout" to me, with the most likely thing to be timing out being name service lookups.  Check the master's name server configuration.  My guess would be that one of its 2-3 configured domain name servers is offline or malfunctioning.

You could test that hypothesis indirectly by performing a bunch of manual name service lookups, and seeing whether any of them suffer similar delays.  On the master, look up an affected client's name via 'nslookup' or 'dig' (or whatever client is appropriate if you're relying on a name service different from DNS).  You could test it indirectly by entering the name and address of one of the problematic clients in the master's 'hosts' file (and ensuring that it is checked first, see 'nsswitch.conf').  If that fixes the issue for that client, but others remain affected then you'll have pretty well tied it down to a name service issue.


John

Philipp Dallig

unread,
Mar 16, 2015, 5:40:35 AM3/16/15
to puppet...@googlegroups.com
Hi,
after a long time i found the mistake.

My PuppetDB was only reachable over a hardware firewall - that's all. After the change from network interfaces all works perfect. Thanks for your help.

Philipp
Reply all
Reply to author
Forward
0 new messages