We have about a dozen CentOS 6 servers being managed by puppet 3.7.4, using puppet apply with locally stored modules and node definition.
Recently,
performance on one of them has collapsed to hopeless levels, and I really don't know how to diagnose the problem.
Puppet used to take 5 - 10 minutes to run, but processing essentially the same manifest is now taking around 40 minutes.
Often, though not always, many things are being (inappropriately) skipped because "trivial" operations are timing out. Specifically
exec { $shell_exists: command => "/usr/bin/test -f '${shell}'" }
Error: Command exceeded timeout
Wrapped exception:
execution expired
Error: /Stage[main]/Users::Ii::Virtual/Users::Developer[robert]/Users::Account[robert]/Exec[check /bin/bash exists for robert]/returns: change from notrun to 0 failed: Command exceeded timeout
even though ${shell} (/bin/bash) exists, and other invocations of exactly the same command in the same run of puppet are confirming this.
In struggling to diagnose this, I have tried cutting down the content of the node by commenting things out until I arrived at the empty node definition...
node 'hostname' {
# Lines
# Commented
# Out
}
The server is currently a standby machine, so apart from Puppet (and me running "top", "tail" and the like) the server is either completely idle, or possibly receiving an rsync or running an occasional (short) cron job. Yet "top" reports CPU utilisation at close to 100%, and puppet is the only process it shows doing any significant processing. Throughout the puppet run, it responds like a heavily loaded machine, and there is nothing in system logs.
Can anyone suggest how I can go about diagnosing what is going wrong?
Robert.
Info: Loading facts
Info: Loading facts
Notice: Compiled catalog for standby1.interactive.co.uk in environment production in 6.71 seconds
Info: Applying configuration version '1426787754'
Info: /Schedule[daily]: Starting to evaluate the resource
Info: /Schedule[daily]: Evaluated in 0.00 seconds
Info: /Schedule[monthly]: Starting to evaluate the resource
Info: /Schedule[monthly]: Evaluated in 0.00 seconds
Info: /Schedule[hourly]: Starting to evaluate the resource
Info: /Schedule[hourly]: Evaluated in 0.00 seconds
Info: Stage[main]: Starting to evaluate the resource
Info: Stage[main]: Evaluated in 0.00 seconds
Info: Class[Main]: Starting to evaluate the resource
Info: Class[Main]: Evaluated in 0.00 seconds
Info: Node[standby1]: Starting to evaluate the resource
Info: Node[standby1]: Evaluated in 0.00 seconds
Info: Class[Settings]: Starting to evaluate the resource
Info: Class[Settings]: Evaluated in 0.00 seconds
Info: Class[Settings]: Starting to evaluate the resource
Info: Class[Settings]: Evaluated in 0.00 seconds
Info: Node[standby1]: Starting to evaluate the resource
Info: Node[standby1]: Evaluated in 0.00 seconds
Info: /Schedule[never]: Starting to evaluate the resource
Info: /Schedule[never]: Evaluated in 0.00 seconds
Info: /Filebucket[puppet]: Starting to evaluate the resource
Info: /Filebucket[puppet]: Evaluated in 0.00 seconds
Info: /Schedule[weekly]: Starting to evaluate the resource
Info: /Schedule[weekly]: Evaluated in 0.00 seconds
Info: /Schedule[puppet]: Starting to evaluate the resource
Info: /Schedule[puppet]: Evaluated in 0.00 seconds
Info: Class[Main]: Starting to evaluate the resource
Info: Class[Main]: Evaluated in 0.00 seconds
Info: Stage[main]: Starting to evaluate the resource
Info: Stage[main]: Evaluated in 0.00 seconds
Notice: Finished catalog run in 1119.24 seconds
Try running puppet with "--debug" and "--evaltrace" to see where it's taking the time.
I'd be looking at DNS as that is often the culprit for unexplained things.
Try running puppet with "--debug" and "--evaltrace" to see where it's taking the time.
I'd be looking at DNS as that is often the culprit for unexplained things.
--
Try running puppet with "--debug" and "--evaltrace" to see where it's taking the time.
On Sat, Mar 21, 2015 at 12:08:07PM -0700, Robert Inder wrote:
> This is not quite the behaviour I want.
>
> But how can I stop it?
In your place I'd probably just purge stale sessions with a cron job. Any particular reason why puppet's tidy is better than find->rm?