|
When the node is shutting down, puppet daemon is one of the first services being stopped. This make sense since puppet ensures other services are running. But the daemon process is forking another puppet agent process to apply the catalog and the initscript is only stopping the parent daemon. When the daemon periodically starts the agent, it can happens that during system shutdown, the daemon is stopped but the child continues to apply the catalog.
It's easy to reproduce by terminating the service while a catalog is being applied.
Oct 27 13:01:32 node2 puppet-agent[30677]: Starting Puppet client version 3.3.2
|
Oct 27 13:01:38 node2 puppet-agent[30677]: Caught TERM; calling stop
|
Oct 27 13:02:03 node2 puppet-agent[30680]: Finished catalog run in 16.43 seconds
|
So while services are being shutdown, puppet brings them up again to be killed ungracefully again by init. This has major implications when there's an clustering software running. While services have been properly de-registered from the cluster, puppet restarts services which then are marked as in a faulty state for the cluster.
A quick fix could be to update the initscript to also terminate the child process.
diff -Nur a/puppet b/puppet
|
--- a/puppet 2015-10-27 10:35:54.011661982 +0000
|
+++ b/puppet 2015-10-27 10:37:05.601287405 +0000
|
@@ -50,10 +50,21 @@
|
|
stop() {
|
echo -n $"Stopping puppet agent: "
|
+ # Get the daemon pid if exists
|
+ [[ -f $pidfile ]] && daemonpid=$(cat $pidfile)
|
killproc $pidopts $puppetd
|
RETVAL=$?
|
echo
|
- [ $RETVAL = 0 ] && rm -f ${lockfile} ${pidfile}
|
+ if [ $RETVAL = 0 ]; then
|
+ # Daemon is dead, clean up child processes and lock files
|
+ if [[ -n $daemonpid ]]; then
|
+ pkill -TERM -P $daemonpid || :
|
+ sleep 1 # grace period
|
+ pkill -KILL -P $daemonpid || :
|
+ fi
|
+ rm -f ${lockfile} ${pidfile}
|
+ fi
|
+
|
}
|
|