Jira (PUP-5431) Puppet restarts services at shutdown

2 views
Skip to first unread message

Ioannis (JIRA)

unread,
Oct 27, 2015, 9:23:05 AM10/27/15
to puppe...@googlegroups.com
Ioannis created an issue
 
Puppet / Bug PUP-5431
Puppet restarts services at shutdown
Issue Type: Bug Bug
Assignee: Kylo Ginsberg
Components: Client
Created: 2015/10/27 6:22 AM
Environment:

$ uname -a
Linux node1 2.6.32-504.el6.x86_64 #1 SMP Tue Sep 16 01:56:35 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
LSB Version:	:base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID:	RedHatEnterpriseServer
Description:	Red Hat Enterprise Linux Server release 6.6 (Santiago)
Release:	6.6
Codename:	Santiago
$ puppet --version
3.3.2

Priority: Major Major
Reporter: Ioannis

When the node is shutting down, puppet daemon is one of the first services being stopped. This make sense since puppet ensures other services are running. But the daemon process is forking another puppet agent process to apply the catalog and the initscript is only stopping the parent daemon. When the daemon periodically starts the agent, it can happens that during system shutdown, the daemon is stopped but the child continues to apply the catalog.

It's easy to reproduce by terminating the service while a catalog is being applied.

Oct 27 13:01:32 node2 puppet-agent[30677]: Starting Puppet client version 3.3.2
Oct 27 13:01:38 node2 puppet-agent[30677]: Caught TERM; calling stop
Oct 27 13:02:03 node2 puppet-agent[30680]: Finished catalog run in 16.43 seconds

So while services are being shutdown, puppet brings them up again to be killed ungracefully again by init. This has major implications when there's an clustering software running. While services have been properly de-registered from the cluster, puppet restarts services which then are marked as in a faulty state for the cluster.

A quick fix could be to update the initscript to also terminate the child process.

diff -Nur a/puppet b/puppet
--- a/puppet	2015-10-27 10:35:54.011661982 +0000
+++ b/puppet	2015-10-27 10:37:05.601287405 +0000
@@ -50,10 +50,21 @@
 
 stop() {
     echo -n $"Stopping puppet agent: "
+    # Get the daemon pid if exists
+    [[ -f $pidfile ]] && daemonpid=$(cat $pidfile)
     killproc $pidopts $puppetd
     RETVAL=$?
     echo
-    [ $RETVAL = 0 ] && rm -f ${lockfile} ${pidfile}
+    if [ $RETVAL = 0 ]; then
+        # Daemon is dead, clean up child processes and lock files
+	if [[ -n $daemonpid ]]; then
+            pkill -TERM -P $daemonpid || :
+            sleep 1 # grace period
+            pkill -KILL -P $daemonpid || :
+        fi
+        rm -f ${lockfile} ${pidfile}
+    fi
+
 }

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.4.11#64026-sha1:78f6ec4)
Atlassian logo

Ioannis (JIRA)

unread,
Oct 27, 2015, 10:44:04 AM10/27/15
to puppe...@googlegroups.com
Ioannis updated an issue
Change By: Ioannis
When the node is shutting down, puppet daemon is one of the first services being stopped. This make sense since puppet ensures other services are running. But the daemon process is forking another puppet agent process to apply the catalog and the initscript is only stopping the parent daemon. When the daemon periodically starts the agent, it can happens that during system shutdown, the daemon is stopped but the child continues to apply the catalog.

It's easy to reproduce by terminating the service while a catalog is being applied.
{code}

Oct 27 13:01:32 node2 puppet-agent[30677]: Starting Puppet client version 3.3.2
Oct 27 13:01:38 node2 puppet-agent[30677]: Caught TERM; calling stop
Oct 27 13:02:03 node2 puppet-agent[30680]: Finished catalog run in 16.43 seconds
{code}


So while services are being shutdown, puppet brings them up again to be killed ungracefully again by init. This has major implications when there's an clustering software running. While services have been properly de-registered from the cluster, puppet restarts services which then are marked as in a faulty state for the cluster.

A quick fix could be to update the initscript to also terminate the child process.
{code}

diff -Nur a/puppet b/puppet
--- a/puppet 2015-10-27 10:35:54.011661982 +0000
+++ b/puppet 2015-10-27 10:37:05.601287405 +0000
@@ -50,10 +50,21 @@
 
 stop() {
     echo -n $"Stopping puppet agent: "
+    # Get the daemon pid if exists
+    [[ -f $pidfile ]] && daemonpid=$(cat $pidfile)
     killproc $pidopts $puppetd
     RETVAL=$?
     echo
-    [ $RETVAL = 0 ] && rm -f ${lockfile} ${pidfile}
+    if [ $RETVAL = 0 ]; then
+        # Daemon is dead, clean up child processes and lock files
+ if [[ -n $daemonpid ]]; then
+            pkill -TERM -P $daemonpid || :
+            sleep 1 # grace period
+            pkill -KILL -P $daemonpid || :
+        fi
+        rm -f ${lockfile} ${pidfile}
+    fi
+
 }
{code}

Maybe a better approach would be for the daemon to terminate its child processes.

Ioannis (JIRA)

unread,
Oct 29, 2015, 6:18:03 AM10/29/15
to puppe...@googlegroups.com
Ioannis updated an issue
When the node is shutting down, puppet daemon is one of the first services being stopped. This make sense since puppet ensures other services are running. But the daemon process is forking another puppet agent process to apply the catalog and the initscript is only stopping the parent daemon. When the daemon periodically starts the agent, it can happens that during system shutdown, the daemon is stopped but the child continues to apply the catalog.

It's easy to reproduce by terminating the service while a catalog is being applied.
{code}
Oct 27 13:01:32 node2 puppet-agent[30677]: Starting Puppet client version 3.3.2
Oct 27 13:01:38 node2 puppet-agent[30677]: Caught TERM; calling stop
Oct 27 13:02:03 node2 puppet-agent[30680]: Finished catalog run in 16.43 seconds
{code}

So while services are being shutdown, puppet brings them up again to be killed ungracefully again by init. This has major implications when there's an clustering software running. While services have been properly de-registered from the cluster, puppet restarts services which then are marked as in a faulty state for the cluster.

A quick fix could be to update the initscript to also terminate the child process.
{code}
diff -Nur a/puppet b/puppet
--- a/puppet 2015-10-27 10:35:54.011661982 +0000
+++ b/puppet 2015-10- 27 29  10: 37 15 : 05 59 . 601287405 129899473  +0000
@@ -50,
10 6  +50, 21 9  @@

 
 stop() {
     echo -n $"Stopping puppet agent: "
+    # Get the daemon pid if exists  and signal any child process
+    [[ -f $pidfile ]] && daemonpid=$(cat $pidfile)
     killproc $pidopts $puppetd
     RETVAL=$?
     echo
-    [ $RETVAL = 0 ] && rm -f ${lockfile} ${pidfile}
+     if  [  $RETVAL = 0 ]; then
+        # Daemon is dead, clean up child processes and lock files
+ if
 [ [  -n $daemonpid ]]
; then
+
 &&             pkill - TERM - P $daemonpid || :
+            sleep 1 # grace period
+            pkill -KILL -P      killproc  $ daemonpid || : pidopts $puppetd
+        fi
+        rm -f
     RETVAL=  $ {lockfile} ${pidfile} ?
+    fi      echo
+
 }
{code}

Maybe a better approach would be for the daemon to terminate its child processes.

Ioannis (JIRA)

unread,
Oct 29, 2015, 6:29:03 AM10/29/15
to puppe...@googlegroups.com
Ioannis updated an issue
When the node is shutting down, puppet daemon is one of the first services being stopped. This make sense since puppet ensures other services are running. But the daemon process is forking another puppet agent process to apply the catalog and the initscript is only stopping the parent daemon. When the daemon periodically starts the agent, it can happens that during system shutdown, the daemon is stopped but the child continues to apply the catalog.

It's easy to reproduce by terminating the service while a catalog is being applied.
{code}
Oct 27 13:01:32 node2 puppet-agent[30677]: Starting Puppet client version 3.3.2
Oct 27 13:01:38 node2 puppet-agent[30677]: Caught TERM; calling stop
Oct 27 13:02:03 node2 puppet-agent[30680]: Finished catalog run in 16.43 seconds
{code}

So while services are being shutdown, puppet brings them up again to be killed ungracefully again by init. This has major implications when there's  an  a  clustering software running. While  services have been  the HA manager has  properly  de-registered  deregistered services  from the cluster, puppet restarts  services  the HA manager  which then  are marked  re-register services  as  they're being forcefully killed by init, causing them  in  a  turn be marked in  faulty state  for  in  the cluster.

A quick fix could be to update the initscript to
 also  attempt  terminate  the  any existing  child process.

{code}
diff -Nur a/puppet b/puppet
--- a/puppet 2015-10-27 10:35:54.011661982 +0000
+++ b/puppet 2015-10-29 10:15:59.129899473 +0000
@@ -50,6 +50,9 @@

 
 stop() {
     echo -n $"Stopping puppet agent: "
+    # Get the daemon pid if exists and signal any child process
+    [[ -f $pidfile ]] && daemonpid=$(cat $pidfile)
+    [[ -n $daemonpid ]] && pkill -P $daemonpid || :

     killproc $pidopts $puppetd
     RETVAL=$?
     echo
{code}

Maybe Perhaps  a better approach would be for the daemon to  terminate  signal itself  its  own  child processes  and wait for them to terminate instead of leaving them orphan .

Branan Riley (JIRA)

unread,
May 15, 2017, 4:01:03 PM5/15/17
to puppe...@googlegroups.com
Branan Riley assigned an issue to Unassigned
Change By: Branan Riley
Assignee: Kylo Ginsberg
This message was sent by Atlassian JIRA (v6.4.14#64029-sha1:ae256fe)
Atlassian logo

Branan Riley (JIRA)

unread,
May 15, 2017, 4:02:02 PM5/15/17
to puppe...@googlegroups.com
Branan Riley updated an issue
Change By: Branan Riley
Labels: needs_repro triaged

Branan Riley (JIRA)

unread,
May 15, 2017, 4:03:02 PM5/15/17
to puppe...@googlegroups.com
Branan Riley commented on Bug PUP-5431
 
Re: Puppet restarts services at shutdown

I believe we fixed an issue with how the daemon process terminates the application process within the past couple of years (or at least, it sounds vagely familiar to me).

Can you confirm if this is still an issue in the latest puppet-agent release?

Jacob Helwig (JIRA)

unread,
Dec 5, 2017, 11:47:03 AM12/5/17
to puppe...@googlegroups.com
Jacob Helwig updated an issue
 
Change By: Jacob Helwig
Sub-team: Coremunity
This message was sent by Atlassian JIRA (v7.0.2#70111-sha1:88534db)
Atlassian logo

Josh Cooper (Jira)

unread,
Jun 10, 2020, 12:08:03 PM6/10/20
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-5431
 
Re: Puppet restarts services at shutdown

This is still an issue due to puppet forking a child to apply the catalog. Note this issue is in conflict with PUP-3931. If we kill all child processes, then it wouldn't be possible to use puppet to upgrade itself. This is why we set KillMode=process, see https://github.com/puppetlabs/puppet/commit/6609c9a65ffbe94e5e3b5f184da1092d95a54eb9. I'm going to close this as won't fix.

This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages