zombie child process

155 views
Skip to first unread message

Elias Abacioglu

unread,
Mar 2, 2012, 1:50:49 PM3/2/12
to puppet...@googlegroups.com
Hi,

This is the third or fourth time this happens. But puppetd gets a zombie shell childprocess and then never finishes the run.

/opt/tc-puppet/bin/ruby /opt/tc-puppet/sbin/puppetd
 \_ [sh] <defunct>

How do I begin looking on what can be wrong?
This error has appeared on both 2.7.6 and 2.7.11.

Regards,
Elias

Raboo

unread,
Mar 3, 2012, 10:09:08 AM3/3/12
to Puppet Users
> How do I begin looking on what can be wrong?
> This error has appeared on both 2.7.6 and 2.7.11.
Ok, a question to my own thread.
Is there a way to set a timeout on both how long a class/module can
run and a total timeout how long puppet can run it's catalog?
That way I will actually find out which module that gets stuck, if
there is a class timeout.

Russell Van Tassell

unread,
Mar 3, 2012, 4:59:55 PM3/3/12
to puppet...@googlegroups.com

My guess ... Just look in your puppet log to see what manifest ran last. At one of my previous engagements, we routinely had zombied processes off of puppet ... Eventually found an unclosed loop in a child process that was causing it.

--
Russell M. Van Tassell
russ...@gmail.com

This message was sent using my wireless handheld device.

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To post to this group, send email to puppet...@googlegroups.com.
To unsubscribe from this group, send email to puppet-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.

Raboo

unread,
Mar 3, 2012, 5:04:39 PM3/3/12
to Puppet Users
On 3 mar, 17:59, Russell Van Tassell <russel...@gmail.com> wrote:
> My guess ... Just look in your puppet log to see what manifest ran last. At
> one of my previous engagements, we routinely had zombied processes off of
> puppet ... Eventually found an unclosed loop in a child process that was
> causing it.
How do I see which manifest run last?

Dominik Zyla

unread,
Mar 4, 2012, 4:04:59 PM3/4/12
to puppet...@googlegroups.com
Hi,

Just take a look into your logs. You'll see something like:

(/Stage[main]/Mcollective::Server/Service[mcollective])

where `Mcollective::Server' points to `mcollective::server' subclass in the catalog. Just look for such log format and you'll know processing which manifest puppet-agent stops on.

Best,

--
Dominik Zyla
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Saturday, 3 March 2012 at 18:04, Raboo wrote:

> On 3 mar, 17:59, Russell Van Tassell <russel...@gmail.com (http://gmail.com)> wrote:
> > My guess ... Just look in your puppet log to see what manifest ran last. At
> > one of my previous engagements, we routinely had zombied processes off of
> > puppet ... Eventually found an unclosed loop in a child process that was
> > causing it.
>
>
> How do I see which manifest run last?
>

> --
> You received this message because you are subscribed to the Google Groups "Puppet Users" group.

> To post to this group, send email to puppet...@googlegroups.com (mailto:puppet...@googlegroups.com).
> To unsubscribe from this group, send email to puppet-users...@googlegroups.com (mailto:puppet-users...@googlegroups.com).

Elias Abacioglu

unread,
Mar 4, 2012, 7:49:00 PM3/4/12
to puppet...@googlegroups.com
Dominik Zyla skrev 2012-03-04 17:04:
> Hi,
>
> Just take a look into your logs. You'll see something like:
>
> (/Stage[main]/Mcollective::Server/Service[mcollective])
>
> where `Mcollective::Server' points to `mcollective::server' subclass in the catalog. Just look for such log format and you'll know processing which manifest puppet-agent stops on.
>
Mar 2 07:54:17 d01ar1sut002 puppet-agent[1398]: Finished catalog run in
3.01 seconds
Mar 2 08:24:33 d01ar1sut002 puppet-agent[1398]: Finished catalog run in
3.63 seconds
Mar 2 16:46:00 d01ar1sut002 puppet-agent[14508]: Reopening log files
Mar 2 16:46:00 d01ar1sut002 puppet-agent[14508]: Starting Puppet client
version 2.7.11

At 16:46 I restarted puppet.. There is nothing in the log because I
haven't made any changes. Do I turn on debug?

Dominik Zyla

unread,
Mar 4, 2012, 8:15:57 PM3/4/12
to puppet...@googlegroups.com
> --
> You received this message because you are subscribed to the Google Groups "Puppet Users" group.
> To post to this group, send email to puppet...@googlegroups.com (mailto:puppet...@googlegroups.com).
> To unsubscribe from this group, send email to puppet-users...@googlegroups.com (mailto:puppet-users...@googlegroups.com).
> For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.


Well, it's always good to turn on debugging for troubleshooting.

--
Dominik Zyla

Raboo

unread,
Mar 5, 2012, 11:14:56 AM3/5/12
to puppet...@googlegroups.com
Ok, it fails at early run.

Mar  5 03:57:33 srzarnsas007 puppet-agent[22690]: Retrieving plugin
Mar  5 03:57:35 srzarnsas007 puppet-agent[22690]: Loading facts in /var/lib/puppet/lib/facter/facter_dot_d.rb
Mar  5 03:57:35 srzarnsas007 puppet-agent[22690]: Loading facts in /var/lib/puppet/lib/facter/root_home.rb
Mar  5 03:57:35 srzarnsas007 puppet-agent[22690]: Loading facts in /var/lib/puppet/lib/facter/operatingsystemmajor.rb
Mar  5 03:57:35 srzarnsas007 puppet-agent[22690]: Loading facts in /var/lib/puppet/lib/facter/hpov.rb
Mar  5 03:57:40 srzarnsas007 puppet-agent[22690]: Caching catalog for srzarnsas007.fqdn.com
Mar  5 03:57:41 srzarnsas007 puppet-agent[22690]: Applying configuration version '1330900136'
Mar  5 04:00:02 srzarnsas007 puppet-agent[22690]: Finished catalog run in 141.02 seconds
Mar  5 04:30:06 srzarnsas007 puppet-agent[22690]: Retrieving plugin

So /bin/sh gets defunc on Retrieving plugin.. What's my next step?

Krzysztof Wilczynski

unread,
Mar 5, 2012, 11:23:52 AM3/5/12
to puppet...@googlegroups.com
Hi,


So /bin/sh gets defunc on Retrieving plugin.. What's my next step?

I would imagine, that adding code to either ignore SIGCHLD, or collect child status (exit code), or use Process.detach could help :-)

KW

Raboo

unread,
Mar 5, 2012, 12:03:12 PM3/5/12
to puppet...@googlegroups.com


On Monday, March 5, 2012 12:23:52 PM UTC+1, Krzysztof Wilczynski wrote:

I would imagine, that adding code to either ignore SIGCHLD, or collect child status (exit code), or use Process.detach could help :-)

How and what in the what what now?
Reply all
Reply to author
Forward
0 new messages