Puppet agent process get stuck

664 views
Skip to first unread message

yannig rousseau

unread,
Jul 17, 2013, 1:00:13 PM7/17/13
to puppet...@googlegroups.com
Hi,

3 days ago, we had a production issue where the puppetmaster became unreachable for 20 minute.
All of the puppet clients which tried to connect at this time are now failing :  "Run of Puppet configuration client already in progress; skipping"

Further investigation show that the puppet agent process is still running on all of this clients 3 days later. I tried to kille the process on a machine, and puppet service came back to normal for this machine.

Is there a way to kill the process on all of the client ?
Is there a way to specify a timeout on the process ? This would permit to discard the process without human action

Regards
Yannig

Configuration :
 - Puppet master => RHEL5, Puppet 3.2.2
 - Puppet client => RHEL5, Puppet 3.1.1 & 3.2.2

jcbollinger

unread,
Jul 18, 2013, 9:11:46 AM7/18/13
to puppet...@googlegroups.com


On Wednesday, July 17, 2013 12:00:13 PM UTC-5, yannig rousseau wrote:
Hi,

3 days ago, we had a production issue where the puppetmaster became unreachable for 20 minute.
All of the puppet clients which tried to connect at this time are now failing :  "Run of Puppet configuration client already in progress; skipping"

Further investigation show that the puppet agent process is still running on all of this clients 3 days later. I tried to kille the process on a machine, and puppet service came back to normal for this machine.

Is there a way to kill the process on all of the client ?


Puppet does not provide a built-in mechanism for this.  There is a variety of tools that could do it, but you would have to had already set them up.

 
Is there a way to specify a timeout on the process ? This would permit to discard the process without human action


Again, Puppet does not provide this as a built-in feature, but it should be possible to add external instrumentation to make it happen.  That would probably be facilitated by launching the agent periodically via a scheduler (such as cron) instead of running it in daemon mode.


John

yannig rousseau

unread,
Jul 18, 2013, 10:29:42 AM7/18/13
to puppet...@googlegroups.com
I managed to kill all of the stuck process with a simple ssh loop in a script. Nevertheless, that's not very cool to proceed so...

Don't you think it would be a good option to implement a timeout in the puppet agent ? If the connection / catalog application is too long (this treshold could be set with a specific argument), the puppet agent discard the action and request the puppet master again...
Or is it too complicated to implement safely ?

Yannig
Reply all
Reply to author
Forward
0 new messages