puppet agent periodically not running after 3.0 upgrade

76 views
Skip to first unread message

David Mesler

unread,
Dec 14, 2012, 8:15:25 PM12/14/12
to puppet...@googlegroups.com
I've recently upgraded from 2.6.9 to 3.0.1 and have noticed an oddity. Our puppet agents are configured with a runinterval of 900 and a splaylimit of 450. Since upgrading I've noticed that once or twice a day our puppet agents simply won't run for about an hour or so. Has anyone else experienced anything like this?

--david

David Mesler

unread,
Dec 19, 2012, 12:00:42 AM12/19/12
to puppet...@googlegroups.com
I've noticed when I strace a puppet agent that has failed to run after its 900 second runinterval, it's blocking on a really long select call. Like:
select(1, [], [], [], {1032, 350000} <unfinished ...>

When that's done it finally re-reads pupet.conf and stars a catalog run. I have no idea where that long select call comes from.

David Mesler

unread,
Dec 20, 2012, 9:09:09 PM12/20/12
to puppet...@googlegroups.com

I’ve added some debugging messages to run_event_loop and figured out what was going on. I’m going to reference line numbers in https://github.com/puppetlabs/puppet/blob/master/lib/puppet/daemon.rb to make it easier on me to explain. I found that occasionally the “if” statement on line 180 was failing because the value of “now” was one second behind “next_reparse”. I believe this is because line 168 uses “to_i” when it should use “ceil”. I’m deploying a patched version of puppet with this change to my servers now.

But I think a bigger issue is line 175 where next_event is set to the current time plus one hour. That’s a pretty arbitrary and unpleasant value. Why not set next_event to the lower vale of :runinterval or :filetimeout? The way it is now, if line 180 fails, you’re stuck with an hour long wait even though your :runinterval may be far less.

Also, line 197 is backwards. It should be “next_agent_run += new_run_interval – agent_run_interval”.

David Mesler

unread,
Dec 21, 2012, 4:48:53 PM12/21/12
to puppet...@googlegroups.com
My ceil theory didn't pan out. But here's a working fix:

--- puppet-3.0.1.orig/lib/puppet/daemon.rb      2012-10-20 00:57:50.000000000 -0400                                               
+++ puppet-3.0.1/lib/puppet/daemon.rb   2012-12-21 08:27:17.000000000 -0500                                                       
@@ -164,6 +164,8 @@                                                                                                               
     next_reparse = 0                                                                                                             
     reparse_interval = Puppet[:filetimeout]                                                                                      
                                                                                                                                  
+    next_event = 0                                                                                                               
+                                                                                                                                 
     loop do                                                                                                                      
       now = Time.now.to_i                                                                                                        
                                                                                                                                  
@@ -172,7 +174,9 @@                                                                                                               
       # the math messes up or something; it should be inexpensive enough to                                                      
       # wake once an hour, then go back to sleep after doing nothing, if                                                         
       # someone only wants listen mode.                                                                                          
-      next_event = now + 60 * 60                                                                                                 
+      if now >= next_event || next_event == 0                                                                                    
+        next_event = now + 60 * 60                                                                                               
+      end                                                                                                                        
                                                                                                                                  
       # Handle reparsing of configuration files, if desired and required.                                                        
       # `reparse` will just check if the action is required, and would be                                                        
@@ -194,7 +198,7 @@                                                                                                               
         # next time it is scheduled to run, just in case.  In the event that                                                     
         # we made no change the result will be a zero second adjustment.                                                         
         new_run_interval    = [Puppet[:runinterval], 0].max                                                                      
-        next_agent_run     += agent_run_interval - new_run_interval                                                              
+        next_agent_run     += new_run_interval - agent_run_interval                                                              
         agent_run_interval  = new_run_interval                                                                                   
       end
Reply all
Reply to author
Forward
0 new messages