I’ve added some debugging messages to run_event_loop and figured out what was going on. I’m going to reference line numbers in https://github.com/puppetlabs/puppet/blob/master/lib/puppet/daemon.rb to make it easier on me to explain. I found that occasionally the “if” statement on line 180 was failing because the value of “now” was one second behind “next_reparse”. I believe this is because line 168 uses “to_i” when it should use “ceil”. I’m deploying a patched version of puppet with this change to my servers now.
But I think a bigger issue is line 175 where next_event is set to the current time plus one hour. That’s a pretty arbitrary and unpleasant value. Why not set next_event to the lower vale of :runinterval or :filetimeout? The way it is now, if line 180 fails, you’re stuck with an hour long wait even though your :runinterval may be far less.
Also, line 197 is backwards. It should be “next_agent_run += new_run_interval – agent_run_interval”.
--- puppet-3.0.1.orig/lib/puppet/daemon.rb 2012-10-20 00:57:50.000000000 -0400
+++ puppet-3.0.1/lib/puppet/daemon.rb 2012-12-21 08:27:17.000000000 -0500
@@ -164,6 +164,8 @@
next_reparse = 0
reparse_interval = Puppet[:filetimeout]
+ next_event = 0
+
loop do
now = Time.now.to_i
@@ -172,7 +174,9 @@
# the math messes up or something; it should be inexpensive enough to
# wake once an hour, then go back to sleep after doing nothing, if
# someone only wants listen mode.
- next_event = now + 60 * 60
+ if now >= next_event || next_event == 0
+ next_event = now + 60 * 60
+ end
# Handle reparsing of configuration files, if desired and required.
# `reparse` will just check if the action is required, and would be
@@ -194,7 +198,7 @@
# next time it is scheduled to run, just in case. In the event that
# we made no change the result will be a zero second adjustment.
new_run_interval = [Puppet[:runinterval], 0].max
- next_agent_run += agent_run_interval - new_run_interval
+ next_agent_run += new_run_interval - agent_run_interval
agent_run_interval = new_run_interval
end