One of the failure modes of CFEngine is a cf-agent process pile-up bogging down the system that gets worse the longer the situation is extant.
Here is a policy to detect and kill cf-agent processes more than 10 minutes old.
Feedback welcome.
bundle agent detect_cf_agent_pileup
{
processes:
"cf-agent"
comment => "Detect and kill cf-agent processes more than 10 minutes old.
CFEngine won't exit until all external commands complete,
unless commands exec_timeout is set and succeeds. This
promise ensures any cf-agent processes more than 10 minutes
old are terminated and killed, so we never end up with hundreds
or thousands of cf-agent processes on a system",
process_count => pileup_check,
process_select => proc_finder,
classes => if_repaired("agents_purged"),
signals => { "term", "kill"};
reports:
process_pile_up::
"cf-agent process pile-up detected!!";
agents_purged::
"cf-agents purged";
}
body process_select proc_finder
{
command => "^.*cf-agent.*"; # (Anchored) regular expression matching the command/cmd field of a process
process_owner => { "root", }; # List of regexes matching the user of a process
stime_range => irange(ago(0,0,0,0,10,0), now); # select processes started within 10 minutes
process_result => "(!stime)&command&process_owner"; # reverse stime to get only processes started over 10 min ago
}
body process_count pileup_check
{
match_range => "0,2"; # Integer range for acceptable number of matches for this process
out_of_range_define => { "process_pile_up" }; # List of classes to define if the matches are out of range
}
--
Need CFEngine training? Email
trai...@verticalsysadmin.com