On 17/06/16 16:52, Aleksey Tsalolikhin wrote:
> Where do you get 6 from, Bronto? I mean, why 6 and not 5 or 7? Just
> curious if there is any special significance to it...
On our systems we have
agent_expireafter => "30" ;
because on CFEngine's standard schedule (5 minutes) we allow a maximum
of 6 agents to run concurrently (6 x 5 = 30).
Why 6, you ask. We expect an agent to be able to apply our policies in
much less than the canonical 5 minutes in normal conditions.
If an agent is around after, say, 10 minutes, it could still be normal.
For example, it may be waiting for apt to download a package from a slow
repository. Thus, I definitely don't want to kill an agent if it's not
done after the canonical interval.
How many agents is it reasonable to leave around then?
having one agent around, or none, is the expected normal situation
having two agents around can still happen (see above)
having three would be strange already
so I leave myself some buffer and double that: if I have double the
number of agents that I think is normal, then I have a problem and I
must start take them down.
Say that something is going really wrong on the system, e.g.: a
filesystem corruption that messes up the CFEngine agent. The agents may
start piling up adding a problem over another: not only you have a
filesystem corruption and the system is malfunctioning, now you're also
filling up the process table and, possibly, eating memory and CPU
cycles. It's reasonable to stop the madness before it hurts: if 2 agents
running are still OK and 3 are strange, then two times 3 is madness and
we must stop it there.
Ciao!
-- bronto