Hi,
I'd like to bring up a point that was raised during the resolution of a
ticket.
The idea is to hopefully trigger a discussion and derive actions from it, if
necessary.
Bugs like the one described in PUP-7848 [0] (for which there's a fix
already,
thanks!) are quite dangerous from the operations' point of view as they
could
quickly reduce the performance of a production Puppet infrastructure.
Is there any kind of watchdog that can be configured at Puppetserver
level to
automatically destroy instances that are misbehaving like these ones
(perhaps
based on the CPU wall time, age...)? We're already using over here
max-requests-per-instance but for obvious reasons it's not useful in
this case
:)
The more agents exercising the bad code and triggering the issue, the faster
the load goes up and therefore the slower the infrastructure becomes. There
should be a way to tell Puppetserver how to protect itself. Perhaps there's
already but we could not find it [1]. In the meantime, what we're doing
is to
put some extra (and very specific) monitoring in place on our side to try to
detect this situation and alarm it but perhaps there's something that
could be
done directly at Puppetserver level to act earlier.
In case it helped we're running 2.7.2 over here.
What do you think?
Thanks!
[0]
https://tickets.puppetlabs.com/browse/PUP-7848
[1]
https://docs.puppet.com/puppetserver/latest/config_file_puppetserver.html
--
bye
Nacho
http://cern.ch/nacho