It seems that, over time performance of the server drops. Puppet agent run times increase from less than a minute to 30 minutes and more, load on the server gets higher and r10k runs take longer and longer to the point of timing out (we have a limit of 30 minutes, it generally takes nowhere near that). Using a Grafana dashboard we can see netstat connections increasing, memory use going high and staying there, some swapping starting and a few blocked processes.
We haven't yet found anything that leads us to the reason for this. I suspect something we do is interacting poorly...but I'm haven't found what that is. I'm hoping for possible places to look or tips to address!
In the log there are some messages around ruby but I am far from knowledgeable about ruby!
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/parser/functions/fail.rb:10:in `block in real_function_fail'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/parser/functions.rb:215:in `block in newfunction'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/profiler/around_profiler.rb:58:in `profile'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/profiler.rb:51:in `profile'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/parser/functions.rb:208:in `block in newfunction'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/functions.rb:751:in `block in call'
org/jruby/RubyKernel.java:1189:in `catch'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/functions.rb:748:in `call'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/puppet_stack.rb:42:in `stack'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/evaluator/runtime3_support.rb:305:in `block in call_function'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/profiler/around_profiler.rb:58:in `profile'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/profiler.rb:51:in `profile'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/evaluator/runtime3_support.rb:303:in `call_function'