Thanks for the response. Nice to get some new data points.
We're unfortunately unable to lengthen the environment cache past 30s
till we get r10k wired up and able to clear the cache automatically.
We're running 32 instances for 32 cores which seems fine, but I'm not
clear on if I should run lower. I set JAVA_ARGS="-Xms18g -Xmx18g
-XX:+UseG1GC" which seems fine the number of cores so far. Have not had
crashing problems. CPU was running 90-100% till I set the heaviest
hitting role to once an hour while we make changes.
One interesting data point is that I was using an ancient function as a
replacement from fqdn_rand(60,$seed) to keep crons from moving around
during the transition. Removing the function from the majority of cases
dropping compile times by 1-3 seconds and smoothed out the CPU curve. I
suspect this function was running poorly and might be the cause of my
1.9.3 Ruby deprecation notices I see in the logs. I suspect the more
Ruby/stdlib I remove from the manifests the better performance will be.
In regards to agents I ran a few tests on our Centos6 hosts. Definitely
see an improvement in apply times particularly in roles that have a lot
of file resources. Dropped from 45s to 30s on one of these roles which
should be http connection reuse though still need to verify it's because
we're using something other than 1.8.7 as the runtime.
Ramin