Sorry for the late reply. Regarding suggestions above:
- no errors happening in the logs of any significance, regarding the database or otherwise.
- Active monitoring of the servers. No CPU, Load, or Memory spikes at all and none of those are showing any kind of resource pressure. Memory usage is stable around 5G of 8G available. I haven't tried monitoring rundeck's internal processes or Java details during one of these slow response events. I will see if that's manageable in our instance.
- DBs are maintained regularly by our DBA team handling index maintenance, etc.
- Configs are identical. All four servers deployed by ansible with the only differences being the target databases. They're even using the same DB host, a MySQL enterprise cluster. And they were all deployed around the same time and are maintained on the same schedule with monthly updates.
- We'll check the slow query log the next time we get a report of this.
But overall, I'm getting the impression that this is simply an inefficiency in the application or database design where they simply did not anticipate an environment with this large a number of projects?
Thanks!