Thanks Chris & Matthew
The Splitting/Sharding concept is one we talked about a little (playbook uses the API to create the sub-inventories and submit jobs against) but kludgy of course and I am not sure how reusable it would have been. This feature hits it on the head though. The forks + task methodology makes it hard without something like this (slow hosts kill the entire fork group; in 6K hosts, not all of them are super fast).
We saw a little talk about clustering AWX and outside of OpenShift, it didn't look like it was supported/existed. We've run into a few gotchas when trying to do it though some of the pointers in the one HA thread here look to be helpful. Is AWX clustering official or only functional within OpenShift?
I don't think Ansible itself could do it either. Most of our testing has been within AWX itself (interface, reporting, history of job execution - these are soft requirements). I set up a simple test to get a feeling for it:
Playbook: turns off gather_facts and contains 2 tasks:
task 1: uname -n & store in a variable
task 2: display the variable
Sample size: 50 servers
Forks: 16
Ansible runtime*: 27 seconds
AWX runtime: 32 seconds (18.5% slower)
* Ansible was run on a physical server with 40 threads; forks was set to 16 still
Calculating this out to 6K servers comes to about 22.5 minutes for Ansible. That doesn't leave much time for the other 400+ tasks which would probably exist. I'll perform some more tests (more tasks, bigger server sample) over the next few days and see how play out. Overall we are pretty new at Ansible and still learning it as well.