Auto scaling AWX on EKS

271 views
Skip to first unread message

Wajid Baig

unread,
Jun 16, 2022, 11:05:21 AM6/16/22
to AWX Project
Hi Team, 

We deployed AWX on EKS. Configured cluster auto scaler, hpa(pod auto scaler).
When we increase replica count, pods are failing with insufficient cpu. Nodes are not auto scaling. Nodes are not getting added. 

My question: Is there anything else we need to add to cluster for nodes t auto scale. 

Please help! Thanks in advance.

Best wishes & Regards,
AB

AWX Project

unread,
Jun 16, 2022, 11:38:02 AM6/16/22
to AWX Project
Hi AB, 

At the moment, the awx-operator does not support using a HorizontalPodAutoscaler with it.  The problem is that the operator tries to maintain the replicas value set in the spec (or the default of 1).  So as the service comes under load, the HPA tries to scale up, but the operator's reconciliation loop will come by and overwrite the changes the HPA made.  

One approach to support HPA's would be to set `watchDependentResources` to False in the watches.yaml, but we have other logic in the operator that depends upon that being true..


We are interested in other potential solutions.  For now, if you want to scale up, you can change the value for replicas on the AWX spec and the operator will reconcile that change.  

Thank you,
AWX Team

Wajid Baig

unread,
Jun 17, 2022, 4:01:59 AM6/17/22
to AWX Project
Thank you very much for responding and sharing great info regarding HPA. 

We have set replica count as 4 for AWX but two of the pods are going into pending state due to insufficient CPU. 
We have also deployed/configured cluster autoscaler and metrics server in the EKS cluster therefore we expecting node group to scale out i.e., to add new nodes in node group but we are wondering why it isn't happening? 
Can you please help us in this regard?

Thanks in advance, 
AB   

AWX Project

unread,
Jun 17, 2022, 4:36:29 PM6/17/22
to AWX Project
I am not sure about cluster autoscaling.  It sounds like the AWX pods are hitting a resource constraint.  You could try specifying lower CPU requests on the AWX spec - https://github.com/ansible/awx-operator#containers-resource-requirements

Honestly the default values in that doc are pretty high in my opinion.  We should probably change those to 50m each.  

Thanks,
Christian

Reply all
Reply to author
Forward
0 new messages