kubernetes-sig-autoscaling@googlegroups.com

529 views
Skip to first unread message

Wei Liu

unread,
Apr 30, 2024, 9:39:23 PM4/30/24
to Autoscaling Kubernetes
Hello team,

My name is Wei, I have a quick question regarding to Cluster Autoscaler in EKS. 
I have plenty of EKS cluster(600 clusters) in AWS, after upgrading to CA version 1.25.3 https://github.com/kubernetes/autoscaler/releases/tag/cluster-autoscaler-1.25.3, the pod (cluster-autoscaler) came with a lot of error about describelaunchtemplateversions rate limit exceeded like below, but before its working fine, without so many describe API actions.
```
E0426 02:16:57.279750 1 static_autoscaler.go:298] Failed to get node infos for groups: error while building node template using instance requirements: (RequestLimitExceeded: Request limit exceeded. status code: 503, request id: 73c0a60b-759c-4c9e-892d-042c454b4f6e) E0426 02:16:57.279721 1 mixed_nodeinfos_processor.go:151] Unable to build proper template node for eks-abuse-decision--personal-failsafe0--us-east-1--prod--np1-eec48ee5-01d3-d6ae-1e63-6ff7514b86d8: error while building node template using instance requirements: (RequestLimitExceeded: Request limit exceeded.
```
Check the aws cloudtrail found the a lot of API call (describelaunchtemplateversions) with error " Request limit exceeded".

```
When scaling up from 0 nodes, the Cluster Autoscaler reads ASG tags to derive information about the specifications of the nodes i.e labels and taints in that ASG.
```

check the cluster, i have many nodegroup with 0/0/100 node desired/min/max, it will derive informations from describe api, but why it happened so many times.

Does anyone also encounter the scenario like me before? Anyone know why that happened? or someone could share me some information about how Cluster autoscaler workflow, when/how the describelaunchtemplateversions happene?



Reply all
Reply to author
Forward
0 new messages