Hi Team,
I was looking for a way to monitor and graph usage (utilization) of node groups in clusters that use CA across various clouds. After going through
metrics.go I can see, that there is nodesGroupMaxNodes metric, which translates into "Maximum number of nodes in the node group".
So the thing is that in order to measure the utilization level of the node group, there should be also another metric, which would tell the current number of nodes in node groups. Having this, it would be easy to monitor node groups utilization levels across various clouds, only be using CA metrics, without going deep into cloud metrics (like AWS/ASG, GCP, Azure etc).
What do you think about this?
I have done a brief research in the CA codebase, and I think all required data is there already:
- perNodeGroupReadiness holds information about nodes count in each node group, split into different node states
- Could just create a similar to GetClusterReadiness method, which would return this perNodeGroupReadiness
- Metric could be defined similar way to nodesGroupMaxNodes, e.g. nodesGroupCurrentNodes
Thank you for your opinion on this,
Cheers,
Maciek