New metric proposal: nodesCountPerNodeGroup

965 views
Skip to first unread message

Maciej Lasyk

unread,
Sep 13, 2023, 9:15:39 AM9/13/23
to Autoscaling Kubernetes
Hi Team,

I was looking for a way to monitor and graph usage (utilization) of node groups in clusters that use CA across various clouds. After going through metrics.go I can see, that there is nodesGroupMaxNodes metric, which translates into "Maximum number of nodes in the node group".

So the thing is that in order to measure the utilization level of the node group, there should be also another metric, which would tell the current number of nodes in node groups. Having this, it would be easy to monitor node groups utilization levels across various clouds, only be using CA metrics, without going deep into cloud metrics (like AWS/ASG, GCP, Azure etc).

What do you think about this?

I have done a brief research in the CA codebase, and I think all required data is there already:
  1.  perNodeGroupReadiness holds information about nodes count in each node group, split into different node states
  2. Could just create a similar to GetClusterReadiness method, which would return this perNodeGroupReadiness
  3. Metric could be defined similar way to nodesGroupMaxNodes, e.g. nodesGroupCurrentNodes

Thank you for your opinion on this,
Cheers,
Maciek

Michael McCune

unread,
Sep 13, 2023, 10:28:51 AM9/13/23
to Maciej Lasyk, Autoscaling Kubernetes
On Wed, Sep 13, 2023 at 9:15 AM 'Maciej Lasyk' via Autoscaling Kubernetes <kubernetes-si...@googlegroups.com> wrote:
Hi Team,

I was looking for a way to monitor and graph usage (utilization) of node groups in clusters that use CA across various clouds. After going through metrics.go I can see, that there is nodesGroupMaxNodes metric, which translates into "Maximum number of nodes in the node group".

So the thing is that in order to measure the utilization level of the node group, there should be also another metric, which would tell the current number of nodes in node groups. Having this, it would be easy to monitor node groups utilization levels across various clouds, only be using CA metrics, without going deep into cloud metrics (like AWS/ASG, GCP, Azure etc).

What do you think about this?

i think there is definitely a desire from some users to have more labels on metrics so that it is easier to isolate specific data with node groups. i'm very much in favor of us expanding the options available through the `--per-nodegroup-metrics` flag, or even making a new flag if that one doesn't fit. there has been a discussion going on in issue 5850 [0] that i would love to see moving forward. happy to contribute to the discussion if folks want to push further into implementing this.



I have done a brief research in the CA codebase, and I think all required data is there already:
  1.  perNodeGroupReadiness holds information about nodes count in each node group, split into different node states
  2. Could just create a similar to GetClusterReadiness method, which would return this perNodeGroupReadiness
  3. Metric could be defined similar way to nodesGroupMaxNodes, e.g. nodesGroupCurrentNodes

Thank you for your opinion on this,
Cheers,
Maciek

--
You received this message because you are subscribed to the Google Groups "Autoscaling Kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-auto...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-autoscaling/3bf135d0-d14e-4aa8-99d2-ebc467b92c7dn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages