Hi all,
Starting a new thread to focus on watch support for Resource Metrics API only.
On yesterday's SIG meeting I briefly talked about why I would like to extend the existing API with watch capability. Let me repeat that to provide context for anyone who didn't attend the meeting.
One of the main consumers of Resource Metrics API is HPA, which scales Deployments based on CPU usage. This means that scaling operation happens in reaction to CPU utilization signal, e.g. ingress traffic increased and we already have highly utilized Pods. This model doesn't allow us to predict anything - one has to provide other source of data (i.e. Custom Metrics), to know beforehand that scale-up operation needs to happen. However, as the name suggests, these metrics are custom, which means that something else - CPU usage - is used for scaling by default. Now, in order to be able to scale efficiently, the information about CPU usage increase should be delivered to HPA as soon as possible. This is hard to achieve in the current model, where any metrics client has to actively poll Metrics Server for fresh data, busy-waiting for metrics update.
What we could do instead is to provide a watch mechanism in Resource Metrics API, allowing Metrics Server to notify its clients that a new batch of data was collected.
One concern I heard on the meeting was that not every metrics provider exposes a streaming API. This indeed may be a concern for Custom Metrics, but as far as I understand, CPU usage is coming from Resource Metrics API, which is backed only by Metrics Server today. Moreover, even if there was another implementation based on some external metrics provider (e.g. Stackdriver) without a streaming API support, it could be simulated by the custom implementation, effectively moving the busy waiting loop from all the clients to a single place on the server.
I would like to prepare a KEP for this extension of Resource Metrics API. Let me know what are your thoughts on this.
Cheers
Daniel