Resource Metrics watch API

125 views
Skip to first unread message

Daniel Kłobuszewski

unread,
Apr 5, 2019, 8:05:40 AM4/5/19
to kubernetes-sig-...@googlegroups.com
Hi all,

Starting a new thread to focus on watch support for Resource Metrics API only.

On yesterday's SIG meeting I briefly talked about why I would like to extend the existing API with watch capability. Let me repeat that to provide context for anyone who didn't attend the meeting.

One of the main consumers of Resource Metrics API is HPA, which scales Deployments based on CPU usage. This means that scaling operation happens in reaction to CPU utilization signal, e.g. ingress traffic increased and we already have highly utilized Pods. This model doesn't allow us to predict anything - one has to provide other source of data (i.e. Custom Metrics), to know beforehand that scale-up operation needs to happen. However, as the name suggests, these metrics are custom, which means that something else - CPU usage - is used for scaling by default. Now, in order to be able to scale efficiently, the information about CPU usage increase should be delivered to HPA as soon as possible. This is hard to achieve in the current model, where any metrics client has to actively poll Metrics Server for fresh data, busy-waiting for metrics update.

What we could do instead is to provide a watch mechanism in Resource Metrics API, allowing Metrics Server to notify its clients that a new batch of data was collected.

One concern I heard on the meeting was that not every metrics provider exposes a streaming API. This indeed may be a concern for Custom Metrics, but as far as I understand, CPU usage is coming from Resource Metrics API, which is backed only by Metrics Server today. Moreover, even if there was another implementation based on some external metrics provider (e.g. Stackdriver) without a streaming API support, it could be simulated by the custom implementation, effectively moving the busy waiting loop from all the clients to a single place on the server.

I would like to prepare a KEP for this extension of Resource Metrics API. Let me know what are your thoughts on this.

Cheers
Daniel

markust...@gmail.com

unread,
Apr 5, 2019, 8:09:24 AM4/5/19
to kubernetes-sig-instrumentation
Hi Daniel,

this ties directly into my proposal of the very same API here: https://groups.google.com/forum/#!topic/kubernetes-sig-instrumentation/nJvDyIwDgu8.

Our assessments regarding the need for this and the possible mitigations for non-streaming backends match up exactly. I'd love to collaborate on a KEP if that's workable from your perspective.

Cheers,
Markus

Daniel Kłobuszewski

unread,
Apr 5, 2019, 8:16:19 AM4/5/19
to markust...@gmail.com, kubernetes-sig-instrumentation
Hi Markus,

Thanks, I've seen this proposal. The difference is that I would like to update Resource Metrics API, while your proposal talks about Custom/External Metrics API. That being said, I think watch support makes sense for all these APIs. I'd be happy to collaborate on the KEP.

Cheers
Daniel 

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-instrumentation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-instru...@googlegroups.com.
To post to this group, send email to kubernetes-sig-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-instrumentation/59db6ddf-70fd-4c19-a7c7-b948766049c5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Frederic Branczyk

unread,
Apr 8, 2019, 5:10:17 AM4/8/19
to kubernetes-sig-instrumentation
As I mentioned in the meeting I can see the use, but a big difficulty is how to cope with essentially no monitoring system out there being able to provide such a functionality. What is the strategy to cope with that situation and not causing all metrics adapters to go defunct?
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-instrumentation+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-sig-instrumentation@googlegroups.com.

markust...@gmail.com

unread,
Apr 8, 2019, 11:04:41 AM4/8/19
to kubernetes-sig-instrumentation
I specifically mentioned these backends as "Incompatible" backends in my writeup. Here's the details:

Based on the list of current implementations, the common backends (Prometheus, Azure, Stackdriver, Datadog) don’t usually provide an API to watch for metric changes like this which makes them a lot less suitable for the scale-from-zero scenario described above.
 
However, the Watch API could still be implemented for those backends as well, reducing the amount of data sent to the autoscalers if metrics do not change often and/or if changes are not reflected as often in those backend systems. Many metrics in these systems have a fixed sample interval. The Watch API could therefore be implemented as a poll in that frequency to reduce load on the backend and only nudge the autoscaler if necessary which improves efficiency of the system and its backends overall. The implementation of said polling would also be local to the specific custom-metrics API provider and thus the implementor (e.g. Stackdriver) has all the context necessary to set this polling interval to whatever value is a good fit for the specific backend.

Basically, as Daniel already mentioned, the adapter gets to decide at which interval to poll the backend (if it doesn't support streaming).

Frederic Branczyk

unread,
Apr 12, 2019, 3:23:13 PM4/12/19
to kubernetes-sig-instrumentation
If no monitoring backend actually natively supporting anything like this, it seems to me like this polling mechanism should rather be implemented on the consuming side. I'm not seeing the actual benefit yet over this (happy to be convinced otherwise though :) ).

markust...@gmail.com

unread,
Apr 15, 2019, 6:34:50 AM4/15/19
to kubernetes-sig-instrumentation
The benefits I see are:
  1. It allows me to build a system which actually supports it and makes use of it to reduce end-to-end latency in the metric-to-autoscaler pipeline. For some autoscaling decisions (scaling up from 0 specifically) this is crucial.
  2. It allows the metric-adapter to decide on the polling interval. Looking through Stackdriver's documentation, for example, metrics have different "refresh intervals". The metric-adapter could be able to know that interval and thus only query/push a new metric in that interval. That makes communication to the backend and communication between adapter and client more efficient.

Daniel Kłobuszewski

unread,
Apr 16, 2019, 7:10:16 AM4/16/19
to markust...@gmail.com, kubernetes-sig-instrumentation
+1 to enabling implementations. It is a good thing to have well-defined API, even if initially all the implementations will just do the polling. Also - at least for the Resource Metrics API - the implementation can be done, all clients will get notified as soon as the kubelet scraping is done.

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-instrumentation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-instru...@googlegroups.com.
To post to this group, send email to kubernetes-sig-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-instrumentation/dd50649a-c1dc-4fa3-bc01-5915119c19ee%40googlegroups.com.

Frederic Branczyk

unread,
Apr 17, 2019, 8:08:03 AM4/17/19
to Daniel Kłobuszewski, markust...@gmail.com, kubernetes-sig-instrumentation
I would still like to see and understand what the migration of this would look like. Under which circumstances can we expect the watch endpoints to be available, and when can't we? To answer this, we might have to involve the API review committee, as I'm not sure we can make changes like that at this point. It might have to be an entirely new API or at least API version. Beyond that, I'd like to see more concrete examples and back of the envelope calculations of the additional load this puts on an implementation - it doesn't seem insignificant.

> It allows the metric-adapter to decide on the polling interval. Looking through Stackdriver's documentation, for example, metrics have different "refresh intervals". The metric-adapter could be able to know that interval and thus only query/push a new metric in that interval. That makes communication to the backend and communication between adapter and client more efficient.

This sounds good at the surface but in the worst case this could mean n watches/polls on stackdriver for every watch created on the custom metrics API watch, as watch happens on a list basis. As watch has to comply to standard Kubernetes API definitions, that means potential new time-series appearing/disappearing within that list over time have to be captured as well.

Generally speaking, I'm not against this, but I think it's a lot more complicated than it looks at the surface and this needs to be properly thought through.

You received this message because you are subscribed to a topic in the Google Groups "kubernetes-sig-instrumentation" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-sig-instrumentation/_b6c0oyPLJA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-sig-instru...@googlegroups.com.

To post to this group, send email to kubernetes-sig-...@googlegroups.com.

Daniel Kłobuszewski

unread,
Apr 18, 2019, 8:17:32 AM4/18/19
to Frederic Branczyk, markust...@gmail.com, kubernetes-sig-instrumentation
Yes, this would likely require updating the API version. I don't know if it would have to go through alpha stage first. Given the fact that all these APIs are currently v1beta1, and that the change is backwards-compatible, I'd prefer to introduce the changes and bump the version to v1beta2. API review should be the right place to ask if this makes sense.

Re: multiple watches - Yes, we'd need n polling loops on Stackdriver, but the metrics could be grouped into buckets (based on their refresh intervals). Then, n would be the number of these buckets. 
Reply all
Reply to author
Forward
0 new messages