Setting up Kubernetes Autoscaling with Custom Metrics

Janaka Bandara

unread,

Apr 27, 2016, 9:51:28 AM4/27/16

to Containers at Google

I am trying to set up autoscaling on a Kubernetes 1.2.3 (beta) cluster based on custom metrics. (I already tried CPU-based autoscaling on the cluster, and it worked fine.)

I tried to follow the custom metrics proposal[1], but I'm having problems in creating the necessary set-up.

This is what I have done so far:

1. Added a custom metrics annotation to the pod spec being deployed (similar to the configuration provided in their proposal):

        apiVersion: v1
        kind: ReplicationController
        metadata:
          name: metrix
          namespace: "default"
        spec:
          replicas: 1
          template:
            metadata:
              labels:
                app: metrix
              annotations:
                metrics.alpha.kubernetes.io/custom-endpoints: >
                  [
                    {
                      "api": "prometheus",
                      "path": "/status",
                      "port": "9090",
                      "names": ["test1"]
                    },
                    {
                      "api": "prometheus",
                      "path": "/metrics",
                      "port": "9090"
                      "names": ["test2"]
                    }
                  ]
            spec:
              containers:
              - name: metrix
                image: janaka/prometheus-ep:v1
                resources:
                  requests:
                    cpu: 400m

2. Created a Docker container tagged `janaka/prometheus-ep:v1` (local) running a Prometheus-compatible server on port 9090, with `/status` and `/metrics` endpoints

3. Enabled custom metrics on the kubelet by appending `--enable-custom-metrics=true` to `KUBELET_OPTS` at `/etc/default/kubelet` (based on the kubelet CLI reference[2]) and restarted kubelet

All pods (in `default` and `kube-system` namespaces) are running, and the heapster pod log doesn't contain any 'anomalous' outputs either (except for a small glitch at startup, due to temporary unavailability of InfluxDB):

    $ kubesys logs -f heapster-daftr

    I0427 05:07:45.807277       1 heapster.go:60] /heapster --source=kubernetes:https://kubernetes.default --sink=influxdb:http://monitoring-influxdb:8086
    I0427 05:07:45.807359       1 heapster.go:61] Heapster version 1.1.0-beta1
    I0427 05:07:45.807638       1 configs.go:60] Using Kubernetes client with master "https://kubernetes.default" and version "v1"
    I0427 05:07:45.807661       1 configs.go:61] Using kubelet port 10255
    E0427 05:08:15.847319       1 influxdb.go:185] issues while creating an InfluxDB sink: failed to ping InfluxDB server at "monitoring-influxdb:8086" - Get http://monitoring-influxdb:8086/ping: dial tcp xxx.xxx.xxx.xxx:8086: i/o timeout, will retry on use
    I0427 05:08:15.847376       1 influxdb.go:199] created influxdb sink with options: host:monitoring-influxdb:8086 user:root db:k8s
    I0427 05:08:15.847412       1 heapster.go:87] Starting with InfluxDB Sink
    I0427 05:08:15.847427       1 heapster.go:87] Starting with Metric Sink
    I0427 05:08:15.877349       1 heapster.go:166] Starting heapster on port 8082
    I0427 05:08:35.000342       1 manager.go:79] Scraping metrics start: 2016-04-27 05:08:00 +0000 UTC, end: 2016-04-27 05:08:30 +0000 UTC
    I0427 05:08:35.035800       1 manager.go:152] ScrapeMetrics: time: 35.209696ms size: 24
    I0427 05:08:35.044674       1 influxdb.go:177] Created database "k8s" on influxDB server at "monitoring-influxdb:8086"
    I0427 05:09:05.000441       1 manager.go:79] Scraping metrics start: 2016-04-27 05:08:30 +0000 UTC, end: 2016-04-27 05:09:00 +0000 UTC
    I0427 05:09:06.682941       1 manager.go:152] ScrapeMetrics: time: 1.682157776s size: 24
    I0427 06:43:38.767146       1 manager.go:79] Scraping metrics start: 2016-04-27 05:09:00 +0000 UTC, end: 2016-04-27 05:09:30 +0000 UTC
    I0427 06:43:38.810243       1 manager.go:152] ScrapeMetrics: time: 42.940682ms size: 1
    I0427 06:44:05.012989       1 manager.go:79] Scraping metrics start: 2016-04-27 06:43:30 +0000 UTC, end: 2016-04-27 06:44:00 +0000 UTC
    I0427 06:44:05.063583       1 manager.go:152] ScrapeMetrics: time: 50.368106ms size: 24
    I0427 06:44:35.002038       1 manager.go:79] Scraping metrics start: 2016-04-27 06:44:00 +0000 UTC, end: 2016-04-27 06:44:30 +0000 UTC

However, the custom endpoints are not being scraped. (I verified it by adding stderr logs for startup and endpoint handlers of my server; only the server initialization logs are displayed on kubectl logs of the pod.)

(From what I understood from the proposal as well as the issue [3], we don't have to run a separate Prometheus collector in the cluster as cAdvisor should already pull data from the endpoints defined in the pod spec. Is this true, or do I need a separate Prometheus collector as well?)

It would be really helpful if someone can point me in the right direction regarding how to properly set up custom metrics-based autoscaling.

Thanks!

[1]: https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/custom-metrics.md
[2]: http://kubernetes.io/docs/admin/kubelet/
[3]: https://github.com/kubernetes/kubernetes/issues/18352

Vishnu Kannan

unread,

Apr 27, 2016, 3:49:00 PM4/27/16

to Containers at Google, Marcin Wielgus, Jerzy Szczepkowski

+Marcin +Jerzy

--
You received this message because you are subscribed to the Google Groups "Containers at Google" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-contain...@googlegroups.com.
To post to this group, send email to google-c...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-containers.
For more options, visit https://groups.google.com/d/optout.

Jerzy Szczepkowski

unread,

May 4, 2016, 8:06:29 AM5/4/16

to jana...@gmail.com, Containers at Google, David Oppenheimer, Piotr Szczesniak, Marcin Wielgus, Filip Grzadkowski, Nikhil Jindal

Hi Janaka,

The proposal [1] is out of date. There is a PR in flight describing the current design:

https://github.com/kubernetes/kubernetes.github.io/pull/135

Please refer to:

https://github.com/mwielgus/kubernetes.github.io/blob/custom-metrics/docs/user-guide/horizontal-pod-autoscaling/index.md#support-for-custom-metrics

Regards,

Jerzy

On Wed, May 4, 2016 at 10:30 AM, Piotr Szczesniak <pszcz...@google.com> wrote:

+Nikhil

There is also the same question on stackoverflow.

http://stackoverflow.com/questions/36882855/setting-up-kubernetes-autoscaling-with-custom-metrics

On Wed, May 4, 2016 at 10:26 AM, Piotr Szczesniak <pszcz...@google.com> wrote:
+jsz

On Sat, Apr 30, 2016 at 8:23 AM, David Oppenheimer <davi...@google.com> wrote:

--

Janaka Bandara

unread,

May 4, 2016, 12:17:50 PM5/4/16

to Containers at Google

Hi Jerzy,

Thanks!
I tried the approach in the PR and it worked fine.

(I found the same links a few days ago from a different source, but could not update this thread on time.)

Sander Ploegsma

unread,

Jun 16, 2016, 7:59:28 AM6/16/16

to google-c...@googlegroups.com

EDIT: never mind, my endpoint was being served over HTTPS, which made it insecure as the certificate wasn't valid for 'localhost'. When using HTTP everything works as expected. Marking as resolved.

Hi, Janaka,

Could you show how you got it working exactly? I tried setting everything up according to the documentation proposed in the PR, but it's a bit unclear how the endpoint should be structured.

My steps:

Enabled custom metrics on the cluster
Created an endpoint on my application that returns the number of open connections
Created this exact ConfigMap:
apiVersion: v1 kind: ConfigMap metadata: name: cm-config data: # Use Prometheus structure, see https://github.com/google/cadvisor/blob/master/docs/application_metrics.md definition.json: "{\"endpoint\" : \"http://localhost:8443/metrics\", {\"metrics_config\" : [{\"name\" : \"connections\", \"metric_type\" : \"gauge\", \"data_type\" : \"int\", \"polling_frequency\" : 10, \"units\" : \"number of active connections\"}]}}"
Mapped the hostPort and containerPort in the RC
Mounted the ConfigMap in my ReplicationController to /etc/custom-metrics
Added this exact annotation to my HPA:
alpha/target.custom-metrics.podautoscaler.kubernetes.io: '{"items":[{"name":"connections", "value": "1000"}]}'

However, the HPA is unable to get my custom metrics. I also couldn't find any way to debug the setup, all I see is 'FailedGetCustomMetrics metrics obtained for 0/1 of pods' when I check the HPA events.

Cheers,

Sander

Tuna

unread,

Jan 9, 2017, 8:33:57 AM1/9/17

to Kubernetes user discussion and Q&A, google-c...@googlegroups.com

Hi Janaka, Jerzy, Sander and everyone,

There is something unclear for me to understand the setup so I just wanna ask:

- Do we need Prometheus running in order to do hpa with custom metric (queries per second) ? OR cAdvisor and heapster are enough ?

- By binding the configMap to a pod and adding the annotation to HPA manifest, we will be good to have it autoscaled?

-Tuna

Tuna

unread,

Jan 9, 2017, 9:23:00 AM1/9/17

to Kubernetes user discussion and Q&A, google-c...@googlegroups.com

Just received a notification from mailer-daemon saying that I don't have permission to reply on this group. But seeing my response was still there. So just a double reply for a try:

---

Hi Janaka, Jerzy, Sander and everyone,

There is something unclear for me to understand the setup so I just wanna ask:

- Do we need Prometheus running in order to do hpa with custom metric (queries per second) ? OR cAdvisor and heapster are enough ?

- By binding the configMap to a pod and adding the annotation to HPA manifest, we will be good to have it autoscaled?

-Tuna

Jerzy Szczepkowski

unread,

Jan 10, 2017, 3:41:19 AM1/10/17

to Kubernetes user discussion and Q&A, google-c...@googlegroups.com

The custom metrics support described in this thread was experimental/alpha feature and is currently discontinued.

We are working on adding proper CM support in release 1.6, please refer to https://github.com/kubernetes/community/pull/152 and https://github.com/kubernetes/kubernetes/pull/34754.

Reply all

Reply to author

Forward