Prometheus not getting metrics from cadvisor

4,546 views
Skip to first unread message

Max Furman

unread,
Sep 1, 2020, 1:51:29 AM9/1/20
to Prometheus Users
Heyo,

I've deployed a prometheus, grafana, kube-state-metrics, alertmanager, etc. setup using kubernetes in GKE v1.16.x. I've used https://github.com/do-community/doks-monitoring as a jumping off point for the yaml files.

I've been trying to debug a situation for a few days now and would be very grateful for some help. My prometheus nodes are not getting metrics from cadvisor.

* All the services and pods in the deployments are running. prometheus, kube-state-metrics, node-exporter, all running - no errors. 
* The cadvisor targets in prometheus UI appear as "up".
* Prometheus is able to collect other metrics from the cluster, but no pod/container level usage metrics.
* I can see cadvisor metrics when I query `kubectl get --raw "/api/v1/nodes/<your_node>/proxy/metrics/cadvisor"`, but when I look in prometheus for `container_cpu_usage` or `container_memory_usage`, there is no data.
* My cadvisor scrape job config in prometheus

```
    - job_name: kubernetes-cadvisor
      honor_timestamps: true
      scrape_interval: 15s
      scrape_timeout: 10s
      metrics_path: /metrics/cadvisor
      scheme: https
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
```
cribbed from the prometheus/docs/examples.

It's odd to me that the path I got from the examples is /metrics/advisor, but the path I queried using `kubectl get` was /proxy/metrics/advisor. 

I've tried a whole bunch of different variations on paths and scrape configs, but no luck. Based on the fact that I can query the metrics using `kubectl get` (they exist) it seems to me the issue is prometheus communicating with the cadvisor target.

If anyone has experience getting this configured I'd sure appreciate some help debugging.

Cheers,
max

Brian Candler

unread,
Sep 1, 2020, 5:17:55 AM9/1/20
to Prometheus Users
On Tuesday, 1 September 2020 06:51:29 UTC+1, Max Furman wrote:
It's odd to me that the path I got from the examples is /metrics/advisor, but the path I queried using `kubectl get` was /proxy/metrics/advisor. 


This is not well documented, but I think it's the difference between scraping the kubelet directly versus scraping through the k8s API.

I did attempt to reverse-engineer some of this here:
but I wouldn't say I really got to the bottom of it.

Max Furman

unread,
Sep 1, 2020, 11:18:14 AM9/1/20
to Prometheus Users
Funnily enough I was exploring that post yesterday - it came up based on some of the keywords I was searching for. Definitely useful to see someone breaking down the various possible data sources. I missed the part right at the bottom (the UPDATE) - it's helpful to see that. <host>:10255/metrics/cadvisor is the url my prometheus config is currently scraping. Still no luck there, but at least nice to be able to confirm that I'm not scraping the completely wrong place.

Brian Candler

unread,
Sep 1, 2020, 1:38:07 PM9/1/20
to Prometheus Users
:10255 is on a test server running microk8s snap:

root@nuc1:~# netstat -natp | grep kubelet
tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN      4129/kubelet
tcp        0      0 127.0.0.1:37888         127.0.0.1:16443         ESTABLISHED 4129/kubelet
tcp6       0      0 :::10255                :::*                    LISTEN      4129/kubelet
tcp6       0      0 :::10250                :::*                    LISTEN      4129/kubelet
root@nuc1:~#

Looking at a production k8s 1.16 cluster (installed using kubectl) I don't see this port open.  And I'm afraid I don't know where it's configured, even in microk8s.

I do note that the kubelet command line includes "--insecure-port=0", but I don't think that's it; even with a reboot it comes up on the same ports.

Max Furman

unread,
Sep 1, 2020, 2:41:19 PM9/1/20
to Prometheus Users
Interesting! This comment helped me a lot. Under the new assumption that the address was wrong, I searched where cadvisor metrics could be scraped for prometheus in GKE. I found a blog post with an example kubernetes-cadvisor config that worked for me.


config pasted below for posterity:
```

- job_name: 'kubernetes-cadvisor'

scheme: https


tls_config:

ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

insecure_skip_verify: true

bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token


kubernetes_sd_configs:

- role: node

relabel_configs:

- action: labelmap

regex: __meta_kubernetes_node_label_(.+)

- target_label: __address__

replacement: kubernetes.default.svc.cluster.local:443

- source_labels: [__meta_kubernetes_node_name]

regex: (.+)

target_label: __metrics_path__

replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
```

Now I see the metrics appearing in prometheus.

Thanks again for the hint!

Now on to trying to get the grafana dashboards working. No rest for the wicked.
Reply all
Reply to author
Forward
0 new messages