Example Prometheus config to monitor node_exporters based on labels ?

3,168 views
Skip to first unread message

SK

unread,
Dec 22, 2016, 9:43:16 PM12/22/16
to Prometheus Users
Hi,

I have K8 cluster and deployed Prometheus in a pod (as a service) and node exporters (as daemon set) and they are all in same namespace. node_exporters are deployed with

       "annotations": {
          "prometheus.io.scrape": "true"
        }

However, Prometheus is not showing any node exporters in the target list even with its config looking for these labels. I found this config in a blog:

  # scrape from node_exporter running on all nodes
  - job_name: 'node-exporters'
    kubernetes_sd_configs:
    - role: pod
      api_server: 'https://10.0.0.1:8080'  <=== tried kuberetes.default.svc.cluster.local (IP, DNS names with and without port#s 443, 6443, 8080 but none of this worked)
      tls_config:
        insecure_skip_verify: true

    # You can specify the following annotations (on pods):
    #   prometheus.io.scrape: true - scrape this pod
    #   prometheus.io.port - scrape this port
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    - source_labels: [__meta_kubernetes_pod_namespace, __meta_kubernetes_pod_label_name]
      separator: '/'
      target_label: job
    - source_labels: [__meta_kubernetes_pod_node_name]
      target_label: node

Is there anything wrong with the config ? Can someone please help ? I can provide more information. Prometheus version is 1.4.1. and K8 version is 1.5.1.

Thanks,
SK

frederic...@coreos.com

unread,
Dec 23, 2016, 8:38:00 AM12/23/16
to Prometheus Users
Depending on how your cluster is configured you likely need a bearer token as well. Do the logs say anything? Generally I would recommend having a look at the sample config[1] for Kubernetes SD. Also a common approach is to label your pods and respectively put them in a service that can be discovered via the `endpoints` role, although in your specific case not strictly required.

[1] https://github.com/prometheus/prometheus/blob/93b70ee4eae4a6bd62c13b80a2e3cf27b398be1f/documentation/examples/prometheus-kubernetes.yml#L40

Matthias Rampke

unread,
Dec 23, 2016, 4:54:45 PM12/23/16
to SK, Prometheus Users
Does Prometheus log any errors? 

Since it's running inside the cluster, try leaving our the api_servers configuration so it properly guess into "in cluster" mode.

How does Prometheus know which port to scrape? 

/MR

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/545e3e50-5b42-43f3-aa22-d47e5e7d86e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

SK

unread,
Dec 26, 2016, 3:21:32 PM12/26/16
to Matthias Rampke, Prometheus Users
Hi,


On Fri, Dec 23, 2016 at 1:54 PM, Matthias Rampke <m...@soundcloud.com> wrote:
Does Prometheus log any errors? 

Since it's running inside the cluster, try leaving our the api_servers configuration so it properly guess into "in cluster" mode.

I tried leaving it out and I see the in the following error in the logs continuously:

[k8-master:~/work/gocode/ws1/cfg/prometheus]kubectl logs kws-deployment-prometheus-1869225016-pt1jw  --namespace=kube-system

time="2016-12-26T20:14:29Z" level=info msg="Starting prometheus (version=1.4.1, branch=master, revision=2a89e8733f240d3cd57a6520b52c36ac4744ce12)" source="main.go:77"
time="2016-12-26T20:14:29Z" level=info msg="Build context (go=go1.7.3, user=root@e685d23d8809, date=20161128-09:59:22)" source="main.go:78"
time="2016-12-26T20:14:29Z" level=info msg="Loading configuration file /usr/local/etc/prometheus/prometheus.yml" source="main.go:250"
time="2016-12-26T20:14:29Z" level=info msg="Loading series map and head chunks..." source="storage.go:354"
time="2016-12-26T20:14:29Z" level=info msg="2838 series loaded." source="storage.go:359"
time="2016-12-26T20:14:29Z" level=info msg="Starting target manager..." source="targetmanager.go:63"
time="2016-12-26T20:14:29Z" level=info msg="Listening on :9090" source="web.go:248"
time="2016-12-26T20:14:29Z" level=error msg="github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *v1.Pod: Get https://10.0.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused" component="kube_client_runtime" source="kubernetes.go:73"
time="2016-12-26T20:14:29Z" level=error msg="github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused" component="kube_client_runtime" source="kubernetes.go:73"
time="2016-12-26T20:14:30Z" level=error msg="github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused" component="kube_client_runtime" source="kubernetes.go:73"
time="2016-12-26T20:14:30Z" level=error msg="github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *v1.Pod: Get https://10.0.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused" component="kube_client_runtime" source="kubernetes.go:73"
time="2016-12-26T20:14:31Z" level=error msg="github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused" component="kube_client_runtime" source="kubernetes.go:73"
time="2016-12-26T20:14:31Z" level=error msg="github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *v1.Pod: Get https://10.0.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused" component="kube_client_runtime" source="kubernetes.go:73"
 
and the config file is:

scrape_configs:
  - job_name: 'kubernetes-nodes'
    scheme: https
    kubernetes_sd_configs:
    - role: node



  # scrape from node_exporter running on all nodes
  - job_name: 'node-exporters'
    scheme: https
    kubernetes_sd_configs:
    - role: pod

    # You can specify the following annotations (on pods):
    #   prometheus.io.scrape: true - scrape this pod
    #   prometheus.io.port - scrape this port
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    - source_labels: [__meta_kubernetes_pod_namespace, __meta_kubernetes_pod_label_name]
      separator: '/'
      target_label: job
    - source_labels: [__meta_kubernetes_pod_node_name]
      target_label: node

Does this mean that it requires some credentials ? I followed the instructions at http://kubernetes.io/docs/getting-started-guides/docker-multinode/ (ran master.sh on master node and worker.sh or worker nodes).

Thanks,
SK
 
/MR


To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.

SK

unread,
Dec 26, 2016, 3:44:22 PM12/26/16
to frederic...@coreos.com, Prometheus Users
Hi,

On Fri, Dec 23, 2016 at 5:38 AM, <frederic...@coreos.com> wrote:
Depending on how your cluster is configured you likely need a bearer token as well. Do the logs say anything?

[Sarat] To setup the cluster, I followed the instructions at http://kubernetes.io/docs/getting-started-guides/docker-multinode/. The logs show that Prometheus is failing to get pods and nodes list from the API server.

...
...
time="2016-12-26T20:14:47Z" level=error msg="github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *v1.Pod: Get https://10.0.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused" component="kube_client_runtime" source="kubernetes.go:73"
time="2016-12-26T20:14:47Z" level=error msg="github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused" component="kube_client_runtime" source="kubernetes.go:73"
time="2016-12-26T20:14:48Z" level=error msg="github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused" component="kube_client_runtime" source="kubernetes.go:73"
time="2016-12-26T20:14:48Z" level=error msg="github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache/reflector.go:109: Failed to list *v1.Pod: Get https://10.0.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused" component="kube_client_runtime" source="kubernetes.go:73"
...
...

Looking at the manifests, looks like the API server is deployed with the following options:

    {
      "name": "apiserver",
      "image": "gcr.io/google_containers/hyperkube-amd64:v1.5.1",
      "command": [
        "/hyperkube",
        "apiserver",
        "--service-cluster-ip-range=10.0.0.1/24",
        "--insecure-bind-address=0.0.0.0",
        "--etcd-servers=http://127.0.0.1:2379",
        "--admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota",
        "--client-ca-file=/srv/kubernetes/ca.crt",
        "--basic-auth-file=/srv/kubernetes/basic_auth.csv",
        "--min-request-timeout=300",
        "--tls-cert-file=/srv/kubernetes/server.cert",
        "--tls-private-key-file=/srv/kubernetes/server.key",
        "--token-auth-file=/srv/kubernetes/known_tokens.csv",
        "--allow-privileged=true",
        "--v=2"
      ],
      "volumeMounts": [
        {
          "name": "data",
          "mountPath": "/srv/kubernetes"
        }
      ]
    },

and I see the following in the pod where api server is running:

root@k8-master:/srv/kubernetes# cat basic_auth.csv
admin,admin,admin

root@k8-master:/srv/kubernetes# cat known_tokens.csv
0bApr0zB1yRry0WOUQqoC8qth8VJACzJ,admin,admin
7YoDgG4hRXOx3AWMqCGyud6vSXHMHr0S,kubelet,kubelet
edzMYIJazrYl7W7q7ZR2RSZIOm288xXD,kube_proxy,kube_proxy
root@k8-master:/srv/kubernetes#

Do I need a bearer token here given this config ? If so,  are there any instructions on how to create one ? I haven't done that before. If I were to use basic_auth, what is the username/passwd that I should be trying ?

Also, I checked how kubelet was running on one of the worker nodes and

/hyperkube kubelet --allow-privileged --api-servers=http://192.168.56.101:8080 --config=/etc/kubernetes/manifests-multi --cluster-dns=10.0.0.10 --cluster-domain=cluster.local --hostname-override=192.168.56.103 --v=2

Here master's IP is 192.168.56.101 (worker's IP is 192.168.56.103) and is using http. These are host's IP addresses and hence I am not hard coding these in the Prometheus config (as they don't belong to this namespace), instead trying to use 10.0.0.1:443 (o/p kubectl get services shows this is where kubernetes service is running) or not using any IP at all to let Prometheus guess and discover it as Matthias was suggesting.

Thanks,
SK

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages