Incorrect number of worker nodes are showing in Grafana

379 views
Skip to first unread message

Muhammad Usman

unread,
Aug 15, 2022, 5:33:55 PM8/15/22
to Prometheus Users
Hi Everyone,

I installed Prometheus and Grafana on AWS EKS cluster, in grafana i observed wrong number of worker nodes is showing. 
As i enter the command "kubectl get nodes" i can see only 40 worker nodes are in cluster, when i checked in grafana this is the query "sum(kube_node_info{node=~"$node"})" it is showing 160,  what could be the reason ?  Below is my prometheus config settings,

- job_name: kubernetes-service-endpoints
          kubernetes_sd_configs:
            - role: endpoints
          relabel_configs:
            - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
              action: keep
              regex: true
            - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
              action: replace
              target_label: __scheme__
              regex: (https?)
            - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
              action: replace
              target_label: __metrics_path__
              regex: (.+)
            - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
              action: replace
              target_label: __address__
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $1:$2
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - source_labels: [__meta_kubernetes_namespace]
              action: replace
              target_label: kubernetes_namespace
            - source_labels: [__meta_kubernetes_service_name]
              action: replace
              target_label: kubernetes_name
            - source_labels: [__meta_kubernetes_pod_name]
              action: replace
              target_label: pod
            - source_labels: [__meta_kubernetes_pod_node_name]
              action: replace
              target_label: node
        - job_name: kubernetes-pods
          metrics_path: /actuator/prometheus
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
              action: keep
              regex: true
            - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
              action: replace
              regex: '([^:]+)(?::\d+)?;(\d+)'
              replacement: '$1:$2'
              target_label: __address__
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
              action: replace
              target_label: __metrics_path__
              regex: (.+)
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - source_labels: [__meta_kubernetes_namespace]
              action: replace
              target_label: kubernetes_namespace
            - source_labels: [__meta_kubernetes_pod_name]
              action: replace
              target_label: kubernetes_pod_name
            - source_labels: [__meta_kubernetes_service_name]
              action: replace
              target_label: kubernetes_name
            - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_component]
              action: replace
              target_label: service (edited) 

Brian Candler

unread,
Aug 16, 2022, 2:53:02 AM8/16/22
to Prometheus Users
Go into prometheus's own web interface, and enter the query:
kube_node_info{node=~".+"}

Do you get 160 results, all with value 1?
If yes: inspecting the metrics and their labels should make it clear why you have 4 per node.

It may be obvious from a simple visual inspection. Otherwise, to drill down further you can do further queries to group them by any label of your choice, e.g.
 
count by (node) (kube_node_info)

will show you all the unique values of the "node" label and how many instances there are of each.

I notice that you are generating the "node" label from the name of a *pod*:

            - source_labels: [__meta_kubernetes_pod_node_name]
              action: replace
              target_label: node

So it might be that you're actually asking kubernetes for a list of pods or services, rather than nodes.

In the end, I'd say this is really a kubernetes question, not a prometheus question.  But you can use PromQL queries to explore the metrics you're getting back from kubernetes.
Reply all
Reply to author
Forward
0 new messages