NOT All the Sharded Prometheus scrape targets successfully

49 views
Skip to first unread message

Jiaxu LIU

unread,
Mar 13, 2020, 11:18:57 PM3/13/20
to Prometheus Users

Hello Prom experts,


I am trying to use Prometheus sharding feature. 


But I cannot query metrics in all of the leaf Proms. Is this a bug or configuration issue?


Could you please help explain? Thanks a lot in advance!

Prom image is prom/prometheus:v2.13.1



The diagram is as below:
4 containers are running in a single host, with 3 leaf Proms listening on 9090, 9091, 9092 respectively, and one federated prom listening on 19090.

I meant the data of node exporter(localhost:9100/metrics) are split into three shards, which are scraped by Prom1, Prom2, Prom3, and then all the metrics are federated in FederatedProm container. This is a very typical use case for Prom sharding.

Prom1(localhost:9090) ----->

Prom2(localhost:9091)----- > FederatedProm(localhost:19090)

Prom3(localhost:9092) ----->

And a node exporter is listening on localhost:9100 as metric data source.

Each leaf Prom's configurations are as below:

Prom1:

global:
  scrape_interval: 1m
  scrape_timeout: 10s
  evaluation_interval: 10s
  external_labels:
    slave: 0
scrape_configs:
  - job_name: 'local_prometheus'
    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    - targets: ['localhost:9100']
    relabel_configs:
    - source_labels: [__address__]
      modulus:       3    # 3 slaves
      target_label:  __tmp_hash
      action:        hashmod
    - source_labels: [__tmp_hash]
      regex:         ^0$  # This is the 1st  slave
      action:        keep

Prom2:

global:
  scrape_interval: 1m
  scrape_timeout: 10s
  evaluation_interval: 10s
  external_labels:
    slave: 1
scrape_configs:
  - job_name: 'local_prometheus'
    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    - targets: ['localhost:9100']
    relabel_configs:
    - source_labels: [__address__]
      modulus:       3    # 3 slaves
      target_label:  __tmp_hash
      action:        hashmod
    - source_labels: [__tmp_hash]
      regex:         ^1$  # This is the 2nd slave
      action:        keep

Prom3 config:

global:
  scrape_interval: 1m
  scrape_timeout: 10s
  evaluation_interval: 10s
  external_labels:
    slave: 2
scrape_configs:
  - job_name: 'local_prometheus'
    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    - targets: ['localhost:9100']
    relabel_configs:
    - source_labels: [__address__]
      modulus:       3    # 3 slaves
      target_label:  __tmp_hash
      action:        hashmod
    - source_labels: [__tmp_hash]
      regex:         ^2$  # This is the third slave
      action:        keep
=================================

Prom1's console:
image

image
See the "Dropped", it is very confusing...

Prom2's console:
image

Prom3's console:
image

So my question is: why can I find ONLY active target for node-exporter job data in console of Prom2, but find there is no active target for node-exporter job in console of Prom1 and Prom3?

My federated Prom's config is as below:

global:
  scrape_interval: 1m
  scrape_timeout: 10s
  evaluation_interval: 10s
scrape_configs:
  - job_name: 'federated_node_exporter'
    # metrics_path defaults to '/metrics'
    metrics_path: /federate
    # scheme defaults to 'http'.
    honor_labels: true
    honor_timestamps: true
    params:
      match[]:
        - node_disk_reads_completed_total
        - node_cpu_seconds_total
    static_configs:
    - targets:
      - 'localhost:9090'
      - 'localhost:9091'
      - 'localhost:9092'

And why do I see metrics from ONLY Prom2, but NO metrics from Prom1 and Prom3?

image

Thanks a lot in advance!

Nice Regards
Casper Liu

Brian Candler

unread,
Mar 14, 2020, 6:20:24 AM3/14/20
to Prometheus Users
It's a sharded config.  That means each target is only scraped by one of the three nodes (or put another way: each node only scrapes one third of the targets given).

All three nodes have the same config with one node:

  - job_name: 'node_exporter'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    - targets: ['localhost:9100']

But the string "localhost:9100" always hashes to the same value.  So only that node will process it.

You should use your hashmod config for a big list of remote targets.  You could either include all three nodes as named targets (in which case they'll all be scraped, but not necessarily by themselves); or you can have a separate job for scraping localhost which *doesn't* use the hashmod.
Reply all
Reply to author
Forward
0 new messages