fluctuating prometheus_sd_discovered_targets for consul

101 views
Skip to first unread message

david...@gmail.com

unread,
May 9, 2019, 11:32:43 AM5/9/19
to Prometheus Users

prometheus_sd_discovered_targets{config=~"consul.*"}

Screenshot from 2019-05-09 17-15-48.png


this metric is constantly fluctuating and i am not sure what it means

i have tried to run script counting nodes from consul through catalog services and catalog service endpoint but the number of nodes was constant 754(same as upper amount from prometheus_sd_discovered_targets)

there are also no missing metrics in prometheus


can someone help me understand what this metric actually means and why its jumping up and down seemingly randomly?

Simon Pasquier

unread,
May 10, 2019, 9:50:54 AM5/10/19
to david...@gmail.com, Prometheus Users
Which version of Prometheus do you run?
IIRC there's been an issue with Consul SD that got fixed in Prometheus 2.9 when we bumped the Consul client library.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c6027680-aca0-4e8a-b8ce-e65e50ee5b70%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

david...@gmail.com

unread,
May 13, 2019, 10:31:17 AM5/13/19
to Prometheus Users
we are running Prometheus 2.9.2 and Consul 1.4.4


On Friday, 10 May 2019 15:50:54 UTC+2, Simon Pasquier wrote:
Which version of Prometheus do you run?
IIRC there's been an issue with Consul SD that got fixed in Prometheus 2.9 when we bumped the Consul client library.

On Thu, May 9, 2019 at 5:32 PM <david...@gmail.com> wrote:

prometheus_sd_discovered_targets{config=~"consul.*"}

Screenshot from 2019-05-09 17-15-48.png


this metric is constantly fluctuating and i am not sure what it means

i have tried to run script counting nodes from consul through catalog services and catalog service endpoint but the number of nodes was constant 754(same as upper amount from prometheus_sd_discovered_targets)

there are also no missing metrics in prometheus


can someone help me understand what this metric actually means and why its jumping up and down seemingly randomly?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.

Simon Pasquier

unread,
May 13, 2019, 11:18:06 AM5/13/19
to david...@gmail.com, Prometheus Users
Can you share your Prometheus config?
What does prometheus_sd_consul_rpc_failures_total look like?

prometheus_sd_discovered_targets reports the current number of targets
discovered per scrape configuration. If the number of services/nodes
is constant then the metric should be constant too.

On Mon, May 13, 2019 at 4:31 PM <david...@gmail.com> wrote:
>
> we are running Prometheus 2.9.2 and Consul 1.4.4
>
> On Friday, 10 May 2019 15:50:54 UTC+2, Simon Pasquier wrote:
>>
>> Which version of Prometheus do you run?
>> IIRC there's been an issue with Consul SD that got fixed in Prometheus 2.9 when we bumped the Consul client library.
>>
>> On Thu, May 9, 2019 at 5:32 PM <david...@gmail.com> wrote:
>>>
>>> prometheus_sd_discovered_targets{config=~"consul.*"}
>>>
>>>
>>> this metric is constantly fluctuating and i am not sure what it means
>>>
>>> i have tried to run script counting nodes from consul through catalog services and catalog service endpoint but the number of nodes was constant 754(same as upper amount from prometheus_sd_discovered_targets)
>>>
>>> there are also no missing metrics in prometheus
>>>
>>>
>>> can someone help me understand what this metric actually means and why its jumping up and down seemingly randomly?
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.
>>> To post to this group, send email to promethe...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c6027680-aca0-4e8a-b8ce-e65e50ee5b70%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To post to this group, send email to promethe...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/93923ab0-8806-42cd-b8ec-2c96dfc631a3%40googlegroups.com.

david...@gmail.com

unread,
May 13, 2019, 11:47:39 AM5/13/19
to Prometheus Users
prometheus_sd_consul_rpc_failures_total in last 4 days its rose by 2 (1100 > 1102)


global:
  scrape_interval: 30s
 evaluation_interval: 30s

alerting:
 alertmanagers:
 - static_configs:
   - targets:
     - ${join("\n      - ", split(",", alertmanagers))}

rule_files:
 - "rules/*.yml"

scrape_configs:
 - job_name: 'prometheus'
    static_configs:
   - targets: ['localhost:9090']

  - job_name: consul discovery service exporter
   sample_limit: 2000
   consul_sd_configs:
     - server: consul-prod:8500
       allow_stale: true
     - server: consul-test:8500
       allow_stale: true

    relabel_configs:
     - source_labels:
       - __meta_consul_tags
       regex: '.*,prometheus-io-ignore,.*'
       action: drop

      # require all node metadata to be present
     - source_labels:
       - __meta_consul_metadata_team
       - __meta_consul_metadata_environment
       - __meta_consul_metadata_service
       - __meta_consul_metadata_component
       regex: '[a-z]{2};[a-z]+;[a-z]{2,3}-[a-z0-9]{2,8};[a-z0-9-]+'
       action: keep

      # override port via service tag prometheus-io-port-<port>
     - source_labels:
       - __meta_consul_tags
       regex: '.*,prometheus-io-port-([1-9][0-9]{0,4}),.*'
       target_label: __meta_consul_service_port

      # override port via service metadata
     - source_labels:
       - __meta_consul_service_metadata_prometheus_io_port
       regex: ([1-9][0-9]{0,4})
       target_label: __meta_consul_service_port

      # require service port
     - source_labels:
       - __meta_consul_service_port
       regex: '0'
       action: drop

      # set instance to full address with port
     - source_labels:
       - __meta_consul_node
       - __meta_consul_service_port
       target_label: __address__
       regex: (.+);(.+)
       replacement: "$1:$2"

      # modify address with proper port
     - source_labels:
       - __address__
       target_label: instance

      # create job name as <service_id>-<component_id>
     - source_labels:
       - __meta_consul_metadata_service
       - __meta_consul_metadata_component
       target_label: job
       regex: (.+);(.+)
       replacement: "$1-$2"

      # regexes of metadata are checked above
     - source_labels:
       - __meta_consul_metadata_environment
       target_label: environment
     - source_labels:
       - __meta_consul_metadata_team
       target_label: team
     - source_labels:
       - __meta_consul_metadata_service
       target_label: service
     - source_labels:
       - __meta_consul_metadata_component
       target_label: component


Simon Pasquier

unread,
May 14, 2019, 4:19:41 AM5/14/19
to david...@gmail.com, Prometheus Users
Do you have services with the same names in consul-prod:8500 and
consul-test:8500?
Looking at the code, it might be that Prometheus overwrites the
targets in that case.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To post to this group, send email to promethe...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/883669e1-91d8-4907-857a-962cd8fbc3e1%40googlegroups.com.

david...@gmail.com

unread,
May 14, 2019, 4:42:58 AM5/14/19
to Prometheus Users
it seems you were absolutely correct
i just duplicate the SD for test and prod, both of them match upper and lower value of prometheus_sd_discovered_targets

thank you

david...@gmail.com

unread,
May 14, 2019, 5:57:53 AM5/14/19
to Prometheus Users

Screenshot from 2019-05-14 11-54-34.png

now it works
although before it discovered all nodes correctly

thank you 
Reply all
Reply to author
Forward
0 new messages