Blackbox exporter not working properly . Need urgent help.

188 views
Skip to first unread message

Pooja Chauhan

unread,
Jul 10, 2020, 9:13:21 AM7/10/20
to Prometheus Users
Hi ,
I have added urls for blackbox monitoring in prometheus and also added alert if url is not giving 302 redirect for 5mins. But the output of blackbox exporter is constantly giving zero value if the urls give 302 value still and constant alerts are coming which are false. Can someone pls help me with this.It was working fine for past days but today it gives this weird issue .

prometheus.yml

  - job_name: 'blackbox'
    metrics_path: /probe
    params:
      module: [http_3xx] # Look for a HTTP 302 response.
    static_configs:
      - targets:
        - https://..........

    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115 # blackbox exporter

Alertmanager.yml configuration :


  - receiver: Monitoring
    group_wait: 15m
    match_re:
      severity: url-down

alert.rules.yml :


  - alert: URL-Monitoring
    expr: (sum(probe_success{job="blackbox"}) by (instance)) == 0
    for: 5m
    labels:
      severity: url-down
    annotations:
      summary: Health API endpoint down for '{{ $labels.instance }}'





Brian Candler

unread,
Jul 10, 2020, 2:48:48 PM7/10/20
to Prometheus Users
You haven't shown the most important bit, which is your blackbox_exporter config where you've defined the http_3xx module.

Also, show the result if you scrape blackbox_exporter manually using curl:

curl 'localhost:9115/probe?target=https://foo.example.com&module=http_3xx'

Pooja Chauhan

unread,
Jul 11, 2020, 4:21:25 AM7/11/20
to Prometheus Users
blackbox.yml :

modules:
  http_2xx:
    prober: http
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2"]
      valid_status_codes: [] 
      method: GET
  http_3xx:
    prober: http
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2"]
      valid_status_codes: [302,304,301]  # Defaults to 2xx
      no_follow_redirects: true
      method: GET

When i am doing manual check using the debug step i am getting late response in some cases and also some time probe success as 0 even if the url is up .

$ curl -s "localhost:9115/probe?debug=true&target=https://foo.example.com&module=http_3xx" | grep -v \#
Logs for the probe:
ts=2020-07-11T08:14:45.652556074Z caller=main.go:304 module=http_3xx target=https://foo.example.com level=info msg="Beginning probe" probe=http timeout_seconds=119.5
ts=2020-07-11T08:14:45.652666819Z caller=http.go:323 module=http_3xx target=https://foo.example.com level=info msg="Resolving target address" ip_protocol=ip6
ts=2020-07-11T08:14:45.652924988Z caller=http.go:323 module=http_3xx target=https://foo.example.com level=info msg="Resolved target address" ip=10.50.200.70
ts=2020-07-11T08:14:45.652973246Z caller=client.go:252 module=http_3xx target=https://foo.example.com level=info msg="Making HTTP request" url=https://0.0.00.0 host=https://foo.example.com
ts=2020-07-11T08:16:45.152652862Z caller=main.go:119 module=http_3xx target=https://foo.example.com level=error msg="Error for HTTP request" err="Get \"https://0.0.00.0\": context deadline exceeded"
ts=2020-07-11T08:16:45.152700889Z caller=main.go:119 module=http_3xx target=https://foo.example.com level=info msg="Response timings for roundtrip" roundtrip=0 start=2020-07-11T08:14:45.653032203Z dnsDone=2020-07-11T08:14:45.653032203Z connectDone=2020-07-11T08:16:45.152647141Z gotConn=0001-01-01T00:00:00Z responseStart=0001-01-01T00:00:00Z end=0001-01-01T00:00:00Z
ts=2020-07-11T08:16:45.152728204Z caller=main.go:304 module=http_3xx target=https://foo.example.com level=error msg="Probe failed" duration_seconds=119.500129397



Metrics that would have been returned:
probe_dns_lookup_time_seconds 0.000264448
probe_duration_seconds 119.500129397
probe_failed_due_to_regex 0
probe_http_content_length 0
probe_http_duration_seconds{phase="connect"} 0
probe_http_duration_seconds{phase="processing"} 0
probe_http_duration_seconds{phase="resolve"} 0.000264448
probe_http_duration_seconds{phase="tls"} 0
probe_http_duration_seconds{phase="transfer"} 0
probe_http_redirects 0
probe_http_ssl 0
probe_http_status_code 0
probe_http_uncompressed_body_length 0
probe_http_version 0
probe_ip_addr_hash 1.699831243e+09
probe_ip_protocol 4
probe_success 0



Module configuration:
prober: http
http:
    valid_status_codes:
        - 302
        - 304
        - 301
    valid_http_versions:
        - HTTP/1.1
        - HTTP/2
    ip_protocol_fallback: true
    no_follow_redirects: true
    method: GET
tcp:
    ip_protocol_fallback: true
icmp:
    ip_protocol_fallback: true
dns:
    ip_protocol_fallback: true

Brian Candler

unread,
Jul 11, 2020, 5:01:02 AM7/11/20
to Prometheus Users
I don't know why you showed two different configurations for blackbox exporter, or why you say "Metrics that would have been returned" - either they were, or they weren't.

It appears to be failing to connect - 120 seconds is a very long deadline.

I suggest you:

1. Try making a direct connection yourself, from the same machine where blackbox_exporter is running:

openssl s_client foo.example.com:443 -servername foo.example.com
GET / HTTP/1.0

and/or just


2. Use tcpdump to look at the traffic between the blackbox_exporter node and the target host.  If you're seeing only outbound TCP SYNs then it's some sort of firewall or routing problem.

Reply all
Reply to author
Forward
0 new messages