I’m getting a lot of alerts of downtime, however I don’t know how to interpret it or how to dig deeper.
Here is my setup:
blockbox.yml
modules:
http:
prober: http
timeout: 60s
prometheus.rules:
...
ALERT service_down
IF probe_success == 0
FOR 15m
prometheus.yml:
...
- job_name: 'blackbox-exporter'
metrics_path: /probe
scrape_interval: 1m
params:
module: ['http']
static_configs:
- targets: # targets to be tested by the blackbox exporter
- https://domain1.com/
- https://domain2.com/
relabel_configs:
- source_labels: [__address__] # set param 'target' to the original target
regex: (.*)
target_label: __param_target
replacement: ${1}
- source_labels: [__param_target] # set label 'instance' to it as well
regex: (.*)
target_label: instance
replacement: ${1}
- source_labels: [] # set __address__ to the blackbox exporter
regex: .*
target_label: __address__
replacement: blackbox_exporter:9115
...
I keep getting alerted about “domain1.com”. The Slack alert says,
[FIRING:1] service_down (https://domain1.com/ blackbox-exporter)
However I never observe an actual issue with it. If I immediately open my browser to domain1.com everything is fine.
Where do I go from here?
blackbox_exporter try the endpoints — when prometheus scrapes it, synchronously?Thanks.
- When does
blackbox_exportertry the endpoints — when prometheus scrapes it, synchronously?
- How can I determine what the cause was? Did it error out (and if so what was the status code — I guess the response body is asking too much)? Or did it time out (I guess that could only have a boolean answer — would be nice to know more specifically how long too long responses are)?
- Is this the right tool for the job or is there some other tool I should use together with or instead of this?
- Is this the right tool for the job or is there some other tool I should use together with or instead of this?
This is the correct tool.
I get back these. Now what? What do each of them mean?
ALERTS{alertname="service_down",alertstate="firing",instance="https://domain1.com/",job="blackbox-exporter"}
probe_http_redirects{instance="https://domain1.com/",job="blackbox-exporter"}
probe_http_content_length{instance="https://domain1.com/",job="blackbox-exporter"}
scrape_duration_seconds{instance="https://domain1.com/",job="blackbox-exporter"}
probe_http_redirects{instance="https://domain2/",job="blackbox-exporter"}
probe_http_content_length{instance="https://domain2/",job="blackbox-exporter"}
probe_duration_seconds{instance="https://domain2/",job="blackbox-exporter"}
probe_ssl_earliest_cert_expiry{instance="https://domain2/",job="blackbox-exporter"}
probe_success{instance="https://domain1.com/",job="blackbox-exporter"}
probe_http_ssl{instance="https://domain2/",job="blackbox-exporter"}
probe_http_ssl{instance="https://domain1.com/",job="blackbox-exporter"}
scrape_samples_scraped{instance="https://domain2/",job="blackbox-exporter"}
probe_http_status_code{instance="https://domain2/",job="blackbox-exporter"}
probe_http_status_code{instance="https://domain1.com/",job="blackbox-exporter"}
up{instance="https://domain2/",job="blackbox-exporter"}
scrape_samples_post_metric_relabeling{instance="https://domain1.com/",job="blackbox-exporter"}
scrape_samples_scraped{instance="https://domain1.com/",job="blackbox-exporter"}
probe_duration_seconds{instance="https://domain1.com/",job="blackbox-exporter"}
probe_success{instance="https://domain2/",job="blackbox-exporter"}
probe_ip_protocol{instance="https://domain1.com/",job="blackbox-exporter"}
scrape_samples_post_metric_relabeling{instance="https://domain2/",job="blackbox-exporter"}
probe_ip_protocol{instance="https://domain2/",job="blackbox-exporter"}
scrape_duration_seconds{instance="https://domain2/",job="blackbox-exporter"}
up{instance="https://domain1.com/",job="blackbox-exporter"}
I get back these. Now what? What do each of them mean?
Is this what you mean?
(Obviously domain1.com is substituted for the real domain.)
$ sudo docker-compose exec blackbox_exporter wget -O - -q 'http://localhost:9115/probe?module=http&target=https://domain1.com'
probe_ip_protocol 4
probe_http_status_code 0
probe_http_content_length 0
probe_http_redirects 1
probe_http_ssl 0
probe_duration_seconds 0.022083
probe_success 0
Is this what you mean?
(Obviously domain1.com is substituted for the real domain.)
$ sudo docker-compose exec blackbox_exporter wget -O - -q 'http://localhost:9115/probe?module=http&target=https://domain1.com' probe_ip_protocol 4 probe_http_status_code 0 probe_http_content_length 0 probe_http_redirects 1 probe_http_ssl 0 probe_duration_seconds 0.022083 probe_success 0
On 3 July 2017 at 07:16, Naftoli Gugenheim <nafto...@gmail.com> wrote:Is this what you mean?
(Obviously domain1.com is substituted for the real domain.)
$ sudo docker-compose exec blackbox_exporter wget -O - -q 'http://localhost:9115/probe?module=http&target=https://domain1.com' probe_ip_protocol 4 probe_http_status_code 0 probe_http_content_length 0 probe_http_redirects 1 probe_http_ssl 0 probe_duration_seconds 0.022083 probe_success 0That's an older version, but it looks like correcting to the 2nd server after the redirect failed.
On 3 July 2017 at 10:43, Brian Brazil <brian....@robustperception.io> wrote:On 3 July 2017 at 07:16, Naftoli Gugenheim <nafto...@gmail.com> wrote:Is this what you mean?
(Obviously domain1.com is substituted for the real domain.)
$ sudo docker-compose exec blackbox_exporter wget -O - -q 'http://localhost:9115/probe?module=http&target=https://domain1.com' probe_ip_protocol 4 probe_http_status_code 0 probe_http_content_length 0 probe_http_redirects 1 probe_http_ssl 0 probe_duration_seconds 0.022083 probe_success 0That's an older version,
but it looks like correcting to the 2nd server after the redirect failed.Connecting, not correcting.
Brian
--Brian Brazil--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/RZUlIh9UF-Q/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAHJKeLrK78YaXa2aEo4h3zEPSyhoVCC7554jcZkJkjkJm8nzyw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
On 3 July 2017 at 10:43, Brian Brazil <brian.brazil@robustperception.io> wrote:On 3 July 2017 at 07:16, Naftoli Gugenheim <nafto...@gmail.com> wrote:Is this what you mean?
(Obviously domain1.com is substituted for the real domain.)
$ sudo docker-compose exec blackbox_exporter wget -O - -q 'http://localhost:9115/probe?module=http&target=https://domain1.com' probe_ip_protocol 4 probe_http_status_code 0 probe_http_content_length 0 probe_http_redirects 1 probe_http_ssl 0 probe_duration_seconds 0.022083 probe_success 0That's an older version,
Well apparently it's the latest version pushed to the docker image prom/blackbox-exporter:latest. It isn't pulling anything newer (I tried before posting).
but it looks like correcting to the 2nd server after the redirect failed.Connecting, not correcting.What do you mean "2nd"?
Also that still doesn't answer the question. How did it fail? What was the failure mode? How do I get more information?
Like I said, it works fine for me in the browser, so on a practical level it's a false positive.Also I still don't have the answer to the larger question. What are the metrics that can be returned (are they always the same or not? how would one know?),
and what do they mean (what are the possible values and what do they represent)?
Brian
--Brian Brazil
To unsubscribe from this group and all its topics, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAHJKeLrK78YaXa2aEo4h3zEPSyhoVCC7554jcZkJkjkJm8nzyw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
On 3 July 2017 at 20:43, Naftoli Gugenheim <nafto...@gmail.com> wrote:
On 3 July 2017 at 10:43, Brian Brazil <brian....@robustperception.io> wrote:On 3 July 2017 at 07:16, Naftoli Gugenheim <nafto...@gmail.com> wrote:Is this what you mean?
(Obviously domain1.com is substituted for the real domain.)
$ sudo docker-compose exec blackbox_exporter wget -O - -q 'http://localhost:9115/probe?module=http&target=https://domain1.com' probe_ip_protocol 4 probe_http_status_code 0 probe_http_content_length 0 probe_http_redirects 1 probe_http_ssl 0 probe_duration_seconds 0.022083 probe_success 0That's an older version,
Well apparently it's the latest version pushed to the docker image prom/blackbox-exporter:latest. It isn't pulling anything newer (I tried before posting).
Ah, I thought I'd already released that. 0.6.0 was just released with these changes.
but it looks like correcting to the 2nd server after the redirect failed.Connecting, not correcting.What do you mean "2nd"?probe_http_redirects is 1, so there was one redirect followed. This 2nd request failed.Also that still doesn't answer the question. How did it fail? What was the failure mode? How do I get more information?The log messages are your best option. There's plans for better diagnostics in future, but there's limits to what can be done with metrics for this sort of thing.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAHJKeLrK78YaXa2aEo4h3zEPSyhoVCC7554jcZkJkjkJm8nzyw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.