Hi there!
I have been busting my head for 2 days now and hope somebody has a pointer or two on what I am missing here.
I have a Prometheus setup running on AKS 1.19.9
bitnami/prometheus-operator:0.48.1
bitnami/prometheus:2.25.2
blackbox-exporter:0.18.0
Use case:
I want to check when a DNS is no longer responding.
10.25.22.15
10.25.33.15
10.105.12.21
Problem:
I have the job running and can see metrics being scraped.
I cannot seem to figure out how to distinguish between the metrics of the 3 dns listed.
I was expecting 3 separate time series for each, in order to determine when one has a probe that is failing. At this stage, I can only see 1 time series.
I am suspecting that my skills at relabeling are letting me down...
Config
I have the following definition in my prometheus-config.yaml:
kind: Probe
metadata:
name: bbexporterdns
namespace: monitoring
labels:
prometheus: prometheus
role: probes
spec:
jobName: bbexporter_dns
prober:
url: bbexporter.monitoring.svc:9115
scheme: http
path: /probe
module: dns_tcp
targets:
staticConfig:
static:
- '10.25.22.15'
- '10.25.33.15'
- '10.105.12.21'
relabelingConfigs:
- sourceLabels: [__address__]
targetLabel: __param_target
- sourceLabels: [__param_target]
targetLabel: instance
---
In the bbexporter.yaml I have the following configured:
apiVersion: v1
kind: ConfigMap
metadata:
name: bbexporter-cfg
namespace: monitoring
data:
bbexporter.yml: |
modules:
dns_tcp:
prober: dns
dns:
preferred_ip_protocol: "ip4"
transport_protocol: "tcp"
tls_config:
insecure_skip_verify: true
Current results:
I can see the following metrics:
# HELP probe_dns_additional_rrs Returns number of entries in the additional resource record list
# TYPE probe_dns_additional_rrs gauge
probe_dns_additional_rrs 0
# HELP probe_dns_answer_rrs Returns number of entries in the answer resource record list
# TYPE probe_dns_answer_rrs gauge
probe_dns_answer_rrs 1
# HELP probe_dns_authority_rrs Returns number of entries in the authority resource record list
# TYPE probe_dns_authority_rrs gauge
probe_dns_authority_rrs 0
# HELP probe_dns_duration_seconds Duration of DNS request by phase
# TYPE probe_dns_duration_seconds gauge
probe_dns_duration_seconds{phase="connect"} 0.018158141
probe_dns_duration_seconds{phase="request"} 0.017735813
probe_dns_duration_seconds{phase="resolve"} 1.3401e-05
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 1.3401e-05
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.036181472
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 2.764160084e+09
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
In the Prometheus browser, I see the following metric: probe_dns_duration_seconds (see the attachment)
I just have no idea to which of the dns it applies - i.e. which dns was probed for this?
I would greatly appreciate some help!
Many thanks :-)
Morné Kruger