A potential issue with your Blackbox DNS probes

22 views
Skip to first unread message

Chris Siebenmann

unread,
Jun 25, 2024, 6:19:35 PM (12 days ago) Jun 25
to Prometheus Users, cks.prom...@cs.toronto.edu
To make a long story short, we've been having mysterious probe failures
with one of our Blackbox DNS probes against (only) some DNS servers that
turned out to be because Blackbox UDP DNS probes have a 512-byte limit
on the size of the reply, because Blackbox doesn't currently set EDNS
options to increase the allowed reply size and doesn't fall back to a
TCP query if the UDP query fails because of truncation. We think this
was partially due to these DNS servers using DNS cookies, which
increases the reply size.

(Our DNS probe checks not just for a successful reply but that the query
resolved to at least one A record, so some of the time the reply could
be long enough that the truncated version didn't include any of the A
records.)

Right now the only way to know for sure that your DNS query failed
because of truncation is to examine Blackbox probe logs, usually through
its web interface (but you can manually query with '..&debug=true'), and
notice that one of the log messages reports something like 'flags: qr tc
rd ra;' (the 'tc' is the important bit). If you are sure you know how
many resource records should in the various sections of the DNS replies,
you can check if the probe got the right number of RRs using the
probe_dns_*_rrs metrics.

For DNS servers that accept TCP connections, you can work around this by
switching your Blackbox DNS module to using TCP instead of the (default)
UDP.

(I suspect that most people will never run into this, but for our sins
we check some external DNS names that have long CNAME chains and other
fun things.)

- cks

Ben Kochie

unread,
Jun 26, 2024, 1:51:28 AM (11 days ago) Jun 26
to Chris Siebenmann, Prometheus Users
Thanks for the detailed post. Sounds like a feature request/bug report. I would file an issue on GitHub, this should be easily solved.


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/1690853.1719353967%40apps0.cs.toronto.edu.

Chris Siebenmann

unread,
Jun 26, 2024, 9:17:13 AM (11 days ago) Jun 26
to Ben Kochie, Chris Siebenmann, Prometheus Users
I filed two issues for Blackbox on Github, one for exposing at least the
'tc' flag state as a metric and one for allowing you to have Blackbox
set an EDNS increased size (which is supported by the underlying Go DNS
library Blackbox uses). I didn't file an issue for UDP to TCP fallback
because I suspect that this is out of scope for Blackbox and anyway it
raises design questions of, for example, how the metrics should work
(since on a fallback Blackbox is now making two DNS requests).

For any interested parties, these are:
https://github.com/prometheus/blackbox_exporter/issues/1258
https://github.com/prometheus/blackbox_exporter/issues/1259

- cks
Reply all
Reply to author
Forward
0 new messages