dns_sd DNS queries limited to 512 bytes?

304 views
Skip to first unread message

craig....@fluxfederation.com

unread,
Dec 11, 2017, 7:22:20 PM12/11/17
to Prometheus Users
Hi,
   Just starting down the road of using Prometheus, and we're looking at using dns_sd to find all our nodes.  However, it seems that it is limited to ~512 byte replies; anything longer results in various DNS resolution failures, e.g.: "dns: bad rdlength", "dns: overflow unpacking uint16", "dns: overflow unpacking uint32".  Which error we get depends on the precise length of the response.  

This surprises me; I can only get to around 10 nodes per SRV record, which seems like a very low number; it would take a very short host-naming scheme to get many more.  Am I missing something about how dns_sd should be working or how I should be doing things?

(Yes, there are other ways I could achieve this, but dns_sd seems quite elegant, and works well with our future plans, so I'm curious to know if there's ways I can make it work)

Prometheus version is 2.0.0, if it matters.

Thanks,
Craig Miskell



Ben Kochie

unread,
Dec 12, 2017, 5:06:28 AM12/12/17
to craig....@fluxfederation.com, Prometheus Users
Prometheus should automatically upgrade to TCP for large DNS responses.

What DNS server are you using to host your SRV records?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/9bdbec6d-e8ca-402e-947f-18199a514619%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

craig....@fluxfederation.com

unread,
Dec 12, 2017, 4:58:27 PM12/12/17
to Prometheus Users
An excellent question: it is dnsmasq, which is where it all goes wrong.

For those playing at home:
*) Prometheus sends a query, with no EDNS0 extra-size options included
*) dnsmasq sends an oversize (>512byte) reply, with no TC bit set (bad dnsmasq, no cookie)
*) The Prometheus dns client is, justifiably, displeased with this offering

If I put bind9 on my prometheus node, forwarding to our dnsmasq servers, and make prometheus use that instead, it all works.  When bind9 replies to prometheus, it correctly truncates the reply and sets the TC bit; prometheus then re-requests with EDNS0 extra-size options set, and gets the full reply properly encoded by bind9.

Silly dnsmasq.  Now to figure out if I can reconfigure it to do the right thing...

Thank you for your time!

Craig Miskell


On Tuesday, 12 December 2017 23:06:28 UTC+13, Ben Kochie wrote:
Prometheus should automatically upgrade to TCP for large DNS responses.

What DNS server are you using to host your SRV records?
On Tue, Dec 12, 2017 at 1:22 AM, <craig....@fluxfederation.com> wrote:
Hi,
   Just starting down the road of using Prometheus, and we're looking at using dns_sd to find all our nodes.  However, it seems that it is limited to ~512 byte replies; anything longer results in various DNS resolution failures, e.g.: "dns: bad rdlength", "dns: overflow unpacking uint16", "dns: overflow unpacking uint32".  Which error we get depends on the precise length of the response.  

This surprises me; I can only get to around 10 nodes per SRV record, which seems like a very low number; it would take a very short host-naming scheme to get many more.  Am I missing something about how dns_sd should be working or how I should be doing things?

(Yes, there are other ways I could achieve this, but dns_sd seems quite elegant, and works well with our future plans, so I'm curious to know if there's ways I can make it work)

Prometheus version is 2.0.0, if it matters.

Thanks,
Craig Miskell




--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.

craig....@fluxfederation.com

unread,
Dec 12, 2017, 5:26:24 PM12/12/17
to Prometheus Users
For the record, a friend of mine who I trust deeply on DNS matters says that the DNS library should be able to handle that reply, and that the 512-byte limit is historical and a bit silly.  So it's possible my statement that the client is justifiably displeased is wrong.

I'm no longer confident in asserting either way... :)

Craig

Matt Palmer

unread,
Dec 12, 2017, 6:47:44 PM12/12/17
to Prometheus Users
On Mon, Dec 11, 2017 at 04:22:20PM -0800, craig....@fluxfederation.com wrote:
> Just starting down the road of using Prometheus, and we're looking at
> using dns_sd to find all our nodes. However, it seems that it is limited
> to ~512 byte replies; anything longer results in various DNS resolution
> failures, e.g.: "dns: bad rdlength", "dns: overflow unpacking uint16",
> "dns: overflow unpacking uint32". Which error we get depends on the
> precise length of the response.

Oh dear. It sounds like either the DNS library that Prometheus is using has
some sort of bug with EDNS0 support, or the DNS server you're using has bugs
with EDNS0. I'm running a dns_sd-based environment with some pretty
hefty SRV record lists (using Route53 as the DNS provider), and whilst I
haven't gone looking real hard, I don't recall seeing any problems (like
missing alerts or timeseries I expected to see) with large SRV lists.

If you can get pcaps of a complete request/response cycle between Prometheus
and your DNS server that causes the problem, you should be able to see which
end is at fault in wireshark. If that's beyond your ken, or if you can't
see any problems, send me the pcap and I'll take a look.

If the problem turns out to be at Prometheus' end, create an issue and ping
me; I've got a vested interest in making sure dns_sd works properly, I've
rummaged around in that code before, and there's a reasonable chance that I
broke whatever's gone wrong anyway.

> (Yes, there are other ways I could achieve this, but dns_sd seems quite
> elegant, and works well with our future plans, so I'm curious to know if
> there's ways I can make it work)

Yes, I'm a huge fan of DNS for service discovery, particularly RFC6763-based
full-blown SD -- I've written registration and querying systems for it, and
I'm talking about it at conferences. We also use it at $DAYJOB, and are
about to expand its deployment to more places.

- Matt

Matt Palmer

unread,
Dec 13, 2017, 1:36:08 AM12/13/17
to Prometheus Users
On Tue, Dec 12, 2017 at 01:58:27PM -0800, craig....@fluxfederation.com wrote:
> An excellent question: it is dnsmasq, which is where it all goes wrong.

Oh my.

> For those playing at home:
> *) Prometheus sends a query, with no EDNS0 extra-size options included
> *) dnsmasq sends an oversize (>512byte) reply, with no TC bit set (bad
> dnsmasq, no cookie)
> *) The Prometheus dns client is, justifiably, displeased with this offering
>
> If I put bind9 on my prometheus node, forwarding to our dnsmasq servers,
> and make prometheus use that instead, it all works. When bind9 replies to
> prometheus, it correctly truncates the reply and sets the TC bit;
> prometheus then re-requests with EDNS0 extra-size options set, and gets the
> full reply properly encoded by bind9.

Hmm... what's different about how BIND queries dnsmasq that allows it to
proceed? Is it just that it ignores the fact of the oversized reply, and
somehow manages to parse the response anyway? If there's a difference in
the way that BIND does the *query* (or sequence of queries), there may be
scope for changing Prometheus to mimic that (say, by sending the
EDNS0-enabled query first, perhaps).

> For the record, a friend of mine who I trust deeply on DNS matters says
> that the DNS library should be able to handle that reply, and that the
> 512-byte limit is historical and a bit silly. So it's possible my
> statement that the client is justifiably displeased is wrong.

The 512 byte limit is historical, and potentially a bit silly, but dnsmasq
is deranged for sending oversized responses, because some clients won't
handle it, and can't be updated to handle it. "Be conservative in what you
send" and all that.

Insofar as the problem exists in the DNS library that Prometheus uses
(https://github.com/miekg/dns), I think you're going to have to have the
argument with them about whether to support parsing over-sized but
non-truncated responses -- if there's nothing Prometheus can do differently
(short of changing DNS libraries, which I doubt is going to happen for a
problem which isn't *strictly* the client library's fault), then there's
nothing that can be fixed in Prometheus, and the problem will have to be
addressed elsewhere.

- Matt

Ben Kochie

unread,
Dec 13, 2017, 2:58:19 AM12/13/17
to Matt Palmer, Prometheus Users
I checked our vendoring of github.com/miekg/dns, it looks like we haven't updated since 2017-02-13T20:16:50Z.  (Both 2.0.0 and 1.8.2)

I looked over the commits since then, but I don't see anything relevant to packet size handling.  But it's probably worth updating to the current code.  The library made an official v1.0.0 release last week as part of the CoreDNS 1.0 release.

It's maybe worth filing an upstream issue with the DNS library.  Miek is pretty good about these things.

On Wed, Dec 13, 2017 at 7:36 AM, Matt Palmer <mpa...@hezmatt.org> wrote:
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/20171213063601.vuqjun6lighjbgmq%40hezmatt.org.

Ben Kochie

unread,
Dec 13, 2017, 3:20:34 AM12/13/17
to Matt Palmer, Prometheus Users
FYI: https://github.com/prometheus/prometheus/pull/3581

Probably won't fix anything, but there are some changes to the network handling in there.

Ben Kochie

unread,
Dec 13, 2017, 10:44:20 AM12/13/17
to craig....@fluxfederation.com, Prometheus Users
I talked to Miek, he recommended that we adjust Prometheus to bufsize=4096 in our initial DNS requests, this would allow us to parse the oversized results from dnsmasq, in addition to possibly reducing the number of TCP upgrades we do.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/17f4ef32-1602-41a4-a57b-a0c36fca57b0%40googlegroups.com.

craig....@fluxfederation.com

unread,
Dec 13, 2017, 2:54:23 PM12/13/17
to Prometheus Users


On Wednesday, 13 December 2017 19:36:08 UTC+13, Matt Palmer wrote:
On Tue, Dec 12, 2017 at 01:58:27PM -0800, craig....@fluxfederation.com wrote:
> An excellent question: it is dnsmasq, which is where it all goes wrong.

Oh my.

> For those playing at home:
> *) Prometheus sends a query, with no EDNS0 extra-size options included
> *) dnsmasq sends an oversize (>512byte) reply, with no TC bit set (bad
> dnsmasq, no cookie)
> *) The Prometheus dns client is, justifiably, displeased with this offering
>
> If I put bind9 on my prometheus node, forwarding to our dnsmasq servers,
> and make prometheus use that instead, it all works.  When bind9 replies to
> prometheus, it correctly truncates the reply and sets the TC bit;
> prometheus then re-requests with EDNS0 extra-size options set, and gets the
> full reply properly encoded by bind9.

Hmm... what's different about how BIND queries dnsmasq that allows it to
proceed?  Is it just that it ignores the fact of the oversized reply, and
somehow manages to parse the response anyway?  If there's a difference in
the way that BIND does the *query* (or sequence of queries), there may be
scope for changing Prometheus to mimic that (say, by sending the
EDNS0-enabled query first, perhaps).

BIND accepts the large reply from dnsmasq but is kind enough to truncate the response to Prometheus to 512 bytes, with the TC bit set, which triggers Prometheus to re-request with the extra-size options and get the full response (and presumably upgrade to TCP if required for even larger responses).  

> For the record, a friend of mine who I trust deeply on DNS matters says
> that the DNS library should be able to handle that reply, and that the
> 512-byte limit is historical and a bit silly.  So it's possible my
> statement that the client is justifiably displeased is wrong.

The 512 byte limit is historical, and potentially a bit silly, but dnsmasq
is deranged for sending oversized responses, because some clients won't
handle it, and can't be updated to handle it.  "Be conservative in what you
send" and all that.
Quite.  I'm tending towards the opinion that dnsmasq is, if not wrong, being exceedingly unhelpful.
 

Insofar as the problem exists in the DNS library that Prometheus uses
(https://github.com/miekg/dns), I think you're going to have to have the
argument with them about whether to support parsing over-sized but
non-truncated responses -- if there's nothing Prometheus can do differently
(short of changing DNS libraries, which I doubt is going to happen for a
problem which isn't *strictly* the client library's fault), then there's
nothing that can be fixed in Prometheus, and the problem will have to be
addressed elsewhere. 

I think the only thing the DNS lib could do differently would be to make the first request include the EDNS0 extra-size options rather than waiting to receive a reply with TC set.   

craig....@fluxfederation.com

unread,
Dec 13, 2017, 2:56:25 PM12/13/17
to Prometheus Users
If that would make it use the EDNS0 size options on the first request, then I'm pretty confident that would solve this particular problem.  I'm happy to run test-builds to see if it does.

Matt Palmer

unread,
Dec 13, 2017, 5:24:46 PM12/13/17
to Prometheus Users
Well, now, *that's* an interesting take on the situation. Prometheus could,
I suppose, do something similar, if the DNS client library exposed the
message length. There's something *like* that, in Msg.Len(), but it appears
to repack the message before calculating the length, so I'm not 100%
confident it would represent the *actual* length of the response that was
sent over the wire (due to response compression).

> > Insofar as the problem exists in the DNS library that Prometheus uses
> > (https://github.com/miekg/dns), I think you're going to have to have the
> > argument with them about whether to support parsing over-sized but
> > non-truncated responses -- if there's nothing Prometheus can do
> > differently
> > (short of changing DNS libraries, which I doubt is going to happen for a
> > problem which isn't *strictly* the client library's fault), then there's
> > nothing that can be fixed in Prometheus, and the problem will have to be
> > addressed elsewhere.
>
> I think the only thing the DNS lib could do differently would be to make
> the first request include the EDNS0 extra-size options rather than waiting
> to receive a reply with TC set.

Whether or not EDNS0 is used is under the control of Prometheus, so that's
something we could potentially do. I strongly suspect the reason why EDNS0
isn't used on the first query is to avoid the situation where a
badly-behaved (and positively ancient) DNS server doesn't handly OPT records
smoothly. The (apparent) behaviour of BIND -- to not send an EDNS0-enabled
query initially -- suggests that is likely to be the correct approach, and
that's supported by RFC2671 s5.3, which acknowledges that servers may send
errors to requests with OPT pseudorecords, so the servers should be
"probed" to determine compatibility. Unless Prometheus were to implement
such probing, it seems like leading with an EDNS0 request may cause problems
for other people using (differently) shonky DNS servers.

I think the ways forward are, in no particular order:

* Get dnsmasq to fix itself.

* Have miekg/dns detect an improperly oversized response and do...
something appropriate, return some sort of useful error we can use to
enable EDNS0.

* Have miekg/dns expose a "this is definitely the size of the packet that
was received" function, so Prometheus can detect an improperly oversized
response and retry with EDNS0.

- Matt

Matt Palmer

unread,
Dec 13, 2017, 6:09:46 PM12/13/17
to Prometheus Users
On Thu, Dec 14, 2017 at 09:24:38AM +1100, Matt Palmer wrote:
> On Wed, Dec 13, 2017 at 11:54:22AM -0800, craig....@fluxfederation.com wrote:
> > > Hmm... what's different about how BIND queries dnsmasq that allows it to
> > > proceed? Is it just that it ignores the fact of the oversized reply, and
> > > somehow manages to parse the response anyway? If there's a difference in
> > > the way that BIND does the *query* (or sequence of queries), there may be
> > > scope for changing Prometheus to mimic that (say, by sending the
> > > EDNS0-enabled query first, perhaps).
> > >
> > BIND accepts the large reply from dnsmasq but is kind enough to truncate
> > the response to Prometheus to 512 bytes, with the TC bit set, which
> > triggers Prometheus to re-request with the extra-size options and get the
> > full response (and presumably upgrade to TCP if required for even larger
> > responses).
>
> Well, now, *that's* an interesting take on the situation. Prometheus could,
> I suppose, do something similar, if the DNS client library exposed the
> message length. There's something *like* that, in Msg.Len(), but it appears
> to repack the message before calculating the length, so I'm not 100%
> confident it would represent the *actual* length of the response that was
> sent over the wire (due to response compression).

Addendum: Craig sent me pcaps of the BIND-mediated exchange, and it looks
like what is *actually* happening is that BIND is sending an EDNS0 request
right out of the gate, without first sending a "traditional" query. If BIND
can get away with it, then I guess so can Prometheus.

I'll submit a PR to this effect shortly, unless someone else beats me to it.

- Matt

Matt Palmer

unread,
Dec 16, 2017, 5:24:55 AM12/16/17
to Prometheus Users
On Thu, Dec 14, 2017 at 10:09:38AM +1100, Matt Palmer wrote:
> Addendum: Craig sent me pcaps of the BIND-mediated exchange, and it looks
> like what is *actually* happening is that BIND is sending an EDNS0 request
> right out of the gate, without first sending a "traditional" query. If BIND
> can get away with it, then I guess so can Prometheus.
>
> I'll submit a PR to this effect shortly, unless someone else beats me to it.

Addendum 2, Reflog Boogaloo: https://github.com/prometheus/prometheus/pull/3586
has been submitted.

- Matt

Reply all
Reply to author
Forward
0 new messages