8.8.8.8 randomly fails to resolve subdomains e.g. 7157599388.sip.teltel.io

342 views
Skip to first unread message

Lukass Lappuķe

unread,
Dec 7, 2020, 11:50:07 AM12/7/20
to public-dns-discuss
Dear all,

We have a lot of users on our VoIP software. The softphone checks each user's domain every minute, users make calls, send messages, etc. the usual VoIP stuff. Each user usually makes DNS checks several times a minute and altogether there are quite a few requests.

We cannot control to which public DNS the end-user tries resolving. When it is to Google public DNS sometimes, randomly it fails. To others like Cloudflare (1.1.1.1) it always works.

The issue is not the amount of DNS checks from the end-user towards Google DNS, which we know has a limit. Actually, the issue is from Google DNS to our Name Server when Google refuses to check the subdomain. It seems that there is another limit here.

Here is an example of an unsuccessful request towards 8.8.8.8 after a nslookup (the domain exists).

{code}
dns.google can't find 7157599388.sip.teltel.io: Non-existent domain
{code}

Any idea how to change this limit?

Help will be highly appreciated! Thank you in advance!

BR,
Lukass

pun...@google.com

unread,
Dec 7, 2020, 12:17:29 PM12/7/20
to public-dns-discuss
On Monday, December 7, 2020 at 11:50:07 AM UTC-5 Lukass Lappuķe wrote:
Dear all,

We have a lot of users on our VoIP software. The softphone checks each user's domain every minute, users make calls, send messages, etc. the usual VoIP stuff. Each user usually makes DNS checks several times a minute and altogether there are quite a few requests.

We cannot control to which public DNS the end-user tries resolving. When it is to Google public DNS sometimes, randomly it fails. To others like Cloudflare (1.1.1.1) it always works.

The issue is not the amount of DNS checks from the end-user towards Google DNS, which we know has a limit. Actually, the issue is from Google DNS to our Name Server when Google refuses to check the subdomain. It seems that there is another limit here.
We have QPS limits per cluster on number of outbound queries to nameservers. However I am not seeing traffic anywhere close to the limits. Also queries to your nameservers are succeeding.

One likely issue is the presence of a CNAME at the apex of the "sip.teltel.io" zone. This can cause a resolver to not query for names under the zone. See snippet of output for "dig +trace +dnssec sip.teltel.io". This article https://www.isc.org/blogs/cname-at-the-apex-of-a-zone/ provides more details on the topic. There are possible solutions being discussed for this need in the IETF (https://tools.ietf.org/html/draft-ietf-dnsop-svcb-https) but those are not yet standardized.

sip.teltel.io. 1800 IN NS ns1.teltel.io.

sip.teltel.io. 1800 IN NS ns2.teltel.io.

sip.teltel.io. 1800 IN DS 7064 13 2 3E0E78E540D6FFCA96AFCE4C78CACD6D5B6B4819B6AB87E2A5D41DBA 198243C5

sip.teltel.io. 1800 IN RRSIG DS 13 3 1800 20201208181037 20201206161037 34505 teltel.io. HFauOgVLEJKKxRbSs4BMPh50ntRuxFghBkqXTkOOG7WX5nWQSZgO143T YFc73xcTRdQxfDIOUVBxpmUj3yMOwQ==

;; Received 263 bytes from 172.64.32.107#53(dina.ns.cloudflare.com) in 8 ms


sip.teltel.io. 3600 IN CNAME www.teltel.io.

sip.teltel.io. 3600 IN RRSIG CNAME 13 3 3600 20201217000000 20201126000000 7064 sip.teltel.io. MkJZcI0+oUYh0a+BLZczP2hta8/KtY1QBMrdkDx+Fnb59lJ2N546P6LG Wr17AakDfC9DNcz98mr7kUIPqZHcSA==

;; Received 169 bytes from 3.9.142.25#53(ns1.teltel.io) in 143 ms

-Puneet

Lukass Lappuķe

unread,
Dec 9, 2020, 8:55:46 AM12/9/20
to public-dns-discuss
Thank you Puneet for the answer.

Last night we removed the CNAMEs and updated our serial number. We'll do some tests to see if there is an improvement.

However, the issue we still have, although it might be a different one (idk...), is that when I run nslookup in a loop to 8.8.8.8 it only resolves 20% of the time approximately. To 1.1.1.1 it resolves successfully 100% of the time.

Request:
while : 
do
nslookup 7157599388.sip.teltel.io 8.8.8.8
done

Response (mostly):
Server: 8.8.8.8
Address: 8.8.8.8#53
** server can't find 7157599388.sip.teltel.io: NXDOMAIN

----

I'd be very grateful if you could look into it.

Thank you in advance!

Lukass

Puneet Sood

unread,
Dec 9, 2020, 1:40:54 PM12/9/20
to Lukass Lappuķe, public-dns-discuss
On Wed, Dec 9, 2020 at 8:55 AM Lukass Lappuķe <l.la...@gmail.com> wrote:
>
> Thank you Puneet for the answer.
>
> Last night we removed the CNAMEs and updated our serial number. We'll do some tests to see if there is an improvement.
>
> However, the issue we still have, although it might be a different one (idk...), is that when I run nslookup in a loop to 8.8.8.8 it only resolves 20% of the time approximately. To 1.1.1.1 it resolves successfully 100% of the time.
>
> Request:
> while :
> do
> nslookup 7157599388.sip.teltel.io 8.8.8.8
> done
>
> Response (mostly):
> Server: 8.8.8.8
> Address: 8.8.8.8#53
> ** server can't find 7157599388.sip.teltel.io: NXDOMAIN

In this case 8.8.8.8 is correct. There is an NSEC record for the
wildcard domain (*.sip.teltel.io) that is covering the record. Please
see https://dnsviz.net/d/7157599381.sip.teltel.io/dnssec/.

Also in the future review the documentation at
https://developers.google.com/speed/public-dns/docs/troubleshooting/domains
first for common problems.
> --
> You received this message because you are subscribed to the Google Groups "public-dns-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to public-dns-disc...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/public-dns-discuss/e4692e6b-c76d-4e80-87c4-2278b317dc1bn%40googlegroups.com.

Alex Dupuy

unread,
Dec 14, 2020, 7:26:27 PM12/14/20
to public-dns-discuss
The problem where there is an NSEC record proving the nonexistence of all subdomains (even though there are subdomains – or in this case, a wildcard) seems quite similar to the one described in https://groups.google.com/g/public-dns-discuss/c/JDRPBzhzdvM/m/v7oYxNzlCgAJ

The sole name server (despite having two names, there is only one IP address for the delegated name servers for sip.teltel.io) may be running PowerDNS, which can get into this state sometimes when it is using a database backend rather than a zone file for the authoritative server. If this is true, the advice from mnordhoff in the thread to run pdnsutil rectify-zone may be helpful for successful resolution of domains under sip.teltel.io. It could also be helpful to add an NS record for the sip.teltel.io zone apex, since that zone currently claims to have no name servers.

On Wednesday, December 9, 2020 at 8:55 AM Lukass Lappuķe wrote:
However, the issue we still have, although it might be a different one (idk...), is that when I run nslookup in a loop to 8.8.8.8 it only resolves 20% of the time approximately. To 1.1.1.1 it resolves successfully 100% of the time.

The sip.teltel.io domain has other DNSSEC problems as a result of the bad NSEC records; you can see this by querying for MX records of any subdomain of sip.teltel.io. The authoritative name server returns a NOERROR (NODATA) result, since the name exists (due to wildcard match) but lacks a wildcard MX record,. However, the DNSSEC records prove the nonexistence of the subdomain at all, rather than the existence of the (wildcard) name but non-existence of the queried type. This results in a SERVFAIL due to a bogus DNSSEC proof, and the Cloudflare 1.1.1.1 resolver even returns a diagnostic Extended DNS Error (EDE) in this case:

$ dig MX foobar.sip.teltel.io  @1.1.1.1

; <<>> DiG 9.16.7 <<>> MX foobar.sip.teltel.io @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 12794
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; EDE: 6 (DNSSEC Bogus)
;; QUESTION SECTION:
;foobar.sip.teltel.io.        IN    MX

;; Query time: 71 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Tue Dec 15 01:18:26 CET 2020
;; MSG SIZE  rcvd: 55

 

Владимир Коваленко

unread,
Dec 15, 2020, 9:14:11 AM12/15/20
to Lukass Lappuķe, public-dns-discuss


пн, 7 дек. 2020 г., 19:50 Lukass Lappuķe <l.la...@gmail.com>:
--
You received this message because you are subscribed to the Google Groups "public-dns-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to public-dns-disc...@googlegroups.com.

Lukass Lappuķe

unread,
Dec 15, 2020, 1:56:02 PM12/15/20
to public-dns-discuss
Dear Puneet & others,

We made all the changes you requested for sip.teltel.io. But it did not improve.

1. We even made a test subdomain xip.teltel.io which is set up exactly the same way and when resolving that domain towards 8.8.8.8 it works fine every time.
The only difference between sip.teltel.io and xip.teltel.io is the load: 
sip receives our customer's traffic and xip only our test requests.

How can you explain that?

---

2. Here is what we also noticed:
When resolving towards 8.8.8.8 from a local wi-fi or a personal hotspot, etc. it sometimes fails.
Responses are received from these Google IP addresses:
74.125.46.10
74.125.112.1
74.125.112.9
74.125.x.x
74.125.z.z
74.125.w.w

But when resolving towards 8.8.8.8 from a server on AWS or DO it always succeeds. 
The responses are received from different Google IP addresses:
172.253.199.4
172.253.1.194
172.217.33.132

Does Google give a higher priority for big customers like AWS or DO vs the public...? Or is there a different explanation?
---

3. We captured the traffic by performing a tcpdump (tcpdump udp and port 53) and noticed that when Google fails to resolve, our name servers never receive a request from Google.

--

4. Here you can test the loops for yourself:
This fails with "sip"  - while true; do dig A 42894078.sip.teltel.io @8.8.8.8; done;
This is OK with "xip"- while true; do dig A 42894078.xip.teltel.io @8.8.8.8; done;

---

Please Google, can you help us solve this? It all seems quite mysterious to us.

Thanks,
Lukass

Lukass Lappuķe

unread,
Dec 21, 2020, 9:39:03 AM12/21/20
to public-dns-discuss
Dear Google,

It's been 4 days now the issue has not reappeared. We'd like to know what may have solved it.

Could you please tell me if you made any modifications on your side regarding the above-mentioned issue?

Thank you in advance!

Best regards,
Lukass

Lukass Lappuķe

unread,
Dec 21, 2020, 9:39:10 AM12/21/20
to public-dns-discuss
And the issue reappeared.

Server: 8.8.8.8

Address: 8.8.8.8#53

** server can't find 42894078.sip.teltel.io: NXDOMAIN


On Tuesday, December 15, 2020 at 8:56:02 PM UTC+2 Lukass Lappuķe wrote:

Chanan Moll

unread,
Feb 2, 2021, 8:39:13 AM2/2/21
to public-dns-discuss
Hi,

After enabling DNSSEC, we are experiencing the same issue for some of our subdomains: they all resolve fine on e.g. 1.1.1.1,  but sometimes the lookup fails on 8.8.8.8. This doesn't happen on our AWS machines, but it does happen using various ISP's in the Netherlands. See below.
The weird thing is, that there are no problems on slightly different subdomains using the same wildcard record in our DNS (Route53). In the example below, the problem occurs for babymax-2.cdn.prod.mas2.media-artists.nl , but not for babymax-1.cdn.prod.mas2.media-artists.nl.
Any solution yet?

Kind regards,

RJ

rjg@RJG-MBPR ~ % nslookup -debug babymax-2.cdn.prod.mas2.media-artists.nl 8.8.8.8

Server: 8.8.8.8

Address: 8.8.8.8#53


------------

    QUESTIONS:

babymax-2.cdn.prod.mas2.media-artists.nl, type = A, class = IN

    ANSWERS:

    ->  babymax-2.cdn.prod.mas2.media-artists.nl

internet address = 13.32.169.7

ttl = 59

    ->  babymax-2.cdn.prod.mas2.media-artists.nl

internet address = 13.32.169.19

ttl = 59

    ->  babymax-2.cdn.prod.mas2.media-artists.nl

internet address = 13.32.169.41

ttl = 59

    ->  babymax-2.cdn.prod.mas2.media-artists.nl

internet address = 13.32.169.111

ttl = 59

    AUTHORITY RECORDS:

    ADDITIONAL RECORDS:

------------

Non-authoritative answer:

Name: babymax-2.cdn.prod.mas2.media-artists.nl

Address: 13.32.169.7

Name: babymax-2.cdn.prod.mas2.media-artists.nl

Address: 13.32.169.19

Name: babymax-2.cdn.prod.mas2.media-artists.nl

Address: 13.32.169.41

Name: babymax-2.cdn.prod.mas2.media-artists.nl

Address: 13.32.169.111


rjg@RJG-MBPR ~ % nslookup -debug babymax-2.cdn.prod.mas2.media-artists.nl 8.8.8.8

Server: 8.8.8.8

Address: 8.8.8.8#53


------------

    QUESTIONS:

babymax-2.cdn.prod.mas2.media-artists.nl, type = A, class = IN

    ANSWERS:

    AUTHORITY RECORDS:

    ->  media-artists.nl

origin = ns-286.awsdns-35.com

mail addr = awsdns-hostmaster.amazon.com

serial = 1

refresh = 7200

retry = 900

expire = 1209600

minimum = 86400

ttl = 19

    ADDITIONAL RECORDS:

------------

Non-authoritative answer:

*** Can't find babymax-2.cdn.prod.mas2.media-artists.nl: No answer




Op maandag 21 december 2020 om 15:39:10 UTC+1 schreef Lukass Lappuķe:

Lukass Lappuķe

unread,
Feb 2, 2021, 11:58:22 AM2/2/21
to public-dns-discuss
Reply all
Reply to author
Forward
0 new messages