Possible Recursion Issues

174 views
Skip to first unread message

mark page

unread,
May 5, 2014, 7:06:34 AM5/5/14
to nxfil...@googlegroups.com
After the 2.0.6 upgrade, I'm seeing what appears to be a recursion issue. NxFilter resolves "A" records just fine, but after some time (2 - 8 hours depending on load), "A" pointers for "CNAME" records are not being returned. I can restart NxFilter and everything starts working again. Note the "WARNING: recursion requested but not available" on the NxFilter return. 

This is happening on both my 12.04 and 14.04 Ubuntu servers. The 10.8.8.88 server uses 10.9.14.2 as its DNS source.

Any thoughts?

Thanks,
Mark


This is a query to my upstream AD/DNS server:
=====================================================================
[mpage@hbcp122605 ~]$ dig mail.google.com @10.9.14.2

; <<>> DiG 9.9.4-P2-RedHat-9.9.4-12.P2.fc20 <<>> mail.google.com @10.9.14.2
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48214
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:

;; ANSWER SECTION:
googlemail.l.google.com. 71 IN A 173.194.44.85
googlemail.l.google.com. 71 IN A 173.194.44.86

;; Query time: 0 msec
;; SERVER: 10.9.14.2#53(10.9.14.2)
;; WHEN: Mon May 05 05:32:41 CDT 2014
;; MSG SIZE  rcvd: 103
=====================================================================


And here's a failed query from NxFilter:
=====================================================================
[mpage@hbcp122605 ~]$ dig mail.google.com @10.8.8.88

; <<>> DiG 9.9.4-P2-RedHat-9.9.4-12.P2.fc20 <<>> mail.google.com @10.8.8.88
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54417
;; flags: qr rd ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:

;; ANSWER SECTION:

;; Query time: 1 msec
;; SERVER: 10.8.8.88#53(10.8.8.88)
;; WHEN: Mon May 05 05:28:09 CDT 2014
;; MSG SIZE  rcvd: 71
=====================================================================

Jinhee

unread,
May 5, 2014, 7:17:00 AM5/5/14
to nxfil...@googlegroups.com
Could be related to NxFilter's cache or manipulation of TTL.
I will look into it tomorrow.

Jinhee

Jinhee

unread,
May 5, 2014, 8:04:43 AM5/5/14
to nxfil...@googlegroups.com
Or there may be something between 2 DNS servers.
What happens if you try just Google DNS as a resolver for NxFilter?
Actually NxFilter doesn't do recursive query.

Jinhee

Jinhee

unread,
May 5, 2014, 8:43:16 AM5/5/14
to nxfil...@googlegroups.com
Did you see any error log about the domain?
It seems like your 2 records gone while their TTL changed.
How many domains being queried a day?

Jinhee

Jinhee

unread,
May 5, 2014, 8:54:19 AM5/5/14
to nxfil...@googlegroups.com
If you have more than 100,000 domains being queried a day you'd better increase the cache size.
It's on Config > DNS setup > Response cache size.

Jinhee

mark page

unread,
May 5, 2014, 9:52:08 AM5/5/14
to nxfil...@googlegroups.com
The cache is already maxed out. If NxFilter does not support recursion, then I would assume this is a stub resolver problem with CNAME records. There are no errors in syslog, upstart, or the nxd.log. 

Jinhee

unread,
May 5, 2014, 8:04:41 PM5/5/14
to nxfil...@googlegroups.com
The only possibility I found from code was that if there's some error when you retrieve it from the cache.
But if it was the case you need to have some error log in nxd.log.
And it can't be persistent.

'maxed out' means you have 1,000,000 for the number?
Were you not using Ubuntu before or was it not happening on the previous versions?
Because there's no difference for the cache part of the source code.

Jinhee

mark page

unread,
May 5, 2014, 8:10:33 PM5/5/14
to nxfil...@googlegroups.com
Yes, the response cache size is set to 1,000,000 and I've been running Ubuntu all along. I've reset the max client cache TTL to 600 and it seems to have made a difference. I'll keep and eye on it and report back.

Thanks for everything,
Mark

Jinhee

unread,
May 7, 2014, 4:45:47 AM5/7/14
to nxfil...@googlegroups.com
I didn't expect that kind of number from one site actually.
It might be better to change it to 10,000,000 then.
Only we don't know if it makes any performance problem.
I will change it to 10,000,000 with v2.0.7.

Jinhee

Jinhee

unread,
May 7, 2014, 5:25:42 AM5/7/14
to nxfil...@googlegroups.com
I think 10mil is too big.
It will require a huge memory.
And it will slow down the system.
NxFilter currently removes the expired cache on every minutes.
And that removing will be slower with 10mil.

However currently if it reaches the limit NxFilter drops the half of the cache.
Means your cache still works.
1mil is better for now I guess.

Jinhee

mark page

unread,
May 7, 2014, 10:18:25 AM5/7/14
to nxfil...@googlegroups.com
Don't worry about it too much, I believe it's a transitory problem with CNAME records from Google specifically.
Reply all
Reply to author
Forward
0 new messages