Performance Tuning RHEL 5 and Bind

brett smith

unread,

Oct 18, 2013, 10:32:35 PM10/18/13

to bind-...@lists.isc.org

I need to build a pair DNS cache servers to support 5000+ clients (
PC's and Servers ). I have been looking for some guides on tuning
BIND and the OS for Enterprise performance rather than the defaults.
The version of bind is bind-9.8.2.

Thank You,
Brett

sth...@nethelp.no

unread,

Oct 19, 2013, 3:20:27 AM10/19/13

to brett...@gmail.com, bind-...@lists.isc.org

5000 clients is such a low number that I don't think you need to worry
about tuning at all.

Steinar Haug, Nethelp consulting, sth...@nethelp.no

brett smith

unread,

Oct 19, 2013, 9:34:45 PM10/19/13

to sth...@nethelp.no, bind-...@lists.isc.org

When all the Windows PC's are switched to our resolver, bind stops responding.
rndc querylog shows queries coming thru, I changed tcp-clients from
1000 to 10000 but DNS seems lagging, so we switched back to the
original Windows Domain resolver. Besides increasing open files
tuning, what TCP / sysctl or named.conf settings can be set to
optimize / speed up DNS queries? Because it seems that Windows clients
use TCP instead of UDP when looking at netstat on the server.

Thanks. Brett.

Steven Carr

unread,

Oct 20, 2013, 2:33:29 AM10/20/13

to brett smith, bind-users

On 20 October 2013 02:34, brett smith <brett...@gmail.com> wrote:
> When all the Windows PC's are switched to our resolver, bind stops responding.
> rndc querylog shows queries coming thru, I changed tcp-clients from
> 1000 to 10000 but DNS seems lagging, so we switched back to the
> original Windows Domain resolver. Besides increasing open files
> tuning, what TCP / sysctl or named.conf settings can be set to
> optimize / speed up DNS queries? Because it seems that Windows clients
> use TCP instead of UDP when looking at netstat on the server.

It will depend on the type and size of the query (and on the
configuration/structure of the network in-between) as to whether
Windows uses UDP or is forced to switch to TCP.

But the option you are probably looking for is "recursive-clients" and
then pick a number. The default is 1000, so this is probably why if
all of your systems are querying at once it stops responding to some
of them.

Other than that it's a case of how much memory, CPU. Is it a VM? if so
have you reserved enough resources for it? What data is it serving?
caching only? authoritative for any zones? Is query logging enabled?
(this is a big performance hit as it has to write everything to disk,
so your disk is going to be a bottleneck).

Tuning is not something that you can be told "this is what to do",
there are a huge number of factors that will influence which
parameters to tweak. But I'd definitely look to the
"recursive-clients" option for starters.

Steve

Alan Clegg

unread,

Oct 20, 2013, 11:32:46 AM10/20/13

to brett smith, bind-...@lists.isc.org

On Oct 19, 2013, at 9:34 PM, brett smith <brett...@gmail.com> wrote:

> When all the Windows PC's are switched to our resolver, bind stops responding.

What does "stops responding" mean? Any logs?

> rndc querylog shows queries coming thru, I changed tcp-clients from
> 1000 to 10000 but DNS seems lagging, so we switched back to the
> original Windows Domain resolver.

Are you really getting that many TCP based queries? If so, something is seriously broken.

> Besides increasing open files
> tuning, what TCP / sysctl or named.conf settings can be set to
> optimize / speed up DNS queries? Because it seems that Windows clients
> use TCP instead of UDP when looking at netstat on the server.

Fix your windows clients.

AlanC
--
Alan Clegg | +1-919-355-8851 | al...@clegg.com

signature.asc

Stuart Browne

unread,

Oct 20, 2013, 6:27:01 PM10/20/13

to brett smith, sth...@nethelp.no, bind-...@lists.isc.org

> -----Original Message-----
> From: bind-users-bounces+stuart.browne=ausregist...@lists.isc.org
> [mailto:bind-users-bounces+stuart.browne=ausregist...@lists.isc.org]
> On Behalf Of brett smith
> Sent: Sunday, 20 October 2013 12:35 PM
> To: sth...@nethelp.no
> Cc: bind-...@lists.isc.org
> Subject: Re: Performance Tuning RHEL 5 and Bind
>
> When all the Windows PC's are switched to our resolver, bind stops
> responding.

> rndc querylog shows queries coming thru, I changed tcp-clients from
> 1000 to 10000 but DNS seems lagging, so we switched back to the

> original Windows Domain resolver. Besides increasing open files

> tuning, what TCP / sysctl or named.conf settings can be set to
> optimize / speed up DNS queries? Because it seems that Windows clients
> use TCP instead of UDP when looking at netstat on the server.
>

> Thanks. Brett.
>
> On Sat, Oct 19, 2013 at 3:20 AM, <sth...@nethelp.no> wrote:
> >> I need to build a pair DNS cache servers to support 5000+ clients (
> >> PC's and Servers ). I have been looking for some guides on tuning
> >> BIND and the OS for Enterprise performance rather than the defaults.
> >> The version of bind is bind-9.8.2.
> >
> > 5000 clients is such a low number that I don't think you need to worry
> > about tuning at all.
> >
> > Steinar Haug, Nethelp consulting, sth...@nethelp.no

If my experience with high-throughput through a redhat system is anything to go by, what you are probably hitting is the IPTables conntrack bucket limits.

The simplest way to avoid this is to bypass connection tracking.

You can do one of the following:

- Turn off iptables (probably not a good idea)
- Turn off conn-tracking and not use the state module, rewriting all rules (nasty)
- Tell iptables to not conntrack for just udp/53 & tcp/53 (-A -t raw -j NOTRACK -m tcp -p tcp --dport 53 ; -A -t raw -j NOTRACK -m udp -p udp --dport 53)

We use the 3rd method and it works beautifully. Just ensure you're 'filter' rules don't force the use of conntrack for that traffic. See the man page for more details.

Stuart

WBr...@e1b.org

unread,

Oct 21, 2013, 9:47:21 AM10/21/13

to bind-...@lists.isc.org

> From: Alan Clegg <al...@clegg.com>

> Fix your windows clients.

You can't fix stupid.

Confidentiality Notice:
This electronic message and any attachments may contain confidential or
privileged information, and is intended only for the individual or entity
identified above as the addressee. If you are not the addressee (or the
employee or agent responsible to deliver it to the addressee), or if this
message has been addressed to you in error, you are hereby notified that
you may not copy, forward, disclose or use any part of this message or any
attachments. Please notify the sender immediately by return e-mail or
telephone and delete this message from your system.

Lightner, Jeff

unread,

Oct 21, 2013, 10:08:58 AM10/21/13

to bind-...@lists.isc.org

Any reason you're using RHEL5 as opposed to RHEL6 if you're building new servers? RHEL5 is very long in the tooth and will go EOL sooner than RHEL6. Since you're using a BIND package not shipped with RHEL5 there's no reason on that account not to move up to RHEL6.

-----Original Message-----
From: bind-users-bounces+jlightner=wate...@lists.isc.org [mailto:bind-users-bounces+jlightner=wate...@lists.isc.org] On Behalf Of WBr...@e1b.org
Sent: Monday, October 21, 2013 9:47 AM
To: bind-...@lists.isc.org
Subject: Re: Performance Tuning RHEL 5 and Bind

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
bind-...@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Athena(r), Created for the Cause(tm)
Making a Difference in the Fight Against Breast Cancer

---------------------------------
CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you.
----------------------------------

Mike Hoskins (michoski)

unread,

Oct 22, 2013, 11:43:46 AM10/22/13

to bind-...@lists.isc.org

-----Original Message-----

From: Alan Clegg <al...@clegg.com>
Date: Tuesday, October 22, 2013 7:44 AM
To: "bind-...@lists.isc.org" <bind-...@lists.isc.org>
Subject: Re: Performance Tuning RHEL 5 and Bind

>On Oct 21, 2013, at 9:47 AM, WBr...@e1b.org wrote:
>
>>> From: Alan Clegg <al...@clegg.com>
>>
>>> Fix your windows clients.
>>
>> You can't fix stupid.
>

>I have lots of windows clients and they don't exhibit this "feature".
>There's something wrong on the windows clients and it's not the norm.
>
>To be honest, recent windows releases do a pretty fine job with DNS.

Agreed. The problem here is the TCP fall-back vs BIND/OS tuning. I've
got a lot of Windows clients (mostly vmware related infra) that don't
query via TCP. I would focus on a deeper inspection of the environment
including network layer. The OP needs to figure out why the queries are
using TCP.

Speculating based on the available data, I'm wondering if the new BIND
servers were stood up behind a firewall...possibly with broken protocol
inspection/fixup type configuration limiting UDP packet size to 512
bytes...and zone data with large NS/whatever RR sets resulting in TCP
retries.

Kevin Darcy

unread,

Oct 22, 2013, 3:29:58 PM10/22/13

to bind-...@lists.isc.org

Are these queries mostly for names in an Active Directory domain? The
default for Active Directory is for *every* Domain Controller to
register NS records at the apex of the AD domain. Pretty soon, for any
reasonably-sized AD infrastructure, all of those NSes cause *all*
queries for *any* name in the domain to trigger a TCP retry (because the
Answer + Authority Sections overflow 512 bytes), if EDNS0 is not in
effect. I sat down with our AD folks a few years ago and impressed upon
them how important it is to be selective about which Domain Controllers
are registered at the apex. They appreciated the negative consequences
of being awash in TCP retries, and it's been managed for some time now
(at least for our *main* AD domain; don't get me started on the business
partner that still has 92 NS records at the apex of their AD domain. Sigh)

Sounds like you might need to have the same discussion with your AD
guys, if in fact AD is a factor here. Even if the users aren't
*consciously* looking up AD-related names, if the AD domain is in the
Suffix Search List and your users' shortname addiction is out of
control, the combination of the two, along with excess NS records at the
apex, can ultimately result in a lot of bogus TCP retries. Sometimes you
can alleviate this with careful ordering or pruning of elements in the
Suffix Search List.

A lot of folks think that query logging is a drain on resources, and
anyone who is serious about DNS performance would never turn it on.
Those folks must not work in a large, chaotic enterprise :-) I find
query logging and associated data-mining tools I've developed over the
years, invaluable to track down broken and/or obsolete query traffic and
eliminate it at the source. This saves me *much* more performance than
the query logging itself, as well as being valuable for security
forensics, incident avoidance (e.g. before I delete this name from DNS,
let me check whether anyone is still looking it up) and a plethora of
other useful stuff.

- Kevin

On 10/19/2013 9:34 PM, brett smith wrote:
> When all the Windows PC's are switched to our resolver, bind stops responding.
> rndc querylog shows queries coming thru, I changed tcp-clients from
> 1000 to 10000 but DNS seems lagging, so we switched back to the
> original Windows Domain resolver. Besides increasing open files
> tuning, what TCP / sysctl or named.conf settings can be set to
> optimize / speed up DNS queries? Because it seems that Windows clients
> use TCP instead of UDP when looking at netstat on the server.
>
> Thanks. Brett.
>
> On Sat, Oct 19, 2013 at 3:20 AM, <sth...@nethelp.no> wrote:
>>> I need to build a pair DNS cache servers to support 5000+ clients (
>>> PC's and Servers ). I have been looking for some guides on tuning
>>> BIND and the OS for Enterprise performance rather than the defaults.
>>> The version of bind is bind-9.8.2.
>> 5000 clients is such a low number that I don't think you need to worry
>> about tuning at all.
>>
>> Steinar Haug, Nethelp consulting, sth...@nethelp.no

brett smith

unread,

Oct 22, 2013, 8:29:01 PM10/22/13

to bind-...@lists.isc.org

Yes tuning off IPTABLES conn-tracking makes a huge difference. I also followed:

https://access.redhat.com/site/solutions/304713
https://access.redhat.com/site/solutions/168483

I still see some SYN_SENT from Windows PC's on tcp port 53 on the DNS
cache server.

Thank You, Brett

On Sun, Oct 20, 2013 at 6:27 PM, Stuart Browne
<stuart...@ausregistry.com.au> wrote:
>
>
>> -----Original Message-----
>> From: bind-users-bounces+stuart.browne=ausregist...@lists.isc.org
>> [mailto:bind-users-bounces+stuart.browne=ausregist...@lists.isc.org]
>> On Behalf Of brett smith
>> Sent: Sunday, 20 October 2013 12:35 PM
>> To: sth...@nethelp.no

>> Cc: bind-...@lists.isc.org
>> Subject: Re: Performance Tuning RHEL 5 and Bind
>>

>> When all the Windows PC's are switched to our resolver, bind stops
>> responding.
>> rndc querylog shows queries coming thru, I changed tcp-clients from
>> 1000 to 10000 but DNS seems lagging, so we switched back to the
>> original Windows Domain resolver. Besides increasing open files
>> tuning, what TCP / sysctl or named.conf settings can be set to
>> optimize / speed up DNS queries? Because it seems that Windows clients
>> use TCP instead of UDP when looking at netstat on the server.
>>
>> Thanks. Brett.
>>
>> On Sat, Oct 19, 2013 at 3:20 AM, <sth...@nethelp.no> wrote:
>> >> I need to build a pair DNS cache servers to support 5000+ clients (
>> >> PC's and Servers ). I have been looking for some guides on tuning
>> >> BIND and the OS for Enterprise performance rather than the defaults.
>> >> The version of bind is bind-9.8.2.
>> >
>> > 5000 clients is such a low number that I don't think you need to worry
>> > about tuning at all.
>> >
>> > Steinar Haug, Nethelp consulting, sth...@nethelp.no
>

Alan Clegg

unread,

Oct 22, 2013, 9:39:14 PM10/22/13

to bind-...@lists.isc.org

On Oct 22, 2013, at 8:29 PM, brett smith <brett...@gmail.com> wrote:

> Yes tuning off IPTABLES conn-tracking makes a huge difference. I also followed:
>
> https://access.redhat.com/site/solutions/304713
> https://access.redhat.com/site/solutions/168483
>
> I still see some SYN_SENT from Windows PC's on tcp port 53 on the DNS
> cache server.

You've cured the symptoms, not the illness.

You really, REALLY need to figure out why your clients are doing TCP. You'll see a world of difference when you solve this part of the puzzle.

signature.asc

Carsten Strotmann

unread,

Oct 24, 2013, 4:05:25 PM10/24/13

to Kevin Darcy, bind-...@lists.isc.org

Hi,

Kevin Darcy <k...@chrysler.com> writes:

> Are these queries mostly for names in an Active Directory domain? The
> default for Active Directory is for *every* Domain Controller to
> register NS records at the apex of the AD domain. Pretty soon, for any
> reasonably-sized AD infrastructure, all of those NSes cause *all*
> queries for *any* name in the domain to trigger a TCP retry (because
> the Answer + Authority Sections overflow 512 bytes), if EDNS0 is not
> in effect. I sat down with our AD folks a few years ago and impressed
> upon them how important it is to be selective about which Domain
> Controllers are registered at the apex. They appreciated the negative
> consequences of being awash in TCP retries, and it's been managed for
> some time now (at least for our *main* AD domain; don't get me started
> on the business partner that still has 92 NS records at the apex of
> their AD domain. Sigh)
>

good point.

Increasing the EDNS0 UDP size might also be an option (default is 1280
for Windows DNS) ->
http://technet.microsoft.com/en-us/library/cc783893%28v=ws.10%29.aspx

It is possible to tell some less critical DC to not register themself in
DNS:
http://support.microsoft.com/kb/198767
and
http://technet.microsoft.com/en-us/library/cc782946%28v=ws.10%29.aspx

-- Carsten

brett smith

unread,

Oct 28, 2013, 11:08:05 PM10/28/13

to bind-...@lists.isc.org

OK I have the source of the problem now I just need an elegant way to
fix it and most cost ( Network TCP ) effective way to fix it

The Windows Domain is responsible for X.internal.example.com and I am
presently forwarding X.internal.example.com to their nameservers DC,
resulting in TCP queries. Which is dragging the cache server down when
PC's query for records off of [NAME].internal.example.com. I don't
mind not caching X.internal.example.com so can I create an NS record
or an stub entry that points the PC's else where rather than
forwarding them or caching them?

Thank You,
Brett

Alan Clegg

unread,

Oct 29, 2013, 12:05:33 AM10/29/13

to brett smith, bind-...@lists.isc.org

On Oct 28, 2013, at 8:08 PM, brett smith <brett...@gmail.com> wrote:

> OK I have the source of the problem now I just need an elegant way to
> fix it and most cost ( Network TCP ) effective way to fix it
>
> The Windows Domain is responsible for X.internal.example.com and I am
> presently forwarding X.internal.example.com to their nameservers DC,
> resulting in TCP queries. Which is dragging the cache server down when
> PC's query for records off of [NAME].internal.example.com. I don't
> mind not caching X.internal.example.com so can I create an NS record
> or an stub entry that points the PC's else where rather than
> forwarding them or caching them?

Slave X.internal.example.com

signature.asc

Charles Swiger

unread,

Oct 29, 2013, 12:35:48 AM10/29/13

to Alan Clegg, brett smith, bind-...@lists.isc.org

Hi—

On Oct 28, 2013, at 9:05 PM, Alan Clegg <al...@clegg.com> wrote:
> Slave X.internal.example.com

+1; it’s also worth looking into why there is such a high volume
of DNS queries. Is it simply a big network with a lot of chatty
clients? Or is TTL turned down so low that client side caching
is not effective and needs to requery often?

Or is something doing a host scan? If it’s your network IDS or
security/network admin folks running a portscan, fine; if it’s
malware or an intruder scanning the local subnet(s), one might want
to notice and take steps to solve the problem rather than a symptom.

Regards,
—
-Chuck