Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

BIND 9.4.x and max-clients-per-query

4,100 views
Skip to first unread message

Jan Arild Lindstrøm

unread,
Sep 8, 2008, 6:38:58 AM9/8/08
to

Hi,

we got serveral recursive BIND 9.4.x servers running with the following option set
in named.conf:
recursive-clients 50000;

From named.log yesterday @ 19:44 ->

--CUT--
07-Sep-2008 19:44:01.250 resolver: clients-per-query increased to 70
07-Sep-2008 19:44:03.124 resolver: clients-per-query increased to 75
07-Sep-2008 19:44:49.700 general: dispatch.c:2999: INSIST(n == 1) failed
07-Sep-2008 19:44:49.700 general: exiting (due to assertion failure)
07-Sep-2008 19:44:53.939 general: zone 0.0.127.in-addr.arpa/IN/internal: loaded serial 1
07-Sep-2008 19:44:53.948 general: zone localhost/IN/internal: loaded serial 1
07-Sep-2008 19:44:55.109 general: zone 0.0.127.in-addr.arpa/IN/external: loaded serial 1
07-Sep-2008 19:44:55.113 general: zone localhost/IN/external: loaded serial 1
07-Sep-2008 19:44:56.282 general: running
07-Sep-2008 19:44:56.961 resolver: clients-per-query increased to 15
07-Sep-2008 19:44:58.127 resolver: clients-per-query increased to 20
07-Sep-2008 19:45:00.168 resolver: clients-per-query increased to 25
07-Sep-2008 19:45:01.602 resolver: clients-per-query increased to 30
07-Sep-2008 19:45:04.079 resolver: clients-per-query increased to 35
07-Sep-2008 19:45:09.490 resolver: clients-per-query increased to 40
07-Sep-2008 19:45:11.826 resolver: clients-per-query increased to 45
07-Sep-2008 19:45:14.200 resolver: clients-per-query increased to 50
07-Sep-2008 19:45:21.336 resolver: clients-per-query increased to 55
07-Sep-2008 19:45:29.406 resolver: clients-per-query increased to 60
07-Sep-2008 19:46:05.896 resolver: clients-per-query increased to 65
07-Sep-2008 19:47:14.187 resolver: clients-per-query increased to 70
07-Sep-2008 19:49:28.621 client: client xx.xx.xx.xx#59739: view external: recursive-clients soft limit exceeded, aborting oldest query
07-Sep-2008 19:49:29.258 client: client xx.xx.xx.xx#1025: view external: recursive-clients soft limit exceeded, aborting oldest query
07-Sep-2008 19:49:30.043 client: client xx.xx.xx.xx#64760: view external: recursive-clients soft limit exceeded, aborting oldest query
07-Sep-2008 19:49:31.012 client: client xx.xx.xx.xx#38850: view external: recursive-clients soft limit exceeded, aborting oldest query
--CUT--

clients-per-query, max-clients-per-query
These set the initial value (minimum) and maximum number of recursive simultanious clients for
any given query (<qname,qtype,qclass>) that the server will accept before dropping additional
clients. named will attempt to self tune this value and changes will be logged. The default values
are 10 and 100.

This value should reflect how many queries come in for a given name in the time it takes to resolve
that name. If the number of queries exceed this value, named will assume that it is dealing with a
non-responsive zone and will drop additional queries. If it gets a response after dropping queries, it
will raise the estimate. The estimate will then be lowered in 20 minutes if it has remained unchanged.

If clients-per-query is set to zero, then there is no limit on the number of clients per query and no
queries will be dropped.

If max-clients-per-query is set to zero, then there is no upper bound other than imposed by recursive-clients.

While the recursive queue was filling, I checked the recursive queries:
ns(root) named 536# rndc recursing ; cat named.recursing | awk '{print $6}' | sort | uniq -c | sort -n | tail -5
809 'crl.verisign.net'
826 'apps.facebook.com'
2503 'ocsp.verisign.net'
12850 'www.facebook.com'
20064 'statistik-gallup.net'

rndc status:
recursive clients: 49662/49900/50000

Clients-per-query and max-clients-per-query are not set, so they are at default 10 and 100.

How is it that these queries have so many simultanious clients? Should not max-clients-per-query keep
it to max 100 simultanious clients for each query? All these numbers are way bigger than 100.

Or have I not understood the purpose of clients-per-query and max-clients-per-query correctly?

Thanks
Jan Arild Lindstrom


Jan Arild Lindstrøm

unread,
Sep 8, 2008, 4:50:22 AM9/8/08
to

Fr34k

unread,
Sep 8, 2008, 1:13:53 PM9/8/08
to
Hello,

 
"07-Sep-2008 19:47:14.187 resolver: clients-per-query increased to 70"
 
70 clients per query seems pretty high to me.
I think slow, and bogus, lookups can contribute to this.
 
In our environment, we use:
 clients-per-query 10 ;
 max-clients-per-query 20 ;

I would also check that the network is clean: no interface errors on server or switch, etc.
 
There may also be bots, and such, driving up DNS traffic in attempts to propagate abuse.
Typically, hundreds of MX lookups from DHCP workstations indicate such malware infections.
Once upon a time, someone pointed me to a Surf net document on using DNS as IDS -- which has some other great ideas.
Anyway, the goal is innoculating infected hosts to stop bogus traffic.
 
I hope this helps.

Jan Arild Lindstrøm

unread,
Sep 9, 2008, 3:01:26 AM9/9/08
to

Hi,

I can not see how that will do any difference, since the the number is way bigger than
max-clients-per-query limit 70 or 100:

ns(root) named 536# rndc recursing ; cat named.recursing | awk '{print $6}' | sort | uniq -c | sort -n | tail -5
809 'crl.verisign.net'
826 'apps.facebook.com'
2503 'ocsp.verisign.net'
12850 'www.facebook.com'
20064 'statistik-gallup.net'

It seems on my that the limit is not working as I think it should. Have I understood the way
the limit should work, or is there something faulty in the code that should limit clients-per-query?

Regards
Jan Arild Linsdtrom

Jan Arild Lindstrøm

unread,
Sep 16, 2008, 3:14:43 AM9/16/08
to

Hi,

is there really none that can explain why clients-per-query get so high even though
max-clients-per-query = 100.... ?

Thanks
Jan Arild Lindstrom

JINMEI Tatuya / 神明達哉

unread,
Sep 20, 2008, 5:50:50 PM9/20/08
to
At Tue, 16 Sep 2008 08:14:43 +0100,

Jan Arild Lindstrøm <j...@telenor.net> wrote:

> is there really none that can explain why clients-per-query get so high even though
> max-clients-per-query = 100.... ?

First, please be more specific about operational environment: the
exact BIND9 version, not just 9.4.x; build options of BIND9; OS and
its version; perhaps also your named.conf.

Second, limiting max-clients-per-query doesn't help reduce the number
of recursive clients if the same query is sent from different IP
addresses.

Third, having 49662 recursive clients looks so extraordinary. I
suspect that the real problem is somewhere else.

---
JINMEI, Tatuya
Internet Systems Consortium, Inc.

Jan Arild Lindstrøm

unread,
Sep 22, 2008, 2:24:02 AM9/22/08
to
At 22:50 20/09/2008, JINMEI Tatuya / =?ISO-2022-JP?B?GyRCP0BMQEMjOkgbKEI=?= wrote:
>At Tue, 16 Sep 2008 08:14:43 +0100,
>Jan Arild Lindstrøm <j...@telenor.net> wrote:
>
>> is there really none that can explain why clients-per-query get so high even though
>> max-clients-per-query = 100.... ?
>
>First, please be more specific about operational environment: the
>exact BIND9 version, not just 9.4.x; build options of BIND9; OS and
>its version; perhaps also your named.conf.

Hardware: Sun Fire T2000, 16GB, 8 core, 1000Mhz, 32 threads
OS: Solaris 10 (Generic_137111-03)
BIND version: 9.4.3b2

SunStudio 12:
-fast -xtarget=ultraT1 -m64
./configure --prefix=/local --localstatedir=/var --with-openssl=/local/openssl --with-randomdev=/dev/urandom \
--enable-threads --with-libtool --enable-static=yes --disable-shared --sysconfdir=/etc/named

options {
tcp-clients 1000;
dnssec-enable no;
recursive-clients 50000;
directory "/etc/named";
recursion yes;
allow-query { our-nets; };
allow-recursion { our-nets; };
allow-query-cache { our-nets; };
pid-file "/var/run/named/named.pid";
check-names master ignore;
check-names slave ignore;
check-names response ignore;
sortlist {
{ localhost; // IF the local host
{ localnets; }; }; // Return local addresses
{ 10/8; // IF host on private net
{ 10/8; }; }; // return private addresses
{ localnets; };
};
};

Acl "our-nets" = about 100 networks, divided on 5 different acls. Planning to upgrade
to 9.5.x soon, to speed up acl processing.

>Second, limiting max-clients-per-query doesn't help reduce the number
>of recursive clients if the same query is sent from different IP
>addresses.

Auch! Is that really correct? Should it not then be called "max-queries-per-client" and
not "max-clients-per-query"?

Not to repeat, but:


clients-per-query, max-clients-per-query
These set the initial value (minimum) and maximum number of recursive simultanious clients for
any given query (<qname,qtype,qclass>) that the server will accept before dropping additional
clients. named will attempt to self tune this value and changes will be logged. The default values
are 10 and 100.

As I understand the text, it is supposed to be a limit on number of queries for any given query,
regardless of client/IP address. And not a limit on number of queries per client.

Am I totally wrong?

>Third, having 49662 recursive clients looks so extraordinary. I
>suspect that the real problem is somewhere else.

ns11(root) OLD 503# wc -l query.log*
13773918 query.log
13761647 query.log.0
13779648 query.log.1
13781716 query.log.10
--CUT--

Logs are rotated every hour.

That is, more than 13 million queries each hour. Mpstat/CPU load is avg. 0.4,
and core saturation about 20%.

>---
>JINMEI, Tatuya
>Internet Systems Consortium, Inc.


Thanks
Jan Arild Lidnstrom


Jan Arild Lindstrøm

unread,
Sep 22, 2008, 3:27:16 AM9/22/08
to

Sorry,

>That is, more than 13 million queries each hour. Mpstat/CPU load is avg. 0.4,
>and core saturation about 20%.

.. it should be utilzation and not saturation.

Regards
Jan Arild Lindstrom

JINMEI Tatuya / 神明達哉

unread,
Sep 24, 2008, 8:16:00 PM9/24/08
to
At Mon, 22 Sep 2008 07:24:02 +0100,

Jan Arild Lindstrøm <j...@telenor.net> wrote:

> >Second, limiting max-clients-per-query doesn't help reduce the number
> >of recursive clients if the same query is sent from different IP
> >addresses.
>
> Auch! Is that really correct? Should it not then be called "max-queries-per-client" and
> not "max-clients-per-query"?

Oops, I was wrong. I was confused about the case where a single (or
multiple) client keeps sending a high volume of different bogus
queries (for which max-clients-per-query doesn't help).

I now see the problem. It's really strange to have more than 10,000
recursive clients for the same query while max-clients-per-query is
100. I have no specific idea about how this could happen, but I'd
suspect this may be a thread-related bug. Is it possible to rebuild
named without threads and see if the same problem happens?

Jan Arild Lindstrøm

unread,
Sep 25, 2008, 1:43:32 AM9/25/08
to

I do not know if our T2000 will manage to reply fast enough, or if it fast will get saturated
if it uses only one thread for alle the traffic. I will rebuild and test a little in parrallell on one
of them.

Regards
Jan Arild Lindstrom

0 new messages