Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

no more recursive clients: quota reached

2,114 views
Skip to first unread message

Oliver Henriot

unread,
Mar 24, 2010, 10:41:28 AM3/24/10
to bind-...@lists.isc.org
Dear list users,

I'd like to understand a point about quotas on recursive clients quotas
and reading books, manuals and this list's archives hasn't made it
entirely clear to me.

I have the classical error logs :

17-Mar-2010 12:14:44.026 client: warning: client 129.88.30.5#57960: no
more recursive clients: quota reached

I have a lot of these... (two thousand unique clients blocked over the
last two weeks on my main resolver)

Is this quota global for all clients? I.e. one rogue client sending
massive amounts of recursive requests would blow the quota for everyone.
Or is it per client? It seems unlikely to me but I'm not clear on that
point.

Is increasing the quota limit the only solution?

It seems odd to me to hit the default bind limit on my servers when they
are not open recursive servers and only clients on my networks (a few
thousand clients for three recursive resolvers) can interrogate them.

The problem is particularly crucial because one of the clients is a
router behind which many of my clients are nated and each time the quota
is reached on the servers they use all the clients behind the router
address are blocked and get network timeouts.

I'm going to increase the quota, but if you can tell me if this the
right thing to do or if I should be looking for something else that
would be great.

Best regards,

Oliver Henriot

Fr34k

unread,
Mar 24, 2010, 11:11:19 AM3/24/10
to Oliver Henriot, bind-...@lists.isc.org
See the BIND ARM for the option recursive-clients

As in:

options {
recursive-clients 4000;
};

I don't recall what the default is (maybe 1000), but our environment required an increase to 4000.

You may also want to look at these options: tcp-clients X; clients-per-query N; max-clients-per-query P;

The defaults may vary on BIND version. Furthermore, settings may vary for the environment the DNS server is in.

Assuming the BIND version supports the rndc utility, one can see a snap shot in time on the current settings and activity.
For example:
# rndc status
version: 9.6.0-P1 (version.bind/txt/ch disabled)
CPUs found: 2
worker threads: 2
number of zones: 14
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is OFF
recursive clients: 71/3900/4000
tcp clients: 0/200
server is up and running


HTH -- Chris

Rich Goodson

unread,
Mar 24, 2010, 12:22:41 PM3/24/10
to Oliver Henriot, bind-...@lists.isc.org
I have 6 resolvers doing recursion for just under a million residential users, and I rarely see the "recursive clients" value go above 1500. We had issues a few months back with firewalls getting overloaded, and one of the symptoms was that recursive clients would climb in to the thousands (it hit around 13,000 once), due to packet loss (I assume failed lookups that caused queries to be repeated).

Right now, I have one server that's resolving somewhere in the 15kqps range and it's hovering between 600-800 recursive clients. That box is recently upgraded hardware (4 hex-core opterons), and is directly connected to a cisco 7609 that's on an OC-192. It is running at about 5% cpu utilization. I have another box that is older hardware (8-core T1000 processor), that is resolving 10-12kqps and it hovers around 1000 recursive queries on the wire. It is running at about 60% CPU utilization.

Are your servers behind a firewall?
If so, what's the CPU utilization look like on your packet filtering device?
What is your link saturation like? How about the link between any clients and your servers?
How about CPU utilization on your servers?

Those are the items I'd look at, but it could be that I'm biased by recently being burned by networking :-)

--
Rich Goodson

On Mar 24, 2010, at 9:41 AM, Oliver Henriot wrote:

> Dear list users,
>
> I'd like to understand a point about quotas on recursive clients quotas and reading books, manuals and this list's archives hasn't made it entirely clear to me.
>
> I have the classical error logs :
>
> 17-Mar-2010 12:14:44.026 client: warning: client 129.88.30.5#57960: no more recursive clients: quota reached
>
> I have a lot of these... (two thousand unique clients blocked over the last two weeks on my main resolver)
>
> Is this quota global for all clients? I.e. one rogue client sending massive amounts of recursive requests would blow the quota for everyone. Or is it per client? It seems unlikely to me but I'm not clear on that point.
>
> Is increasing the quota limit the only solution?
>
> It seems odd to me to hit the default bind limit on my servers when they are not open recursive servers and only clients on my networks (a few thousand clients for three recursive resolvers) can interrogate them.
>
> The problem is particularly crucial because one of the clients is a router behind which many of my clients are nated and each time the quota is reached on the servers they use all the clients behind the router address are blocked and get network timeouts.
>
> I'm going to increase the quota, but if you can tell me if this the right thing to do or if I should be looking for something else that would be great.
>
> Best regards,
>
> Oliver Henriot
>

> _______________________________________________
> bind-users mailing list
> bind-...@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users

Chris Thompson

unread,
Mar 24, 2010, 1:08:01 PM3/24/10
to Oliver Henriot, Bind Users Mailing List
On Mar 24 2010, Oliver Henriot wrote:

>Dear list users,
>
>I'd like to understand a point about quotas on recursive clients quotas
>and reading books, manuals and this list's archives hasn't made it
>entirely clear to me.
>
>I have the classical error logs :
>
>17-Mar-2010 12:14:44.026 client: warning: client 129.88.30.5#57960: no
>more recursive clients: quota reached
>
>I have a lot of these... (two thousand unique clients blocked over the
>last two weeks on my main resolver)
>
>Is this quota global for all clients? I.e. one rogue client sending
>massive amounts of recursive requests would blow the quota for everyone.
>Or is it per client? It seems unlikely to me but I'm not clear on that
>point.

It is the length of the queue of all outstanding recursive queries.
This depends not just on the RATE of queries coming in, but also the
time it takes to resolve them. (If the queue fills up, BIND gives up
on the ones that have been outstanding longest.)

Monitor the count with "rndc stats" to find out whether the outstanding
query queue is often close to the limit, or is spiking. In any case,
when the queue is large, take a look at it by using "rndc recursing"
(dumps the queue to "named.recursing" in BIND's current directory). You
may find that you have a lot of queries for some domain that is failing
to resolve in a timely fashion (we've had problems like that with people
trying to use RBLs from which we are blocked, for example).

You should also bear in mind the possibility of network problems, as
others have suggested. And firewall software might be mangling certain
outgoing queries, or the responses to them, making them appear to time
out.

--
Chris Thompson
Email: ce...@cam.ac.uk

Stephane Bortzmeyer

unread,
Mar 26, 2010, 4:13:12 AM3/26/10
to Chris Thompson, Bind Users Mailing List
On Wed, Mar 24, 2010 at 05:08:01PM +0000,
Chris Thompson <ce...@cam.ac.uk> wrote
a message of 46 lines which said:

> It is the length of the queue of all outstanding recursive queries.
> This depends not just on the RATE of queries coming in, but also the
> time it takes to resolve them. (If the queue fills up, BIND gives up
> on the ones that have been outstanding longest.)

Yes, and it is the Baofeng attack
<https://www.dns-oarc.net/files/workshop-200911/Ziqian_Liu.pdf>

John Wobus

unread,
Mar 26, 2010, 10:00:49 AM3/26/10
to bind-...@lists.isc.org
Typically you can increase the default without harm, e.g., double or x
10 if you
have a recent-vintage server with typical memory and speed, but
something might be causing the behavior that is impervious to
such a change or that needs some other kind of attention.
Such a problem might solely stem from sheer load, but quite often stems
from queries that are not receiving answers and are just sitting there
until they time out.

One of your clients might be making up names and trying them:
many would receive negative responses but a percent would receive
no response and sit. Or it could be that some specific locally-
popular domain's
nameservers are down or unreachable. Or it could be intermittent
network
problems. Or some kind of long-term routing/connectivity issue, e.g. the
consequences of firewalling.

If there are short episodes with tons of these log entries, that hints
at
short problems with your Internet connection, or a specific app that
is causing the issue when it runs. If your Internet connectivity
goes away in such manner that packets "disappear", then the number
of outstanding recursive queries typically steadily rises until the
quota
is reached.

If you look at the number of clients at random times and it is always
substantial and/or close to the quota, it may be that increasing the
quota is the right solution.

rndc lets you view the outstanding queries and see how long they've
been waiting, which provides a lot of insight into what is happening.

John Wobus
Cornell IT

0 new messages