Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

recursive-clients, what value ?

659 views
Skip to first unread message

Mikael

unread,
Jun 14, 2004, 9:09:31 AM6/14/04
to
Hello,

I'm working on bind 9.2 on linux mandrake and I had lots of logs
containing :

Jun 14 09:23:30 hostname named[1045]: client: client 127.0.0.1#34559: no
more recursive clients: quota reached
Jun 14 09:25:22 hostname named[1045]: client: client 127.0.0.1#34565: no
more recursive clients: quota reached

I read that increasing "recursive-clients" could help. I'm going to set
it to 2000 but is there a way to know what would be a good value ?
Is there a way to monitor in real time the number of simultaneous clients ?

Thanks in advance,

Mikael

Kevin Pickard

unread,
Jun 14, 2004, 7:20:17 PM6/14/04
to
Mikael <pub.n...@grizzli.org> wrote in message news:<caknas$1500$1...@sf1.isc.org>...

Determining a good value for the "recursive-clients" option depends on
your environment. From the logs you included, it looks like you're
just barely peaking over the default 1000 simultaneous recursive
queries every now and then, (because you only had two entries, and
they were a couple mins apart). I'd guess that, unless you expect
significantly more recursive queries in the future, that bumping
"recursive-clients" to 1100 should probably suffice. If you find that
you're still generating "quota reached" logs like above, bump it up a
little more until the messages subside.

Setting it at 2000, as you mentioned, is probably more than you need
-- theoretically a doubling of your current simultaneous recursive
query capacity.

Lastly, keep in mind that, as the Bv9ARM mentions, "each recursing
client uses a fair bit of memory, on the order of 20 kilobytes". Your
amount of physical memory ultimately dictates how high this value can
go. We have many servers in production which have this value set
above 3000, but they have the RAM to back it up.

Best Regards,

ksp

Richard Maynard

unread,
Jun 14, 2004, 8:18:11 PM6/14/04
to
> I read that increasing "recursive-clients" could help. I'm
> going to set
> it to 2000 but is there a way to know what would be a good value ?

As Kevin already pointed out, this is a question that can vary widely
depending on your environment. Firstly, I would advise against doubling the
limit, if you eventually want to get there that's okay, but you may want to
do it in smaller incremental steps. The recursive-clients limitations are
not only memory related, but you also have to factor in extra machine load,
not just in managing the additional queries, but also all the extra
connections it will have to keep track of. Doubling the amount of queries
will also increase network traffic, if you are on a network that is even
lightly congested then increases in queries could cause a greater number of
queries to fail than would previously.

The other part of what you put in your logs, is that localhost is the one
triggering these failures. If it's a caching only name server that serves
services on that machine only you could also investigate other items such as
increasing the cache time, or caching invalid responses, which I believe by
default is off. Caching negative responses can GREATLY reduce the
recursive-clients problem. If DNS for a high traffic domain is failing, and
you're recursive clients are all hitting their timeouts waiting for a
response when one won't be coming back then you'll end up again, seeing
errors like the ones you saw.

> Is there a way to monitor in real time the number of
> simultaneous clients ?

You could probably get this from turning up debugging/query logging and
parsing the data, but outside of that I'm not sure of any immediate way to
get the number of simultaneous clients. I would suggest keeping track of
your stats daily and looking at your long term trends, and be sure to look
at your cache hit rate, it gives a good idea of what other things you can
start tuning. While, most decent hardware running on a good network will
have no problems with an extra thousand recursive clients, it may not be the
ultimate answer, by increasing the recursive clients you could end up
masking a different issue.

-- Richard Maynard


Ladislav Vobr

unread,
Jun 14, 2004, 11:23:21 PM6/14/04
to
>
> I read that increasing "recursive-clients" could help. I'm going to set
> it to 2000 but is there a way to know what would be a good value ?
> Is there a way to monitor in real time the number of simultaneous clients ?

hmm, that would be nice to see, what kind of stuff and how much full is
the internal recursive queue. I don't know anything which can do this
and miss it too.

I have the value 2000, but regularly getting the similiar messages, bind
sometimes keeps retrying in hundereds of queries in background and
nobody is aware about it, and it uses up whole queue without logging any
thing else that that the queue is full.

My traffic is around 3000/sec on a caching 9.2.3, I asked some time back
here, what should be the number or how I can calculate it roughly, what
number would fit this kind of traffic, but no recommedation :-(

Ladislav


Ladislav Vobr

unread,
Jun 15, 2004, 12:46:56 AM6/15/04
to

> default is off. Caching negative responses can GREATLY reduce the
> recursive-clients problem. If DNS for a high traffic domain is failing, and
> you're recursive clients are all hitting their timeouts waiting for a
> response when one won't be coming back then you'll end up again, seeing
> errors like the ones you saw.
it is by default in bind9 (3hours) and (10min for lame servers) only
nxdomain, nxrrset are cached, servfail is not, time-outs are not, if
*all* nameservers for high traffic domain are down, bind will keep
flooding it sometimes with incredible rate depends purely on your
clients, bind doesn't control it in any way,just amplify (by retries)
you clients' flood and sends it to those "pure victim servers", same for
servfail.

>
>>Is there a way to monitor in real time the number of
>>simultaneous clients ?
> You could probably get this from turning up debugging/query logging and
> parsing the data, but outside of that I'm not sure of any immediate way to
> get the number of simultaneous clients. I would suggest keeping track of
> your stats daily and looking at your long term trends, and be sure to look
> at your cache hit rate, it gives a good idea of what other things you can
> start tuning.
Richard, how can I plot cache hit rate? Do you mean rndc stats, there is
only # of recursive requests.

Ladislav


Richard Maynard

unread,
Jun 15, 2004, 1:25:30 AM6/15/04
to
> it is by default in bind9 (3hours) and (10min for lame servers) only
> nxdomain, nxrrset are cached, servfail is not, time-outs are not, if
> *all* nameservers for high traffic domain are down, bind will keep
> flooding it sometimes with incredible rate depends purely on your
> clients, bind doesn't control it in any way,just amplify (by retries)
> you clients' flood and sends it to those "pure victim
> servers", same for
> servfail.

Ahhh! That's good to know, I didn't realize the ServFail's weren't cached.
Thanks for the info, I sure wish servfails were cached, as it stands, it's
not hard to start a short lived DOS against a DNS farm that will recurse for
you with bad servers and one bad domain.


> Richard, how can I plot cache hit rate? Do you mean rndc
> stats, there is
> only # of recursive requests.

You can look at the # of recursive requests and the number of total requests
to get the % of your requests that are answered from Cache. I found this to
be amongst the most useful datasets when correlating machine performance
with different configurations. It's not uncommon to see 75% of the queries
on my farm answered out of cache during peak usage hours. I don't know or
have exact numbers on how much longer a cached vs a non-cached query will
take right now, but I do have direct performance differences.

Getting my cache hit rate up from ~50% to ~65% yeilded more than a 15%
performance improvement, with only minimal changes to our configurations.

-- Richard Maynard


Ladislav Vobr

unread,
Jun 15, 2004, 1:43:26 AM6/15/04
to
> Ahhh! That's good to know, I didn't realize the ServFail's weren't cached.
> Thanks for the info, I sure wish servfails were cached, as it stands, it's
> not hard to start a short lived DOS against a DNS farm that will recurse for
> you with bad servers and one bad domain.
yes :-) I guess we all lernead it hard way :-)

>
>>Richard, how can I plot cache hit rate? Do you mean rndc
>>stats, there is
>>only # of recursive requests.
>
> You can look at the # of recursive requests and the number of total requests
> to get the % of your requests that are answered from Cache. I found this to
> be amongst the most useful datasets when correlating machine performance
> with different configurations. It's not uncommon to see 75% of the queries
> on my farm answered out of cache during peak usage hours. I don't know or
> have exact numbers on how much longer a cached vs a non-cached query will
> take right now, but I do have direct performance differences.
>
> Getting my cache hit rate up from ~50% to ~65% yeilded more than a 15%
> performance improvement, with only minimal changes to our configurations.

hmm, I thought there might be a relation, but seemed to me more
complicated, what about the failures don't they initiate a query before
they fail, is this failed query counted as well in recursion? and what
is it recursion actually, when I dump stats I see recursion, imho
caching server initiate on behalf of recursive clients non-recursive
queries (unless it is forwarding), so if it counts how many
non-recursive queries had to be done, because it was not cached, it
should not be called recursion, but i might be wrong here:-), I think
definitelly it is very good idea, to plot this. We are using rrdtool
here but use to plot only basic stats, we'll try this parametr as well.

Thanks for your support

Ladislav


Mikael

unread,
Jun 15, 2004, 5:27:02 AM6/15/04
to
Thanks for all your comments.
Actually, my problem is a bit deeper :
I got lots of those "quota reached" logs : 170 during 6 hours.
And my main problem is that my server got completely frozen and did not
accept any login until I rebooted it...
Whether it's the cause or the consequence of the freeze, that's what I'm
trying to find out... Any idea about that ?

Mikael

0 new messages