I'm working on bind 9.2 on linux mandrake and I had lots of logs
containing :
Jun 14 09:23:30 hostname named[1045]: client: client 127.0.0.1#34559: no
more recursive clients: quota reached
Jun 14 09:25:22 hostname named[1045]: client: client 127.0.0.1#34565: no
more recursive clients: quota reached
I read that increasing "recursive-clients" could help. I'm going to set
it to 2000 but is there a way to know what would be a good value ?
Is there a way to monitor in real time the number of simultaneous clients ?
Thanks in advance,
Mikael
Determining a good value for the "recursive-clients" option depends on
your environment. From the logs you included, it looks like you're
just barely peaking over the default 1000 simultaneous recursive
queries every now and then, (because you only had two entries, and
they were a couple mins apart). I'd guess that, unless you expect
significantly more recursive queries in the future, that bumping
"recursive-clients" to 1100 should probably suffice. If you find that
you're still generating "quota reached" logs like above, bump it up a
little more until the messages subside.
Setting it at 2000, as you mentioned, is probably more than you need
-- theoretically a doubling of your current simultaneous recursive
query capacity.
Lastly, keep in mind that, as the Bv9ARM mentions, "each recursing
client uses a fair bit of memory, on the order of 20 kilobytes". Your
amount of physical memory ultimately dictates how high this value can
go. We have many servers in production which have this value set
above 3000, but they have the RAM to back it up.
Best Regards,
ksp
As Kevin already pointed out, this is a question that can vary widely
depending on your environment. Firstly, I would advise against doubling the
limit, if you eventually want to get there that's okay, but you may want to
do it in smaller incremental steps. The recursive-clients limitations are
not only memory related, but you also have to factor in extra machine load,
not just in managing the additional queries, but also all the extra
connections it will have to keep track of. Doubling the amount of queries
will also increase network traffic, if you are on a network that is even
lightly congested then increases in queries could cause a greater number of
queries to fail than would previously.
The other part of what you put in your logs, is that localhost is the one
triggering these failures. If it's a caching only name server that serves
services on that machine only you could also investigate other items such as
increasing the cache time, or caching invalid responses, which I believe by
default is off. Caching negative responses can GREATLY reduce the
recursive-clients problem. If DNS for a high traffic domain is failing, and
you're recursive clients are all hitting their timeouts waiting for a
response when one won't be coming back then you'll end up again, seeing
errors like the ones you saw.
> Is there a way to monitor in real time the number of
> simultaneous clients ?
You could probably get this from turning up debugging/query logging and
parsing the data, but outside of that I'm not sure of any immediate way to
get the number of simultaneous clients. I would suggest keeping track of
your stats daily and looking at your long term trends, and be sure to look
at your cache hit rate, it gives a good idea of what other things you can
start tuning. While, most decent hardware running on a good network will
have no problems with an extra thousand recursive clients, it may not be the
ultimate answer, by increasing the recursive clients you could end up
masking a different issue.
-- Richard Maynard
hmm, that would be nice to see, what kind of stuff and how much full is
the internal recursive queue. I don't know anything which can do this
and miss it too.
I have the value 2000, but regularly getting the similiar messages, bind
sometimes keeps retrying in hundereds of queries in background and
nobody is aware about it, and it uses up whole queue without logging any
thing else that that the queue is full.
My traffic is around 3000/sec on a caching 9.2.3, I asked some time back
here, what should be the number or how I can calculate it roughly, what
number would fit this kind of traffic, but no recommedation :-(
Ladislav
Ladislav
Ahhh! That's good to know, I didn't realize the ServFail's weren't cached.
Thanks for the info, I sure wish servfails were cached, as it stands, it's
not hard to start a short lived DOS against a DNS farm that will recurse for
you with bad servers and one bad domain.
> Richard, how can I plot cache hit rate? Do you mean rndc
> stats, there is
> only # of recursive requests.
You can look at the # of recursive requests and the number of total requests
to get the % of your requests that are answered from Cache. I found this to
be amongst the most useful datasets when correlating machine performance
with different configurations. It's not uncommon to see 75% of the queries
on my farm answered out of cache during peak usage hours. I don't know or
have exact numbers on how much longer a cached vs a non-cached query will
take right now, but I do have direct performance differences.
Getting my cache hit rate up from ~50% to ~65% yeilded more than a 15%
performance improvement, with only minimal changes to our configurations.
-- Richard Maynard
hmm, I thought there might be a relation, but seemed to me more
complicated, what about the failures don't they initiate a query before
they fail, is this failed query counted as well in recursion? and what
is it recursion actually, when I dump stats I see recursion, imho
caching server initiate on behalf of recursive clients non-recursive
queries (unless it is forwarding), so if it counts how many
non-recursive queries had to be done, because it was not cached, it
should not be called recursion, but i might be wrong here:-), I think
definitelly it is very good idea, to plot this. We are using rrdtool
here but use to plot only basic stats, we'll try this parametr as well.
Thanks for your support
Ladislav
Mikael