Slow restart

den...@gmail.com

unread,

Jan 20, 2014, 10:38:47 AM1/20/14

to gd...@googlegroups.com

Hello Brandon!

Just like to have any recommendations on how to fast up restart phase, coz currently it takes a lot of time:
# time /etc/init.d/gdnsd restart
* Restarting gdnsd gdnsd                                                                                                            [ OK ]

real    0m18.686s
user    0m2.608s
sys    0m0.356s

Top15 calls been made during restart:
% time     seconds usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
45.46    0.448028       40730        11           wait4
11.77    0.116007      116007         1           mlockall
9.52    0.093860         456       206           mmap
8.12    0.080006         159       504           nanosleep
4.90    0.048251          43      1134      1133 connect
3.91    0.038509         987        39           munmap
3.65    0.036002        1800        20           clone
2.45    0.024111          22      1119           recvfrom
2.44    0.024042          21      1130           getsockopt
2.44    0.024001        2000        12           execve
2.08    0.020501          16      1312           close
2.03    0.020021          39       511           epoll_wait
0.77    0.007570         505        15           mremap
0.41    0.004000         235        17           write
0.02    0.000215           0      2000           sendto
0.02    0.000186           0      1141         1 socket
0.02    0.000156           0      1119           shutdown
0.01    0.000132           0      2265           epoll_ctl
0.00    0.000019           0      1034           clock_gettime
0.00    0.000018           0       114           fstat

We still using restart to apply new configurations since count of zone files changes and pool changes in "config" file are almost the same during the day. Unfortunately there is no way to reduce count of pool adjustments during the day.

Currently I'm thinking in a bad way:
- create satellite process for gdnsd using the same configuration
- reroute incoming requests to satellite process with 'iptables -j REDIRECT' for a moment while main instance restarting
- route requests back to the main process by removing 'iptables -j REDIRECT' rule
- restart the satellite process
- introduce all that logic in init script

I know that according to RFC requesting DNS resolver should retry resolve few times in a row before give up, however I've met too many non-RFC implementations to sleep well with current state.
Also it's quite feasible to reach maxtry threshold by doing configuration apply / gdnsd restart on all servers one by one.
And as a last argument, this temporary unavailability of each server breaks RFC-compliant optimizations for the nearest auth. DNS server based on NS wights (like BIND implementation has)

Installed gdnsd version 1.10.1.

Do you have any best practices for that case?
Thank you in advance!

Brandon Black

unread,

Jan 20, 2014, 11:08:18 AM1/20/14

to gdnsd

On Mon, Jan 20, 2014 at 9:38 AM, <gdnsd+noreply-APn2wQfGKkkaYfpced...@googlegroups.com> wrote:

Hello Brandon!

Just like to have any recommendations on how to fast up restart phase, coz currently it takes a lot of time:
# time /etc/init.d/gdnsd restart
* Restarting gdnsd gdnsd                                                                                                            [ OK ]

real    0m18.686s
user    0m2.608s
sys    0m0.356s

Top15 calls been made during restart:
% time     seconds usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
45.46    0.448028       40730        11           wait4
11.77    0.116007      116007         1           mlockall
9.52    0.093860         456       206           mmap
8.12    0.080006         159       504           nanosleep
4.90    0.048251          43      1134      1133 connect

[....]

Does your initscript actually use gdnsd's native "restart", or is it doing its own stop->start sequence? gdnsd's native restart command already has some logic to speed things up - it loads the config and zonefiles in a new process while leaving the old one running and servicing requests, and then after all the data it loaded it shuts the old one down. If this is working correctly, the restart command may take the ~19s it does above to execute, but queries will still be flowing through the old daemon for the vast majority of that time.

So, if the initscript isn't using gdnsd's own fast-restart, that's the first thing to fix, and it may be all you need to fix. (note also that, last I checked, there was no way to do a native systemd service for gdnsd that uses the fast-restart mode, so if you're using systemd, that could be another can of worms. I have some thoughts on making it work (some ugly cgroup code), I just haven't been very inclined to work on the problem).

If you're sure that the initscript is executing gdnsd's fast restart (e.g. "/usr/sbin/gdnsd restart") and that there is a true and significant window of service unavailability during the restart, then we'll probably have to dig deeper and find out what's holding things up in your case (my best first guess would be that the old daemon is taking too long to stop when killed).

den...@gmail.com

unread,

Jan 20, 2014, 2:26:09 PM1/20/14

to gd...@googlegroups.com

On Monday, January 20, 2014 7:08:18 PM UTC+3, blblack wrote:

Does your initscript actually use gdnsd's native "restart", or is it doing its own stop->start sequence? gdnsd's native restart command already has some logic to speed things up - it loads the config and zonefiles in a new process while leaving the old one running and servicing requests, and then after all the data it loaded it shuts the old one down. If this is working correctly, the restart command may take the ~19s it does above to execute, but queries will still be flowing through the old daemon for the vast majority of that time.

So, if the initscript isn't using gdnsd's own fast-restart, that's the first thing to fix, and it may be all you need to fix. (note also that, last I checked, there was no way to do a native systemd service for gdnsd that uses the fast-restart mode, so if you're using systemd, that could be another can of worms. I have some thoughts on making it work (some ugly cgroup code), I just haven't been very inclined to work on the problem).

If you're sure that the initscript is executing gdnsd's fast restart (e.g. "/usr/sbin/gdnsd restart") and that there is a true and significant window of service unavailability during the restart, then we'll probably have to dig deeper and find out what's holding things up in your case (my best first guess would be that the old daemon is taking too long to stop when killed).

I've just double checked initscript, and yes, I'm using "native" restart option. Also I've performed some test by running
# watch -n0.5 'date; dig @127.0.0.1 www.example.com'
and for sure, port stopped to answer for about 10 seconds.

I guess the problem might be in the count of pools we have configured (>50 of vary, each one with http health check).

Brandon Black

unread,

Jan 21, 2014, 10:28:24 AM1/21/14

to gdnsd

On Mon, Jan 20, 2014 at 1:26 PM, <gdnsd+noreply-APn2wQfGKkkaYfpced...@googlegroups.com> wrote:

I guess the problem might be in the count of pools we have configured (>50 of vary, each one with http health check).

Yeah, reading through the source, I'm pretty sure that's the issue. The initial round of health checks happens synchronously after killing the old daemon but before starting service in the new one. There are a couple of ways we could design around that problem:

1) We could stop doing the synchronous wait on initial health checks. The main issue here is (a) we don't save persistent health state anywhere and (b) it could be very stale if we did. Just assuming everything is "up" (or that some saved state is still recent/valid) to avoid waiting on a restart could lead to a period of inappropriate responses until the resources fall back down from the normal down_thresh count of bad responses.

It's possible there's a solution here involving code to save/restore current health state and only using the restore data to skip the initial monitor round if the timestamp is relatively recent. Note that the plans for the upcoming 2.0 features include saved administrative state, just not saved healthcheck state, so there could be some implementation overlap that makes this a cheap idea. The downside of any solution of this sort is it's complex and heuristic-y and only solves the delay problem for monitoring, not for other potential delay sources.

2) We could overhaul the daemonization code (again) and try to make it possible to not stop the old daemon (on restart action) until just before we acquire the listening sockets (or even later in the case of a functioning SO_REUSEPORT). I think this would necessitate moving the code for killing the old daemon and acquiring the pidfile into the outer parent process on startup (not the final daemon process), switching the locking from fcntl() to flock() (and losing some atomicity guarantees, but they're not that critical in this case?), and passing the flock()'d fd down to the daemon to hold. It would also mean adding some additional complexity to the communications between the daemon and the outer parent over their existing status pipe connection (to pass the pidfile fd and perhaps some other state info).

I'm a little leery of touching the daemonization code again because it's tricky code and seems to be very stable in spite of its complexity in current form, but this solution could get us past all possible significant startup delays, and in combination with SO_REUSEPORT we could even get things working so well that not a single request is lost in a typical restart case.

Brandon Black

unread,

Jan 21, 2014, 10:48:08 AM1/21/14

to gdnsd

On Tue, Jan 21, 2014 at 9:28 AM, Brandon Black <blb...@gmail.com> wrote:

2) We could overhaul the daemonization code (again) [...]

Actually, I was re-thinking this, and maybe it can be done without too much invasive change. We could switch to using a chown()'d subdirectory under the rundir (/var/run or /run) so that the daemon can manage its pidfile post-privdrop. Even the kill() of the old daemon could be done post-privdrop on the assumption that the running uid hasn't changed (I don't think that's a big problem - if someone changes the user the daemon runs as, they'll have to accept a full stop + start cycle for that).

The reason I've avoided having the pidfile owned by the privdrop user in the past is that it creates a minor attack vector: a compromised daemon could rewrite the pidfile with the pid of an unrelated process so that a future "/etc/init.d/gdnsd stop" run by root kills an unrelated important process. However, I think that concern is outdated now that we're using fcntl() to get the pid of the locking daemon rather than reading the file, assuming someone doesn't write an initscript that bypasses all of this and tries to read the file and do things manually anyways.

-- Brandon

Reply all

Reply to author

Forward