Hello all,
and welcome to the BugCast of today!
This is about shutting down. Shutting down the director process. Which doesn't work. Or, lets be more precise: it works sometimes. Other times it looks like that:
Stopping bareos_dir.
Waiting for PIDS: 2501
120 second watchdog timeout expired. Shutdown terminated.
Now, there are a few things to observe with this:
sending a *second* sigTERM will always immediately terminate. (But the shutdown routine doesn't do that; it expects to be obeyed the first time.)
if a console happens to be open, exiting that console after the first sigTERM didn't work, will also immediately terminate. But the shutdown routine doesn't do such either.
Consequentially the shutdown routine will timeout, and will either fail to shutdown, or after timeout forcibly shutdown (by crashing all other programs and databases).
What is also obvious is that the signal handling is *BROKEN*: it gets hosed somewhere between signals and events.
And the last time I looked into the code for the signal handling (i think that was release 2.2.7 or such), this looked so horrible that I never went there again. It didn't look so very wrong, but it all looked like it was deliberately coded for LUMMUX.
Also, I just happend to happen to find a few abandoned fixes on the garret - but I'm not sure which kind of misbehaviour they actually fix, as there are so many of them - anyway, just have fun with these, or give them to your children to play with...
+ *** src/lib/signal.c.orig Thu Aug 5 16:29:51 2010
+ --- src/lib/signal.c Sun Oct 3 04:16:25 2010
+ ***************
+ *** 357,369 ****
+ /* Now setup signal handlers */
+ sighandle.sa_flags = 0;
+ sighandle.sa_handler = signal_handler;
+ ! sigfillset(&sighandle.sa_mask);
+ sigignore.sa_flags = 0;
+ sigignore.sa_handler = SIG_IGN;
+ sigfillset(&sigignore.sa_mask);
+ sigdefault.sa_flags = 0;
+ sigdefault.sa_handler = SIG_DFL;
+ ! sigfillset(&sigdefault.sa_mask);
+
+
+ sigaction(SIGPIPE, &sigignore, NULL);
+ --- 357,369 ----
+ /* Now setup signal handlers */
+ sighandle.sa_flags = 0;
+ sighandle.sa_handler = signal_handler;
+ ! sigemptyset(&sighandle.sa_mask);
+ sigignore.sa_flags = 0;
+ sigignore.sa_handler = SIG_IGN;
+ sigfillset(&sigignore.sa_mask);
+ sigdefault.sa_flags = 0;
+ sigdefault.sa_handler = SIG_DFL;
+ ! sigemptyset(&sigdefault.sa_mask);
+
+
+ sigaction(SIGPIPE, &sigignore, NULL);
Here is another one --
+ *** src/lib/signal.c.orig Sun Sep 22 23:02:39 2013
+ --- src/lib/signal.c Sun Sep 22 23:02:01 2013
+ ***************
+ *** 140,146 ****
+ }
+ Dmsg2(900, "sig=%d %s\n", sig, sig_names[sig]);
+ /* Ignore certain signals -- SIGUSR2 used to interrupt threads */
+ ! if (sig == SIGCHLD || sig == SIGUSR2) {
+ return;
+ }
+ already_dead++;
+ --- 140,146 ----
+ }
+ Dmsg2(900, "sig=%d %s\n", sig, sig_names[sig]);
+ /* Ignore certain signals -- SIGUSR2 used to interrupt threads */
+ ! if (sig == SIGCHLD || sig == SIGUSR2 || sig == 0) {
+ return;
+ }
+ already_dead++;
So with this I say, Stay tuned for the next BugCast!
----------
Footnotes:
* Bug numbers have been randomized for security reasons.