Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Samba] restarting samba using a cron job on Debian

47 views
Skip to first unread message

Lars Hanke

unread,
Nov 30, 2015, 5:20:03 AM11/30/15
to
For some reason my secondary DC loses sync every once in a while. It
looks like this in samba-tool drs showrepl:

Last attempt @ Thu Nov 19 13:53:09 2015 CET failed, result 5
(WERR_ACCESS_DENIED)
229 consecutive failure(s).
Last success @ Wed Nov 18 18:48:07 2015 CET

Restarting samba fixes the issue for an unpredictable time. Sometimes
hours, sometimes many weeks. So I wrote a script to restart samba in
this case:

#!/bin/bash
#
# Check if samba replication broke down and restart samba in this case
#
SAMBA_TOOL=/usr/bin/samba-tool
SED=/bin/sed
MAIL=/usr/bin/mail
RM=/bin/rm
MKTMP=/bin/mktemp

FAIL=`$SAMBA_TOOL drs showrepl | $SED -n "/^\s*[1-9][0-9]* consecutive
failure(s)\.$/p"`

if [[ -n "$FAIL" ]]; then
TMP=$($MKTMP)
$SAMBA_TOOL drs showrepl > "$TMP"
echo "Restart ..." >> "$TMP"
/etc/init.d/samba restart >> "$TMP"
echo "... done!" >> "$TMP"
$MAIL -s 'DC2 restart' sy...@example.com < "$TMP"
$RM -f "$TMP"
fi

And it works perfectly, if I run it manually. However, the idea is to
run it by cron every 5 minutes. But when it's run from cron restarting
samba fails:

Restart ...
Stopping NetBIOS name server: nmbd.
Stopping SMB/CIFS daemon: smbd.
Stopping Samba AD DC daemon: samba.
Starting Samba AD DC daemon: samba failed!
... done!

Running the same script manually from a root shell works however fine.

The system is Debian Jessie using samba 4.1.17-Debian. I start the
script using the following entry in root's crontab:

*/5 * * * * /root/samba-restart.sh

Any ideas what I'm doing wrong?

Thanks for your help,
- lars.

--
To unsubscribe from this list go to the following URL and read the
instructions: https://lists.samba.org/mailman/options/samba

Rowland Penny

unread,
Nov 30, 2015, 6:20:04 AM11/30/15
to
I think you may be using the wrong start/stop/restart init script. On
Debian there are usually 4 samba init scripts:

nmbd
smbd
samba
samba-ad-dc

There is also the winbind init script, but this will only be installed
if you are using winbind i.e. on a domain member

The nmbd & smbd init scripts are there to start and stop the individual
deamons, the samba init script runs both of the nmbd & smbd init
scripts, samba-ad-dc starts/stops the samba deamon, which will then
start the smbd deamon.

If you are running Samba4 as an AD DC, you should never start the nmbd
deamon, you should also never start smbd manually.

What you are trying to do is, in my opinion, the wrong way to go about
fixing the problem, you really should try to ascertain why you are
losing sync.

Rowland

Lars Hanke

unread,
Nov 30, 2015, 7:40:05 AM11/30/15
to
Thanks Rowland, for the thoughts.

> If you are running Samba4 as an AD DC, you should never start the nmbd
> deamon, you should also never start smbd manually.

Yes, I could optimize to use /etc/init.d/samba-ad-dc immediately,but in
fact this is what /etc/init.d/samba does. The messages with "AD DC
daemon" are generated by that script, and it is the failing one.

> What you are trying to do is, in my opinion, the wrong way to go about
> fixing the problem, you really should try to ascertain why you are
> losing sync.

Agreed. But I've no idea how to troubleshoot that issue. Any help on
fixing the cause is also appreciated.

Rowland Penny

unread,
Nov 30, 2015, 8:30:04 AM11/30/15
to
On 30/11/15 12:35, Lars Hanke wrote:
> Thanks Rowland, for the thoughts.
>
> > If you are running Samba4 as an AD DC, you should never start the nmbd
> > deamon, you should also never start smbd manually.
>
> Yes, I could optimize to use /etc/init.d/samba-ad-dc immediately,but
> in fact this is what /etc/init.d/samba does. The messages with "AD DC
> daemon" are generated by that script, and it is the failing one.

Yes, you are correct running /etc/init.d/samba does run samba-ad-dc, but
I still wouldn't run it, this is part of that script:

case $1 in
start)
/etc/init.d/nmbd start
/etc/init.d/smbd start
/etc/init.d/samba-ad-dc start
;;

So, it checks if it should start nmbd, exits because it shouldn't,
checks if it should start smbd, exits because it shouldn't, it then
checks if it should start samba and because it should, it does.

Why bother with all the above, just use samba-ad-dc instead.

>
> > What you are trying to do is, in my opinion, the wrong way to go about
> > fixing the problem, you really should try to ascertain why you are
> > losing sync.
>
> Agreed. But I've no idea how to troubleshoot that issue. Any help on
> fixing the cause is also appreciated.
>
>

You are going to have to supply more info, is there anything in the logs
when replication fails?
If not, try raising the log level until you do get something.

Rowland

Lars Hanke

unread,
Nov 30, 2015, 2:30:03 PM11/30/15
to
> Why bother with all the above, just use samba-ad-dc instead.

Yes, I streamlined the script somewhat. Let's see what happens when
replication fails next time.

> You are going to have to supply more info, is there anything in the logs
> when replication fails?
> If not, try raising the log level until you do get something.

Anything specific I should look for? As said it happens at unpredictable
intervals, usually after many days. An of course I cannot leave the
system in that state for long. So I have to get as many information at
once. Any recommendation for settings?

Rowland Penny

unread,
Nov 30, 2015, 3:00:04 PM11/30/15
to
On 30/11/15 19:16, Lars Hanke wrote:
> > Why bother with all the above, just use samba-ad-dc instead.
>
> Yes, I streamlined the script somewhat. Let's see what happens when
> replication fails next time.
>
> > You are going to have to supply more info, is there anything in the
> logs
> > when replication fails?
> > If not, try raising the log level until you do get something.
>
> Anything specific I should look for? As said it happens at
> unpredictable intervals, usually after many days. An of course I
> cannot leave the system in that state for long. So I have to get as
> many information at once. Any recommendation for settings?
>

If it was predictable, then you could just raise the log level to 10 and
wait for it to happen, copy the logs and then restart samba.

As it isn't predictable, you are going to have to set up your log
rotation system to rotate the log files every day until it does happen,
hopefully at this point, the logs will reveal your problem.

Rowland

George

unread,
Nov 30, 2015, 4:10:03 PM11/30/15
to
The same happens to me on Debian 7 with the included Samba 4.1.17.

Although relates to Ubuntu, this is the same issue:
https://bugs.launchpad.net/ubuntu/+source/samba/+bug/1357471

I can tell you that what triggers this are network connectivity issues
between the DCs, even if they last a couple of seconds. Once the
connectivity is restored, the replication stays broken unless Samba is
restarted.

I have recently compiled 4.3.1 and the related libraries from Experimental
and will give it a shot soon, we'll see if it behaves in the same way. I'll
keep you posted.

You can also use this little script as a workaround, run frequently via
cron (provided by Marco van Zwetselaar). Will check if replication is
broken and restart the service accordingly.

-----

#!/bin/sh
#
# check-samba-ad-dc.sh
#
# Stop gap measure to restart the Samba AD DC on WERR_CONNECTION_REFUSED
# https://bugs.launchpad.net/ubuntu/+source/samba/+bug/1357471
# https://bugzilla.samba.org/show_bug.cgi?id=11164

TMPFILE="/tmp/$(basename "$0").$$"

if ! samba-tool drs showrepl > "$TMPFILE" ||
! grep -q 'Last attempt .* successful' "$TMPFILE" ||
grep -q 'Last attempt .* failed' "$TMPFILE"; then
echo "Restarting Samba AD DC at $(date)"
service samba-ad-dc restart
fi

rm -f "$TMPFILE"

-----


Best regards.

George

George

unread,
Dec 21, 2015, 1:00:04 PM12/21/15
to
I can confirm that this issue seems resolved (at least) on v.4.3.1

Best regards

Lars Hanke

unread,
Dec 23, 2015, 2:10:03 PM12/23/15
to
Hi Rowland,

>> > Why bother with all the above, just use samba-ad-dc instead.
>> Yes, I streamlined the script somewhat. Let's see what happens when
>> replication fails next time.
>> > You are going to have to supply more info, is there anything in the
>> logs
>> > when replication fails?

Didn't yet raise the log level to 10, but will do now. However, the new
script did an automatic restart, which apparently worked fine, but
actually did wreak obscure havoc.

Today I got reports that write access to the file servers was erratic
and slow. Checking that secondary DC I found:

# samba-tool drs showrepl
Failed to connect host 172.16.10.17 on port 135 -
NT_STATUS_CONNECTION_REFUSED

something I did not check for, ps aux revealed:

root 23501 3.6 0.6 493684 49016 ? S Dez11 621:24
/usr/sbin/samba -D

i.e. the process eats a lot of CPU and was quite resistive to being
killed. The automatic restart happened on Dec 21st! The primary DC had
some 800 consecutive failures. I restarted the container and everything
worked fine.

I'll set the log level to 10 and we'll see what happens ...

Regards,
- lars-
0 new messages