Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Samba] samba4 replication issues | sam.ldb inconsistency

817 views
Skip to first unread message

mourik jan heupink - merit

unread,
Jul 8, 2014, 12:00:03 PM7/8/14
to
Hi all,

We seem to have some issues with our samba4 ad setup. I posted about
this last week already but had received no replies at all so far. :-(

What is our situation:

two domain controllers (dc1 and dc2), one (separate) fileserver, all
running sernet-4.1.7. From the workstations perspective, everything is
running as it should, there appear to be no issues.

However: something in my replication has gone wrong... on dc2:

==== INBOUND NEIGHBORS ====

DC=DomainDnsZones,DC=samba,DC=company,DC=com
Default-First-Site-Name\DC1 via RPC
DSA object GUID: 81a27497-bdfb-4977-9874-675bbfba490f
Last attempt @ Tue Jul 8 17:12:09 2014 CEST failed,
result 8442 (WERR_DS_DRA_INTERNAL_ERROR)
3252 consecutive failure(s).
Last success @ Tue Jul 1 16:34:57 2014 CEST

CN=Configuration,DC=samba,DC=company,DC=com
Default-First-Site-Name\DC1 via RPC
DSA object GUID: 81a27497-bdfb-4977-9874-675bbfba490f
Last attempt @ Tue Jul 8 17:12:10 2014 CEST was successful
0 consecutive failure(s).
Last success @ Tue Jul 8 17:12:10 2014 CEST
(the rest all replicates succesfully)

Then, to verify integrity of DC=DomainDnsZones on dc1, I type:

root@dc1:/var/log/samba# samba-tool dbcheck --cross-ncs
ltdb:
tdb(/var/lib/samba/private/sam.ldb.d/DC=DOMAINDNSZONES,DC=SAMBA,DC=COMPANY,DC=COM.ldb):
tdb_rec_read bad magic 0x198 at offset=1044437120
ERROR(ldb): uncaught exception - Indexed and full searches both failed!

On dc2 the same "samba-tool dbcheck cross-ncs" says: "checking 187478
objects". Has been running for many hours now, I have no idea how far it
is. The server is pretty buzy doing it.

So, my working conclusion is that on DC1 the
DC=DomainDnsZones,DC=samba,DC=company,DC=com has become corrupted, and
therefore fails to replicate to dc2.

Does the list agree with this?

I hope that dc2 is still having the correct DC=DomainDnsZones. But,
since replication seems to be only from dc1 TO dc2, I'm unsure how to
import the healthy dc2 database into dc1.

Does the above make any sense? How to make both dc's happy and fully
functional again?

Any help would be VERY much appreciated... Hopefully I'll get some
replies this time!

Kind regards,
MJ
--
To unsubscribe from this list go to the following URL and read the
instructions: https://lists.samba.org/mailman/options/samba

Daniel Müller

unread,
Jul 9, 2014, 1:50:01 AM7/9/14
to
I had the same issue with the same situation: the same "samba-tool dbcheck
cross-ncs" says: "checking 187478 objects". Has been running for many hours
now".
The only thing I could do is to reinstall samba on the corrupt dc


EDV Daniel Müller

Leitung EDV
Tropenklinik Paul-Lechler-Krankenhaus
Paul-Lechler-Str. 24
72076 Tübingen
Tel.: 07071/206-463, Fax: 07071/206-499
eMail: mue...@tropenklinik.de
Internet: www.tropenklinik.de




-----Ursprüngliche Nachricht-----
Von: samba-...@lists.samba.org [mailto:samba-...@lists.samba.org] Im
Auftrag von mourik jan heupink - merit
Gesendet: Dienstag, 8. Juli 2014 17:59
An: sa...@lists.samba.org
Betreff: [Samba] samba4 replication issues | sam.ldb inconsistency

mourik jan heupink - merit

unread,
Jul 9, 2014, 5:30:02 AM7/9/14
to
Hi Daniel,

Thanks for your reply. But I'm unsure which of my dc's is still healthy:

On dc2 "samba-tool dbcheck cross-ncs" says: "checking 187478 objects",
and has been running for many hours now...

On dc1 the DC=DomainDnsZones,DC=samba,DC=company,DC=com seems to be
corrupt:
user@server:/tmp/sam.ldb.d$ tdbbackup -v
./DC\=DOMAINDNSZONES\,DC\=SAMBA\,DC\=COMPANY\,DC\=COM.ldb
tdb_rec_read bad magic 0x198 at offset=1044437120
restoring ./DC=DOMAINDNSZONES,DC=SAMBA,DC\=COMPANY\,DC\=COM.ldb
./DC=DOMAINDNSZONES,DC=SAMBA,DC\=COMPANY\,DC\=COM.ldb.bak: No such file
or directory
user@server:/tmp/sam.ldb.d$

Yet the network seems to work quite nicely still... but I guess we're
sitting on top of a ticking timebomb...

This sounds as if we're in deep ****, doesn't it? Any help would be very
much appreciated...

By the way, you had EXACTLY the same number of objects samba-tool was
checking eternally??

mourik jan heupink - merit

unread,
Jul 9, 2014, 6:00:02 AM7/9/14
to
Ok, DC2 seems to be healthy from an ldb point of view: I checked with
"tdbbackup -v" all .ldb files in /private/sam.ldb.d/ and they all finish
successfully.

Yet, on the same DC2 the "samba-tool dbcheck cross-ncs" is still buzy
checking the 187478 objects...

Judging from the above, is DC2 healthy or not?

We know that DC1 has a corrupt DC=DomainDnsZones ldb database, so how to
proceed now? Install a new DC3..? Copy the DC=DomainDnsZones ldb from
DC2 to DC1?

Tips, trics, pointers?

Daniel Müller

unread,
Jul 9, 2014, 7:10:03 AM7/9/14
to
Yes there were about "checking 187478 objects" objects never ending...


EDV Daniel Müller

Leitung EDV
Tropenklinik Paul-Lechler-Krankenhaus
Paul-Lechler-Str. 24
72076 Tübingen
Tel.: 07071/206-463, Fax: 07071/206-499
eMail: mue...@tropenklinik.de
Internet: www.tropenklinik.de





-----Ursprüngliche Nachricht-----
Von: mourik jan heupink - merit [mailto:heu...@merit.unu.edu]
Gesendet: Mittwoch, 9. Juli 2014 11:27
An: mue...@tropenklinik.de; sa...@lists.samba.org
Betreff: Re: AW: [Samba] samba4 replication issues | sam.ldb inconsistency

mourik jan heupink - merit

unread,
Jul 9, 2014, 7:30:03 AM7/9/14
to
Hi Daniel, list,

Ok, so it sounds as if we're in serious problems, with a (separate?)
issue on both dc1 and dc2. Yet nobody else (thank you, Daniel!) is
responding at all...

Does nobody know how to handle a situation like this? Has no one ever
been here? All help would be DEEPLY appreciated... Again, to resume:

We have two dc's:
On dc2: samba-tool dbcheck cross-ncs" has been checking 187478 objects
for 18 hours or so, but "tdbbackup -v" tells me all ldb's are healty

On dc1: root@dc1:/var/log/samba# samba-tool dbcheck --cross-ncs
ltdb:
tdb(/var/lib/samba/private/sam.ldb.d/DC=DOMAINDNSZONES,DC=SAMBA,DC=COMPANY,DC=COM.ldb):
tdb_rec_read bad magic 0x198 at offset=1044437120
ERROR(ldb): uncaught exception - Indexed and full searches both failed!

On both DC's the DC=DOMAINDNSZONES ldb file is ten times bigger than the
other ldb files. Not sure if that's normal.

What to do?
- Add a third (new, fresh) DC, and hope it will sync successfully?
- copy the DC=DOMAINDNSZONES ldb from dc2 to dc1?

Fortunately the network still seems to be running fine.

Achim Gottinger

unread,
Jul 9, 2014, 7:50:03 AM7/9/14
to
Am 09.07.2014 13:24, schrieb mourik jan heupink - merit:
> Hi Daniel, list,
>
> Ok, so it sounds as if we're in serious problems, with a (separate?)
> issue on both dc1 and dc2. Yet nobody else (thank you, Daniel!) is
> responding at all...
>
> Does nobody know how to handle a situation like this? Has no one ever
> been here? All help would be DEEPLY appreciated... Again, to resume:
>
> We have two dc's:
> On dc2: samba-tool dbcheck cross-ncs" has been checking 187478 objects
> for 18 hours or so, but "tdbbackup -v" tells me all ldb's are healty
>
> On dc1: root@dc1:/var/log/samba# samba-tool dbcheck --cross-ncs
> ltdb:
> tdb(/var/lib/samba/private/sam.ldb.d/DC=DOMAINDNSZONES,DC=SAMBA,DC=COMPANY,DC=COM.ldb):
> tdb_rec_read bad magic 0x198 at offset=1044437120
> ERROR(ldb): uncaught exception - Indexed and full searches both failed!
>
> On both DC's the DC=DOMAINDNSZONES ldb file is ten times bigger than
> the other ldb files. Not sure if that's normal.
>
> What to do?
> - Add a third (new, fresh) DC, and hope it will sync successfully?
> - copy the DC=DOMAINDNSZONES ldb from dc2 to dc1?
>
> Fortunately the network still seems to be running fine.
If one of your two DC's is still working flawless you can try to move
all fsmo roles to that server and rejoin the other one.
DC=DOMAINDNSZONES can become pretty huge since it keeps all the deleted
dns entries for 6 month by default.
To shrink it you can tdbbackup the ldb file and use that dumped file
which should be smaller. Stop samba run tdbbackup and copy the backup to
the original location.
Seems tdbbackup works on dc1 for
DC=DOMAINDNSZONES,DC=SAMBA,DC=COMPANY,DC=COM.ldb maybe using the backup
fixes your issues.

achim~

mourik jan heupink - merit

unread,
Jul 9, 2014, 8:40:02 AM7/9/14
to
Hi achim, list

> If one of your two DC's is still working flawless you can try to move
> all fsmo roles to that server and rejoin the other one.
But I'm not *sure* that one of my dc's is in perfect shape. I *know*
that the DC=DOMAINDNSZONES on dc1 is corrupt.

DC2 seems to be fine, however, samba-tool dbcheck cross-ncs never stops
checking, and has been running for 18 hours now. So perhaps dc2 is not
healthy too?

samba-tool fsmo show tells me that all roles are currently on the DC1.

I'm a bit hesitant to start messing with my AD (transferring roles,
etc), because of the uncertain state it seems to be in. I'm not sure if
I'll be able to reverse it, if this goes terribly wrong.

If I *knew* that DC2 is healthy, I could transfer all roles there, etc.
But as Daniel said: he had to reinstall a DC because of "samba-tool
dbcheck cross-ncs" that never ended. (like the situation on my DC2)

> Seems tdbbackup works on dc1 for
> DC=DOMAINDNSZONES,DC=SAMBA,DC=COMPANY,DC=COM.ldb maybe using the backup
> fixes your issues.
So, is it possible to use take the
DC=DOMAINDNSZONES,DC=SAMBA,DC=COMPANY,DC=COM.ldb from the working dc,
and copy it to the problem dc? Can I overwrite the corrupt file with
another dc's file?

Or is my best bet now to install a DC3, and see what gets replicated to
that new dc?

MJ

Achim Gottinger

unread,
Jul 9, 2014, 8:50:03 AM7/9/14
to
It sounded like tdbbackup
DC=DOMAINDNSZONES,DC=SAMBA,DC=COMPANY,DC=COM.ldb works on your dc1. So
i'd try the result of that backup operation first.

As far as i unterstand fsmo roles from following that list there is
nothing to transfer it's just an setting so it can be changed even after
the server holding all the roles was removed from the network. Someone
please correct me if i'm wron on this one.

Id expect you need an server with working fsmo roles to join an new dc
to your domain, be it dc3 as an new one or dc1 denotet and rejoined.

Best is to do an backup like it's mentioned in the wiki from your
working server dc2 before proceding.

L.P.H. van Belle

unread,
Jul 9, 2014, 9:00:02 AM7/9/14
to
FSMO Roles are not "just" a setting..

This is a most importent part..
You can set different FSMO Roles on different DC's ist not just for 1 server.

You have 5 FSMO roles.

Schema master FSMO role holder is the DC responsible for performing updates to the directory schema

Domain naming master role holder is the DC responsible for making changes to the forest-wide domain name space of the directory

RID master FSMO role holder is the single DC responsible for processing RID Pool requests from all DCs within a given domain. It is also responsible for removing an object from its domain and putting it in another domain during an object move.

PDC emulator is necessary to synchronize time in an enterprise. Windows includes the W32Time (Windows Time) time service that is required by the Kerberos authentication protocol.
it also handles : Password changes, Account lockouts are processed by PDC
and the PDC performas the functions that a MS NT4.0 Bases PDC did.

Infrastructure master should be held by a domain controller that is not a Global Catalog server(GC).
( which is almost never the case )

above is mostly a copy of :
http://support.microsoft.com/kb/197132


Louis


>-----Oorspronkelijk bericht-----
>Van: ac...@ag-web.biz [mailto:samba-...@lists.samba.org]
>Namens Achim Gottinger
>Verzonden: woensdag 9 juli 2014 14:47
>Aan: sa...@lists.samba.org
>Onderwerp: Re: [Samba] samba4 replication issues | sam.ldb
>inconsistency
>

L.P.H. van Belle

unread,
Jul 9, 2014, 9:10:02 AM7/9/14
to
In you case, i would go for a new install.

Add DC3 to the domain, check the database.
move the FSMO Roles to this server.

Why a new install, easier to check for errors.
and ... Dont hurry to much, give the system time to set things.
sychronizing takes time.. so be patient.

and ( in my case ) takes much less time to fix than solving the problem.
and 1 importent thing, if you do this on a VM, set DC3 to atleast 8GB Ram.
with virtual DC's you set setup within 15 min. ( 5 min for an extra dc for me with my scripts ;-) )
This is also why i dont use a DC for other things the being a DC.
Very fast recovering and new installs.

Louis

>-----Oorspronkelijk bericht-----
>Van: heu...@merit.unu.edu
>[mailto:samba-...@lists.samba.org] Namens mourik jan
>heupink - merit
>Verzonden: woensdag 9 juli 2014 14:31
>Aan: sa...@lists.samba.org
>Onderwerp: Re: [Samba] samba4 replication issues | sam.ldb
>inconsistency
>

mourik jan heupink - merit

unread,
Jul 9, 2014, 10:00:02 AM7/9/14
to
Hi Louis,

Thanks for your reply.

I just installed a new dc3, and tried to join it to the AD:

- it finds my writeble DC1
- the join succeeds
- replication starts, but then it bails out on:
Replicating DC=DomainDnsZones,DC=samba,DC=company,DC=com
Join failed - cleaning up

And for the record: DC=DomainDnsZones is the corrupted ldb file on my
dc1. So this sounds logical to me.

How dangerous is it, at this point, to transfer all (or perhaps some?)
fsmo roles to dc2, which seems to be in slightly better shape than dc1...?

(however: on dc2 "samba-tool dbcheck cross-ncs" is STILL checking 187479
objects, and, as indicated by Daniel, it will probably never finish that)

What a mess...

MJ

Daniel Müller

unread,
Jul 9, 2014, 10:10:03 AM7/9/14
to
In my case I had to kill this with kill 9 and restart the server.


EDV Daniel Müller

Leitung EDV
Tropenklinik Paul-Lechler-Krankenhaus
Paul-Lechler-Str. 24
72076 Tübingen
Tel.: 07071/206-463, Fax: 07071/206-499
eMail: mue...@tropenklinik.de
Internet: www.tropenklinik.de





-----Ursprüngliche Nachricht-----
Von: samba-...@lists.samba.org [mailto:samba-...@lists.samba.org] Im
Auftrag von mourik jan heupink - merit
Gesendet: Mittwoch, 9. Juli 2014 15:57
An: sa...@lists.samba.org
Betreff: Re: [Samba] samba4 replication issues | sam.ldb inconsistency

L.P.H. van Belle

unread,
Jul 9, 2014, 10:40:02 AM7/9/14
to

Ok before you reboot.. .


what is the output off..

ls -lai .../samba/private/sam.ldb.d/

ls -lai .../samba/private/dns/sam.ldb.d/

and how much ram does you server have.
and only samba or running more, of lots more.

Greetz,

Louis


>-----Oorspronkelijk bericht-----
>Van: heu...@merit.unu.edu
>[mailto:samba-...@lists.samba.org] Namens mourik jan
>heupink - merit
>Verzonden: woensdag 9 juli 2014 15:57

Achim Gottinger

unread,
Jul 9, 2014, 11:00:01 AM7/9/14
to
Am 09.07.2014 14:54, schrieb L.P.H. van Belle:
> FSMO Roles are not "just" a setting..
>
> This is a most importent part..
> You can set different FSMO Roles on different DC's ist not just for 1 server.
>
> You have 5 FSMO roles.
>
> Schema master FSMO role holder is the DC responsible for performing updates to the directory schema
>
> Domain naming master role holder is the DC responsible for making changes to the forest-wide domain name space of the directory
>
> RID master FSMO role holder is the single DC responsible for processing RID Pool requests from all DCs within a given domain. It is also responsible for removing an object from its domain and putting it in another domain during an object move.
>
> PDC emulator is necessary to synchronize time in an enterprise. Windows includes the W32Time (Windows Time) time service that is required by the Kerberos authentication protocol.
> it also handles : Password changes, Account lockouts are processed by PDC
> and the PDC performas the functions that a MS NT4.0 Bases PDC did.
>
> Infrastructure master should be held by a domain controller that is not a Global Catalog server(GC).
> ( which is almost never the case )
>
> above is mostly a copy of :
> http://support.microsoft.com/kb/197132
>
>
> Louis
>
>
What i meant was that the branches with fsmo related information get
replicated accross all addc's. So in case you transfer an role the
branches do not need to get transfered it's just an setting in the ldap
tree which changes.

mourik jan heupink - merit

unread,
Jul 9, 2014, 11:30:02 AM7/9/14
to
Hi Louis,

> ls -lai .../samba/private/sam.ldb.d/
See here: http://pastebin.com/8Uxt7Hza

> ls -lai .../samba/private/dns/sam.ldb.d/
I have no such directory

> and how much ram does you server have.
DC1 2gig, DC2 3gig. (top tells me both DC's use around 3/4 of their
memory, hardly any or no swap in use)

The DC's are only DC's, nothing else. We have around 300 accounts.

Just one *vital* question in all this:

On DC2 the "samba-tool dbcheck cross-nc" never finishes checking 187478
objects. However: everything on the DC2 seems to work beautifully. It is
fully replicated (except for the DC=DomainDnsZones), DNS on the DC2
works also for our internal domain, the "tdbbackup -v" tells me all ldb
files are fine, etc, etc.

So: the mere fact that "samba-tool dbcheck cross-nc" never finishes...
does it mean that there actually *is* something wrong, or could I
perhaps assume that nothing is wrong with my DC2?

In which case I could transfer roles to DC2, then add a DC3, etc, etc.

The only problem on DC2, as far as I can tell, is: "samba-tool dbcheck
cross-nc" and it's never ending quest to check 187478 objects...

Anyway: thanks *very* much for stepping in and helping!

L.P.H. van Belle

unread,
Jul 10, 2014, 2:50:02 AM7/10/14
to

I cant think of much more beside increasing the log levels and
run samba-tool with verbose so we can see more.

have you tried to reindex on DC1 and 2?

samba-tool dbcheck --reindex --verbose

and run samba tool with verbose so we can see a bit more maybe.
samba-tool dbcheck --cross-ncs --verbose

you can run :
ldbsearch -H path-to-DC=DOMAINDNZ... etc .. .ldb
and compair DC1 and DC2.

here bit more info:
https://wiki.samba.org/index.php/Samba4/LDBIntro

ldbmodify to modify... but becarefull with this.

And maybe any samba dev can have a look also..



Louis


>-----Oorspronkelijk bericht-----
>Van: heu...@merit.unu.edu
>[mailto:samba-...@lists.samba.org] Namens mourik jan
>heupink - merit
>Verzonden: woensdag 9 juli 2014 17:22

Andrew Bartlett

unread,
Jul 10, 2014, 6:10:02 AM7/10/14
to
On Tue, 2014-07-08 at 17:58 +0200, mourik jan heupink - merit wrote:
> Hi all,
>
> We seem to have some issues with our samba4 ad setup. I posted about
> this last week already but had received no replies at all so far. :-(

If you urgently need help, please contact a Samba commercial support
provider with experience in the AD DC:

https://www.samba.org/samba/support/globalsupport.html
This implies very serious corruption of this tdb (ldb) file.

> On dc2 the same "samba-tool dbcheck cross-ncs" says: "checking 187478
> objects". Has been running for many hours now, I have no idea how far it
> is. The server is pretty buzy doing it.

This is quite likely, as dbcheck is fairly intensive and the internal
DNS bug regarding deleted objects means we get a *lot* of records. It
probably is still making progress however.

Perhaps see the suggestions elsewhere on this list for purging the DNS
records after only 1 month.

> So, my working conclusion is that on DC1 the
> DC=DomainDnsZones,DC=samba,DC=company,DC=com has become corrupted, and
> therefore fails to replicate to dc2.
>
> Does the list agree with this?

Yes.

> I hope that dc2 is still having the correct DC=DomainDnsZones. But,
> since replication seems to be only from dc1 TO dc2, I'm unsure how to
> import the healthy dc2 database into dc1.
>
> Does the above make any sense? How to make both dc's happy and fully
> functional again?
>
> Any help would be VERY much appreciated... Hopefully I'll get some
> replies this time!

This is a difficult situation. Ideally you would get the 'good' DC to
replicate to a new installation, and work from there.

Andrew Bartlett

--
Andrew Bartlett http://samba.org/~abartlet/
Authentication Developer, Samba Team http://samba.org
Samba Developer, Catalyst IT http://catalyst.net.nz/services/samba

mourik jan heupink - merit

unread,
Jul 11, 2014, 6:20:02 AM7/11/14
to
Hi Louis,

On 07/10/2014 08:46 AM, L.P.H. van Belle wrote:
>
> I cant think of much more beside increasing the log levels and
> run samba-tool with verbose so we can see more.
>
> have you tried to reindex on DC1 and 2?
>
> samba-tool dbcheck --reindex --verbose
>
> and run samba tool with verbose so we can see a bit more maybe.
> samba-tool dbcheck --cross-ncs --verbose
Thanks for the --verbose tip, I didn't know that one. I'll try that one
shortly.

MJ

mourik jan heupink - merit

unread,
Jul 11, 2014, 6:40:02 AM7/11/14
to
Hi Andrew, list,

> This is a difficult situation. Ideally you would get the 'good' DC to
> replicate to a new installation, and work from there.
>
> Andrew Bartlett
>
Ok, this is what I thought, yes. Thank you, Andrew. Just some final
confirmation:

All fsmo roles are currently on my DC1 (with the corrupt DomainDnsZones
database). I'm a bit hesitant to start moving around roles, as long as
everything still seems to work. But, as far as I see, there are three
options to proceed from here:

option 1 - move only the DomainNamingMasterRole from (corrupt) DC1 to
(probably healthy) DC2. Then install/add a new DC3, and then it will
replicate everything from DC1, except it will take the DomainDnsZones
from DC2, is that right?)

(but I don't know if DC=DomainDnsZones and the role
DomainNamingMasterRole are connected with each other like this)

option 2 - take a deep breath, move all roles to DC2, hope & check
everything still works afterwards, and then install/add DC3, so it will
replicate everything from DC2.

And I guess this is NOT possible:
option 3 - Install a new third DC3, and replicate that new DC3 with my
(probably healthy) DC2, WITHOUT doing scary things like transferring
fsmo roles first?

I hope option 1 could work..? Can anyone confirm/advise?

MJ

Andrew Bartlett

unread,
Jul 11, 2014, 7:00:03 AM7/11/14
to
On Fri, 2014-07-11 at 12:34 +0200, mourik jan heupink - merit wrote:
> Hi Andrew, list,
>
> > This is a difficult situation. Ideally you would get the 'good' DC to
> > replicate to a new installation, and work from there.
> >
> > Andrew Bartlett
> >
> Ok, this is what I thought, yes. Thank you, Andrew. Just some final
> confirmation:
>
> All fsmo roles are currently on my DC1 (with the corrupt DomainDnsZones
> database). I'm a bit hesitant to start moving around roles, as long as
> everything still seems to work. But, as far as I see, there are three
> options to proceed from here:
>
> option 1 - move only the DomainNamingMasterRole from (corrupt) DC1 to
> (probably healthy) DC2. Then install/add a new DC3, and then it will
> replicate everything from DC1, except it will take the DomainDnsZones
> from DC2, is that right?)
>
> (but I don't know if DC=DomainDnsZones and the role
> DomainNamingMasterRole are connected with each other like this)
>
> option 2 - take a deep breath, move all roles to DC2, hope & check
> everything still works afterwards, and then install/add DC3, so it will
> replicate everything from DC2.
>
> And I guess this is NOT possible:
> option 3 - Install a new third DC3, and replicate that new DC3 with my
> (probably healthy) DC2, WITHOUT doing scary things like transferring
> fsmo roles first?

There is no need to move roles in the short term. They don't do
anything until you need to allocate a new RID pool. By then, hopefully
you can successfully seize them.

I think option 3 is the best option. Get that working, as you don't
loose anything by taking this option.

We don't have great tools for removing dead DCs yet (we have tools that
*should* do that, but clearly by reports do not). We need to sort that
out.

Thanks,

Andrew Bartlett

--
Andrew Bartlett http://samba.org/~abartlet/
Authentication Developer, Samba Team http://samba.org
Samba Developer, Catalyst IT http://catalyst.net.nz/services/samba


mourik jan heupink - merit

unread,
Jul 11, 2014, 9:20:01 AM7/11/14
to
Hi Andrew, list,

>> And I guess this is NOT possible:
>> option 3 - Install a new third DC3, and replicate that new DC3 with my
>> (probably healthy) DC2, WITHOUT doing scary things like transferring
>> fsmo roles first?
>
> I think option 3 is the best option. Get that working, as you don't
> loose anything by taking this option.
But... I tried adding a DC3, and it started replicating with DC1, and of
course it failed to replicate the faulty DomainDnsZones. So the join
failed, and was reversed.

So...how to add a DC3, and make it sync to DC2 instead of DC1? I assumed
that this is not possible without transferring fsmo rules, but your
recommendation above tells me that it IS possible?

That would be super cool, and by far the best solution out of this mess.

MJ

Achim Gottinger

unread,
Jul 11, 2014, 9:30:02 AM7/11/14
to
Am 11.07.2014 15:16, schrieb mourik jan heupink - merit:
> Hi Andrew, list,
>
>>> And I guess this is NOT possible:
>>> option 3 - Install a new third DC3, and replicate that new DC3 with my
>>> (probably healthy) DC2, WITHOUT doing scary things like transferring
>>> fsmo roles first?
>>
>> I think option 3 is the best option. Get that working, as you don't
>> loose anything by taking this option.
> But... I tried adding a DC3, and it started replicating with DC1, and
> of course it failed to replicate the faulty DomainDnsZones. So the
> join failed, and was reversed.
>
> So...how to add a DC3, and make it sync to DC2 instead of DC1? I
> assumed that this is not possible without transferring fsmo rules, but
> your recommendation above tells me that it IS possible?
>
> That would be super cool, and by far the best solution out of this mess.
>
> MJ
I'd stop samba services on dc1 and join dc3 afterwards.

L.P.H. van Belle

unread,
Jul 11, 2014, 9:30:02 AM7/11/14
to
here you go.

samba-tool drs replicate
Usage: samba-tool drs replicate <destinationDC> <sourceDC> <NC> [options]


>-----Oorspronkelijk bericht-----
>Van: heu...@merit.unu.edu
>[mailto:samba-...@lists.samba.org] Namens mourik jan
>heupink - merit
>Verzonden: vrijdag 11 juli 2014 15:16
>Aan: Andrew Bartlett
>CC: sa...@lists.samba.org
>Onderwerp: Re: [Samba] samba4 replication issues | sam.ldb
>inconsistency
>

Achim Gottinger

unread,
Jul 11, 2014, 9:30:04 AM7/11/14
to
Am 11.07.2014 15:21, schrieb Achim Gottinger:
> Am 11.07.2014 15:16, schrieb mourik jan heupink - merit:
>> Hi Andrew, list,
>>
>>>> And I guess this is NOT possible:
>>>> option 3 - Install a new third DC3, and replicate that new DC3 with my
>>>> (probably healthy) DC2, WITHOUT doing scary things like transferring
>>>> fsmo roles first?
>>>
>>> I think option 3 is the best option. Get that working, as you don't
>>> loose anything by taking this option.
>> But... I tried adding a DC3, and it started replicating with DC1, and
>> of course it failed to replicate the faulty DomainDnsZones. So the
>> join failed, and was reversed.
>>
>> So...how to add a DC3, and make it sync to DC2 instead of DC1? I
>> assumed that this is not possible without transferring fsmo rules,
>> but your recommendation above tells me that it IS possible?
>>
>> That would be super cool, and by far the best solution out of this mess.
>>
>> MJ
> I'd stop samba services on dc1 and join dc3 afterwards.
To cleanup all samba settings after an failed join for example, i us
this on debian.

rm /var/lock/samba/*
rm /var/log/samba/*
rm -r /var/cache/samba/*
rm -r /var/run/samba/*
rm /var/lib/samba/*
rm -r /var/lib/samba/private/*
rm -r /var/lib/samba/sysvol/
rm -r /var/lib/samba/ntp_signd/
rm -r /var/lib/samba/winbindd_privileged/
rm -r /var/lib/samba/spool/*
rm -r /var/lib/samba/print/*

Also move /etc/samba/smb.conf out of the way before tying to join again.

Andrew Bartlett

unread,
Jul 11, 2014, 5:20:02 PM7/11/14
to
On Fri, 2014-07-11 at 15:16 +0200, mourik jan heupink - merit wrote:
> Hi Andrew, list,
>
> >> And I guess this is NOT possible:
> >> option 3 - Install a new third DC3, and replicate that new DC3 with my
> >> (probably healthy) DC2, WITHOUT doing scary things like transferring
> >> fsmo roles first?
> >
> > I think option 3 is the best option. Get that working, as you don't
> > loose anything by taking this option.
> But... I tried adding a DC3, and it started replicating with DC1, and of
> course it failed to replicate the faulty DomainDnsZones. So the join
> failed, and was reversed.
>
> So...how to add a DC3, and make it sync to DC2 instead of DC1? I assumed
> that this is not possible without transferring fsmo rules, but your
> recommendation above tells me that it IS possible?

It certainly is. you can join to any DC regardless of roles. You should
be able to point it to DC2 with --server=DC2

> That would be super cool, and by far the best solution out of this mess.

Yeah :-)

Andrew Bartlett

--
Andrew Bartlett http://samba.org/~abartlet/
Authentication Developer, Samba Team http://samba.org
Samba Developer, Catalyst IT http://catalyst.net.nz/services/samba


mourik jan heupink - merit

unread,
Jul 13, 2014, 7:50:02 AM7/13/14
to
Hi Andrew, list,

> It certainly is. you can join to any DC regardless of roles. You should
> be able to point it to DC2 with --server=DC2
>
>> That would be super cool, and by far the best solution out of this mess.
>
> Yeah :-)
>

Haha, Yeah :-)


Anyway: it's running as I write this. Since the DC=DomainDnsZones is
(still) so big, it is taking ages to sync, but we're progressing nicely.
Currently at 72361/183094, so it'll be buzy for a while.

With --server=DC2 it has already come much much further than before, and
all in all this is looking very promising.

Thanking the list (Andrew, Achim, Louis, Daniel) for the invaluable help
over the last week, and wishing you all a beautiful (remainder of the)
weekend!

Mourik Jan

mourik jan heupink - merit

unread,
Jul 15, 2014, 12:10:02 PM7/15/14
to
Hi all,

Despite my first optimism, it seems we're not out of the woods just yet...

> It certainly is. you can join to any DC regardless of roles. You should
> be able to point it to DC2 with --server=DC2
>

I managed to install a new DC3, with --server=DC2:

samba-tool domain join samba.company.com DC -Uadministrator
--realm=samba.company.com --server=DC2

This completes successfully, no errors. However, when I start my DC3, I
receive:
[2014/07/15 17:35:44.891271, 0]
../lib/util/util_runcmd.c:317(samba_runcmd_io_handler)
/usr/sbin/samba_dnsupdate: update failed: SERVFAIL
and
[2014/07/15 17:41:08.790679, 0]
../source4/dsdb/repl/drepl_out_helpers.c:840(dreplsrv_update_refs_done)
UpdateRefs failed with WERR_DS_DRA_ACCESS_DENIED/NT code 0xc0002105
for 9a3d9130-45f3-43b6-bbf4-189c19764bd5._msdcs.samba.company.com
CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com
[2014/07/15 17:41:08.815799, 0]
../source4/dsdb/repl/drepl_ridalloc.c:43(drepl_new_rid_pool_callback)
../source4/dsdb/repl/drepl_ridalloc.c:43: RID Manager failed RID
allocation - WERR_DS_DRA_INTERNAL_ERROR - extended_ret[0x0]

Checking dns on my DC2 I learned that dc3.samba.company.com did not
resolve correctly, so I did on DC2:

samba-tool dns add ip.address.dc2 samba.company.com DC3 A ip.address.dc3
-Uadministrator

and now dc3.samba.company.com does resolve correctly. However:
restarting samba things still don't work:

[2014/07/15 17:42:35.027090, 0]
../lib/util/util_runcmd.c:317(samba_runcmd_io_handler)
/usr/sbin/samba_dnsupdate: ; TSIG error with server: tsig verify failure
[2014/07/15 17:42:35.027250, 0]
../lib/util/util_runcmd.c:317(samba_runcmd_io_handler)
/usr/sbin/samba_dnsupdate: update failed: SERVFAIL
[2014/07/15 17:42:38.642366, 0]
../source4/dsdb/repl/drepl_out_helpers.c:840(dreplsrv_update_refs_done)
UpdateRefs failed with WERR_DS_DRA_ACCESS_DENIED/NT code 0xc0002105
for 9a3d9130-45f3-43b6-bbf4-189c19764bd5._msdcs.samba.company.com
DC=ForestDnsZones,DC=samba,DC=company,DC=com
[2014/07/15 17:42:38.816639, 0]
../source4/dsdb/repl/drepl_out_helpers.c:840(dreplsrv_update_refs_done)
UpdateRefs failed with WERR_DS_DRA_ACCESS_DENIED/NT code 0xc0002105
for 9a3d9130-45f3-43b6-bbf4-189c19764bd5._msdcs.samba.company.com
DC=samba,DC=company,DC=com
[2014/07/15 17:42:38.960894, 0]
../source4/dsdb/repl/drepl_out_helpers.c:840(dreplsrv_update_refs_done)
UpdateRefs failed with WERR_DS_DRA_ACCESS_DENIED/NT code 0xc0002105
for 9a3d9130-45f3-43b6-bbf4-189c19764bd5._msdcs.samba.company.com
CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com
[2014/07/15 17:42:39.068958, 0]
../source4/dsdb/repl/drepl_out_helpers.c:840(dreplsrv_update_refs_done)
UpdateRefs failed with WERR_DS_DRA_ACCESS_DENIED/NT code 0xc0002105
for 9a3d9130-45f3-43b6-bbf4-189c19764bd5._msdcs.samba.company.com
CN=Configuration,DC=samba,DC=company,DC=com
[2014/07/15 17:43:06.580263, 0]
../source4/dsdb/repl/drepl_out_helpers.c:840(dreplsrv_update_refs_done)
UpdateRefs failed with WERR_DS_DRA_ACCESS_DENIED/NT code 0xc0002105
for 9a3d9130-45f3-43b6-bbf4-189c19764bd5._msdcs.samba.company.com
DC=ForestDnsZones,DC=samba,DC=company,DC=com
[2014/07/15 17:43:06.798779, 0]
../source4/dsdb/repl/drepl_out_helpers.c:840(dreplsrv_update_refs_done)
UpdateRefs failed with WERR_DS_DRA_ACCESS_DENIED/NT code 0xc0002105
for 9a3d9130-45f3-43b6-bbf4-189c19764bd5._msdcs.samba.company.com
CN=Configuration,DC=samba,DC=company,DC=com
[2014/07/15 17:43:07.113991, 0]
../source4/dsdb/repl/drepl_out_helpers.c:840(dreplsrv_update_refs_done)
UpdateRefs failed with WERR_DS_DRA_ACCESS_DENIED/NT code 0xc0002105
for 9a3d9130-45f3-43b6-bbf4-189c19764bd5._msdcs.samba.company.com
DC=samba,DC=company,DC=com
[2014/07/15 17:43:07.372502, 0]
../source4/dsdb/repl/drepl_out_helpers.c:840(dreplsrv_update_refs_done)
UpdateRefs failed with WERR_DS_DRA_ACCESS_DENIED/NT code 0xc0002105
for 9a3d9130-45f3-43b6-bbf4-189c19764bd5._msdcs.samba.company.com
CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com
[2014/07/15 17:43:07.390439, 0]
../source4/dsdb/repl/drepl_ridalloc.c:43(drepl_new_rid_pool_callback)
../source4/dsdb/repl/drepl_ridalloc.c:43: RID Manager failed RID
allocation - WERR_DS_DRA_INTERNAL_ERROR - extended_ret[0x0]

So... a lot of access denied, plus an internal error to top things of.

Getting more and more nervous. Any tips how to proceed are again very
welcome..?

mourik jan heupink - merit

unread,
Jul 15, 2014, 12:50:01 PM7/15/14
to
Some more info on the current situation:

On my new DC3, checking replication, it says 0 failures, except for DC1,
on my corrupted DC=DomainDnsZones:

DC=DomainDnsZones,DC=samba,DC=company,DC=com
Default-First-Site-Name\DC1 via RPC
DSA object GUID: 81a27497-bdfb-4977-9874-675bbfba490f
Last attempt @ Tue Jul 15 18:18:10 2014 CEST failed,
result 8442 (WERR_DS_DRA_INTERNAL_ERROR)
10 consecutive failure(s).
Last success @ NTTIME(0)

Since this is my corrupted DC1, I guess this is to be expected.
Replication from DC2 seems fine, 0 failures.

The majority of errors starting my new DC3 seems to be:
samba_dnsupdate: update failed: SERVFAIL

Taking tips the list, I tried:
samba_dnsupdate --verbose
(it's full output is here: http://pastebin.com/H4EYkxnA)

This command gives the following errors:

Failed to find matching DNS entry A samba.company.com 192.87.x.y

Failed to find matching DNS entry SRV _kpasswd._tcp.samba.company.com
dc3.samba.company.com 464

Failed to find matching DNS entry SRV _kpasswd._udp.samba.company.com
dc3.samba.company.com 464

Failed to find matching DNS entry SRV _kerberos._tcp.samba.company.com
dc3.samba.company.com 88

Failed to find matching DNS entry SRV
_kerberos._tcp.default-first-site-name._sites.samba.company.com
dc3.samba.company.com 88

Failed to find matching DNS entry SRV _kerberos._udp.samba.company.com
dc3.samba.company.com 88

Failed to find matching DNS entry SRV _gc._tcp.samba.company.com
dc3.samba.company.com 3268

; TSIG error with server: tsig verify failure
update failed: SERVFAIL
Failed nsupdate: 2
Calling nsupdate for SRV _kpasswd._tcp.samba.company.com
dc3.samba.company.com 464
Outgoing update query:
;; ->>HEADER<<- opcode: UPDATE, status: NOERROR, id: 0
;; flags:; ZONE: 0, PREREQ: 0, UPDATE: 0, ADDITIONAL: 0
;; UPDATE SECTION:
_kpasswd._tcp.samba.company.com. 900 IN SRV 0 100 464
dc3.samba.company.com.

; TSIG error with server: tsig verify failure
update failed: SERVFAIL
Failed nsupdate: 2
Calling nsupdate for SRV _kpasswd._udp.samba.company.com
dc3.samba.company.com 464
Outgoing update query:
;; ->>HEADER<<- opcode: UPDATE, status: NOERROR, id: 0
;; flags:; ZONE: 0, PREREQ: 0, UPDATE: 0, ADDITIONAL: 0
;; UPDATE SECTION:
_kpasswd._udp.samba.company.com. 900 IN SRV 0 100 464
dc3.samba.company.com.

; TSIG error with server: tsig verify failure
update failed: SERVFAIL
Failed nsupdate: 2
Failed update of 10 entries
root@dc3:/var/log/samba# samba_dnsupdate --verbose | less
Failed to find matching DNS entry SRV _kerberos._tcp.samba.company.com
dc3.samba.company.com 88
Looking for DNS entry SRV _kerberos._tcp.dc._msdcs.samba.company.com
dc3.samba.company.com 88 as _kerberos._tcp.dc._msdcs.samba.company.com.

My problem seems to be missing dns entries for my new dc3...? Should I
add all these missing dns names..? Surely that cannot be the way..?

Thanks very much for any help!

mourik jan heupink - merit

unread,
Jul 15, 2014, 1:00:03 PM7/15/14
to
Hi,

A thing that just occured to me:

All fsmo roles are still owned by my DC1, including the
DomainNamingMasterRole.

This probably means that my new DC3 tries to register dns stuff to DC1,
right? As this is the DC with the corrupt DC=DomainDnsZones, it's likely
to fail.

I should probably now transfer the fsmo role DomainNamingMasterRole to
my DC2, and then attempt to restart DC3.

Does the list agree with this?

Sorry for asking all these questions, but this is the first time I'm in
trouble with samba4/AD...

Thanks!

Marc Muehlfeld

unread,
Jul 15, 2014, 2:20:03 PM7/15/14
to
Hello Mourik,

Am 15.07.2014 18:48, schrieb mourik jan heupink - merit:
> A thing that just occured to me:
>
> All fsmo roles are still owned by my DC1, including the
> DomainNamingMasterRole.
>
> This probably means that my new DC3 tries to register dns stuff to DC1,
> right? As this is the DC with the corrupt DC=DomainDnsZones, it's likely
> to fail.

The Domain Naming Master role is among others responsible for the
uniqueness of domain and subdomain names and DC names in an AD forest.
It's not DNS stuff.
http://msdn.microsoft.com/en-us/library/cc223750.aspx




> I should probably now transfer the fsmo role DomainNamingMasterRole to
> my DC2, and then attempt to restart DC3.
>
> Does the list agree with this?

If you replication isn't working any more - and you can't get it fixed -
you should shutdown your DC1 and seize the roles on your remaining DCs:
https://wiki.samba.org/index.php/Flexible_Single-Master_Operations_%28FSMO%29_roles#Seizing_a_FSMO_role

But you should make sure, that DC1 doesn't come back, because the five
roles must not exist twice in your domain/forest (depending on the role).




Regards,
Marc

heupink, mourik jan c

unread,
Jul 15, 2014, 3:20:02 PM7/15/14
to
Hi Marc, list,

> If you replication isn't working any more - and you can't get it fixed -
> you should shutdown your DC1 and seize the roles on your remaining DCs:
> https://wiki.samba.org/index.php/Flexible_Single-Master_Operations_%28FSMO%29_roles#Seizing_a_FSMO_role
Should I try to transfer roles first?

> But you should make sure, that DC1 doesn't come back, because the five
> roles must not exist twice in your domain/forest (depending on the role).
Right. In case the role seize works out, I have to get rid of my current dc1.
But I guess I CAN create a NEW install, call it dc1, and join it as a new domain controller?

(i mean: the name dc1 in our domain is not 'contaminated' or so, it's just this specific samba installation called dc1 that should never appear again?)

Thanks for your reply.
Mourik Jan

Marc Muehlfeld

unread,
Jul 15, 2014, 3:40:02 PM7/15/14
to
Am 15.07.2014 21:14, schrieb heupink, mourik jan c:
>> If you replication isn't working any more - and you can't get it fixed -
>> you should shutdown your DC1 and seize the roles on your remaining DCs:
>> https://wiki.samba.org/index.php/Flexible_Single-Master_Operations_%28FSMO%29_roles#Seizing_a_FSMO_role
> Should I try to transfer roles first?

Try transfering it first.

The roles are basically just values in the AD. Depending on the
remaining working replication, we will see if it works.




>> But you should make sure, that DC1 doesn't come back, because the five
>> roles must not exist twice in your domain/forest (depending on the role).
> Right. In case the role seize works out, I have to get rid of my current dc1.
> But I guess I CAN create a NEW install, call it dc1, and join it as a new domain controller?
>
> (i mean: the name dc1 in our domain is not 'contaminated' or so, it's just this specific samba installation called dc1 that should never appear again?)

You have to demote the broken DC:
https://wiki.samba.org/index.php/Demote_a_Samba_DC

You should try this first, to get old GUIDs, etc. out of your directory.

But demoting with samba-tool only works on the DC, you want to demote
currently. Demoting foreing DCs doesn't work (see the linked bug report).

If the demote was successful, you should be able to join the fresh DC1
again.

mourik jan heupink - merit

unread,
Jul 16, 2014, 2:30:02 AM7/16/14
to
Hi Marc, list,


> Try transfering it first.
All roles transferred successfully. Then I shutdown my dc1, and booted
up and logged on to a workstation: "there were no logon servers
available to service your request". This is when I started sweating. :-)

Then I restarted samba on my dc2, and after that I WAS able to logon.
Wew, I never expected to have to restart samba...?

> The roles are basically just values in the AD. Depending on the
> remaining working replication, we will see if it works.
All three dc's show the same fsmo info now, so also the corrupt dc1
knows all roles are on the dc2 now.

I guess in this state, I can leave the dc1 running for a bit? I don't
like doing many big things in a row.

> You have to demote the broken DC:
> https://wiki.samba.org/index.php/Demote_a_Samba_DC
>
> You should try this first, to get old GUIDs, etc. out of your directory.
Yep, I will do that in a few hours, or even tomorrow or so. First see
how this new dc2/dc3 setup behaves.

>
> But demoting with samba-tool only works on the DC, you want to demote
> currently. Demoting foreing DCs doesn't work (see the linked bug report).
>
> If the demote was successful, you should be able to join the fresh DC1
> again.
Right, I'll reinstall the dc1, but just out of curiosity: if I removed
all files below /var/lib/samba, I would basically already have a fresh
installation, right?

Thanks very much for your kind assistance through all this mess.

Mourik Jan

mourik jan heupink - merit

unread,
Jul 16, 2014, 2:50:01 AM7/16/14
to
Hi Marc, list,

On (the new) DC3, I still have some dnsupdate errors showing up:

[2014/07/16 08:39:34.378816, 0]
../lib/util/util_runcmd.c:317(samba_runcmd_io_handler)
/usr/sbin/samba_dnsupdate: ; TSIG error with server: tsig verify failure

How serious are those, and how to get rid of them?

I restarted DC3, and right after that it occured 10 times, and now it is
running silently.

MJ

Marc Muehlfeld

unread,
Jul 16, 2014, 3:30:02 AM7/16/14
to
Am 16.07.2014 08:28, schrieb mourik jan heupink - merit:
> Right, I'll reinstall the dc1, but just out of curiosity: if I removed
> all files below /var/lib/samba, I would basically already have a fresh
> installation, right?

If this is where all your Samba databases are, then yes.

mourik jan heupink - merit

unread,
Jul 16, 2014, 3:30:02 AM7/16/14
to
Ah I should search more before posting....

In this post: http://marc.info/?l=samba&m=138748499227175&w=2
Günter kukkukk explains:

> That output
> ; TSIG error with server: tsig verify failure
> is usually only seen when the internal DNS server is running.
> It's a glitch, which can be ignored atm (all dyn. updates are done OK).

So I guessing now that we are completely healthy again. :-)

Thanking the list very much for it's kind assistance, i guess this
thread can be closed now!! :-)

Best regards,
Mourik Jan
0 new messages