Stuck Backlink Obituary

TJM

unread,

Aug 15, 2006, 10:32:35 AM8/15/06

to

Sorry about the duplicate post in "administration-tools" I thought I
was in this forum when I posted....

We're running NW 6.5 sp5 and edir 8.7.3.7. We have an issue with a
stuck obituary from an object that was renamed a few weeks ago. On the
Master server the obit has processed, on the 3 r/w servers the obit is
unprocessed with the following info:

Ś(1) Found obituary for: EID: 0000D5EE, DN: CN=ETAC
Helpdesk.OU=acc.O=nccu.T=ACŚ
Ś-Value CTS : 7-27-2006 15:08:26 R = 0002 E = 0008 Ś
Ś-Value MTS = 7-27-2006 15:08:26 R = 0002 E = 0008, Type = 0006
BACKLINK, Ś
Ś-Flags = 0000 Ś
Ś-Backlink: Type = 00000005 NEW_RDN, RemoteID = FFFFFFFF, Ś
Ś ServerID = 0000A8E5, CN=APOLLO.OU=MAIL.O=DESS.T=ACC Ś

The server listed above (apollo) holds the master replica and shows 0
obits. All other obits are processing normally and there no problems
with time or replica synchronization. The TID's I've found such as
10062149 all reference inhibit move obits so I wasn't sure if they
apply. What needs to be done to clean this up?

thanks,

Travis

Andy

unread,

Aug 15, 2006, 1:04:14 PM8/15/06

to

TJM wrote:
> Sorry about the duplicate post in "administration-tools" I thought I
> was in this forum when I posted....
>
> We're running NW 6.5 sp5 and edir 8.7.3.7. We have an issue with a
> stuck obituary from an object that was renamed a few weeks ago. On the
> Master server the obit has processed, on the 3 r/w servers the obit is
> unprocessed with the following info:
>
>

> ¦(1) Found obituary for: EID: 0000D5EE, DN: CN=ETAC
> Helpdesk.OU=acc.O=nccu.T=AC¦
> ¦-Value CTS : 7-27-2006 15:08:26 R = 0002 E = 0008 ¦
> ¦-Value MTS = 7-27-2006 15:08:26 R = 0002 E = 0008, Type = 0006
> BACKLINK, ¦
> ¦-Flags = 0000 ¦
> ¦-Backlink: Type = 00000005 NEW_RDN, RemoteID = FFFFFFFF, ¦
> ¦ ServerID = 0000A8E5, CN=APOLLO.OU=MAIL.O=DESS.T=ACC ¦

>
> The server listed above (apollo) holds the master replica and shows 0
> obits. All other obits are processing normally and there no problems
> with time or replica synchronization. The TID's I've found such as
> 10062149 all reference inhibit move obits so I wasn't sure if they
> apply. What needs to be done to clean this up?
>
> thanks,
>
> Travis

Travis,

I ran into a problem recently where I had a few stuck OBITS in my
tree, and browsing around Novell's TIDs, I did find these:

Processing stuck obituaries in All DS versions: TID 10062149
Removing an orphaned Inhibit Move obituary using NDS iMonitor: TID
10082137

These would have helped me for the most part, but I found that, even
after running several processes, incuding moving the master replica of
that partition to the server holding the stuck obituaries, nothing
would help. A good friend of mine, who happens to be a CDE/CDA, came
over and went through my whole tree with DSBROWSE, and concluded that
the lowest possible container that held the now deleted and defunct
object could be emptied and deleted, and then local and global
DSREPAIRs be run to clear out the stuck obits.

Needless to say, we moved all objects in that OU to a new temp OU and
then removed the container that had the stuck deleted OBITS and poof,
all was better. We then, of course had to rebuild/Rename the temp
container to match the removed one, after DS had replicated clean.

Best of luck.

--Andy

TJM

unread,

Aug 15, 2006, 1:14:48 PM8/15/06

to

Yow! That would be hugely ugly. All told that would included hundreds
of objects including servers, licenses, users, and OUs. I'll wait 'till
I turn over every rock before going that way.

Thanks for the suggestion though, at lease I have a worse case option if
I don't find anything else.

Travis

Jim Henderson

unread,

Aug 15, 2006, 1:41:56 PM8/15/06

to

Unless it's causing -637s (which would mean it's holding up an
INHIBIT_MOVE obit somewhere), this shouldn't cause a problem for you.

What version of eDir is in use?

Jim

--
Jim Henderson, CNA6, CDE, CNI, LPIC-1
Novell Training Services

TJM

unread,

Aug 15, 2006, 2:11:55 PM8/15/06

to

Nope, no -637s that I can see. Where exactly should I be looking?
We're using 87.3.7 on NetWare 6.5 sp5.

Jim Henderson

unread,

Aug 15, 2006, 3:05:15 PM8/15/06

to

On Tue, 15 Aug 2006 18:11:55 +0000, TJM wrote:

> Nope, no -637s that I can see. Where exactly should I be looking?
> We're using 87.3.7 on NetWare 6.5 sp5.

If you run the iMonitor obit report, that should give you a good overview.
Just change the default time to 0 days rather than 7 days.

-637s, if they're going to occur, occur when you're doing
partition/replica operations. It looks like this is probably an orphaned
obit, you could possibly -xk3 the server and then force the
backlinker, but I'd want a second opinion on that before suggesting it as
a course of action. It seems that the servers reporting the obit think
there's a backlink where there isn't, and that's why the obit is stuck.

Given that it's a couple weeks old, the backlinker on itself hasn't
corrected the issue. But with eDir 8.7.3.7, backlinks are used only when
a used_by can't be used, so even then it should have minimal impact on the
environment.

Edward van der Maas

unread,

Aug 15, 2006, 4:52:23 PM8/15/06

to

TJM wrote:

> The server listed above (apollo) holds the master replica and shows 0
> obits. All other obits are processing normally and there no problems
> with time or replica synchronization. The TID's I've found such as
> 10062149 all reference inhibit move obits so I wasn't sure if they
> apply. What needs to be done to clean this up?

As your Master replica doesn't have any obits just blow away the RW
replicas and put them back. This will fix it.

--
Cheers,
Edward

Richard Beels [SysOp]

unread,

Aug 16, 2006, 1:28:17 AM8/16/06

to

<warning>
non-chat message... :-)
</warning>

> But with eDir 8.7.3.7, backlinks are used only when
> a used_by can't be used, so even then it should have minimal impact on the
> environment.

Huh?

--
Cheers!
Richard Beels
~ Network Consultant
~ Sysop, Novell Support Connection
~ MCNE, CNE*, CNA*, CNS*, N*LS

TJM

unread,

Aug 16, 2006, 9:08:01 AM8/16/06

to

Ahhh, that makes sense. I'll schedule some after hours maintenance time
and do exactly that.

Thanks!

TJM

unread,

Aug 16, 2006, 9:07:06 AM8/16/06

to

Thanks!

Edward van der Maas

unread,

Aug 16, 2006, 5:11:31 PM8/16/06

to

TJM wrote:

> Ahhh, that makes sense. I'll schedule some after hours maintenance
> time and do exactly that.

Let us know how you get on.

> Thanks!

You're welcome

--
Cheers,
Edward

Nathan Chattaway

unread,

Aug 17, 2006, 8:34:03 PM8/17/06

to

We have a similar problem. A server went bellyup at the same time as an
admin was moving some user objects out of the OU that the server is in and
held a RW replica of a partition beginning at that OU.

We've pulled the dead server out of the tree, and the replica ring for that
partition is now reporting healthy, but we've got a bunch of backlink
obituary objects for the server, licenses, nss volume objects etc.
And The user objects that were being moved at time of server failure are now
move inhibit obituarys. We've followed the TID for using iMonitor to run the
obituary report and release move entry, but still they just sit there. We
can't delete other RW replicas of the partition where the obits reside, when
we try we get -637 errors.

Is it safe to use the advanced option (deleteobj) in the iMonitor report to
remove the offending move inhibit and backlink obits?

Or should we create a new temp OU and move the remaining user objects over
to it, delete the OU that contains the obits and then wait for the obits to
clear out, then rename the temp OU back to the original name and rebuild the
failed server into this container?

Nate

"Jim Henderson" <Jim.He...@SysOps.NSC> wrote in message
news:pan.2006.08.15....@SysOps.NSC...

Jim Henderson

unread,

Aug 18, 2006, 1:04:33 AM8/18/06

to

On Wed, 16 Aug 2006 05:28:17 +0000, Richard Beels[ SysOp] wrote:

>
> <warning>
> non-chat message... :-)
> </warning>
>
>> But with eDir 8.7.3.7, backlinks are used only when a used_by can't be
>> used, so even then it should have minimal impact on the environment.
>
> Huh?

Starting with 8.7.1 (I believe it was), processing for obits changed to
use DRLs rather than backlinks (as the primary way of handling the obits).
This resolved probably 95% of the stuck obit problems that we were seeing
because this essentially is a means of handing off the obit processing to
the master replica of a ring where a reference exists.

Think in terms of group membership/member associations between users and
groups; if the user exists in partition A, and the group exists in
partition B, then all the servers in the ring for Partition B potentially
hold an external reference for the user in Partition A (the only time it
wouldn't is if both partitions are held on a server). So if the master of
Partition B is told that the extref needs to be deleted, it can tell all
the other servers in the replica ring that the obit is being processed
instead of the server where the operation initiated having to contact all
servers in the replica ring.

"Used By" is a partition reference (ie, "this object is used by objects in
this other partition") rather than a server reference (ie, a backlink
value). This is much more efficient for obit processing because you won't
get hung up on one server losing communications.

There are still cases where backlink obitiuaries have to be referenced -
for example, NSSv1 servers having filesystem rights (which were based on
EID at that point rather than GUID) or maintaining a user object's extref
for the connection table (ie, so MONITOR can show the user object), and in
those types of cases, there isn't a "pair" partition that "Used By" can be
populated with, so we have to fall back on using the old backlink process
(no alternative).

DRLs were implemented in 8.6.x as I recall, but they didn't become the
primary means of processing obits until 8.7.1 (or .3, memory's getting a
little fuzzy on when that change took place).

Jim Henderson

unread,

Aug 18, 2006, 12:57:42 AM8/18/06

to

On Fri, 18 Aug 2006 00:34:03 +0000, Nathan Chattaway wrote:

> We've pulled the dead server out of the tree, and the replica ring for
> that partition is now reporting healthy, but we've got a bunch of backlink
> obituary objects for the server, licenses, nss volume objects etc. And The
> user objects that were being moved at time of server failure are now move
> inhibit obituarys. We've followed the TID for using iMonitor to run the
> obituary report and release move entry, but still they just sit there. We
> can't delete other RW replicas of the partition where the obits reside,
> when we try we get -637 errors.

That's definitely an inhibit move obit that's causing the problem - the
backlink obits being stuck is probably what's holding them up.

> Is it safe to use the advanced option (deleteobj) in the iMonitor report
> to remove the offending move inhibit and backlink obits?

I don't know that I'd try using deleteobj to try to clean this up - you
may want to look at the server recovery article on Cool Solutions that I
wrote to see if the process there will help you (do a search by author for
the article - I don't have the link handy at the moment).

> Or should we create a new temp OU and move the remaining user objects over
> to it, delete the OU that contains the obits and then wait for the obits
> to clear out, then rename the temp OU back to the original name and
> rebuild the failed server into this container?

You won't be able to delete the container until the obit has processed.

Nathan Chattaway

unread,

Aug 18, 2006, 4:13:19 AM8/18/06

to

Thanks Jim,

I've had a look at the CoolSolutions article you recommended:

http://www.novell.com/coolsolutions/feature/15748.html

We have cleared the server from the replica rings (it had RW of two
partitions on it) and all rings are healthy.

It's just those backlink and delete inhibit obituarys. Can you think of
anything else to help us clear these out?

Nate

"Jim Henderson" <Jim.He...@SysOps.NSC> wrote in message

news:pan.2006.08.18....@SysOps.NSC...

TJM

unread,

Aug 18, 2006, 8:27:06 AM8/18/06

to

Thanks Jim. We've got one Partition with 3 replicas so it's not that
complicated. The Stuck Obit is from a group the got renamed. I'm not
going to sweat it for now and will follow Edwards suggestion when I get
some downtime.

Much appreciated.

Travis

Jim Henderson

unread,

Aug 20, 2006, 2:34:44 AM8/20/06

to

On Fri, 18 Aug 2006 08:13:19 +0000, Nathan Chattaway wrote:

> It's just those backlink and delete inhibit obituarys. Can you think of
> anything else to help us clear these out?

"Delete Inhibit" obits? The only ones that should be any sort of concern
are "Move Inhibit" obits - anything else shouldn't prevent you from
completing the recovery.

Jim Henderson

unread,

Aug 20, 2006, 2:35:03 AM8/20/06

to

On Fri, 18 Aug 2006 12:27:06 +0000, TJM wrote:

> Thanks Jim. We've got one Partition with 3 replicas so it's not that
> complicated. The Stuck Obit is from a group the got renamed. I'm not
> going to sweat it for now and will follow Edwards suggestion when I get
> some downtime.

Sounds good, glad to help out. :-)

Nathan Chattaway

unread,

Aug 20, 2006, 7:02:15 PM8/20/06

to

Sorry, Move Inhibit obits, as per my original post, they're what we've got
alright.

Nate

"Jim Henderson" <Jim.He...@SysOps.NSC> wrote in message

news:pan.2006.08.20....@SysOps.NSC...

Jim Henderson

unread,

Aug 20, 2006, 9:03:07 PM8/20/06

to

On Sun, 20 Aug 2006 23:02:15 +0000, Nathan Chattaway wrote:

> Sorry, Move Inhibit obits, as per my original post, they're what we've got
> alright.

No problems, just wanted to make sure something hadn't changed. :-)

Do you have a corresponding MOVED obit for the stuck inhibit move?

Nathan Chattaway

unread,

Aug 20, 2006, 10:24:51 PM8/20/06

to

No, there are no corresponding MOVED obits at all. After establishing this,
I then proceeded to Click on the "Release Move Entry" for both "Inhibit
Move" obits, and these were reported as successful. That was last thursday
evening, and here on monday lunchtime they're still sitting there. They
appear in ConsoleOne as unknown objects in the container which they were
moved to. There is also a third obit which says it's a "Dead(primary),
Inhibit Move" which is also a user object that someone tried to move while
the server was off the air. This third user object fails on the "Release
Move Entry" command.

The users affected have had their user objects created from scratch in the
original (source) container where they were attempted to be moved from, so
that they can login and get on with their work. This may not have been the
best thing to do, but was done by the admin guy who caused the problem by
trying to move these people while the replica ring was broken.

Nate

"Jim Henderson" <Jim.He...@SysOps.NSC> wrote in message

news:pan.2006.08.21....@SysOps.NSC...

Jim Henderson

unread,

Aug 21, 2006, 2:11:42 AM8/21/06

to

On Mon, 21 Aug 2006 02:24:51 +0000, Nathan Chattaway wrote:

> No, there are no corresponding MOVED obits at all. After establishing
> this, I then proceeded to Click on the "Release Move Entry" for both
> "Inhibit Move" obits,

At this point, I'd probably advise that you get someone to dial in and
take a look, then - that option to release the move entries *should* have
resolved the issue here.

That it hasn't means something weird is going on.

Nathan Chattaway

unread,

Aug 21, 2006, 4:14:59 AM8/21/06

to

On your advice, I've logged an support request with Novell. I'll let you
know how things proceed.

Thanks for all assistance thus far.

Nate
"Jim Henderson" <Jim.He...@SysOps.NSC> wrote in message
news:pan.2006.08.21....@SysOps.NSC...

Jim Henderson

unread,

Aug 21, 2006, 12:10:22 PM8/21/06

to

On Mon, 21 Aug 2006 08:14:59 +0000, Nathan Chattaway wrote:

> On your advice, I've logged an support request with Novell. I'll let you
> know how things proceed.

Sounds good.

nchat...@bh.nospam.com.nospam.au

unread,

Aug 23, 2006, 2:46:09 AM8/23/06

to

Jim,

We found an OU object under the container that the crashed server was in,
called "Extend", which wasn't visi8ble from ConsoleOne or NWAdmin, but
could be seen in iManager when walking the tree. We deleted this object,
and suddenly the move inhibit user obits cleared!

However, we now have a pile of dead backlink obits, some flagged 0004,
which are just sitting there. We've followed the TID for removing stuck
obits from any version of eDir, but nothing is working. It looks like the
Extend OU is an obit on many of our servers, even though they have never
contained a replica of the partition that this OU was in.

Any further suggestions? Novell are taking their sweet time in getting back
to us on the incident we opened monday.

Nate

Jim Henderson

unread,

Aug 23, 2006, 12:25:37 PM8/23/06

to

On Wed, 23 Aug 2006 06:46:09 +0000, nchattaway wrote:

> Jim,
>
> We found an OU object under the container that the crashed server was in,
> called "Extend", which wasn't visi8ble from ConsoleOne or NWAdmin, but
> could be seen in iManager when walking the tree. We deleted this object,
> and suddenly the move inhibit user obits cleared!
>
> However, we now have a pile of dead backlink obits, some flagged 0004,
> which are just sitting there. We've followed the TID for removing stuck
> obits from any version of eDir, but nothing is working. It looks like the
> Extend OU is an obit on many of our servers, even though they have never
> contained a replica of the partition that this OU was in.

Stuck dead obits shouldn't cause you any issues - the inhibit move being
gone should be sufficient to get things rolling.

Craig Johnson

unread,

Aug 24, 2006, 12:30:40 AM8/24/06

to

Here's my two cents on clearing stuck obits:

1. Go from least risky to most risky methods.
2. Start by moving the master around the replica ring and use set
dstrace=*j
3. If that doesn't clear all the obits, next run DSREPAIR -OT (then
unattended repair) to retimestamp the obits. Follow that with a set
dstrace=*j
4. So far you have done nothing risky. Now we start playing a bit
more. If the obits are for backlinks, try a DSREPAIR -XK3 (delete
backlinks) followed by set dstrace=*b (rebuild backlinks). Perhaps
that gets rid of more of them.
5. If you should get a replica with no stuck obits, consider making it
the master, and then sending all objects out from that replica (deletes
and rebuilds the replicas on the other servers).
6. If you are still stuck after that, I have to get more specific with
what I see wrong there as to what to do next.

Craig Johnson
Novell Support Connection SysOp

Nathan Chattaway

unread,

Aug 27, 2006, 11:37:02 PM8/27/06

to

Guys,

Thanks for all the assistance. We've now managed to clear up all the stuck
backlink obits!

Kind Regards,

Nathan

"Craig Johnson" <cra...@ix.netcom.com> wrote in message
news:VA.0000368...@ix.netcom.com...

Craig Johnson

unread,

Aug 31, 2006, 1:59:38 AM8/31/06

to

In article <yxtIg.1369$PP....@prv-forum2.provo.novell.com>, Nathan Chattaway
wrote:

> Thanks for all the assistance. We've now managed to clear up all the stuck
> backlink obits!
>

Glad to help!