Unfencing problem

438 views
Skip to first unread message

sekbla...@googlemail.com

unread,
Aug 29, 2014, 12:13:11 PM8/29/14
to esos-...@googlegroups.com
Hi again,

hope someone is able to help me understand what's going on.
My target is to build a secure HA-ISCSI-Cluster, and as i am new to this area, i did some testings.

So far i build a Master/Slave system with LVM on top of DRBD, and configured Pacemaker to switch the resources.
Well, it works the first time, but not the second, because the resource gets fenced (ok) but the unfencing does not seem to work.
I have the following handlers defined in /etc/drbd.d/global_common.conf:

fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";


as far as i understand the principles behind it, the crm-fence-peer.sh is the one who fences the other peer, and as soon the second node gets ready, it calls crm-unfence-peer.sh
And guess what - fencing works! At last i see this in the CIB:

location drbd-fence-by-handler-r0-ms_drbd ms_drbd \
rule $id="drbd-fence-by-handler-r0-rule-ms_drbd" $role="Master" -inf: #uname ne node1


Ok, so far so good. Problem is: The constraint never get removed when the other node returns, but node 2 show the following interesting line in /var/log/messages:

Aug 29 15:56:32
node2 kernel: [10505.350562] block drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
Aug 29 15:56:32
node2 kernel: [10505.350567] block drbd0: updated UUIDs 1D4783F2C11DE4D6:0000000000000000:E772F416B3528F88:E771F416B3528F89
Aug 29 15:56:32
node2 kernel: [10505.350571] block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
Aug 29 15:56:32
node2 kernel: [10505.373051] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0
Aug 29 15:56:32
node2 kernel: [10505.388159] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)

I guess this command should in fact tell pacemaker to unfence the resource, but somehow this did not work.
I checked "drbdadm dump all" and verified that the unfence entry is there.

Running ESOS r679 with DRBD userland 8.4.4

Any hints for me?

-Patrick


Marc Smith

unread,
Aug 29, 2014, 2:49:11 PM8/29/14
to esos-...@googlegroups.com
Yeah, pacemaker is an interesting piece of software... I'm 99% sure that all of the "strangeness" I experience with it (when using it on machines myself) is due to a user error (me) / configuration / setup issues. I've noticed the same behavior you describe and I had always assumed that it was never supposed to un-fence itself. I have always just did a 'crm configure edit' and deleted the constraint rule (the fence) to un-fence the node. I'm not sure this is necessarily an ESOS specific problem -- its likely a configuration/setup issue with the DRBD RA and pacemaker. Not to push it somewhere else, but honestly the pacemaker/drbd forums are probably a better place to find the answer for this question.

I'd be curious to know the resolution (if any).


--Marc


--
You received this message because you are subscribed to the Google Groups "esos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to esos-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

sekbla...@googlemail.com

unread,
Sep 1, 2014, 11:16:45 AM9/1/14
to esos-...@googlegroups.com
Ok, now here i come with the very interesting details:

After a  lot of debugging, i found out that in fact the script is correctly called, but at one point in the script it does not seem to work.
At the beginning of /usr/lib/drbd/crm-fence-peer.sh, you see the following lines:
 
sed_rsc_location_suitable_for_string_compare()
5 {
6         # expected input: exactly one tag per line: "^[[:space:]]*<.*/?>$"
7         sed -ne '
8         # within the rsc_location constraint with that id,
9         /<rsc_location .*\bid="'"$1"'"/, /<\/rsc_location>/ {
10                 /<\/rsc_location>/q # done, if closing tag is found
11                 s/^[[:space:]]*//   # trim spaces
12                 s/ *\bid="[^"]*"//  # remove id tag
13                 # print each attribute on its own line, by
14                 : attr
15                 h # remember the current (tail of the) line
16                 # remove all but the first attribute, and print,
17                 s/^\([^[:space:]]*[[:space:]][^= ]*="[^"]*"\).*$/\1/p
18                 g # then restore the remembered line,
19                 # and remove the first attribute.
20                 s/^\([^[:space:]]*\)[[:space:]][^= ]*="[^"]*"\(.*\)$/\1\2/
21                 # then repeat, until no more attributes are left
22                 t attr
23         }'
| sort
24 }
What the script do is the following:
1) get the output of cibadmin -Ql in the variable cib_xml
2) run the output into the given code above, called like this
have_constraint=$(set +x; echo "$cib_xml" |
                 sed_rsc_location_suitable_for_string_compare
"$id_prefix-$master_id")
The result would be some attributes in have_constraint, filtered out and formatted by SED.
Problem is: even if the string in "$id_prefix-$master_id" is correct, the result set is always empty.

So i have one little question: Does the used SED in BusyBox, that is used in ESOS, behave different that in other distributions?
Maybe different enough to break the functionality here?

Marc Smith

unread,
Sep 1, 2014, 12:34:15 PM9/1/14
to esos-...@googlegroups.com
Yes, the Busybox version of sed is used in ESOS, and yes, a lot of times the built-in BB tools behave differently then the originals. A number of Busybox built-in tools have been disabled and replaced with the "real" utilities already in ESOS.

A good test would be copying a sed binary from a local Linux machine (eg, Fedora, Ubuntu, etc.) up to an ESOS host and see how the script behaves. (Replace the sym. link /bin/sed with a real binary.) Check that there are no extra libraries (eg, selinux) for the binary you test with using 'ldd'.

Let me know if you need a hand with this. If you confirm it works fine with a real sed, I'll replace the BB version in ESOS.


--Marc




sekbla...@googlemail.com

unread,
Sep 2, 2014, 8:02:10 AM9/2/14
to esos-...@googlegroups.com
I tried to copy a version from Ubuntu 14.04 Desktop edition, but ldd shows a lot of dependencies i was unable to met. Are you able to give me access to a version that's easier to migrate?

-Patrick
Reply all
Reply to author
Forward
0 new messages