Locally Mounted Shared Storage Cluster Not Failing Over

139 views

Skip to first unread message

John Duncan Wannamaker

unread,

Aug 30, 2014, 9:04:45 AM8/30/14

to quadst...@googlegroups.com

Hey guys,

So I've got a question in regards to the best way to handle setting up a HA shared storage cluster. The problem I'm having is that the secondary client (set as Type=Master) will not takeover. Whenever the first node fails, the second node stays in standby for about 30 seconds until the OCFS2 filesystem gives up waiting to sync and then kernel panics both nodes. Does the fail-over work for locally mounted vdisks or is it only for iSCSI based initiators?

Hardware Configuration:

- Old Dell/EMC Clariion AX4-5 Fiber Channel SAN

- Two Dell PowerEdge 1950 servers, 4xCPU and 16GB of RAM

- Each server has two PCI-E cards, a QLogic FC HBA connected to a Cisco MDS switch and an Emulex 10GB CNA card for the storage traffic

- LUNS configured with two 7.5TB RAID5 Volumes presented to both servers, managed by multipathd to be active/standby

- Both 7.5TB LUNs added to the default storage pool in QUADStor as /dev/mapper/mpathc and /dev/mapper/mpathd

The two servers have the following IP configuration:

fileserver1

-eth0: (emulex 10GBit port 1/2) - 10.80.0.4

-eth2: (internal broadcom 1GBit port 1/2) - 10.10.10.1 (crossover cable for QUADStor/CTDB metadata)

-eth3: (internal broadcom 1GBit port 2/2) - 192.168.100.11 (crossover cable for IPMI fencing)

-iDRAC6: 192.168.100.1 (plugged directly into fileserver2/eth3)

fileserver2

-eth0: (emulex 10GBit port 1/2) - 10.80.0.5

-eth2: (internal broadcom 1GBit port 1/2) - 10.10.10.2 (crossover cable for QUADStor/CTDB metadata)

-eth3: (internal broadcom 1GBit port 2/2) - 192.168.100.12 (crossover cable for IPMI fencing)

-iDRAC6: 192.168.100.2 (plugged directly into fileserver1/eth3)

QUADStor Configuration:

fileserver1:

/quadstor/etc/ndcontroller.conf

Controller=10.80.0.4
Node=10.80.0.5
HABind=10.10.10.1
HAPeer=10.10.10.2
Fence=/usr/sbin/fence_idrac -a 192.168.100.2 -l root -p vsphereroot -o reboot

/quadstor/bin/ndconfig output (fileserver1)

        Node Type: Controller
        Controller: 10.80.0.4
           HA Peer: 10.10.10.2
           HA Bind: 10.10.10.1
       Node Status: Controller Inited
         Node Role: Master
       Sync Status: Sync Done
             Nodes: 10.80.0.5

fileserver2:

/quadstor/etc/ndclient.conf

Controller=10.80.0.4
Node=10.80.0.5
Type=Master
HABind=10.10.10.2
HAPeer=10.10.10.1
Fence=/usr/sbin/fence_idrac -a 192.168.100.1 -l root -p vsphereroot -o reboot

/quadstor/bin/ndconfig output (fileserver2)

         Node Type: Client
        Controller: 10.80.0.4
              Node: 10.80.0.5
           HA Peer: 10.10.10.1
           HA Bind: 10.10.10.2
       Node Status: Master Inited
         Node Role: Standby
       Sync Status: Sync Done

VDISK Configuration:

[root@fileserver1 ~]# /quadstor/bin/vdconfig -l
Name          Pool    Serial Number                    Size(GB) LUN   Status
infra-backups Default 6e340bdadcca0d25d25e0e01ed15d0af 10000    2     D C E
cln0          Default 6e38124216698641e6271b4942579d8e 5000     1     D C
cln1          Default 6edc37e2b5294e1e0092067f5a418e74 5000     3     D C
cln2          Default 6e31c9affed295dbfe8c522394571c46 5000     4     D C
shared        Default 6eb1656d67181df7f8def6627d206962 10000    5     D C

Each VDISK is mounted locally on both hosts and formatted with the OCFS2 cluster filesystem. The system does fail to mount these VDISKS at boot time even though I have specified them in the /etc/fstab and /quadstor/etc/fstab.custom as suggested. However, once the QUADStor service has finished the initial ddload tables (380 seconds or so) then I am able to mount the disks on both nodes.

fileserver1 vdisk mounts:

/dev/sdj on /mnt/shared type ocfs2 (rw,_netdev,noatime,data=writeback,commit=30,heartbeat=local)
/dev/sdk on /mnt/cln0 type ocfs2 (rw,_netdev,noatime,data=writeback,commit=30,heartbeat=local)
/dev/sdl on /mnt/cln1 type ocfs2 (rw,_netdev,noatime,data=writeback,commit=30,heartbeat=local)
/dev/sdm on /mnt/cln2 type ocfs2 (rw,_netdev,noatime,data=writeback,commit=30,heartbeat=local)

fileserver2 vdisk mounts:

/dev/sdm on /mnt/shared type ocfs2 (rw,_netdev,noatime,data=writeback,commit=30,heartbeat=local)
/dev/sdj on /mnt/cln0 type ocfs2 (rw,_netdev,noatime,data=writeback,commit=30,heartbeat=local)
/dev/sdk on /mnt/cln1 type ocfs2 (rw,_netdev,noatime,data=writeback,commit=30,heartbeat=local)
/dev/sdl on /mnt/cln2 type ocfs2 (rw,_netdev,noatime,data=writeback,commit=30,heartbeat=local)

/etc/ocfs2/cluster.conf (both nodes)

node:
        ip_port=7777
        ip_address=10.10.10.1
        number=0
        name=fileserver1
        cluster=ocfs2


node:
        ip_port=7777
        ip_address=10.10.10.2
        number=1
        name=fileserver2
        cluster=ocfs2


cluster:
        node_count=2
        name=ocfs2

There are also configurations for Samba/Winbind and CTDB that I have not included here initially but I do not think are related to the issue.

Thanks guys, you rock.

Duncan

QUADStor Support

unread,

Aug 30, 2014, 10:49:28 AM8/30/14

to quadstor-virt

Have the fence commands been tested ?

On fileserver1 the following should reset fileserver2

/usr/sbin/fence_idrac -a 192.168.100.2 -l root -p vsphereroot -o reboot

And on fileserver2 the following should reset fileserver1

/usr/sbin/fence_idrac -a 192.168.100.1 -l root -p vsphereroot -o reboot

If they are working please send us the syslog from both machines to
sup...@quadstor.com

> --
> You received this message because you are subscribed to the Google Groups
> "QUADStor Storage Virtualization" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to quadstor-vir...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
Storage Virtualization Features
http://www.quadstor.com/storage-virtualization.html
Offloaded Data Transfers (ODX) Introduction
http://www.quadstor.com/offloaded-data-transfers-odx.html
vStorage APIs for Array Integration (VAAI)
http://www.quadstor.com/vstorage-apis-for-array-integration-vaai.html
Documentation http://www.quadstor.com/storage-virtualization-documentation.html

John Duncan Wannamaker

unread,

Aug 30, 2014, 4:00:30 PM8/30/14

to quadst...@googlegroups.com

Hello and thanks for the response. I have tested the fence commands on both nodes, the output looks like this:

[root@fileserver1 ~]# /usr/sbin/fence_idrac -a 192.168.100.2 -l root -p vsphereroot -o reboot
Rebooting machine @ IPMI:192.168.100.2...Done
[root@fileserver1 ~]#

I have verified that this command issues a hardware reset on the physical hosts. And when I run this reset on fileserver1 the locally mounted quadstor vdisks on fileserver2 time out.

Here is the output from /var/log/messages on fileserver2 right after fileserver1 is fenced: (hardware reset switch)

Aug 30 15:15:30 fileserver2 kernel: WARN: node_msg_wait:48 msg timedout ticks 2956155 msg timestamp 2932155 cmd 10 msg_id 7521 xchg id 0 timo 24000
Aug 30 15:15:30 fileserver2 kernel: WARN: node_msg_wait:48 msg timedout ticks 2956160 msg timestamp 2932160 cmd 10 msg_id 7522 xchg id 0 timo 24000
Aug 30 15:15:30 fileserver2 kernel: WARN: node_msg_wait:48 msg timedout ticks 2956164 msg timestamp 2932164 cmd 10 msg_id 7523 xchg id 0 timo 24000
Aug 30 15:15:30 fileserver2 kernel: WARN: node_msg_wait:48 msg timedout ticks 2956168 msg timestamp 2932168 cmd 10 msg_id 7524 xchg id 0 timo 24000
Aug 30 15:15:35 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 has been idle for 30.60 secs, shutting it down.
Aug 30 15:15:35 fileserver2 kernel: o2net: No longer connected to node fileserver1 (num 0) at 10.10.10.1:7777
Aug 30 15:15:35 fileserver2 kernel: (kworker/u:3,243,0):dlm_do_master_request:1332 ERROR: link to 0 went down!
Aug 30 15:15:35 fileserver2 kernel: (kworker/u:3,243,0):dlm_get_lock_resource:917 ERROR: status = -112
Aug 30 15:15:36 fileserver2 kernel: bnx2 0000:03:00.0: eth2: NIC Copper Link is Down
Aug 30 15:15:38 fileserver2 kernel: bnx2 0000:03:00.0: eth2: NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON
Aug 30 15:15:40 fileserver2 smbd[10901]: [2014/08/30 15:15:40.797359,  0] lib/ctdbd_conn.c:624(ctdb_handle_message)
Aug 30 15:15:40 fileserver2 smbd[10901]:   Got cluster reconfigure message
Aug 30 15:15:41 fileserver2 ntpd[4292]: Listen normally on 13 eth0 10.80.0.7 UDP 123
Aug 30 15:15:41 fileserver2 ntpd[4292]: peers refreshed
Aug 30 15:15:45 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:15:48 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:15:51 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:15:54 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:15:57 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:00 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:03 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:05 fileserver2 kernel: o2net: No connection established with node 0 after 30.0 seconds, giving up.
Aug 30 15:16:06 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:10 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:13 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:16 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:19 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:22 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:25 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:28 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:31 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:34 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:35 fileserver2 kernel: o2net: No connection established with node 0 after 30.0 seconds, giving up.
Aug 30 15:16:37 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:41 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:44 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:47 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:50 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:53 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:54 fileserver2 kernel: WARN: node_msg_wait:48 msg timedout ticks 3040155 msg timestamp 3016155 cmd 10 msg_id 7525 xchg id 0 timo 24000
Aug 30 15:16:54 fileserver2 kernel: WARN: node_msg_wait:48 msg timedout ticks 3040160 msg timestamp 3016160 cmd 10 msg_id 7526 xchg id 0 timo 24000
Aug 30 15:16:54 fileserver2 kernel: WARN: node_msg_wait:48 msg timedout ticks 3040164 msg timestamp 3016164 cmd 10 msg_id 7527 xchg id 0 timo 24000
Aug 30 15:16:54 fileserver2 kernel: WARN: node_msg_wait:48 msg timedout ticks 3040168 msg timestamp 3016168 cmd 10 msg_id 7528 xchg id 0 timo 24000
Aug 30 15:16:56 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
Aug 30 15:16:59 fileserver2 kernel: o2net: Connection to node fileserver1 (num 0) at 10.10.10.1:7777 shutdown, state 7
# machine reboots here, boot-up logs start 6 minutes later

The secondary node status shows Sync Error within 3 seconds:

[root@fileserver2 ~]# /quadstor/bin/ndconfig


         Node Type: Client
        Controller: 10.80.0.4
              Node: 10.80.0.5
           HA Peer: 10.10.10.1
           HA Bind: 10.10.10.2
       Node Status: Master Inited
         Node Role: Standby


       Sync Status: Sync Error

If fileserver2 tries has any active vdisk IO, they time out after 2 minutes and the system will kernel panic and reboot. If there are no open files, then once the quadstor service is finished starting back up on the master, the Sync Status on the client node (fileserver2) will continue to say Sync Error until I stop and restart the quadstor service on fileserver2. Then the status will go back to "Sync Done" within about 10 seconds.

By the way, if the controller node is down, the /quadstor/bin/ndconfig command returns nothing.

I've also noticed that quadstor on the controller loads ~ 11 out of 16GB of hash tables into memory, where this does not seem to happen at all on the client/master node. The client node only has about 2 out of 16GB utilized when running.

QUADStor Support

unread,

Aug 31, 2014, 2:42:20 AM8/31/14

to quadstor-virt

We would need the full syslogs. Also you could run diagnostics and
send it to us at sup...@quadstor.com

To get the diagnostics on the controller (HTML UI -> System -> Run
Diagnostics -> Submit)

On the client node you could run
/quadstor/bin/rundiag

On both systems the quadstor service needs to be running for
generating the diagnostics. More comments below.

On Sun, Aug 31, 2014 at 1:30 AM, John Duncan Wannamaker
<jdwa...@gmail.com> wrote:
> Hello and thanks for the response. I have tested the fence commands on both
> nodes, the output looks like this:
>
> [root@fileserver1 ~]# /usr/sbin/fence_idrac -a 192.168.100.2 -l root -p
> vsphereroot -o reboot
> Rebooting machine @ IPMI:192.168.100.2...Done
> [root@fileserver1 ~]#
>
> I have verified that this command issues a hardware reset on the physical
> hosts. And when I run this reset on fileserver1 the locally mounted
> quadstor vdisks on fileserver2 time out.

>tion to node fileserver1
> (num 0) at 10.10.10.1:7777 shutdown, state 7

<snip>

> # machine reboots here, boot-up logs start 6 minutes later
>
> The secondary node status shows Sync Error within 3 seconds:
>

At this point the client node would have taken over to manage the
cluster. It seems that it failing.

> [root@fileserver2 ~]# /quadstor/bin/ndconfig
> Node Type: Client
> Controller: 10.80.0.4
> Node: 10.80.0.5
> HA Peer: 10.10.10.1
> HA Bind: 10.10.10.2
> Node Status: Master Inited
> Node Role: Standby
> Sync Status: Sync Error

> If fileserver2 tries has any active vdisk IO, they time out after 2 minutes
> and the system will kernel panic and reboot. If there are no open files,
> then once the quadstor service is finished starting back up on the master,
> the Sync Status on the client node (fileserver2) will continue to say Sync
> Error until I stop and restart the quadstor service on fileserver2. Then
> the status will go back to "Sync Done" within about 10 seconds.

Ideally when the controller is back it should wait for the controller
to intialize and then do any necessary syncs. This however could be
because in this case the client node should have been the owner of the
cluster and should hand this over back to the controller. But the
originally take over failed.

> By the way, if the controller node is down, the /quadstor/bin/ndconfig
> command returns nothing.

If the client node cannot connect to the controller and needs the
controller to intialize (disks, vdisk etc) then nothing is know n to
it yet.

> I've also noticed that quadstor on the controller loads ~ 11 out of 16GB of
> hash tables into memory, where this does not seem to happen at all on the
> client/master node. The client node only has about 2 out of 16GB utilized
> when running.

Memory usage will be different, but we will check if there can be such
a different.

Reply all

Reply to author

Forward

0 new messages