[Lustre-discuss] Active-Active failover configuration

511 views
Skip to first unread message

x...@xgl.pereslavl.ru

unread,
Apr 20, 2010, 8:42:34 AM4/20/10
to lustre-...@lists.lustre.org
Greetings!

Sorry for troubling you by such question but I cannot find example in documentation and meet some problems.

I want to use Active/Active failover configuration. (Where can I find some examples?)

I have 2 nodes - s1 and s2 used as OSS'es
I also have 3 block devices
(MDS) - 1Tb used as MDS|MDT
(OST0) 8Tb used as OST1
(OST1) 8Tb used as OST2
All devices available from both OSS's.

OST0 mounted on s1
MDS and OST1 mounted on s2 in normal state.

How can I configure system such way, that if one os OSS'es (s2, as an example), fails out, second OSS (s1) take control of all resources?

I have heartbeat installed and configured.
[root@s2 ~]# cat /etc/ha.d/haresources
s1 Filesystem::/dev/disk/b801::/mnt/ost0::lustre
s2 Filesystem::/dev/disk/b800::/mnt/mdt::lustre Filesystem::/dev/disk/8800::/mnt/ost1::lustre

I configure system;
On s2 I format and mount MDT and OST1
mkfs.lustre --reformat --fsname=lustre --mgs --mdt /dev/disk/by-id/b800
mount -t lustre /dev/disk/b800 /mnt/mdt/
mkfs.lustre --reformat --ost --fsname=lustre --mgsnode=192.168.11.12@o2ib /dev/disk/8800
mount -t lustre /dev/disk/8800 /mnt/ost1

On s1 I format and mount OST0
mkfs.lustre --reformat --ost --fsname=lustre --mgsnode=192.168.11.12@o2ib /dev/disk/b801
mount -t lustre /dev/disk/b801 /mnt/ost0

service heartbeat up and running on both nodes.

Where have I add some parameters to have lustre up and running if s2 going down? Or where can I find some examples?
How can s1 takeover MDS (/mnt/mdt) and OST1 (/mnt/ost1) that usually mounted on s2?

Thanks,
Katya
_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

--
You received this message because you are subscribed to the Google Groups "lustre-discuss-list" group.
To post to this group, send email to lustre-di...@googlegroups.com.
To unsubscribe from this group, send email to lustre-discuss-...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lustre-discuss-list?hl=en.

sheila....@oracle.com

unread,
Apr 20, 2010, 11:29:32 AM4/20/10
to x...@xgl.pereslavl.ru, lustre-...@lists.lustre.org
Section 8.3.2.2 (Configuring Heartbeat) includes a worked example to configure OST failover (active/active):

http://wiki.lustre.org/manual/LustreManual18_HTML/Failover.html#50598002_pgfId-1295199
--
Oracle
Sheila Barthel | Documentation Lead
Phone: +1 3035622468
Oracle Lustre Group

Green Oracle Oracle is committed to developing practices and products that help protect the environment

x...@xgl.pereslavl.ru

unread,
Apr 21, 2010, 6:36:57 AM4/21/10
to bar...@oracle.com, lustre-...@lists.lustre.org, x...@xgl.pereslavl.ru
Thank you for your answer!

Unfortunately, I have read this manual and meet some problems.

I've configured heartbeat, have defined resources controlled by Heartbeat and haven't found any error in HA logs.
I've got 3 shared resources controlled by heartbeat, 2 OSTs and 1 MDS (described in previous message).

When I use "hb_takeover all" utilite on OSS1 (s1 in previous message) to takeover control over OSS2 resources - OST1 and MDS (OST1 and MDT mounted on OSS2 (s2) in standard configuration)) it takes control and I saw all resources mounted on one OSS1; But I cannot use lustre filesystem, can't mount it on a client.

On active OSS1 I can see in dmesg that all OSTs try to connect to mds using old (standard) address of OST2; but MDS moved to OSS1.

What have I missing?
May be I have to specify some keys when formatting MDSs/OSTs to let them work correcly in case of switching resources to another OSS node? How to do it?

__________
Thanks,
Katya

Katya Tutlyaeva

unread,
Apr 21, 2010, 11:56:13 PM4/21/10
to sheila....@oracle.com, lustre-...@lists.lustre.org
Thank you, I have found the answers.


___________
Thanks,
Katya
_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

--

Katya Tutlyaeva

unread,
Apr 22, 2010, 7:23:49 AM4/22/10
to lustre-...@lists.lustre.org
Hi all,
I'm trying to test Lustre using loadgen, but got sergmentation fault error:
*
*I have succesfully added obdecho.ko on both OSS previously

[lustre]# loadgen
loadgen> dev lustre-OST0000-osc
192.168.11.12@o2ib
Added uuid OSS_UUID: 192.168.11.12@o2ib
Target OST name is 'lustre-OST0000-osc'
loadgen> st 3
start 0 to 3
loadgen: running thread #1
Segmentation fault


Meet same error on both OSS-es and client using any number of clients.

What's wrong?

_____________
Thanks,
Katya
_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

--

Andreas Dilger

unread,
Apr 22, 2010, 3:43:40 PM4/22/10
to Katya Tutlyaeva, lustre-...@lists.lustre.org
On 2010-04-22, at 05:23, Katya Tutlyaeva wrote:
> I'm trying to test Lustre using loadgen, but got sergmentation fault error:
> *
> *I have succesfully added obdecho.ko on both OSS previously
>
> [lustre]# loadgen
> loadgen> dev lustre-OST0000-osc
> 192.168.11.12@o2ib
> Added uuid OSS_UUID: 192.168.11.12@o2ib
> Target OST name is 'lustre-OST0000-osc'
> loadgen> st 3
> start 0 to 3
> loadgen: running thread #1
> Segmentation fault
>
>
> Meet same error on both OSS-es and client using any number of clients.

I believe there is a fix for loadgen in bugzilla.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.
Reply all
Reply to author
Forward
0 new messages