i have an issue with failover of the MGS device within my cluster.
i'm building a simple lustre environment; just one lustre file system
(testfs) ...
i have a two node cluster for my MGS/MDT; this is for an active/passive
config with the MGS and MDT on different devices and mounted separately
(not co-locating).
i have a two node cluster for my OSTs, in an active/active config in
that the first OST is on node one and the second OST is on node two.
if i have the above, then heartbeat is happy with the OSTs and the MDT
mountpoints being mounted on either node in the cluster. the MGS however
is not. i get the following message when it tries to mount on the
alternative node:
---8<---
mount.lustre: mount /dev/sdb at /lustre/testfs/mgs failed: Invalid argument
This may have multiple causes.
Are the mount options correct?
Check the syslog for more info.
---8<---
i've noticed that if i consolidate and have my MGS and MDT on the same
device/mountpoint on the MDS cluster nodes, all is well and the file
system mounts on the alternative node perfectly.
any ideas?
i have ensured i created the file systems with --failnode and --mgsnode=
for each MDS server, but no joy.
i can see in a previous post to lustre-discuss someone having a similar,
if not the same issue:
http://lists.lustre.org/pipermail/lustre-discuss/2008-September/008634.html
cheers
_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
Kevin
Hi,
> i have an issue with failover of the MGS device within my cluster.
>
> i'm building a simple lustre environment; just one lustre file system
> (testfs) ...
>
> i have a two node cluster for my MGS/MDT; this is for an active/passive
> config with the MGS and MDT on different devices and mounted separately
> (not co-locating).
>
> i have a two node cluster for my OSTs, in an active/active config in
> that the first OST is on node one and the second OST is on node two.
So you in fact have 4 nodes as your Lustre servers, yes? What is your
shared storage technology? How are the two OSSes accessing the same two
OSTs and how are the two MDSes accessing the single MDT and MGT?
> if i have the above, then heartbeat is happy with the OSTs and the MDT
> mountpoints being mounted on either node in the cluster. the MGS however
> is not. i get the following message when it tries to mount on the
> alternative node:
>
> ---8<---
> mount.lustre: mount /dev/sdb at /lustre/testfs/mgs failed: Invalid argument
> This may have multiple causes.
> Are the mount options correct?
> Check the syslog for more info.
> ---8<---
Is /dev/sdb actually accessible on the alternative node? What does
"cat /proc/partitions" say on that node?
What does dmesg tell you after you try to mount /dev/sdb and it fails?
b.
thanks to you both. i had a mkfs.lustre error, which after had been
pointed out fixed the issue.
> ------------------------------------------------------------------------
Can you share the error here so that future searches of this problem are
complete with a solution?
There's nothing more frustrating than finding the same problem you are
having in a mailing list archive and with no solution. Actually there
is something more frustrating and that's that the problem was solved but
no details on how.
Thanx,
b.
I gave Neil the correct mkfs commands:
mkfs.lustre --reformat --failnode=192.168.123.21@tcp0 --mgs /dev/sdb
mkfs.lustre --reformat --fsname bananafs --failnode=192.168.123.21@tcp0
--mgsnode=192.168.123.20@tcp0 --mgsnode=192.168.123.21@tcp0 --mdt /dev/sdc
Kevin
Brian,
Here are the original, incorrect, mkfs commands:
mkfs.lustre --reformat --fsname bananafs --failnode lustremds2 --mgs
/dev/sdb
mkfs.lustre --reformat --fsname bananafs --failnode lustremds2
--mgsnode=lustremds1 --mdt /dev/sdc
Hi Kevin,
> Here are the original, incorrect, mkfs commands:
>
> mkfs.lustre --reformat --fsname bananafs --failnode lustremds2 --mgs
> /dev/sdb
> mkfs.lustre --reformat --fsname bananafs --failnode lustremds2
> --mgsnode=lustremds1 --mdt /dev/sdc
So to be clear, was his failure that he only specified the one --mgsnode
or that his hostname specifications did not resolve properly to the IP
addresses he used in his subsequently, working commands? Or both?
b.
actually, the original cmds i ran did have the = signs for the
--failnode=<nodename> arguments; i gave the wrong bash history info to
kevin to analyse when he asked for them from the host :)
however, having said that, it's the NIDs that were missing from the
hosts that fixed the issues i was having.
cheers
Is --failnode evaluated for the MGS? We seem to do fine without it as any client
requires explicit configuration of the MGS failnode anyway. Or is it possible
to override this configuration with the value set on the MGS?
> mkfs.lustre --reformat --fsname bananafs --failnode=192.168.123.21@tcp0
--mgsnode=192.168.123.20@tcp0 --mgsnode=192.168.123.21@tcp0 --mdt /dev/sdc
Regards,
Daniel.