[Lustre-discuss] Fwd: can't mount our lustre filesystem after tunefs.lustre --writeconf

1,531 views
Skip to first unread message

Stu Midgley

unread,
Mar 17, 2012, 5:10:33 AM3/17/12
to lustre, lustrefs
---------- Forwarded message ----------
From: Stu Midgley <sdm...@gmail.com>
Date: Sat, Mar 17, 2012 at 5:10 PM
Subject: can't mount our lustre filesystem after tunefs.lustre --writeconf
To: wc-di...@whamcloud.com


Afternoon

We have a rather severe problem with our lustre file system.  We had a
full config log and the advice was to rewrite it with a new one.  So,
we unmounted our lustre file system off all clients, unmount all the
ost's and then unmounted the mds.  I then did

mds:
  tunefs.lustre --writeconf --erase-params /dev/md2

oss:
  tunefs.lustre --writeconf --erase-params --mgsnode=mds001 /dev/md2

After the tunefs.lustre on the mds I saw

Mar 17 14:33:02 mds001 kernel: Lustre: MGS MGS started
Mar 17 14:33:02 mds001 kernel: Lustre: MGC172.16.0.251@tcp: Reactivating import
Mar 17 14:33:02 mds001 kernel: Lustre: MGS: Logs for fs p1 were
removed by user request.  All servers must be restarted in order to
regenerate the logs.
Mar 17 14:33:02 mds001 kernel: Lustre: Enabling user_xattr
Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: new disk, initializing
Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: Now serving
p1-MDT0000 on /dev/md2 with recovery enabled

which scared me a little...

the mds and the oss's mount happily BUT I can't mount the file system
on my clients... on the mds I see


Mar 17 16:42:11 mds001 kernel: LustreError: 137-5: UUID
'prod_mds_001_UUID' is not available  for connect (no target)


On the client I see


Mar 17 16:00:06 host kernel: LustreError: 11-0: an error occurred
while communicating with 172.16.0.251@tcp. The mds_connect operation
failed with -19


now, it appears the writeconf renamed the UUID of the mds from
prod_mds_001_UUID to p1-MDT0000_UUID but I can't work out how to get
it back...


for example I tried


# tunefs.lustre --mgs --mdt --fsname=p1 /dev/md2
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

 Read previous values:
Target:     p1-MDT0000
Index:      0
UUID:       prod_mds_001_UUID
Lustre FS:  p1
Mount type: ldiskfs
Flags:      0x405
            (MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters:

tunefs.lustre: cannot change the name of a registered target
tunefs.lustre: exiting with 1 (Operation not permitted)

I'm now stuck not being able to mount a 1PB file system... which isn't good :(

--
Dr Stuart Midgley
sdm...@gmail.com


--
Dr Stuart Midgley
sdm...@gmail.com
_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Stu Midgley

unread,
Mar 17, 2012, 5:13:55 AM3/17/12
to wc-di...@whamcloud.com, lustre, lustrefs
Extra, we are running 1.8.7

Thanks.

Stu Midgley

unread,
Mar 18, 2012, 1:20:46 AM3/18/12
to Nathan Rutman, lustre, wc-di...@whamcloud.com, lustrefs
ok, from what I can tell, the root of the problem is


[root@mds001 CONFIGS]# hexdump -C p1-MDT0000 | grep -C 2 mds
00002450 0b 00 00 00 04 00 00 00 12 00 00 00 00 00 00 00 |................|
00002460 70 31 2d 4d 44 54 30 30 30 30 00 00 00 00 00 00 |p1-MDT0000......|
00002470 6d 64 73 00 00 00 00 00 70 72 6f 64 5f 6d 64 73 |mds.....prod_mds|
00002480 5f 30 30 31 5f 55 55 49 44 00 00 00 00 00 00 00 |_001_UUID.......|
00002490 78 00 00 00 07 00 00 00 88 00 00 00 08 00 00 00 |x...............|
--
000024c0 00 00 00 00 04 00 00 00 0b 00 00 00 12 00 00 00 |................|
000024d0 02 00 00 00 0b 00 00 00 70 31 2d 4d 44 54 30 30 |........p1-MDT00|
000024e0 30 30 00 00 00 00 00 00 70 72 6f 64 5f 6d 64 73 |00......prod_mds|
000024f0 5f 30 30 31 5f 55 55 49 44 00 00 00 00 00 00 00 |_001_UUID.......|
00002500 30 00 00 00 00 00 00 00 70 31 2d 4d 44 54 30 30 |0.......p1-MDT00|

[root@mds001 CONFIGS]#
[root@mds001 CONFIGS]# hexdump -C /mnt/md2/CONFIGS/p1-MDT0000 | grep -C 2 mds
00002450 0b 00 00 00 04 00 00 00 10 00 00 00 00 00 00 00 |................|
00002460 70 31 2d 4d 44 54 30 30 30 30 00 00 00 00 00 00 |p1-MDT0000......|
00002470 6d 64 73 00 00 00 00 00 70 31 2d 4d 44 54 30 30 |mds.....p1-MDT00|
00002480 30 30 5f 55 55 49 44 00 70 00 00 00 07 00 00 00 |00_UUID.p.......|
00002490 80 00 00 00 08 00 00 00 00 00 62 10 ff ff ff ff |..........b.....|


now if only I can get the UUID to be removed or reset...


On Sun, Mar 18, 2012 at 1:05 PM, Dr Stuart Midgley <sdm...@gmail.com> wrote:
> hmmm… that didn't work
>
> # tunefs.lustre --force --fsname=p1 /dev/md2


> checking for existing Lustre data: found CONFIGS/mountdata
> Reading CONFIGS/mountdata
>
>   Read previous values:
> Target:     p1-MDT0000
> Index:      0
> UUID:       prod_mds_001_UUID
> Lustre FS:  p1
> Mount type: ldiskfs
> Flags:      0x405
>              (MDT MGS )
> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
> Parameters:
>

> tunefs.lustre: unrecognized option `--force'
> tunefs.lustre: exiting with 22 (Invalid argument)


>
>
>
>
> --
> Dr Stuart Midgley
> sdm...@gmail.com
>
>
>

> On 18/03/2012, at 12:17 AM, Nathan Rutman wrote:
>
>> Take them all down again, use tunefs.lustre --force --fsname.

>> ______________________________________________________________________
>> This email may contain privileged or confidential information, which should only be used for the purpose for which it was sent by Xyratex. No further rights or licenses are granted to use such information. If you are not the intended recipient of this message, please notify the sender by return and delete it. You may not use, copy, disclose or rely on the information contained in it.
>>
>> Internet email is susceptible to data corruption, interception and unauthorised amendment for which Xyratex does not accept liability. While we have taken reasonable precautions to ensure that this email is free of viruses, Xyratex does not accept liability for the presence of any computer viruses in this email, nor for any losses caused as a result of viruses.
>>
>> Xyratex Technology Limited (03134912), Registered in England & Wales, Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA.
>>
>> The Xyratex group of companies also includes, Xyratex Ltd, registered in Bermuda, Xyratex International Inc, registered in California, Xyratex (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd registered in The People's Republic of China and Xyratex Japan Limited registered in Japan.
>> ______________________________________________________________________

Dr Stuart Midgley

unread,
Mar 18, 2012, 2:40:45 AM3/18/12
to kit.we...@nyu.edu, lustre, wc-di...@whamcloud.com, lustrefs
No, we have tried that.

This file system started life about 6 years ago as lustre 1.4 and has continually been upgraded… hence the whacky UUID. Trying to rename the FS doesn't work. It doesn't change the UUID that the mgs tells clients to mount.


--
Dr Stuart Midgley
sdm...@gmail.com

On 18/03/2012, at 2:24 PM, Kit Westneat wrote:

> You should be able to reset the UUID by doing another writeconf with the --fsname flag. After the writeconf, you'll have to writeconf all the OSTs too.
>
> It worked on my very simple test at least:
> [root@mds1 tmp]# tunefs.lustre --writeconf --fsname=test1 /dev/loop0


> checking for existing Lustre data: found CONFIGS/mountdata
> Reading CONFIGS/mountdata
>
> Read previous values:

> Target: t1-MDT0000
> Index: 0
> Lustre FS: t1
> Mount type: ldiskfs
> Flags: 0x5
> (MDT MGS )
> Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro
> Parameters: mdt.group_upcall=/usr/sbin/l_getgroups
>
>
> Permanent disk data:
> Target: test1-MDT0000
> Index: 0
> Lustre FS: test1
> Mount type: ldiskfs
> Flags: 0x105
> (MDT MGS writeconf )
> Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro
> Parameters: mdt.group_upcall=/usr/sbin/l_getgroups
>
> Writing CONFIGS/mountdata
>
>
> HTH,
> Kit
> --
> Kit Westneat
> System Administrator, eSys
> kit.we...@nyu.edu
> 212-992-7647

Stu Midgley

unread,
Mar 18, 2012, 3:36:21 AM3/18/12
to kit.we...@nyu.edu, lustre, wc-di...@whamcloud.com, lustrefs
I'm well down this path... I replaced the mountdata with that from my
small temporary mdt (same name) and that didn't help.

Now, I will do a few tests on the p1-client. Perhaps after a write
conf that is basically clean... and I can replace it... but currently
it contains lots of info about each of the OST's.

All the OST's are happy mounting to the mdt and all think that they
are part of our p1 file system.

Thanks.


On Sun, Mar 18, 2012 at 3:04 PM, Kit Westneat <kit.we...@nyu.edu> wrote:
> Oh right, that makes sense. I guess if I were you I would try one of two
> things. First, back up the MDT, and then try:
> 1) format a small loopback device with the parameters you want the MDT to
> have, then replace the CONFIGS directory on your MDT with the CONFIGS
> directory on the loopback device
> - OR -
> 2) use a hex editor to modify the UUID
>
> Then use tunefs.lustre --print to make sure it all looks good before
> mounting it.
>
> Though one thing I wonder about is, are the OSTs on the same page with the
> fsname? Like are they expecting to be part of the p1 filesystem?


>
> HTH,
> Kit
>
> --
> Kit Westneat
> System Administrator, eSys
> kit.we...@nyu.edu
> 212-992-7647
>
>

Dr Stuart Midgley

unread,
Mar 18, 2012, 4:58:57 AM3/18/12
to kit.we...@nyu.edu, lustre, wc-di...@whamcloud.com, lustrefs
Well, our filesystem is back.

I hexedit'ed the CONFIGS/p1-client and replaced prod_mds_001_UUID with p1-MDT0000_UUID and now our file system mounts.

Ran a heap of checks and it all looks good.

Thanks everyone for your help.


--
Dr Stuart Midgley
sdm...@gmail.com

Kit Westneat

unread,
Mar 18, 2012, 2:24:13 AM3/18/12
to Stu Midgley, lustre, wc-di...@whamcloud.com, lustrefs
You should be able to reset the UUID by doing another writeconf with the --fsname flag. After the writeconf, you'll have to writeconf all the OSTs too.

It worked on my very simple test at least:
[root@mds1 tmp]# tunefs.lustre --writeconf --fsname=test1 /dev/loop0
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     t1-MDT0000
Index:      0
Lustre FS:  t1
Mount type: ldiskfs
Flags:      0x5
              (MDT MGS )
Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro
Parameters: mdt.group_upcall=/usr/sbin/l_getgroups


   Permanent disk data:
Target:     test1-MDT0000
Index:      0
Lustre FS:  test1
Mount type: ldiskfs
Flags:      0x105
              (MDT MGS writeconf )
Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro
Parameters: mdt.group_upcall=/usr/sbin/l_getgroups

Writing CONFIGS/mountdata


HTH,
Kit
--
Kit Westneat
System Administrator, eSys
kit.we...@nyu.edu
212-992-7647


iamatt

unread,
Mar 18, 2012, 2:28:51 AM3/18/12
to kit.we...@nyu.edu, Stu Midgley, lustre, wc-di...@whamcloud.com, lustrefs
sorry for you situation... only word of encouragement i can offer at
this time is


http://www.youtube.com/watch?v=hNf5s_vra18

Kit Westneat

unread,
Mar 18, 2012, 3:04:44 AM3/18/12
to Dr Stuart Midgley, lustre, wc-di...@whamcloud.com, lustrefs
Oh right, that makes sense. I guess if I were you I would try one of two things. First, back up the MDT, and then try:
1) format a small loopback device with the parameters you want the MDT to have, then replace the CONFIGS directory on your MDT with the CONFIGS directory on the loopback device
- OR -
2) use a hex editor to modify the UUID

Then use tunefs.lustre --print to make sure it all looks good before mounting it. 

Though one thing I wonder about is, are the OSTs on the same page with the fsname? Like are they expecting to be part of the p1 filesystem?

HTH,
Kit

--
Kit Westneat
System Administrator, eSys
kit.we...@nyu.edu
212-992-7647


Reply all
Reply to author
Forward
0 new messages