[Lustre-discuss] MDT backup procedure

434 views
Skip to first unread message

Daniel Kulinski

unread,
Jun 17, 2009, 11:41:42 AM6/17/09
to lustre-...@lists.lustre.org

As we move forward with our lustre testing I am wondering about MDT backup. 

 

Is it feasible to unmount the MDT, create an image of it and remount it after the backup.  Of course this wouldn’t happen but nightly.

 

From what I can identify, in the case of an MDT failure we would have to do the following:

 

Restore from the last backup.

Run an lfsck across the filesystem.

 

Am I missing anything else at this point?  We will also be doing file level backups of the filesystem as a whole but we are looking for quick ways to recover from an MDT failure.

 

Thanks,

  Dan Kulinski

Andreas Dilger

unread,
Jun 17, 2009, 1:05:30 PM6/17/09
to Daniel Kulinski, lustre-...@lists.lustre.org

There is a documented process for doing MDT backup/restore that should be
used. In particular there are some files which shold not be restored if
the MDT is being restored.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Ramiro Alba Queipo

unread,
Jun 17, 2009, 1:35:51 PM6/17/09
to Daniel Kulinski, lustre-...@lists.lustre.org
HiDaniel,

By reading Chapter 15 of Lustre Operations Manual, it follows that an
MDT backup is only useful if you are changing hardwary or the like.
I am afraid that you can not pretend to replace with a previous image an
failed MDT, as data in OSTs and MDT is not matching any more, right?

Cheers

> --
> Aquest missatge ha estat analitzat per MailScanner
> a la cerca de virus i d'altres continguts perillosos,
> i es considera que está net.
> MailScanner agraeix a transtec Computers pel seu suport.

> _______________________________________________
> Lustre-discuss mailing list
> Lustre-...@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

--
Ramiro Alba

Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu


Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 86 46


--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que està net.
For all your IT requirements visit: http://www.transtec.co.uk

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Jun 17, 2009, 1:49:21 PM6/17/09
to Ramiro Alba Queipo, lustre-...@lists.lustre.org, Daniel Kulinski
IMHO, may be one can do a HA-MDS using shared storage
all the datas are in the shared storage so U can do a failover

>> ------------------------------------------------------------------------

hung-sheng_tsao.vcf

Daniel Kulinski

unread,
Jun 17, 2009, 1:51:39 PM6/17/09
to Hung-Sh...@sun.com, Ramiro Alba Queipo, lustre-...@lists.lustre.org
We are actually in an HA setup now. My main concern is a double disk failure on the MDT device.

Thanks,
Dan Kulinski

Cliff White

unread,
Jun 17, 2009, 3:35:29 PM6/17/09
to Ramiro Alba Queipo, lustre-...@lists.lustre.org, Daniel Kulinski
Ramiro Alba Queipo wrote:
> HiDaniel,
>
> By reading Chapter 15 of Lustre Operations Manual, it follows that an
> MDT backup is only useful if you are changing hardwary or the like.
> I am afraid that you can not pretend to replace with a previous image an
> failed MDT, as data in OSTs and MDT is not matching any more, right?

If you do a backup/immediate restore, it should be fine. If you restore
from an old image you will lose the changes made post-backup, but the
rest of the data should be fine.
cliffw

>
> Cheers
>
> On Wed, 2009-06-17 at 09:41 -0600, Daniel Kulinski wrote:
>> As we move forward with our lustre testing I am wondering about MDT
>> backup.
>>
>>
>>
>> Is it feasible to unmount the MDT, create an image of it and remount
>> it after the backup. Of course this wouldn’t happen but nightly.
>>
>>
>>
>> From what I can identify, in the case of an MDT failure we would have
>> to do the following:
>>
>>
>>
>> Restore from the last backup.
>>
>> Run an lfsck across the filesystem.
>>
>>
>>
>> Am I missing anything else at this point? We will also be doing file
>> level backups of the filesystem as a whole but we are looking for
>> quick ways to recover from an MDT failure.
>>
>>
>>
>> Thanks,
>>
>> Dan Kulinski
>>
>>
>>
>> --
>> Aquest missatge ha estat analitzat per MailScanner
>> a la cerca de virus i d'altres continguts perillosos,
>> i es considera que está net.
>> MailScanner agraeix a transtec Computers pel seu suport.
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-...@lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>

>> ------------------------------------------------------------------------

Andreas Dilger

unread,
Jun 17, 2009, 6:23:01 PM6/17/09
to Cliff White, lustre-...@lists.lustre.org, Daniel Kulinski
On Jun 17, 2009 12:35 -0700, Cliff White wrote:

> Ramiro Alba Queipo wrote:
> > By reading Chapter 15 of Lustre Operations Manual, it follows that an
> > MDT backup is only useful if you are changing hardwary or the like.
> > I am afraid that you can not pretend to replace with a previous image an
> > failed MDT, as data in OSTs and MDT is not matching any more, right?
>
> If you do a backup/immediate restore, it should be fine. If you restore
> from an old image you will lose the changes made post-backup, but the
> rest of the data should be fine.
> cliffw

Right - just like any backup, any changes made after the backup will of
course not be restored. One additional issue is that some OST objects
will not be available if they were deleted after the backup, even though
the restored MDS will still reference them. Accessing these files will
return -ENOENT.

At that point it would be possible (though not necessary) to run "lfsck"
to clean up the inconsistencies between the MDT and OST filesystems.
It is also possible to just re-delete the files that have "-ENOENT" and
restore (from some other filesystem-level backup) the rest of the files.

An MDS backup is a good idea, because it avoids having to restore 100TB+
(or whatever) of data from backup, leaving only a smaller number of changed
files that might need to be restored. It should NOT be the only form of
backup for the filesystem, since it does not contain any of the FILE data.
You, or your users, should do backups of their critical files separately.

Cheers, Andreas


--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

_______________________________________________

Daniel Kulinski

unread,
Jun 17, 2009, 6:28:07 PM6/17/09
to Andreas Dilger, Cliff White, lustre-...@lists.lustre.org
Thanks for this verbose reply. It is exactly what I needed and what I suspected I would run into. We are planning on multiple backup procedures. Users will backup at checkpoints in their work flow, IT will backup the MDT nightly and we are also looking at the possibility of backup the complete file system.

Thanks again for everyone's input, this gives me some good ammunition going forward for proposals.

Thanks,
Dan Kulinski

-----Original Message-----
From: Andreas...@sun.com [mailto:Andreas...@sun.com] On Behalf Of Andreas Dilger
Sent: Wednesday, June 17, 2009 4:23 PM
To: Cliff White
Cc: Ramiro Alba Queipo; lustre-...@lists.lustre.org; Daniel Kulinski
Subject: Re: [Lustre-discuss] MDT backup procedure

Adam Knight

unread,
Jun 17, 2009, 7:09:13 PM6/17/09
to lustre-...@lists.lustre.org
Pertaining to your original email, rather than taking the MDT down to
backup, it is very convenient to use LVM snapshots. With this
functionality it creates a LV duplicate of the MDT and allows you to
mount that as ldiskfs and backup files from a consistent copy (won't be
changing even if your MDT continues to add/remove data). Your lustre
filesystem will therefore stay operational during the backup. If you
time it cleverly, you can snapshot your MDT and OSTs at the same time
and backup from all of them to have a consistent copy of the whole
filesystem as well.

> Thanks for this verbose reply. It is exactly what I needed and what I suspected I would run into. We are planning on multiple backup procedures. Users will backup at checkpoints in their work flow, IT will backup the MDT nightly and we are also looking at the possibility of backup the complete file system.
>
> Thanks again for everyone's input, this gives me some good ammunition going forward for proposals.
>
> Thanks,
> Dan Kulinski
>
>
>

Ramiro Alba Queipo

unread,
Jun 18, 2009, 5:32:36 AM6/18/09
to Andreas Dilger, Cliff White, lustre-...@lists.lustre.org, Daniel Kulinski
Hi all,

In order to clarify ideas, let me to sum up (Please tell me if I am
wrong).

There are 3 ways of doing an MDT backup:

1) Device-level using dd command

You can do it from the original device to another local device with at
least the same capacity, BUT no clients and no OSTs should be active, so
NOT SUITABLE for an automated nightly backup

2) File-level using tar or rsync commands

You can make a copy to other directory (even remotely) BUT you MUST STOP
lustre and remount it as an 'ldiskfs' file system type. You also have to
save aditional information (cd /lustre/mds; getfattr -R -d -m '.*' -P .
> /<backup-dir>/ea.bak). So NOT SUITABLE for an automated nightly backup
either

3) File-level on LVM snapshots

LVM allows you to make a duplication of the MDT while lustre file system
is operational, so you can make afterwards a File-level backup of the
LVM snapshot while everything is running. Then it IS SUITABLE for an
automated backup.
Disadvantages are that you need extra local space for LVM snapshots and
the impact on performance of using LVM over the MDT.


By the way. The procedure described at 'How do I replace an OST or MDS?'
in Apendix B of Lustre Operational Manual differs from procedure
discribed at 15.1.3.1 (Backing Up an MDS File):
- getfattr -R -d -m '.*' -P . > ea.bak
- getfattr -R -e base64 -d . > /tmp/mdsea

Which one is the right one?


Cheers

--
Ramiro Alba

Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu


Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 86 46

--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,

Andreas Dilger

unread,
Jun 18, 2009, 12:24:35 PM6/18/09
to Ramiro Alba Queipo, Cliff White, lustre-...@lists.lustre.org, Sheila Barthel, Daniel Kulinski
On Jun 18, 2009 11:32 +0200, Ramiro Alba Queipo wrote:
> There are 3 ways of doing an MDT backup:
>
> 1) Device-level using dd command
>
> You can do it from the original device to another local device with at
> least the same capacity, BUT no clients and no OSTs should be active, so
> NOT SUITABLE for an automated nightly backup

Well, "no clients/OSTs should be active" is a relative term. You will
almost certainly have a usable backup even if the filesystem was active,
because ext3 has a robust on-disk layout, but you would need to run an
e2fsck afterward.

> 2) File-level using tar or rsync commands
>
> You can make a copy to other directory (even remotely) BUT you MUST STOP
> lustre and remount it as an 'ldiskfs' file system type. You also have to
> save aditional information (cd /lustre/mds; getfattr -R -d -m '.*' -P .
> > /<backup-dir>/ea.bak). So NOT SUITABLE for an automated nightly backup
> either

Right. Note that when using "tar" or "rsync" you should use the "--sparse"
option so that it doesn't back up empty files. Also, with newer versions
of tar (on RHEL/FC) and rsync it is possible to have it do the backup/restore
of the extended attributes directly.

You could also use "dump-0.4b40" (or later) to do a hybrid device/file
level backup. It will back up the filesystem directly from the block device,
but only the files that are in use. Versions 0.4b40+ can also do the
backup/restore of extended attributes, which is critical.

> 3) File-level on LVM snapshots
>
> LVM allows you to make a duplication of the MDT while lustre file system
> is operational, so you can make afterwards a File-level backup of the
> LVM snapshot while everything is running. Then it IS SUITABLE for an
> automated backup.
> Disadvantages are that you need extra local space for LVM snapshots and
> the impact on performance of using LVM over the MDT.

This is probably the best option. It allows consistent backups to be
done, and if you only keep a single snapshot the performance hit isn't
too big.

> By the way. The procedure described at 'How do I replace an OST or MDS?'
> in Apendix B of Lustre Operational Manual differs from procedure
> discribed at 15.1.3.1 (Backing Up an MDS File):
> - getfattr -R -d -m '.*' -P . > ea.bak
> - getfattr -R -e base64 -d . > /tmp/mdsea

I would say the first one is better, though I like to use "-e hex"
instead of "-e base64" because the hex output is easier for me to
decode if I need to for some reason. Probably the "replace an OST/MDT"
chapter should just reference the backup/restore section instead of
duplicating the content.

> i es considera que est? net.


> For all your IT requirements visit: http://www.transtec.co.uk
>

> _______________________________________________


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

_______________________________________________

John White

unread,
Jun 19, 2009, 1:58:25 AM6/19/09
to Adam Knight, lustre-...@lists.lustre.org
On Jun 17, 2009, at 4:09 PM, Adam Knight wrote:

> Pertaining to your original email, rather than taking the MDT down to
> backup, it is very convenient to use LVM snapshots. With this
> functionality it creates a LV duplicate of the MDT and allows you to
> mount that as ldiskfs and backup files from a consistent copy (won't
> be
> changing even if your MDT continues to add/remove data). Your lustre
> filesystem will therefore stay operational during the backup. If you
> time it cleverly, you can snapshot your MDT and OSTs at the same time
> and backup from all of them to have a consistent copy of the whole
> filesystem as well.
>

So following this, has anyone migrated an MDT to new storage with this
sort of procedure?

-create an lvm'd MDT that produces snapshots
-use it for a while in production
-get some snazzy new disk
-shutdown lustre
-take a snapshot of the MDT and shuffle it off to some different
storage media
-create a new LVM with snazzy new disk (specifically of a different
size from the original MDT)
-restore snapshot
-run lfsck for good measure (is this advisable on what could feasibly
be a clean filesystem?)
-bring up lustre

Please keep in mind, I've used LVM but haven't used snapshots, I'm not
familiar with their limitations. We're looking to create a filesystem
immediately but would like to get some much faster storage for the MDT
later without burning and building a new FS.

John White

unread,
Jun 19, 2009, 2:12:15 AM6/19/09
to John White, Adam Knight, lustre-...@lists.lustre.org
Ah, if I had only read on one mail I would have seen this procedure is
far more complex than described in the Operations Manual. I suppose
that's one of the benefits of an object based FS.
----------------
John White
High Performance Computing Services (HPCS)
(510) 486-7307
One Cyclotron Rd, MS: 50B-3209C
Lawrence Berkeley National Lab
Berkeley, CA 94720

Ramiro Alba Queipo

unread,
Jun 19, 2009, 4:39:06 AM6/19/09
to Andreas Dilger, Cliff White, lustre-...@lists.lustre.org, Sheila Barthel, Daniel Kulinski
Andreas,

This is a very interesting discussion, and it has raised some doubts on
the matter.

On Thu, 2009-06-18 at 10:24 -0600, Andreas Dilger wrote:
> On Jun 18, 2009 11:32 +0200, Ramiro Alba Queipo wrote:
> > There are 3 ways of doing an MDT backup:
> >
> > 1) Device-level using dd command
> >
> > You can do it from the original device to another local device with at
> > least the same capacity, BUT no clients and no OSTs should be active, so
> > NOT SUITABLE for an automated nightly backup
>
> Well, "no clients/OSTs should be active" is a relative term. You will
> almost certainly have a usable backup even if the filesystem was active,
> because ext3 has a robust on-disk layout, but you would need to run an
> e2fsck afterward.
>
> > 2) File-level using tar or rsync commands
> >
> > You can make a copy to other directory (even remotely) BUT you MUST STOP
> > lustre and remount it as an 'ldiskfs' file system type. You also have to
> > save aditional information (cd /lustre/mds; getfattr -R -d -m '.*' -P .
> > > /<backup-dir>/ea.bak). So NOT SUITABLE for an automated nightly backup
> > either
>
> Right. Note that when using "tar" or "rsync" you should use the "--sparse"
> option so that it doesn't back up empty files. Also, with newer versions

Can you tell me which versions? (I am using Ubuntu 8.04 with tar-1.19
and rsync-2.6.9).

> of tar (on RHEL/FC) and rsync it is possible to have it do the backup/restore
> of the extended attributes directly.

You mean there is no need use getfattr/setfattr commands?

>
> You could also use "dump-0.4b40" (or later) to do a hybrid device/file
> level backup. It will back up the filesystem directly from the block device,
> but only the files that are in use. Versions 0.4b40+ can also do the
> backup/restore of extended attributes, which is critical.
>
> > 3) File-level on LVM snapshots
> >
> > LVM allows you to make a duplication of the MDT while lustre file system
> > is operational, so you can make afterwards a File-level backup of the
> > LVM snapshot while everything is running. Then it IS SUITABLE for an
> > automated backup.
> > Disadvantages are that you need extra local space for LVM snapshots and
> > the impact on performance of using LVM over the MDT.
>
> This is probably the best option. It allows consistent backups to be
> done, and if you only keep a single snapshot the performance hit isn't
> too big.

So, the best option for automated backups could be the use of LVM
snapshots and then use 'dump' with dump levels over the mounted
snapshot. No needed the use of getfattr/setfattr commands, right?

What about performance influence of LMV for MDT on the overall Lustre
performance?

i es considera que està net.

Jerome, Ron

unread,
Jun 19, 2009, 9:24:21 AM6/19/09
to John White, lustre-...@lists.lustre.org
Hi John,

I migrated my MGS/MDT to new hardware just a few weeks ago without much
difficulty. I did not use an LVM snapshots though, rather the procedure
outlined in the manual (section 15.1.3.1 of the 1.6 manual) using tar
(with the "sparse" option, this is very important!) and getattr. Mine
is a combination MGS/MDT, so I also needed to tunefs.lustre --writeconf
to get the OST's to update their configuration logs on the new server.
I gave the new server the same IP address as the old one, so there
weren't any issues with changing nids. It's been running great ever
since.

FYI, it took a few hours to create the tar and extended attribute files
on the old server (~3.4M inodes) and about half that time to restore
them onto the new server (faster disks :) All in all, about 4 hours of
down time.

Ron Jerome
National Research Council Canada.


> -----Original Message-----
> From: lustre-disc...@lists.lustre.org [mailto:lustre-discuss-
> bou...@lists.lustre.org] On Behalf Of John White
> Sent: June 19, 2009 1:58 AM
> To: Adam Knight
> Cc: lustre-...@lists.lustre.org
> Subject: Re: [Lustre-discuss] MDT backup procedure
>

Jim Garlick

unread,
Jun 19, 2009, 11:59:29 AM6/19/09
to Ramiro Alba Queipo, Andreas Dilger, Cliff White, Daniel Kulinski, lustre-...@lists.lustre.org, Sheila Barthel

The versions we tested:
- tar-1.20-5 from Fedora 10 works.
- tar-1.15.1-23.0.1 from RHEL 5 does NOT work

Also, for file level backup, exclude /OBJECTS/* and /CATALOGS from the
backup, and make sure clients are unmounted during the restore or their
caches will become corrupt when the restored MDS comes back online
(due to changing inode numbers on the backing fs I believe).

The procedure that we tested a while back is as follows (to which I would
add Andreas's suggestion of --sparse):

# Backup
mount -t ldiskfs -ouser_xattr /dev/sda /mnt/mdt
tar --xattrs --no-selinux --exclude './OBJECTS/*' \
--exclude './CATALOGS' -C/mnt/mdt -cf backup.tar

# Restore
mount -t ldiskfs -ouser_xattr /dev/sda /mnt/mdt
tar -C/mnt/mdt -xf backup.tar
# (be afraid if this command produces no output)
getfattr -d -m ".*" -R /mnt/mdt | grep trusted.lov | more

> > > > Cheers, Andreas
> > > > --
> > > > Andreas Dilger
> > > > Sr. Staff Engineer, Lustre Group
> > > > Sun Microsystems of Canada, Inc.
> > > >
> > > >
> > > --
> > > Ramiro Alba
> > >
> > > Centre Tecnològic de Tranferència de Calor
> > > http:// www. cttc.upc.edu
> > >
> > >
> > > Escola Tècnica Superior d'Enginyeries
> > > Industrial i Aeronàutica de Terrassa
> > > Colom 11, E-08222, Terrassa, Barcelona, Spain
> > > Tel: (+34) 93 739 86 46
> > >
> > >
> > > --
> > > Aquest missatge ha estat analitzat per MailScanner
> > > a la cerca de virus i d'altres continguts perillosos,
> > > i es considera que est? net.
> > > For all your IT requirements visit: http:// www. transtec.co.uk
> > >
> >

> > > _______________________________________________
> > > Lustre-discuss mailing list
> > > Lustre-...@lists.lustre.org
> > > http:// lists.lustre.org/mailman/listinfo/lustre-discuss
> >
> >

> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Sr. Staff Engineer, Lustre Group
> > Sun Microsystems of Canada, Inc.
> >
> >
> --
> Ramiro Alba
>
> Centre Tecnològic de Tranferència de Calor
> http:// www. cttc.upc.edu
>
>
> Escola Tècnica Superior d'Enginyeries
> Industrial i Aeronàutica de Terrassa
> Colom 11, E-08222, Terrassa, Barcelona, Spain
> Tel: (+34) 93 739 86 46
>
>
> --
> Aquest missatge ha estat analitzat per MailScanner
> a la cerca de virus i d'altres continguts perillosos,
> i es considera que est? net.
> For all your IT requirements visit: http:// www. transtec.co.uk
>

> _______________________________________________

Ramiro Alba Queipo

unread,
Jun 19, 2009, 1:10:48 PM6/19/09
to Jim Garlick, Andreas Dilger, Cliff White, Daniel Kulinski, lustre-...@lists.lustre.org, Sheila Barthel
Hi all,

Thank you very much to all the people in this thread. You have been
really helpful

Cheers

i es considera que està net.

Reply all
Reply to author
Forward
0 new messages