Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

All of mount point in lost+found, is there any way I'm not screwed?

22 views
Skip to first unread message

Doug Freyburger

unread,
Jan 22, 2012, 6:03:09 AM1/22/12
to
Folks,

One of my systems has a crappy HBA. Every time it reboots I have to
"vgchange -a n vgEXT", "vxexport vgEXT", "vgimport vgEXT",
"vgchange -a y vgEXT" then run fsck against its logical volumes before
mounting. Swapping the HBA cards did not help.

Problem is the mount point has several million files of metadata
associated with a large Oracle database. Backups are so slow they
haven't completed in month.

Usually the fsck's complete okay. This time the host suffered a memory
module failure and when it booted the fsck moved all of the files to
lost+found. There are 353 directories there. I can only reliably guess
that the bottom one in "ls -ltr | tail" is the top level as its subdirs
match. After that guessing becomes very unreliable. There are 64K+
regular files in lost+found so guessing isout of the question.

Let me see if I have this right. In ext3 a directory is nothing but a
mapping of names to inode numbers. All of the file metadata is in the
inode not the directory. It's a detail that directories are b-trees not
linear lists. if fsck puts a file in lost+found that means the
directory it was in was corrupted. If it were possible to restore the
name fsck would have. Therefor it's not possible in any known sense.

Conceptually here’s how I have restored from lost+found in the past
with smaller numbers of files there. There is one enormous assumption
at the front.

Option one. Have a recursive “ls” output that includes the inode
numbers. Write a script that uses those inode numbers to move the
files back to the names they used to have but no longer do have.

Option two. Have a recursive “ls” output that has file sizes but no
inode numbers. Restore all files that have unambiguous sizes then
start guessing based on contents.

Option three. Have no recursive “ls” output. Start guessing.

I believe I'm in situation 3 right now. With a known partial backup but
not a known full backup.

philo

unread,
Jan 22, 2012, 10:23:39 AM1/22/12
to
On 01/22/2012 05:03 AM, Doug Freyburger wrote:
> Folks,
>
> One of my systems has a crappy HBA. Every time it reboots I have to
> "vgchange -a n vgEXT", "vxexport vgEXT", "vgimport vgEXT",
> "vgchange -a y vgEXT" then run fsck against its logical volumes before
> mounting. Swapping the HBA cards did not help.
>
> Problem is the mount point has several million files of metadata
> associated with a large Oracle database. Backups are so slow they
> haven't completed in month.
>



Even a *large* data base could not possibly have taken over a month to
back up.
You should have fixed the problem when it first became evident.

Since you have been through this before and apparently had success,
hopefully you will be able to pull it off again.

When you are all done it will be time for you to fix the problem that
caused it



<snip>

unruh

unread,
Jan 22, 2012, 1:03:46 PM1/22/12
to
On 2012-01-22, Doug Freyburger <dfre...@yahoo.com> wrote:
> Folks,
>
> One of my systems has a crappy HBA. Every time it reboots I have to
> "vgchange -a n vgEXT", "vxexport vgEXT", "vgimport vgEXT",
> "vgchange -a y vgEXT" then run fsck against its logical volumes before
> mounting. Swapping the HBA cards did not help.
>
> Problem is the mount point has several million files of metadata
> associated with a large Oracle database. Backups are so slow they
> haven't completed in month.

And now you discover what backups are for! You act as if backups were
some process which was an annoyance and if it did not finish, goody,
that takes up less space and time. Bizarre. How in the world can you
have backups that "haven't completed in month"?

Note that one primary backup should, for everyone, be a mirror disk on
another computer. That is NOT slow. You can backup terrabytes of data or
more a day. Make sure it is on another computer, so if your current one
goes up in smoke, the backup will not be lost as well. Then use tape or
whatever.


>
> Usually the fsck's complete okay. This time the host suffered a memory
> module failure and when it booted the fsck moved all of the files to
> lost+found. There are 353 directories there. I can only reliably guess
> that the bottom one in "ls -ltr | tail" is the top level as its subdirs
> match. After that guessing becomes very unreliable. There are 64K+
> regular files in lost+found so guessing isout of the question.
>
> Let me see if I have this right. In ext3 a directory is nothing but a
> mapping of names to inode numbers. All of the file metadata is in the
> inode not the directory. It's a detail that directories are b-trees not
> linear lists. if fsck puts a file in lost+found that means the
> directory it was in was corrupted. If it were possible to restore the
> name fsck would have. Therefor it's not possible in any known sense.
>
> Conceptually here?s how I have restored from lost+found in the past
> with smaller numbers of files there. There is one enormous assumption
> at the front.
>
> Option one. Have a recursive ?ls? output that includes the inode
> numbers. Write a script that uses those inode numbers to move the
> files back to the names they used to have but no longer do have.

And how would you know what names they used to have?

>
> Option two. Have a recursive ?ls? output that has file sizes but no
> inode numbers. Restore all files that have unambiguous sizes then
> start guessing based on contents.

And how do you know what file sizes they used to have?

>
> Option three. Have no recursive ?ls? output. Start guessing.
>
> I believe I'm in situation 3 right now. With a known partial backup but
> not a known full backup.

Yup.

Doug Freyburger

unread,
Jan 23, 2012, 12:55:55 PM1/23/12
to
philo wrote:
> Doug Freyburger wrote:
>
>> One of my systems has a crappy HBA. Every time it reboots I have to
>> "vgchange -a n vgEXT", "vxexport vgEXT", "vgimport vgEXT",
>> "vgchange -a y vgEXT" then run fsck against its logical volumes before
>> mounting. Swapping the HBA cards did not help.
>
>> Problem is the mount point has several million files of metadata
>> associated with a large Oracle database. Backups are so slow they
>> haven't completed in month.
>
> Even a *large* data base could not possibly have taken over a month to
> back up.

Unless the client elects to purchase a backup package that I'd never
heard of not one of the ones I recommended.

> You should have fixed the problem when it first became evident.

Unless the client chose to have their backups handled by their Windows
admins who never did so. Unless the client chose to not spend the
billable hours to have us fix the problem.

> Since you have been through this before and apparently had success,
> hopefully you will be able to pull it off again.

An algorithm below might work. Else there are a couple of other
possibilities that are being discussed.

> When you are all done it will be time for you to fix the problem that
> caused it

This client has a long history of making expensive decisions. Sometimes
they are penny wise, pound foolish. Sometimes they sound buzz-word-y
and cool sounding but are really no more than more expensive and less
reliable. Once again I will suggest moving to a backup product that
actually works. Once again I'll send in the bill for a lot of hours,
one way or the other.

One lesson learned - I will start to produce a weekly inode to name
listing of every file on every server I support. And bill for writing
the script. Very profitable given the number of clients this event at
one of them triggered.

On the HBA problems -

Part of me observes this and continues to recommend professional
systems for my clients. Sun, HP and IBM commercial servers don't do
this. Commodity hardware does do this and many other sorts of failures
that don't happen on Solaris, HP-UX and AIX. Heck, today I'm doing a
replacement of a mirroed boot drive on an AIX box. Damn thing has
continued to keep the mirror intact for a month using up more and more
reassignment blocks and just plain kept running even on a failed drive.
Ah the standard AIX experience. AIX may be unpleasant to work on but
you can't kill it with a plasma torch.

Part of me observes my bill for support hours and figures it's
profitable or me, for my company, and much of the debugging is fun
because it's challenging.

On the other options -

There are 4 restore options possible. Another consultant on my team
has been working all day on calculating recovery using data available
in combined lost+found and Avamar log files. A contractor from another
firm has been working all day on getting a LUN that contains the
original data in HP-UX format mounted to a legacy HP-UX host to use
rsync. A local Windows admin at the client has been working all day on
assembling partial Avamar backups into a full Avamar restore. The
fourth possibility is to reinstall Legato Networker on an HP-UX host
and import all of the expired tapes back in.

The Legato approach would definitely work but it might take 3 weeks so
it has not yet been addressed. To start we’d have to install HP-UX
11.23 on another legacy HP-UX for it to be able to support the tape
silo, then install Networker and so on. The Legato approach would also
be extremely profitable for me because all of the hours would be
separately billed not a part of the standard fixed price contract.

All in all if that LUN to HP-UX works out we'll go with it and the
rsync. Run mkfs on the logical volume and start copying. Easy peasy.

1) If the LUN is available then the rsync will take about 2-3 days. Big if.

2) If the Avamar partial backups can be assembled into a full restore
that will take 2-3 days. No guarantee there are enough partial backups
to make a full.

3) Below is an algorithm that I have designed to try to restore directly.

4) If we end up needing to go the Legato Networker route that will take
on the order of 3 weeks. Unfortunately this is the only option that's
certain to work. Because of the length of time and cost in task pack
hours involved all three previous options will be exhausted before
falling back.

Note that there are 535 directories in lost+found. And I counted the
digits wrong for regular files. There are about 650,000 out of 2+
million total files on the mount point. All regular files on the mount
point are *.tif images.

This client needs to purchase Documentum!

The algorithm that is being attempted -

One of our DBAs is now developing the Oracle query needed to calculate
the full path to any one *.tif file. The query is needed for the first
loop in the algorithm:

For each directory under /mnt/lost+found do
Find the name of one *.tif file under it and write down its exact path under lost+found.
Find that *.tif file in Oracle and calculate what it’s complete directory path should be.
Build a table of inode numbered directories to calculated full directory paths.
Done

Topological/alphabetical sort the table of directory mapping so they are
processed shallow first.

For each directory in the table
See if the parent directory needs to be created. Note which ones don’t exist because those are suspects for the regular files under /170img/lost+found
Rename the directory from lost+found to its correct place in the tree.
Done

For each directory in the table do
Compare the list of regular files in it to the listing from the backups.
If there are any missing directories note them down as incomplete ones
Restore the specific files from backup, counting as we go. Should be 535 small restore jobs.
Done

If the restore count equals the number of regular files in /mnt/lost+found we are done.

Else look at restoring from the list marked as incomplete in the
previous loop.

Doug Freyburger

unread,
Jan 23, 2012, 1:20:37 PM1/23/12
to
unruh wrote:
> Doug Freyburger <dfre...@yahoo.com> wrote:
>
>> Problem is the mount point has several million files of metadata
>> associated with a large Oracle database. Backups are so slow they
>> haven't completed in month.
>
> And now you discover what backups are for! You act as if backups were
> some process which was an annoyance and if it did not finish, goody,
> that takes up less space and time. Bizarre. How in the world can you
> have backups that "haven't completed in month"?

It was the client's decision to switch from Legato Networker supported
by me to EMC Avamar supported by their Windows team. Shrug. Sometimes
I just have to accept their decisions and send them my bill. I
recommend. They make the final decision. If that decision ends up very
profitable for me in the long run then I work hard to puzzle out issues
like this. That includes asking around if anyone else has invented
tricks. In another post I listed an algorithm that might work.

> Note that one primary backup should, for everyone, be a mirror disk on
> another computer. That is NOT slow. You can backup terrabytes of data or
> more a day. Make sure it is on another computer, so if your current one
> goes up in smoke, the backup will not be lost as well. Then use tape or
> whatever.

First thing I checked. There's a daily BCV that gets mounted on another
host. We weren't notified of the problem for two days. The copy has
now been equally corrupt for two days (today makes three days. May as
well keep copies of our intermediate results as we work). Most days
this would have been noticed the first day and I could do a reverse
BCV by hand. This weekend there happened to be a huge scheduled
production outage to apply Oracle patches and everyone was too busy to
notice it on time. Primary restore that would have handled it in a
couple of hours, too late to use.

>> Conceptually here?s how I have restored from lost+found in the past
>> with smaller numbers of files there. There is one enormous assumption
>> at the front.
>
>> Option one. Have a recursive ?ls? output that includes the inode
>> numbers. Write a script that uses those inode numbers to move the
>> files back to the names they used to have but no longer do have.
>
> And how would you know what names they used to have?

That is in fact my question. I'm aware that the designers of fsck
thought it to be impossible or it would have been implemented inside
fsck. But since I've recovered from this sort of problem in the past I
wondered if anyone else ever has and how the did it.

>> Option two. Have a recursive ?ls? output that has file sizes but no
>> inode numbers. Restore all files that have unambiguous sizes then
>> start guessing based on contents.
>
> And how do you know what file sizes they used to have?

Even partial backups have log files. Log files list file sizes. If log
files include inode numbers that's even more direct as the files under
lost+found use those inode numbers.
0 new messages