Filesystem missing when trying to recover RAID

274 views
Skip to first unread message

Chewey

unread,
Oct 20, 2012, 5:36:35 AM10/20/12
to al...@googlegroups.com
I have been running ALT-F in RAID1 for some time (works fantastically - thanks!) but have not had to swap out drives before now.  As one drive had its health status as "failed" I thought I'd take the opportunity to upgrade from two 1TB to two 2TB drives. The procedure I used was as follows.
 
1) In the RAID component operations I failed and then removed the right drive (the one which was unhealthy)
2) I inserted a new 2TB drive, partitioned it (swap partition followed by RAID partition), formatted it as ext4 and then added it to the RAID array.
3) The RAID array successfully rebuilt and all was well.
 
This is where it went wrong...
 
4) I then failed and removed the left drive.
5) I inserted a spare (but not new) 2TB drive and partitioned it as for the right drive.  I didn't format it.
6) I then (I know, this is bad) created a new RAID array.
 
At the end of that, I couldn't access anything.
 
Thinking all was not lost, I reinserted the original two 1TB drives.  However the RAID status is inactive and it is not possible to mount md0.
 
I would be very grateful if anyone can provide me with steps to try to recover my original data.  I do have a backup of all the critical data but, for various reasons, recovering it is going to be extremely time consuming.  I have not amended the partitions on any of the drives post step 6.  I have tried working back through various combinations of drives to see if I could find a point from which I could rebuild the array but this hasn't worked.
 
I'm afraid I can only use the web interface.  I've not (yet) learnt anything more sophisticated.
 
Thanks in advance!

Chewey

unread,
Oct 20, 2012, 7:05:48 AM10/20/12
to al...@googlegroups.com
In case it helps, I've been experimenting with ssh.  Completely new to this so the below may or may not be of use. It relates to the two original 1TB drives when reinserted in the same bays as they were originally. Thanks!
 
 
# ls -l /
total 28
-rw-r--r--    1 root     root         17987 Sep 24  2010 COPYING
-rw-r--r--    1 root     root           245 Sep 24  2010 LICENCE
lrwxrwxrwx    1 root     root            16 Oct 20 09:12 Public -> /mnt/SDA4/Public
drwxr-xr-x    2 root     root          1520 Feb 21  2012 bin
drwxr-xr-x    2 root     root            40 Feb 21  2012 boot
drwxr-xr-x    4 root     root          1540 Oct 20 09:12 dev
drwxr-xr-x    9 root     root          1040 Oct 20 10:25 etc
lrwxrwxrwx    1 root     root            15 Oct 20 09:12 home -> /mnt/SDA4/Users
-rwxr-xr-x    1 root     root          2440 Dec  6  2010 init
drwxr-xr-x    2 root     root           500 Feb 21  2012 lib
drwxr-xr-x    4 root     root            80 Oct 20 09:12 mnt
dr-xr-xr-x   49 root     root             0 Jan  1  1970 proc
drwxr-x---    2 root     root           100 Oct 20 10:25 root
drwxrwxrwt    4 root     root           100 Oct 20 09:12 rootmnt
drwxr-xr-x    2 root     root          1320 Oct 20 09:12 sbin
drwxr-xr-x   11 root     root             0 Oct 20 09:12 sys
drwxrwxrwt    6 root     root           180 Oct 20 10:25 tmp
drwxrwxrwt    2 root     root            40 Oct 20 09:12 tmproot
drwxr-xr-x   10 root     root            60 Feb 21  2012 usr
drwxr-xr-x    4 root     root           180 Oct 20 09:12 var
# mdadm --detail /dev/md0
mdadm: md device /dev/md0 does not appear to be active.
 
# cat /proc/mdstat
Personalities : [linear] [raid1]
md0 : inactive sdb2[1](S) sda2[0](S)
      1948266624 blocks
unused devices: <none>
#
# mdadm --detail /dev/md0
mdadm: md device /dev/md0 does not appear to be active.
# mdadm --examine /dev/sda2
/dev/sda2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : d706218a:f87b1974:b5530359:8733c6f8
  Creation Time : Wed Apr  8 19:49:06 2009
     Raid Level : raid1
  Used Dev Size : 974133312 (929.01 GiB 997.51 GB)
     Array Size : 974133312 (929.01 GiB 997.51 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Update Time : Fri Oct 19 17:26:47 2012
          State : clean
Internal Bitmap : present
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 8a1e5e04 - correct
         Events : 4075875

      Number   Major   Minor   RaidDevice State
this     0       8        2        0      active sync   /dev/sda2
   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
# mdadm --examine /dev/sdb2
/dev/sdb2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : d706218a:f87b1974:b5530359:8733c6f8
  Creation Time : Wed Apr  8 19:49:06 2009
     Raid Level : raid1
  Used Dev Size : 974133312 (929.01 GiB 997.51 GB)
     Array Size : 974133312 (929.01 GiB 997.51 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Update Time : Fri Oct 19 23:07:37 2012
          State : clean
Internal Bitmap : present
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 8a1ec594 - correct
         Events : 4078897

      Number   Major   Minor   RaidDevice State
this     1       8       18        1      active sync   /dev/sdb2
   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
# blkid
/dev/mtdblock0: TYPE="minix"
/dev/sda4: LABEL="SDA4" UUID="9f134334-9044-470b-a1e4-ec0325e56dfd" TYPE="ext4"
/dev/sdb4: LABEL="SDB4" UUID="187c7b68-17ba-4ea7-8bb6-65193d1ed019" TYPE="ext4"
/dev/loop0: TYPE="squashfs"
/dev/mtdblock1: TYPE="minix"
/dev/sda1: TYPE="swap"
/dev/sda2: UUID="8a2106d7-7419-7bf8-5903-53b5f8c63387" TYPE="mdraid"
/dev/sdb1: TYPE="swap"
/dev/sdb2: UUID="8a2106d7-7419-7bf8-5903-53b5f8c63387" TYPE="mdraid"
#
 

Joao Cardoso

unread,
Oct 22, 2012, 3:23:34 PM10/22/12
to al...@googlegroups.com


On Saturday, October 20, 2012 10:36:35 AM UTC+1, Chewey wrote:
I have been running ALT-F in RAID1 for some time (works fantastically - thanks!)

Thanks.
Alt-F 0.1RC2 or 0.1RC1? Flashed, I guess.
 
but have not had to swap out drives before now.  As one drive had its health status as "failed" I thought I'd take the opportunity to upgrade from two 1TB to two 2TB drives. The procedure I used was as follows.
 
1) In the RAID component operations I failed and then removed the right drive (the one which was unhealthy)

and removed it from the right bay. Let's call it "failed 1TB" drive.
 
2) I inserted a new 2TB drive, partitioned it (swap partition followed by RAID partition), formatted it as ext4

No need for a format here

and then added it to the RAID array. 
3) The RAID array successfully rebuilt and all was well.
 
This is where it went wrong...

At this point you have 1TG on the left and a 2TB on the right bay.

4) I then failed and removed the left drive.

And removed it from the left bay. Lets call it "good 1TB" drive. It has all your data intact. Hope you keep it safe in a drawer.
 
5) I inserted a spare (but not new) 2TB drive and partitioned it as for the right drive.  I didn't format it.

in 4 you removed the left drive, in 5 you inserted a right drive? Don't fit...
Or did you mean that you removed the left drive from the bay, inserted a new drive on on the left bay intended to be used in the right bay?

6) I then (I know, this is bad) created a new RAID array.

Thus the data is lost. In which drives is the question. Both disks was 2TB?

To avoid confusion I always stick a paper label to the disks when I add/eject/partition disks. And always save the backup drive in a drawer, I don't keep it on the desktop!


At the end of that, I couldn't access anything.
 
Thinking all was not lost, I reinserted the original two 1TB drives.  However the RAID status is inactive and it is not possible to mount md0.

From your next post, everything looks fine on the 1TB disks, I don't understand why the RAID is not started.
Doesn't a RAID device appears in the lower section of the RAID web page? With a "Start" button on it? What happens if you hit it?
Or only a "Stop" button appears (this is a bug for RC2)?

In any case, try at the command line "mdadm --assemble /dev/md0". Error? What does it say?

Do you know what is "good" and the "failed" 1TB disks? Same brand/model? difficult to guess which is which...

In any case, I would recommend you to use only *one* disk at a time on the box, and see if your data is still in any of them. With only one disk in the box, the RAID should start in the "degraded" mode, but your data should still be available.
 
Please keep reporting, I'm now more available.

Chewey

unread,
Oct 22, 2012, 5:03:18 PM10/22/12
to al...@googlegroups.com
Joao
 
Thanks for your response.  Very grateful.  I've added comments below (in blue) out of interest although I'm hoping (after some lengthy experimentation with Putty - learnt lots!) I've got it sorted.  For reasons I still don't understand (but I suspect it was something I did in steps 1-6 below) it was not possible to mount either of the 1TB drives through Alt-F.  Eventually I gave up and inserted and reformatted both 2TB drives and created a new RAID 1 array.  As a last resort, I then attached the Failed 1TB drive via USB.  Still no joy through Alt-F but, through Putty, I could mount the drive and then Alt-F finally recognised it.  As we speak, all the data from the 1TB drive is being copied onto the new 2TB RAID array.  It is looking like it could take 2-3 days but was still cause of a minor celebration!!  (I'm hoping Failed 1TB won't actually fail in this period but, in theory, I still have Good 1TB to try again if it does.)
 
If the copy isn't successful, and if you don't mind, I'll be back in touch!
 
One final question.  I've noticed that my users folder is now located on the USB drive (Failed 1TB).  Do I have to move this to the new RAID array before disconnecting the USB? If so, do I move it to md0 or should it go on a separate partition?  The latter could be an issue as I have 3 partitions: swap, md0 (=sda2/sdb2) and sda3/sdb3.  The third partition is tiny and was only created because Alt-F presumably does something clever to make sure the main partition isn't an inconvenient size.
 
Keep up the good work!  Now I've used it a bit more, I'm even more impressed with Alt-F.
 
Regards
Saul
 
 

On Monday, October 22, 2012 8:23:34 PM UTC+1, Joao Cardoso wrote:


On Saturday, October 20, 2012 10:36:35 AM UTC+1, Chewey wrote:
I have been running ALT-F in RAID1 for some time (works fantastically - thanks!)

Thanks.
Alt-F 0.1RC2 or 0.1RC1? Flashed, I guess. Yes, RC2.
 
but have not had to swap out drives before now.  As one drive had its health status as "failed" I thought I'd take the opportunity to upgrade from two 1TB to two 2TB drives. The procedure I used was as follows.
 
1) In the RAID component operations I failed and then removed the right drive (the one which was unhealthy)

and removed it from the right bay. Let's call it "failed 1TB" drive.
 
2) I inserted a new 2TB drive, partitioned it (swap partition followed by RAID partition), formatted it as ext4

No need for a format here

and then added it to the RAID array. 
3) The RAID array successfully rebuilt and all was well.
 
This is where it went wrong...

At this point you have 1TG on the left and a 2TB on the right bay. Correct.  Let's call the 2TB drive "New 2TB".

4) I then failed and removed the left drive.

And removed it from the left bay. Correct. Lets call it "good 1TB" drive. It has all your data intact. Hope you keep it safe in a drawer.
 
5) I inserted a spare (but not new) 2TB drive and partitioned it as for the right drive.  I didn't format it. Let's call this "Spare 2TB"

in 4 you removed the left drive, in 5 you inserted a right drive? Don't fit... Sorry, slightly poor use of English.  What I meant was I inserted the spare 2TB drive in the left bay and then partitioned it in exactly the same way as the drive in the right bay.  I did it manually rather than copying the partition table but I don't expect that matters.
Or did you mean that you removed the left drive from the bay, inserted a new drive on on the left bay intended to be used in the right bay?

6) I then (I know, this is bad) created a new RAID array. The interesting thing about this, which I forgot to mention before, was that there was no option to rebuild the array.  Blame it on the time of night but I stupidly thought creating a new array would have the same effect as the array would be based off New 2TB.

Thus the data is lost. In which drives is the question. Both disks was 2TB? Yes, both disks were 2TB. What I was hoping to achieve was that by failing Good 1TB I could remove it and replace it with Spare 2TB.  If I added Spare 2TB to the array, I was hoping the array would rebuild, copying all my data from New 2TB to Spare 2TB.

To avoid confusion I always stick a paper label to the disks when I add/eject/partition disks. And always save the backup drive in a drawer, I don't keep it on the desktop! Good tip!


At the end of that, I couldn't access anything.
 
Thinking all was not lost, I reinserted the original two 1TB drives.  However the RAID status is inactive and it is not possible to mount md0.

From your next post, everything looks fine on the 1TB disks, I don't understand why the RAID is not started.
Doesn't a RAID device appears in the lower section of the RAID web page? With a "Start" button on it? What happens if you hit it? There was only a stop button.  In fact, the only other option I had was to "Destroy" the RAID and that sounded like a bad choice... (I understand that this should not destroy the data but I was still hopeful I could get the array restarted.)
Or only a "Stop" button appears (this is a bug for RC2)?

In any case, try at the command line "mdadm --assemble /dev/md0". Error? What does it say? This is not a command I tried.  Not being confident with mdadm I was worried the assembly would result in the same problem I had from step (6).

Do you know what is "good" and the "failed" 1TB disks? Same brand/model? difficult to guess which is which... Yes, I can tell them apart although, with hindsight, I admit a label would have been sensible on otherwise identical disks!

In any case, I would recommend you to use only *one* disk at a time on the box, and see if your data is still in any of them. With only one disk in the box, the RAID should start in the "degraded" mode, but your data should still be available. I did try this.  The box recognised md0 which was not degraded (weird, given there was only one drive) but refused to mount md0. 

Chewey

unread,
Oct 22, 2012, 5:11:22 PM10/22/12
to al...@googlegroups.com
Just to add... on the "RAID Creation and Maintenance" page of Alt-F I've just noticed that sdc2 (which is a partition on the USB drive) has become one of the components of the RAID array.  I definately did not add sdc2 to the array so that's odd.  The array was created with the 2x 2TB drives before I connected the USB drive.
 
Once it is done copying the data from sdc2 onto md0, I'll remove sdc2 from the array.  Perhaps that explains why my "users" directory is on sdc2 rather than md0 at the moment. 
 
Anyway, have to go now, but thanks again.  Will post to let you know if it worked in a few days.

Joao Cardoso

unread,
Oct 23, 2012, 10:30:20 AM10/23/12
to al...@googlegroups.com


On Monday, October 22, 2012 10:03:20 PM UTC+1, Chewey wrote:
Joao
 
Thanks for your response.  Very grateful.  I've added comments below (in blue) out of interest although I'm hoping (after some lengthy experimentation with Putty - learnt lots!) I've got it sorted.  For reasons I still don't understand (but I suspect it was something I did in steps 1-6 below) it was not possible to mount either of the 1TB drives through Alt-F.  Eventually I gave up and inserted and reformatted both 2TB drives and created a new RAID 1 array.  As a last resort, I then attached the Failed 1TB drive via USB.  Still no joy through Alt-F but, through Putty, I could mount the drive and then Alt-F finally recognised it.  As we speak, all the data from the 1TB drive is being copied onto the new 2TB RAID array.  It is looking like it could take 2-3 days but was still cause of a minor celebration!!  (I'm hoping Failed 1TB won't actually fail in this period but, in theory, I still have Good 1TB to try again if it does.)

Using USB as you did is a simple way to change both RAID1 disks.
The recipe, for others reference:

1-power off the box, using System->Utilities, remove both RAID1 disks from its bays, save one of them in a drawer (your backup, in case things go wrong), connect the other to a SATA to USB adapter (cheap), but don't connect it to the box yet.

2-insert the two new disks in the box, power it up and using the Disk Wizard create a new RAID1, that will probably be called md0

3-power up the SATA to USB adapter and disk, wait a couple of seconds and connect it to the box rear USB connector. A new degraded RAID array should appear, probably called md1, with all your data.

4-Using Setup->Folders
   a) select the degraded RAID mountpoint, probably /mnt/md1, and hit the CopyContents button
   b) select the new disks RAID array mountpoint, probably /mnt/md0, and hit the Paste button. Wait. And wait, it is going to take a while, depending on the amount of data to copy.

5-When it finish copying, power down the box using System->Utilities, unplug the SATA to USB adapter and power-up the box.

The "right" RAID way to do things recipe:

0-take note of the RAID components and drive bays, usually sda2/right, sdb2/left
1-Fail, then remove one of the RAID components, then remove its drive (Disk->Utilities->Eject). The array will become degraded but your data is still there and usable.
2-insert the new drive and partition it using the Disk Partitioner (be sure you are using the new disk!). Use a 500MB first swap partition and the remaining space for a RAID partition. Don't create a filesystem on it, it's a wast of time.
3-Using Disk->RAID, add the new disk partition to the existing degraded RAID. A resync should start while all data is replicated to the new component. Wait until it finish.
4.Repeat steps 1,2,3 for the other old drive.

If the new disks are of the same capacity of the older ones, you are done.
If the new disks are bigger, then you have to grow the RAID and filesystem:

5-Using Disk->RAID, select "Enlarge". Wait until it finish
6-Using Disk->Filesystem, select "Enlarge". That's it.

This RAID way of doing things is more involved than the USB method, but your data will be always available (in read/write mode!) during the process, this might be important if you are running a business or web server.

It might seem to be slower than the USB method, as two resyncs will happen, but it will be not, as internal transfer is faster than USB to internal disks. Also, when copying from USB to the new RAID1, two copies are in fact being done, as there are two disks to copy to (mirror). Only experimentation will tell for sure.

I dislike recipes, as people might follow them blindly, and if a typo happens you end-up with a salty dessert :-)

 
If the copy isn't successful, and if you don't mind, I'll be back in touch!

Of course.
  
One final question.  I've noticed that my users folder is now located on the USB drive (Failed 1TB).

The Users, Public, Backup, Alt-F and ffp folders are setup for use when they are first detected (at powerup or hotplug time).

  Do I have to move this to the new RAID array before disconnecting the USB?

After you finish copying the data, two sets of the above folders will exist, but only the first ones (USB) will be used.

When you will eject the USB these will stop being used, but the new ones, in the RAID, are already there and will not be used. You have either to reboot or to Stop then Start the RAID array in order for them to be detected and start being used. You can trying doing the latter (after ejecting the USB drive), but because of the Start/Stop button RAID bug in RC2 it might not work and you will have to reboot.
 
If so, do I move it to md0 or should it go on a separate partition?

If you are copying all the data, as you should, you will be copying all the above folders to the RAID (md0)
 
  The latter could be an issue as I have 3 partitions: swap, md0 (=sda2/sdb2) and sda3/sdb3.  The third partition is tiny and was only created because Alt-F presumably does something clever to make sure the main partition isn't an inconvenient size.

Yes.
  
Keep up the good work!  Now I've used it a bit more, I'm even more impressed with Alt-F.
 
Regards
Saul
 
...

From your next post, everything looks fine on the 1TB disks, I don't understand why the RAID is not started.
Doesn't a RAID device appears in the lower section of the RAID web page? With a "Start" button on it? What happens if you hit it? There was only a stop button.  In fact, the only other option I had was to "Destroy" the RAID and that sounded like a bad choice... (I understand that this should not destroy the data but I was still hopeful I could get the array restarted.)
Or only a "Stop" button appears (this is a bug for RC2)?

Yes, a bug :-(
 
In any case, try at the command line "mdadm --assemble /dev/md0". Error? What does it say? This is not a command I tried.  Not being confident with mdadm I was worried the assembly would result in the same problem I had from step (6).

This is what the Start button does: "Assemble the components of a previously created array into an active array".
This is different from "Create", "Create a new array"

Thanks, Joao

Joao Cardoso

unread,
Oct 23, 2012, 10:36:08 AM10/23/12
to al...@googlegroups.com


On Monday, October 22, 2012 10:11:22 PM UTC+1, Chewey wrote:
Just to add... on the "RAID Creation and Maintenance" page of Alt-F I've just noticed that sdc2 (which is a partition on the USB drive) has become one of the components of the RAID array.

This is a reason for worry... what are the RAID components and colors as seen in the RAID page? If sdc2 is green, no problem, it is being considered a spare component, and will be automatically used if one of the current active components (in black) fails (in red).

If it is a spare (green), you have to remove it from the array before ejecting/unplugging the USB. If it is an active component (in black) I can foresee big troubles.

 
  I definately did not add sdc2 to the array so that's odd.  The array was created with the 2x 2TB drives before I connected the USB drive.

So it must be green, lets hope.
 
 
Once it is done copying the data from sdc2 onto md0, I'll remove sdc2 from the array.  Perhaps that explains why my "users" directory is on sdc2 rather than md0 at the moment. 

See the other post.

Chewey

unread,
Oct 29, 2012, 4:30:21 PM10/29/12
to al...@googlegroups.com
Well, it took longer than expected but all the data is there and I now have a 2TB RAID.  Thanks again!
Reply all
Reply to author
Forward
0 new messages