Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Spares not marked faulty on failure.

6 views
Skip to first unread message

lack

unread,
Oct 17, 2005, 3:09:49 PM10/17/05
to
I have a system with some hotplugging support for my SATA card. I've
found an interesting bug in MD. Or at least I think it's a bug.

If I start with a RAID5 array with 3 drives and one spare:

# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5]
md0 : active raid5 sdc[2] sdd[3] sdb[1] sda[0]
10485760 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

Then hot-unplug the spare. From then on, every read or write to the
array puts this in the dmesg log:

md: write_disk_sb failed for device sdd
md: errors occurred during superblock update, repeating
scsi3 (0:0): rejecting I/O to dead device
md: write_disk_sb failed for device sdd
md: errors occurred during superblock update, repeating
scsi3 (0:0): rejecting I/O to dead device
md: write_disk_sb failed for device sdd
md: errors occurred during superblock update, repeating
scsi3 (0:0): rejecting I/O to dead device
md: write_disk_sb failed for device sdd
md: errors occurred during superblock update, repeating
scsi3 (0:0): rejecting I/O to dead device
md: write_disk_sb failed for device sdd
md: excessive errors occurred during superblock update, exiting

I would think that this should cause the drive to be marked as
'failed', but apparently that doesn't happen to spares. Once this
spare is actually pulled into the array, it is quickly detected that it
has failed and properly dealt with, but wouldn't it be better to have
it failed earlier instead?

--
Jim Ramsay

cyber

unread,
Nov 6, 2005, 3:01:24 PM11/6/05
to
----

Jim,

This should help on 2.6.11:

--- linux.orig/drivers/md/md.c 2005-02-13 12:16:24.000000000 -0800
+++ linux/drivers/md/md.c 2005-11-03 15:59:50.934626576 -0800
@@ -1288,17 +1288,25 @@


dprintk("%s ", bdevname(rdev->bdev,b));
if (!rdev->faulty) {
- err += write_disk_sb(rdev);
+ int ret;
+ ret = write_disk_sb(rdev);
+ if (ret) {
+ printk(KERN_ERR "md: have failed device.\n");
+ md_error(mddev, rdev);
+ printk(KERN_ERR "md: device marked as
failed.\n");
+ err += ret;
+ }
} else
dprintk(")\n");
if (!err && mddev->level == LEVEL_MULTIPATH)
/* only need to write one superblock... */
break;
}
+
if (err) {
if (--count) {
printk(KERN_ERR "md: errors occurred during
superblock"
- " update, repeating\n");
+ " update!\n");
goto repeat;
}
printk(KERN_ERR \


Cheers,
D.

0 new messages