raid1: IO failed after 5 retries
raid1: IO failed after 5 retries
raid1: IO failed after 5 retries
/export/mail: got error 5 while accessing filesystem
panic: softdep_deallocate_dependencies: unrecovered I/O error
The funny thing is there's actually no I/O error (at least nothing in dmesg).
Upon reboot, fsck showed nothing unusual, just the usual few unref files and
low block counts.
Any ideas?
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-...@muc.de
what does: raidctl -s raid1
say?
(my guess is it thinks one disk has failed, and the reason for the
above is that it also got an IO error in trying to read the other
disk... In passing that EIO back, softdep wasn't happy, and fell
over..)
Later...
Greg Oster
I'm just wondering what if that really had been an actual I/O error. Then,
I would have lost the RAIDframe parity because of the panic, and probably
not have been able to re-write it because of the I/O error.
Is there any decent way to save the RAIDframe parity on a panic?
> So I just go another panic on the file server:
>
> raid1: IO failed after 5 retries
> raid1: IO failed after 5 retries
> raid1: IO failed after 5 retries
> /export/mail: got error 5 while accessing filesystem
> panic: softdep_deallocate_dependencies: unrecovered I/O error
>
> The funny thing is there's actually no I/O error (at least nothing in dmesg).
No other messages?
> Upon reboot, fsck showed nothing unusual, just the usual few unref files and
> low block counts.
>
> Any ideas?
Note that our softdep implementation has no capability to recover from
I/O errors whatsoever, so it just panics. This is one reason we will be
replacing it with journalling in 6.0.
Cheers,
Andrew
> Note that our softdep implementation has no capability to recover from
> I/O errors whatsoever, so it just panics.
Is this any different in the FreeBSD implementation?
However, it looks like what I had was not a real I/O error.
Would it be possible that the softdep code tried to write a block outside
the partition, and that failed without a further error message from the
disk driver?
There are plans to make it not have to check as much parity
after a panic, but right now it has to verify all parity...
Later...
Greg Oster
When the first (time-wise) partition of a RAID set is opened, the
first thing that is done is to mark all the components as having
"dirty parity". This is done before any writing of data takes place.
If you then lose a component (e.g. disk dies) then at that point the
system considers all remaining parity as valid -- this is fine,
because the parity was valid before, and it's been kept up-to-date.
If the system now panics, the component labels will say that the
parity is dirty, but since a component is missing it will be forced
to use whatever parity is there. There is a non-zero chance that
data may be lost due to a panic in this instance, but the only
alternative would be to dig out the backup tapes and do a full
restore (guaranteeing loss of whatever data was generated since the
last backup...).
Later...
Greg Oster
For when is this planned?
Geert
You too have a text editor! (Probably even one with convenient
macros for doing things like indenting source code.)
The parity logging code, if it could be quickly made to work at all
(I keep meaning to find the time to try, but my text editor is...busy),
appears to have most of the functionality required for this already.
Specifically, it chops up the disk into many parity regions and tracks
their state independently of each other.
I have no idea if this code would work in a 2-disk "mirror" setup
like many of us use to boot NetBSD. I suspect that would require
some support in the bootblocks, just as a 2-disk "RAID5" doesn't work
now.
--
Thor Lancelot Simon t...@rek.tjls.com
"Even experienced UNIX users occasionally enter rm *.* at the UNIX
prompt only to realize too late that they have removed the wrong
segment of the directory structure." - Microsoft WSS whitepaper
On Nov 25, 12:10pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
} Subject: Re: panic: softdep_deallocate_dependencies: unrecovered I/O error
} > No other messages?
} No, the lines immediately before the "raid1: IO failed after 5 retries." are
} "wsdisplay0: screen n added (...)" from the last boot.
}
} > Note that our softdep implementation has no capability to recover from
} > I/O errors whatsoever, so it just panics.
} Is this any different in the FreeBSD implementation?
}
} However, it looks like what I had was not a real I/O error.
} Would it be possible that the softdep code tried to write a block outside
} the partition, and that failed without a further error message from the
} disk driver?
}
>-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=
No deadlines have been set... Since we've now branched for 5.0,
however, arguably it should happen between now and the branch for 6.0
:-} (Read that as: I should get serious about doing some RAIDframe
hacking...)
Later...
Greg Oster
Hmmmm... This should be in a PR somewhere then... (RAIDframe
shouldn't be returning EIO unless there really is an IO error on the
underlying components..)
Later...
Greg Oster
> On Nov 25, 12:10pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
> } Subject: Re: panic: softdep_deallocate_dependencies: unrecovered I/O error
> } > No other messages?
> } No, the lines immediately before the "raid1: IO failed after 5 retries." ar
> e
> } "wsdisplay0: screen n added (...)" from the last boot.
> }
> } > Note that our softdep implementation has no capability to recover from
> } > I/O errors whatsoever, so it just panics.
> } Is this any different in the FreeBSD implementation?
> }
> } However, it looks like what I had was not a real I/O error.
> } Would it be possible that the softdep code tried to write a block outside
> } the partition, and that failed without a further error message from the
> } disk driver?
> }
> >-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=
>
Later...
Greg Oster
Another RAIDframe question: Is there a decent way to replace a component on the fly that one is afraid of failing?
Say I get recoverable errors or squeeky noises from a component. Of course, I can add a hot spare, fail that component and begin a reconstruction. But then I have a three-hour window where, if another component fails, I have shot myself in the foot.
Can you schedule downtime of the RAID sets for those 3 hours? If you
can't, then you're rather stuck with that 3-hour window of vulnerability.
If you can, then the best thing to do would be to unmount all active
filesystems from the RAID set, and then do the add hot spare and
rebuild.
Let's say wd0e, wd1e, and wd2e are in your RAID set, and that wd2 is
dying. So you hot-add wd3e to the set, and start rebuilding wd2e to
wd3e. Now if wd0e fails, you could always do a 'raidctl -u' and
then 'raidctl -C' to force the configuration with just wd0e, wd1e,
and wd2e in the RAID set. Because nothing was writing to the disk
during the reconstruction, you know that all the data and parity are
still self-consistent, and so you can still trust wd2e to be in-sync
with the contents of wd0e and wd1e. If the filesystem from the RAID
set is active you can't do this, as the instant you would change any
of the data on wd0e or wd1e then wd2e becomes invalid. (yes, it's
only invalid for the stripes that change, but RAIDframe doesn't keep
track of what's changed on a stripe-by-stripe basis...)
Of course, you do have backups of all the data, right? ;) (A
verified set or two of backups is still the best guarantee against
data loss in these situations...)
Later...
Greg Oster
Nope... That, and digging through the code a bit more, there is a
very definite path that gets followed in this part of the code, and
there are no "out-of-memory" possibilities along that path...
But I'm completely failing to see how the "raid1: IO failed after 5 retries."
can occur without an actual component failure...
Later...
Greg Oster
There actually is a way around this, but it requires preparation at
RAID setup time.
What you do is, instead of building your RAID 5 out of disk partitions,
make each disk partition a member of a two-component RAID 1 (mirror)
set, with the other member missing. Then build your RAID 5 atop the
RAID 1s.
Then, if you want to swap out a disk underneath the RAID 5, you hot-add
the new disk to the relevant underlying RAID 1, wait until the RAID 1's
reconstruct is done, and pull the original. The RAID 5 never knows
anything is happening. Mark the RAID 1s autoconfiguring and you don't
even have to think about updating their config files.
However, as I said above, this requires that it be designed in from the
beginning. Not much help in your current predicament, unless you can
take the whole thing out of service for a future-proofing rebuild.
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML mo...@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Luckily, I don't currently have drives making weird noises just now.
> Of course, you do have backups of all the data, right?
Yes. But restoring from Tivoli would take days to complete.
Does this setup impact performance?
> But I'm completely failing to see how the "raid1: IO failed after 5 retries."
> can occur without an actual component failure...
nosleep memory allocation?
Andrew
> Nice trick, didn't think of that.
:-)
> Does this setup impact performance?
It must have *some* impact. I haven't measured the difference; my
guess would be that the impact is so small it's hard to measure, but
that really is just a guess.
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML mo...@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
--
That sort of 'out-of-memory' condition was my first reaction, but I
couldn't find anything yesterday... digging a bit deeper reveals there
is a spot where we're not willing to wait for the Disk Queue Data pool
to return us an item. The low-water mark is set to just to just 64
elements, and the high water mark is at 256... those values should be
enough for 10-43 disks to run at full capacity..
This could also be triggered if getiobuf() returns NULL too... Hmm..
(For those playing along at home, src/sys/dev/raidframe/
rf_diskqueue.c:rf_CreateDiskQueueData() is the place where the
allocations could fail...)
Later...
Greg Oster
I've just asked our prime factoring group (being the only ones with relevant
amounts of data) whether they had been moving around large amounts of data
at that time.
However, this time, I do have a dump.
What should I investigate in the dump?
:(
> However, this time, I do have a dump.
>
> What should I investigate in the dump?
I'd like to know what's in raidPtrs[0] (or raidPtrs[1] or whatever
number the RAID set is). In particular, knowing what's in the
raidPtrs[0].Queues structure might be useful.
The trick is to figure out why a getiobuf() call returned NULL, or
whether or not the diskqueuedata pool was exhausted...
Feel free to contact me off-line about this...
Later...
Greg Oster
> I'd be suspicious of the hardware.
I'm unsure about that. At least, it's ECC memory.
> Especially if you've not made any changes recently.
No.
-Brian
On Apr 8, 6:57pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
} Subject: Re: lockmgr: non-zero exclusive count
} > If it's been runing solidly for a while,
} More or less, yes. At least, that's the first panic of that kind I've seen.
}
} > I'd be suspicious of the hardware.
} I'm unsure about that. At least, it's ECC memory.
}
} > Especially if you've not made any changes recently.
} No.
>-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=
--
Good luck.
-Brian
On Apr 7, 5:27pm, =?iso-8859-1?Q?Edgar_Fu=DF?= wrote:
} Subject: RAIDframe component replacement
} I would need some advice on RAIDframe failing component replacement (on =
} 4.0.1/amd64).
}
} I have a RAID1 consisting of sd0a and sd1a. Now, sd0 sometimes fails =
} with "hardware error", but reconstruction onto it is OK. Of course, I =
} want to replace the disc. Luckily, I have a spare drive and everything =
} is hotpluggable SCA and I have unused slots.
}
} It seem I have two options (given the spare disc I have has already been =
} fdisk'ed and disklabel'ed):
} 1. Leave the two current discs in, insert the replacement disc, scscictl =
} scan it (becoming sd2) and then add it as a hot spare via raidctl -a =
} sd2a, Then, raidctl -F sd0a which should begin a reconstruction on sd2a.
} 2. Do a raidctl -f sd0a (if sd0 hasn't been marked as failed already), =
} then scsictl detach it and pull it out. Then, substitute it with the =
} replacement disc, scsictl scan (does it become sd0 then?) and raidctl -a =
} sd0a. Probably I have to raidctl -F component0 again in order for the =
} reconstruction to begin.
}
} What's the preferred method?
} I would like to end up with a setup that is bootable again (given I =
} installboot'ed on the spare). Also, the configuration preferably should =
} look unchanged (i.e. sd0a/sd1a and not component0/sd1a/sd2a).
} Additionally, I would prefer the procedure that is safer against the =
} remaining component (sd1) failing in the middle of it.
}
} Thanks for your help.=
>-- End of excerpt from =?iso-8859-1?Q?Edgar_Fu=DF?=
if the remplaceemnt drive is identical, you don't need to detach/reattach
> I've done this several times with non-hotpluggable SCSI hardware where I had to power off anyway. But with SCA, I'm unsure, whether, after detaching sd0 (and sd1 still there), a newly scanned sd will become sd0 or sd2?
it should still be sd0
--
Manuel Bouyer <bou...@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
> But with scsictl detach/scan, I suppose?
If that's an excerpt from me: no.
> I've done this several times with non-hotpluggable SCSI hardware
> where I had to power off anyway.
I've done it with non-hot-pluggable hardware _without_ powering off.
If the SCSI bus is in use by any other devices, I will usually break to
ddb (to ensure the bus is idle) during the unplug-and-replug.
> But with SCA, I'm unsure, whether, after detaching sd0 (and sd1 still
> there), a newly scanned sd will become sd0 or sd2?
I'm not sure either. My guess would be that it would be sd2, but that
is just a guess.
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML mo...@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
--
> > But with scsictl detach/scan, I suppose?
> > But with SCA, I'm unsure, whether, after detaching sd0 (and sd1 still
> > there), a newly scanned sd will become sd0 or sd2?
>
> I'm not sure either. My guess would be that it would be sd2, but that
> is just a guess.
When I have done this in the past, I have done:
raidctl -f /dev/sd0a raid0 # mark component as failed
scsictl /dev/sd0c stop # stop disk spinning (optional)
scsictl /dev/scsibus0 detach 0 0 # detach old disk
[remove old disk, insert new]
scsictl /dev/scsibus0 scan any any # attach new disk
disklabel -i sd0 # add/edit disklabel
raidctl -R /dev/sd0a raid0 # rebuild raid
Replace with the relevant values for "sd0", "raid0", and "scsibus0" above.
The new disk gets the same ID as the old one. With SCA, the SCSI ID is
part of the enclosure, and not set on the disk.
Occasionally, inserting the new disk has caused a SCSI bus reset, because I
couldn't guarantee that the bus would be idle when I changed disks. This
might cause other disks to negotiate async transfers. Either waiting, or
explicit:
scsictl /dev/scsibus0 scan X 0
where X is the SCSI ID of the other disk restores the correct transfer speed.
(You don't want your other raid disk running async during the rebuild).
Note, that a disk isn't always labelled with the same number of sectors as
another of the same model. Hopefully, the replacement has at least as many
sectors as the original.
Thanks,
J
--
My other computer also runs NetBSD / Sailing at Newbiggin
http://www.netbsd.org/ / http://www.newbigginsailingclub.org/
> The new disk gets the same ID as the old one. With SCA, the SCSI ID is
> part of the enclosure, and not set on the disk.
Yes, of course. But I haven't hardwired SCSI IDs to sd instances. So I didn't know whether the new SCSI Target 0 would become sd0.
> Yes, of course. But I haven't hardwired SCSI IDs to sd instances. So I didn't know whether the new SCSI Target 0 would become sd0.
It will attach with the same unit number as the one that was detached. The
autoconf code removes the entry when the disk is detached and the new disk
attaches with the first free entry. So, if you have wired down the entries
or not, an identical replacement will have the same number. If you detach
more than one, make sure that you reattach them in reverse order of detach.
Thanks,
J
--
My other computer also runs NetBSD / Sailing at Newbiggin
http://www.netbsd.org/ / http://www.newbigginsailingclub.org/
--
Now I have yet to find out why an identical disc was only scanned as 160MB/s, where the original one was 320. In fact, repeated scsictl scan make it either 80 or 160MB. Is there way to re-negotiate sync speed?
On Apr 8, 10:48am, =?iso-8859-1?Q?Edgar_Fu=DF?= wrote:
} Subject: Re: RAIDframe component replacement
} > then, after replacing the failed sd0 with the new sd0
} But with scsictl detach/scan, I suppose?
} I've done this several times with non-hotpluggable SCSI hardware where I =
} had to power off anyway. But with SCA, I'm unsure, whether, after =
} detaching sd0 (and sd1 still there), a newly scanned sd will become sd0 =
} or sd2?
} I.e., is an attached sd given the smallest unused device number or that =
} following the highest used one?=
>-- End of excerpt from =?iso-8859-1?Q?Edgar_Fu=DF?=
--
-Brian
On Aug 2, 2:53pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
} Subject: Re: SATA: lost interrupt/reset failed
} > OK, I'll re-attach the drive to another machine.
} I did that and the SMART status is OK.
}
} So, anything short of re-booting the server to unlock the locked up SATA port? I'll be losing 455 days of uptime!
>-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=