mdadm badblocks errors

78 views
Skip to first unread message

Danny Robson

unread,
Jan 31, 2022, 7:01:08 PM1/31/22
to mlug

Hi all,

I mentioned in passing last night that I was hit by some errors in an
mdadm RAID5 array that resulted in one or two large files silently
becoming unreadable.

I was getting IO errors to particular offsets in these files, and some
_quite_ concerning syslog messages in the form of "bcache:
bch_count_backing_io_errors() md128: IO error on backing device,
unrecoverable"

The disks all had a pass on SMART diagnostics, mdadm reported good
health, and nothing was logged during a full scrub of the array.

A few days later it turns out I had overlooked the value for
"Offline_Uncorrectable" in the SMART reports and one of the disks had
partially failed.

I appears to mdadm has a feature I'd never heard about that can ignore
uncorrectable sectors named "badblocks". As far as I can tell this value
isn't taken into account during normal health checks.

The kernel.org wiki has a decent writeup of the underlying ideas:
https://raid.wiki.kernel.org/index.php/The_Badblocks_controversy

You can interrogate individual disks with the `mdadm` command, eg:

sybil ~ # mdadm --examine-badblocks /dev/sdd1
Bad-blocks on /dev/sdd1:
162543728 for 8 sectors
162546208 for 8 sectors

Now I have another test to add to regular array monitoring scripts.

But I remain somewhat uncertain as to why RAID5 didn't protect me here.

Cheers,
Danny Robson

zak martell

unread,
Jan 31, 2022, 7:21:28 PM1/31/22
to mlu...@googlegroups.com
Hi Danny,

RAID will not protect you from sector failure, unless the disks determine that sector as unwritable/bad. RAID simply just does the copy and duplication. If the disk on the other drive was writable it will process. If it later becomes unreadable, the raid “process” has no idea - which likely happened here. 

There are things like raid scrubbing and such you can run on schedule to check for things. Check it out. RAID itself has many schedule tasks it can do/you can enable 

It’s part of the old saying “raid is not a backup”.  

--
You received this message because you are subscribed to the Google Groups "mlug-au" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlug-au+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mlug-au/065c35fc-c3e7-9254-9f1f-00626c516e1c%40nerdcruft.net.

Danny Robson

unread,
Jan 31, 2022, 7:33:55 PM1/31/22
to mlu...@googlegroups.com

On 1/2/22 10:21, zak martell wrote:

> RAID will not protect you from sector failure, unless the disks
> determine that sector as unwritable/bad.

The drives themselves appear to have listed these sectors as bad. The
extended SMART logs are full of entries like:
"Error: UNC at LBA = 0x09b04a20 = 162548256"

> There are things like raid scrubbing and such you can run on schedule to
> check for things.

The thing that surprised me most here was that scrubbing did not report
any errors whatsoever.

There's a good chance the original badblock entries were logged, but I
haven't had a chance to scour syslog yet.

> It’s part of the old saying “raid is not a backup”.

Amusingly, I caught this during a regular (but delayed) backup.

Cheers,
Danny Robson.

zak martell

unread,
Jan 31, 2022, 7:49:09 PM1/31/22
to mlu...@googlegroups.com
The drives themselves appear to have listed these sectors as bad. The
extended SMART logs are full of entries like:
"Error: UNC at LBA = 0x09b04a20 = 162548256"

Was that before or after the creation of the file? 

RAID is great, use it for performance and uptime purposes, but never trust it to protect your files. 

--
You received this message because you are subscribed to the Google Groups "mlug-au" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlug-au+u...@googlegroups.com.

Danny Robson

unread,
Jan 31, 2022, 8:02:13 PM1/31/22
to mlu...@googlegroups.com

On 1/2/22 10:48, zak martell wrote:
> Was that before or after the creation of the file? 

Ah, fair. My assumption would be after, but it's a little difficult to
determine at this point.

I had naively assumed that if one drive reported the sector as
unreadable then the RAID driver could reconstruct the sector from the
remaining drives (where possible).

Is that not the case?

I can definitely see a few risks, but it feels overly cautious to
completely disable access to these sectors if only one drive in an array
failed.

> RAID is great, use it for performance and uptime purposes, but never
trust it to protect your files.

Yeah, I've developed a healthy paranoia around storage given my long
history of catastrophic drive, software, and user failures.

This just adds to the list. :)

Cheers,
Danny Robson.

Brian May

unread,
Jan 31, 2022, 8:32:00 PM1/31/22
to Danny Robson, mlu...@googlegroups.com
Danny Robson <da...@nerdcruft.net> writes:

> I had naively assumed that if one drive reported the sector as
> unreadable then the RAID driver could reconstruct the sector from the
> remaining drives (where possible).
>
> Is that not the case?

I would have expected that to be the case.

Probably none of these explain your situation, but some things that can
go wrong:

* Bad drive can't read data. But doesn't report failure. It returns bad
data instead. RAID driver has no idea data is bad, and passes it on.
Result: silent data corruption. File systems like zfs and btrfs will
checksum data blocks so they can tell if data is corrupt.

* Especially for RAID1 where you only have two drives: Bad drive is
automatically taken offline. This results in more load on the good
drive, which could kill it too. Especially if you are trying to sync
onto a new drive. The good drive - which may have been produced in the
same batch as the bad drive and have the same defect - becomes bad,
and all data is lost.
--
Brian May <br...@linuxpenguins.xyz>
https://linuxpenguins.xyz/brian/

Andrew McGlashan

unread,
Feb 1, 2022, 8:36:57 AM2/1/22
to mlu...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi,

On 1/2/22 11:01 am, Danny Robson wrote:
> The kernel.org wiki has a decent writeup of the underlying ideas:
> https://raid.wiki.kernel.org/index.php/The_Badblocks_controversy

Nothing in there showing updates of the situation and the page was:
" This page was last modified on 6 May 2018, at 21:09."

Does that page need updates?

To me, it's not that clear.

Drive "self-healing" is certainly a modern feature of drives, so just
trying to exercise the data will get some problems fixed. I too
thought though that having using RAID would allow some level of
protection against partial or full drive failure. Was a fan of ZFS in
the Solaris world, not happy with Oracle's licensing of said feature
and never trusted BTRFS. I know there are other implementations of
ZFS, but there is the worry about not having ECC type RAM and the
amount of RAM you need to operate well with ZFS (something that may
have improved over the years, I just don't know).

The self-healing magic, one would expect, would allow for some kind of
checksumming to be present, but it needs to report errors back to the
driver so that the same data otherwise "protected" by RAID can
determine if a good copy of the data is available. That would help
with data integrity.

For many years, data has only been somewhat protected by RAID and it's
perhaps a miracle that more data hasn't been lost, or rather more
likely that bit rot and other failures go on being undetected even
with RAID level scrubbing. A "clean" state for an MDADM RAID
volume.... can't be trusted?

Kind Regards
AndrewM
-----BEGIN PGP SIGNATURE-----

iHUEAREIAB0WIQTJAoMHtC6YydLfjUOoFmvLt+/i+wUCYfk3bQAKCRCoFmvLt+/i
+9FuAP91UxIsHVBz1Td96sBQVf42TOm0dvhVMwmTZHUDeANtfQD/alqpstDAm0Ru
lZqR2NTCO71C9reLx4MNbt15/TxGUCA=
=bl6g
-----END PGP SIGNATURE-----
Reply all
Reply to author
Forward
0 new messages