Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

SMART Uncorrectable_Error_Cnt rising - should I be worried?

50 views
Skip to first unread message

The Wanderer

unread,
Jan 9, 2024, 8:20:06 AMJan 9
to
This is not directly Debian-related, except insofar as the system
involved is running Debian, but we've already had a somewhat similar
thread recently and this forum is as likely as any I'm aware of to have
people who might have the experience to address the question(s). I would
be open to recommendations for alternate / better forums for this
inquiry, if people have such.


For background: I have an eight-drive RAID-6 array of 2TB SSDs, built
back in early-to-mid 2021.

Until recently, as far as I'm aware there have not been any problems
related to it.


Within the past few weeks, I got root-mail notifications from smartd
that the ATA error count on two of the drives had increased - one from 0
to a fairly low value (I think between 10 and 20), the other from 0 to
1. I figured this was nothing to worry about - because of the relatively
low values, because the other drives had not shown any such thing, and
because of the expected stability and lifetime of good-quality SSDs.


On Sunday (two days ago), I got root-mail notifications from smartd
about *all* of the drives in the array. This time, the total error
counts had gone up to values in the multiple hundreds per drive. Since
then (yesterday), I've also gotten further notification mails about at
least one of the drives increasing further. So far today I have not
gotten any such notifications.

One thing I don't know, which may or may not be important, is whether
these alert mails are being triggered when the error-count increase
happens, or when a scheduled check of some type is run. If it's the
latter, then it might be that there's a monthly check and that's the
reason why all eight drives got mails sent at once, but if it's the
former, then the so-close-in-time alerts from all eight drives would
seem more likely to reflect a real problem.


I've looked at the SMART attributes for the drives, and am having a hard
time determining whether or not there's anything worth being actually
concerned about here. Some of the information I'm seeing seems to
suggest yes, but other information seems to suggest no.

Relevant-seeming excerpts from the output of 'smartctl -a' on one of the
drives is attached (rather than inline, to avoid line-wrapping). I can
provide full output of that command for that drive, or even for all of
the drives, if desired.


Things that seem to suggest that there may be reason to be concerned
include, but may not be limited to:

The Uncorrectable_Error_Cnt, which is the value referenced by the alert
mails, has risen well above its apparent previous value of 0, and signs
are that it may be going to keep rising.

The Runtime_Bad_Block count is nonzero.

The ECC_Error_Rate is nonzero (and, at least in the case of this
specific drive, also equal to the Uncorrectable_Error_Cnt).

Most of the attributes are listed as of type "Old_age". That strikes me
as unexpected; two and a half years of mostly-read-based operation does
not seem like enough to qualify a SSD as "old", although my expectations
here may well be off. (I would be inclined to expect five-to-ten years
of operation out of a non-defective drive, assuming reasonable physical
treatment otherwise, if not considerably more.)

As mentioned above, the increase in Uncorrectable_Error_Cnt has happened
at nearly the same time (relative to drive installation date) for all
the drives, and for some of the drives it seems to be continuing to
increase.


I don't know how to interpret the "Pre-fail" notation for the other
attributes. That terminology could be intended to mean "This drive has
entered the final stage before failure, and its failure is expected to
be imminent" - or it could equally well be the status that the
attributes *start* in, with the intended meaning "This drive has not yet
reached a stage where there is any reason to think it might fail".


Things that seem to suggest that there may *not* be a reason to be
concerned include, but may not be limited to:

The "VALUE" column for each of the attributes remains high; most are in
the range from 098 to 100, and excluding the Airflow_Temperature_Cel
figure, the lowest is 095, for Power_On_Hours. From what I've managed to
find in reading online, this column is typically a percentage value,
with lower percentages indicating that the drive is closer to failure.

The Total_LBAs_Written value, when combined with the Sector Size,
results (if my math is correct) in a total-data-written figure of
between 3TB and 4TB. That should be *well* under the advertised write
endurance of this drive, given that the drive is 2TB and (both IIRC and
from what I've found in reading up on such things again after these
errors started to occur) those advertised values for similar-capacity
drives seem to start in the hundreds of TB and go up.


So... as the Subject asks, should I be worried? How do I interpret these
results, and at what point do they start to reflect something to take
action over? If there is not reason to be worried, what *do* these
alerts indicate, and at what point *should* I start to be worried about
them?

I already *am* worried, to the point of having heartburn and difficulty
sleeping over the possibility of data loss (there's enough on here that
external backup would be somewhat difficult to arrange), but I'm not
sure whether or not that is warranted.

My default plan is to identify an appropriate model and buy a pair of
replacement drives, but not install them yet; buy another two drives
every six months, until I have a full replacement set; and start failing
drives out of the RAID array and installing replacements as soon as one
either fails, or looks like it's imminently about to fail. But if the
mass notification mails indicate that all eight are nearing failure,
that might not be enough - and if they don't indicate any likelihood of
failure this year, then buying replacement drives yet might be
premature.

What drives I choose to buy as replacement would also be influenced by
how likely it is that this indicates impending failure. If it doesn't,
then drives similar to what I already have would probably still be
appropriate; if it does, then I'm going to want to go up-market and buy
long-endurance drives intended for high uptime - i.e., data-center
storage drives, which are likely to be more expensive.

--
The Wanderer

The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore all
progress depends on the unreasonable man. -- George Bernard Shaw
sdc-202401090732-partial
signature.asc

Dan Ritter

unread,
Jan 9, 2024, 10:00:06 AMJan 9
to
The Wanderer wrote:
> So... as the Subject asks, should I be worried? How do I interpret these
> results, and at what point do they start to reflect something to take
> action over? If there is not reason to be worried, what *do* these
> alerts indicate, and at what point *should* I start to be worried about
> them?
>
> I already *am* worried, to the point of having heartburn and difficulty
> sleeping over the possibility of data loss (there's enough on here that
> external backup would be somewhat difficult to arrange), but I'm not
> sure whether or not that is warranted.

YES. Backup ASAP.

2TB and 4TB Samsung 870 EVO disks produced before November 2022 have this as
a known failure mode.


> Model Family: Samsung based SSDs
> Device Model: Samsung SSD 870 EVO 2TB
> Serial Number: S620NJ0R410888A

These may or may not be under warranty, depending on when you
purchased them and from whom. Assume Samsung will take a long
time, no matter what.

-dsr-

The Wanderer

unread,
Jan 9, 2024, 10:30:06 AMJan 9
to
On 2024-01-09 at 09:38, Dan Ritter wrote:

> The Wanderer wrote:
>
>> So... as the Subject asks, should I be worried? How do I interpret
>> these results, and at what point do they start to reflect something
>> to take action over? If there is not reason to be worried, what
>> *do* these alerts indicate, and at what point *should* I start to
>> be worried about them?
>>
>> I already *am* worried, to the point of having heartburn and
>> difficulty sleeping over the possibility of data loss (there's
>> enough on here that external backup would be somewhat difficult to
>> arrange), but I'm not sure whether or not that is warranted.
>
> YES. Backup ASAP.
>
> 2TB and 4TB Samsung 870 EVO disks produced before November 2022 have
> this as a known failure mode.

AARGH.

I spent (what I'm now fairly sure was) *thousands* on these things,
after comparing a fair few drive models in reviews, and this is the
first I've heard that there was any issue.

Thank you for pointing it out. I have now done a bit of specific looking
regarding this model, and found a thread discussing it which started in
January 2022 and *is still going today*.

I have now ordered a high-capacity external hard drive to back up the
data (delivery date is this coming Monday), and two Intel
enterprise-class drives to start having a reserve on hand for the
expected failure. I don't really have the funds right now to buy
replacements for everything immediately, at least not without carrying a
lot more of a credit-card balance than I want to, but I want to at least
get started.

>> Model Family: Samsung based SSDs
>> Device Model: Samsung SSD 870 EVO 2TB
>> Serial Number: S620NJ0R410888A
>
> These may or may not be under warranty, depending on when you
> purchased them and from whom. Assume Samsung will take a long time,
> no matter what.

IIRC, I bought them via Newegg, in early-to-maybe-mid 2021. (The
alternative would be Amazon, but in this case I don't think that's how
it happened.) I would be surprised if there were warranty coverage at
this point, but might look a bit deeper; even if there is coverage,
however, it's not worth waiting (and risking data loss) for the process
to complete.

This is *not* a stress I need right now...
signature.asc

Curt

unread,
Jan 9, 2024, 11:20:06 AMJan 9
to
On 2024-01-09, The Wanderer <wand...@fastmail.fm> wrote:
>
> My default plan is to identify an appropriate model and buy a pair of
> replacement drives, but not install them yet; buy another two drives
> every six months, until I have a full replacement set; and start failing
> drives out of the RAID array and installing replacements as soon as one
> either fails, or looks like it's imminently about to fail. But if the
> mass notification mails indicate that all eight are nearing failure,
> that might not be enough - and if they don't indicate any likelihood of
> failure this year, then buying replacement drives yet might be
> premature.
>

Isn't it advised not to use the drives in this case, or to unmount them,
or to avoid all reads and writes (or however it should be termed, as we all
agree your symptoms mean trouble) in order to avoid exacerabating the
upcoming failure? Or does this go without saying?

Michael Kjörling

unread,
Jan 9, 2024, 11:30:06 AMJan 9
to
On 9 Jan 2024 08:11 -0500, from wand...@fastmail.fm (The Wanderer):
> Within the past few weeks, I got root-mail notifications from smartd
> that the ATA error count on two of the drives had increased - one from 0
> to a fairly low value (I think between 10 and 20), the other from 0 to
> 1. I figured this was nothing to worry about - because of the relatively
> low values, because the other drives had not shown any such thing, and
> because of the expected stability and lifetime of good-quality SSDs.
>
>
> On Sunday (two days ago), I got root-mail notifications from smartd
> about *all* of the drives in the array. This time, the total error
> counts had gone up to values in the multiple hundreds per drive. Since
> then (yesterday), I've also gotten further notification mails about at
> least one of the drives increasing further. So far today I have not
> gotten any such notifications.

A single or a few bad blocks is nothing to be overly concerned about.
I had an Intel SSD which lived a long, healthy, happy life with one
bad sector and never gave any signs of further problems.

Hundreds of bad blocks per drive is certainly cause for concern.

More worrying is a _significant increase in the rate of increase_ of
the bad blocks count. That suggests that the drive is suffering from
some underlying problem.


> So... as the Subject asks, should I be worried? How do I interpret these
> results, and at what point do they start to reflect something to take
> action over? If there is not reason to be worried, what *do* these
> alerts indicate, and at what point *should* I start to be worried about
> them?

At an absolute minimum, were it me, I would refresh my backups. As
8-wide RAID-6 of 2TB drives nets you about 12 TB of storage, I'd say
get yourself a ~16 TB external rotational HDD and set up to backup
onto it. You should have backups anyway; there's no time like the
present to get started.

You are admittedly in a much better position than many; if the errors
are randomly located, odds are that you have sufficient redundancy to
manage within the storage array.

The good part is if you look at SMART attributes 5 and 179; taken in
combination, I take them as indication that all (31) reallocated
sectors have been reallocated into the spare sectors pool, and this
represents approximately 2% of the spare sectors pool.

Absolutely do keep an eye on attribute 179. If the spare sectors pool
start to fill up, the drive won't be able to reallocate any further
sectors, and your RAID array won't do you much good.

I would also keep an eye out for I/O errors in the kernel log, but be
mindful of which devices they are coming from.

--
Michael Kjörling 🔗 https://michael.kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”

The Wanderer

unread,
Jan 9, 2024, 1:20:06 PMJan 9
to
It probably is, but I don't have an alternative, since this is my
daily-driver computer and I can't afford to go without one in the
interim.
signature.asc

The Wanderer

unread,
Jan 9, 2024, 1:30:07 PMJan 9
to
On 2024-01-09 at 11:21, Michael Kjörling wrote:

> On 9 Jan 2024 08:11 -0500, from wand...@fastmail.fm (The Wanderer):
>
>> Within the past few weeks, I got root-mail notifications from
>> smartd that the ATA error count on two of the drives had increased
>> - one from 0 to a fairly low value (I think between 10 and 20), the
>> other from 0 to 1. I figured this was nothing to worry about -
>> because of the relatively low values, because the other drives had
>> not shown any such thing, and because of the expected stability and
>> lifetime of good-quality SSDs.
>>
>>
>> On Sunday (two days ago), I got root-mail notifications from
>> smartd about *all* of the drives in the array. This time, the total
>> error counts had gone up to values in the multiple hundreds per
>> drive. Since then (yesterday), I've also gotten further
>> notification mails about at least one of the drives increasing
>> further. So far today I have not gotten any such notifications.
>
> A single or a few bad blocks is nothing to be overly concerned
> about. I had an Intel SSD which lived a long, healthy, happy life
> with one bad sector and never gave any signs of further problems.
>
> Hundreds of bad blocks per drive is certainly cause for concern.
>
> More worrying is a _significant increase in the rate of increase_ of
> the bad blocks count. That suggests that the drive is suffering from
> some underlying problem.

Do you read the provided excerpt from the SMART data as indicating that
there are hundreds of bad blocks, or that they are rising rapidly?

The Runtime_Bad_Block count for that drive is nonzero, but it is only 31.

What's high and seems as if it may be rising is the
Uncorrectable_Error_Cnt value (attribute 187) - which I understand to
represent *incidents* in which the drive attempted to read a sector or
block and was unable to do so.

>> So... as the Subject asks, should I be worried? How do I interpret
>> these results, and at what point do they start to reflect something
>> to take action over? If there is not reason to be worried, what
>> *do* these alerts indicate, and at what point *should* I start to
>> be worried about them?
>
> At an absolute minimum, were it me, I would refresh my backups. As
> 8-wide RAID-6 of 2TB drives nets you about 12 TB of storage, I'd say
> get yourself a ~16 TB external rotational HDD and set up to backup
> onto it. You should have backups anyway; there's no time like the
> present to get started.

I've ordered a 22TB external drive for the purpose of creating such a
backup. Fingers crossed that things last long enough for it to get here
and get the backup created.

> You are admittedly in a much better position than many; if the
> errors are randomly located, odds are that you have sufficient
> redundancy to manage within the storage array.

That's what I'm relying on.

> The good part is if you look at SMART attributes 5 and 179; taken in
> combination, I take them as indication that all (31) reallocated
> sectors have been reallocated into the spare sectors pool, and this
> represents approximately 2% of the spare sectors pool.

The fact that this is the same value as the Runtime_Bad_Block count
(attribute 183) is something I'd noticed before sending that mail, and
is probably not a coincidence.

> Absolutely do keep an eye on attribute 179. If the spare sectors
> pool start to fill up, the drive won't be able to reallocate any
> further sectors, and your RAID array won't do you much good.
>
> I would also keep an eye out for I/O errors in the kernel log, but
> be mindful of which devices they are coming from.

dmesg does have what appears to be an error entry for each of the events
reported in the alert mails, correlated with the devices in question. I
can provide a sample of one of those, if desired.
signature.asc

Michael Kjörling

unread,
Jan 9, 2024, 1:50:06 PMJan 9
to
On 9 Jan 2024 10:21 -0500, from wand...@fastmail.fm (The Wanderer):
>>> Model Family: Samsung based SSDs
>>> Device Model: Samsung SSD 870 EVO 2TB
>>
>> These may or may not be under warranty,
>
> I would be surprised if there were warranty coverage at
> this point, but might look a bit deeper;

https://www.samsung.com/us/computing/memory-storage/solid-state-drives/870-evo-sata-2-5-ssd-1tb-mz-77e1t0b-am/
indicates that the warranty is five years or 1200 TBW for the 2 TB model.

Michael Kjörling

unread,
Jan 9, 2024, 2:10:06 PMJan 9
to
On 9 Jan 2024 13:25 -0500, from wand...@fastmail.fm (The Wanderer):
>>> Within the past few weeks, I got root-mail notifications from
>>> smartd that the ATA error count on two of the drives had increased
>>> - one from 0 to a fairly low value (I think between 10 and 20), the
>>> other from 0 to 1. I figured this was nothing to worry about -
>>> because of the relatively low values, because the other drives had
>>> not shown any such thing, and because of the expected stability and
>>> lifetime of good-quality SSDs.
>>>
>>> On Sunday (two days ago), I got root-mail notifications from
>>> smartd about *all* of the drives in the array. This time, the total
>>> error counts had gone up to values in the multiple hundreds per
>>> drive. Since then (yesterday), I've also gotten further
>>> notification mails about at least one of the drives increasing
>>> further. So far today I have not gotten any such notifications.
>
> Do you read the provided excerpt from the SMART data as indicating that
> there are hundreds of bad blocks, or that they are rising rapidly?

No; that was your claim, in the paragraph about Sunday's events.


> The Runtime_Bad_Block count for that drive is nonzero, but it is only 31.
>
> What's high and seems as if it may be rising is the
> Uncorrectable_Error_Cnt value (attribute 187) - which I understand to
> represent *incidents* in which the drive attempted to read a sector or
> block and was unable to do so.

The drive may be performing internal housekeeping and in doing so try
to read those blocks, or something about your RAID array setup may be
doing so.

Exactly what are you using for RAID-6? mdraid? An off-board hardware
RAID HBA? Motherboard RAID? Or something else? What you say suggests
mdraid or something similar.


> I've ordered a 22TB external drive for the purpose of creating such a
> backup. Fingers crossed that things last long enough for it to get here
> and get the backup created.

I suggest selecting, installing and configuring (as much as possible)
whatever software you will use to actually perform the backup while
you wait for the drive to arrive. It might save you a little time
later. Opinions differ but I like rsnapshot myself; it's really just a
front-end for rsync, so the copy is simply files, making partial or
full restoration easy without any special tools.


> dmesg does have what appears to be an error entry for each of the events
> reported in the alert mails, correlated with the devices in question. I
> can provide a sample of one of those, if desired.

As long as the drive is being honest about failures and is reporting
failures rapidly, the RAID array can do its work. What you absolutely
don't want to see is I/O errors relating to the RAID array device (for
example, with mdraid, /dev/md*), because that would presumably mean
that the redundancy was insufficient to correct for the failure. If
that happens, you are falling off a proverbial cliff.

The Wanderer

unread,
Jan 9, 2024, 2:30:06 PMJan 9
to
On 2024-01-09 at 14:01, Michael Kjörling wrote:

> On 9 Jan 2024 13:25 -0500, from wand...@fastmail.fm (The Wanderer):
>
>>>> Within the past few weeks, I got root-mail notifications from
>>>> smartd that the ATA error count on two of the drives had
>>>> increased - one from 0 to a fairly low value (I think between
>>>> 10 and 20), the other from 0 to 1. I figured this was nothing
>>>> to worry about - because of the relatively low values, because
>>>> the other drives had not shown any such thing, and because of
>>>> the expected stability and lifetime of good-quality SSDs.
>>>>
>>>> On Sunday (two days ago), I got root-mail notifications from
>>>> smartd about *all* of the drives in the array. This time, the
>>>> total error counts had gone up to values in the multiple
>>>> hundreds per drive. Since then (yesterday), I've also gotten
>>>> further notification mails about at least one of the drives
>>>> increasing further. So far today I have not gotten any such
>>>> notifications.
>>
>> Do you read the provided excerpt from the SMART data as indicating
>> that there are hundreds of bad blocks, or that they are rising
>> rapidly?
>
> No; that was your claim, in the paragraph about Sunday's events.

That paragraph was about the Uncorrectable_Error_Cnt value, which I do
not understand to directly reflect a count of bad blocks. That's why I
wanted to clarify; if you *do* understand that to directly reflect bad
blocks, I'd like to understand your thinking in arriving at that
understanding, and if alternately you were reaching that conclusion from
other sources, I'd like to know how and from what, because it would be
something I've missed.

>> The Runtime_Bad_Block count for that drive is nonzero, but it is
>> only 31.
>>
>> What's high and seems as if it may be rising is the
>> Uncorrectable_Error_Cnt value (attribute 187) - which I understand
>> to represent *incidents* in which the drive attempted to read a
>> sector or block and was unable to do so.
>
> The drive may be performing internal housekeeping and in doing so
> try to read those blocks, or something about your RAID array setup
> may be doing so.
>
> Exactly what are you using for RAID-6? mdraid? An off-board hardware
> RAID HBA? Motherboard RAID? Or something else? What you say suggests
> mdraid or something similar.

mdraid, yes.

>> I've ordered a 22TB external drive for the purpose of creating such
>> a backup. Fingers crossed that things last long enough for it to
>> get here and get the backup created.
>
> I suggest selecting, installing and configuring (as much as
> possible) whatever software you will use to actually perform the
> backup while you wait for the drive to arrive. It might save you a
> little time later. Opinions differ but I like rsnapshot myself; it's
> really just a front-end for rsync, so the copy is simply files,
> making partial or full restoration easy without any special tools.

My intention was to shut down everything that normally runs, log out as
the user who normally runs it, log in as root (whose home directory,
like the main installed system, is on a different RAID array with
different backing drives), and use rsync from that point. My
understanding is that in that arrangement, the only thing accessing the
RAID-6 array should be the rsync process itself.

For additional clarity: the RAID-6 array is backing a pair of logical
volumes, which are backing the /home and /opt partitions. The entire
rest of the system is on a series of other logical volumes which are
backed by a RAID-1 array, which is based on entirely different drives
(different model, different form factor, different capacity, I think
even different connection technology) and which has not seen any
warnings arise.

>> dmesg does have what appears to be an error entry for each of the
>> events reported in the alert mails, correlated with the devices in
>> question. I can provide a sample of one of those, if desired.
>
> As long as the drive is being honest about failures and is reporting
> failures rapidly, the RAID array can do its work. What you
> absolutely don't want to see is I/O errors relating to the RAID array
> device (for example, with mdraid, /dev/md*), because that would
> presumably mean that the redundancy was insufficient to correct for
> the failure. If that happens, you are falling off a proverbial
> cliff.

Yeah, *that* would be indicative of current catastrophic failure. I have
not seen any messages related to the RAID array itself.


(For awareness: this is all a source of considerable psychological
stress to me, to an extent that is leaving me on the edge of physically
ill, and I am managing to remain on the good side of that line only by
minimizing my mental engagement with the issue as much as possible. I am
currently able to read and respond to these mails without pressing that
line, but that may change at any moment, and if so I will stop replying
without notice until things change again.)
signature.asc

David Christensen

unread,
Jan 9, 2024, 5:40:06 PMJan 9
to
On 1/9/24 05:11, The Wanderer wrote:

> I have an eight-drive RAID-6 array of 2TB SSDs, built
> back in early-to-mid 2021.


> Within the past few weeks, I got root-mail notifications from smartd
> that the ATA error count on two of the drives had increased ...

> On Sunday (two days ago), I got root-mail notifications from smartd
> about *all* of the drives in the array.

> One thing I don't know, which may or may not be important, is whether
> these alert mails are being triggered when the error-count increase
> happens, or when a scheduled check of some type is run.


Please do a full backup to a portable HDD ASAP. Put that HDD off-site.
Get another HDD and do a full backup. Then do incremental backups
daily. After a week, two weeks, or a month, swap the drives. At some
point, start destroying older backups to make room for new backups.


Please burn your most critical data to a high-quality optical media.
Enable checksums by some means (extended attributes, checksum file in
volume root, etc.). Validate the burn using the checksums. Then burn
and validate new critical data every week, two weeks, month, etc..
Validate checksums periodically.


AIUI smartd runs periodically via systemd, Perhaps another reader can
post the incantation required to display the settings and/or locate past
SMART reports on disk.


You can always run smartctl manual to get a SMART report whenever you
want (I like the --xall/-x option):

# smartctl -x DEV


> I've looked at the SMART attributes for the drives, and am having a hard
> time determining whether or not there's anything worth being actually
> concerned about here. Some of the information I'm seeing seems to
> suggest yes, but other information seems to suggest no.


Reading SMART reports has a learning curve. STFW for the terms you do
not understand. And, beware that different manufacturers with different
engineers make different long-term predictions based upon different
short-term test data.


Looking at SMART reports over time for the same drive, looking for
trends, and noticing problems is exactly the right thing to do. You and
smartd did good. :-)


> Most of the attributes are listed as of type "Old_age".


Samsung EVO 870 are good drives, but they are "consumer" drives -- e.g.
intended for laptop/ desktop computers that are powered off or
hibernating most of the time. The SMART report you attached showed a
"Power_On_Hours" attribute value of 22286. Assuming an operational
specification of 40 hours/week, that SSD has usage equivalent to 10.7
years. So, it is old.


> I don't know how to interpret the "Pre-fail" notation for the other
> attributes.


AIUI "Pre-fail" indicates the drive is going to fail soon and should be
replaced.


> My default plan is to identify an appropriate model and buy a pair of
> replacement drives, but not install them yet; buy another two drives
> every six months, until I have a full replacement set; and start failing
> drives out of the RAID array and installing replacements as soon as one
> either fails, or looks like it's imminently about to fail.


If you want 24x7 storage at minimum total cost of ownership, I suggest
3.5" enterprise HDD's. I buy "new" or "open box" older model drives on
eBay, the older the cheaper. They typically die within a month or run
for years. SAS has more features, yet can be cheaper (assuming you have
compatible hardware).


I prefer RAID-10 over RAID-5 or RAID-6 because IOPS scales up linearly
with the number of mirrors (spindles). So, if one mirror does 120 IOPS
(7200 RPM), two mirrors do 240 IOPS, three do 360 IOPS, etc.. Also,
resilvering is a direct disk-to-disk copy at sequential read and write
speeds. To get protection against two-device failure, you need 3-day
mirrors; or, a hot spare and a time delay longer than resilvering time
between failures.


Finally, depending upon your choice of RAID, volume management,
filesystem, etc., you might be able to re-use those SSD's as
accelerators -- read cache, write cache, metadata, etc.. (This is easy
on ZFS. Perhaps other readers with madm, LVM, btrfs, etc., can comment
on SSD acceleration for those.)


David

David Christensen

unread,
Jan 9, 2024, 5:50:06 PMJan 9
to
On 1/9/24 14:34, David Christensen wrote:

> You can always run smartctl manual ...

Correction: manually


> To get protection against two-device failure, you need 3-day mirrors ...

Correction: 3-way


> Perhaps other readers with madm, ...

Correction: mdadm


David

Michael Kjörling

unread,
Jan 10, 2024, 4:40:06 AMJan 10
to
On 9 Jan 2024 14:34 -0800, from dpch...@holgerdanske.com (David Christensen):
>> I don't know how to interpret the "Pre-fail" notation for the other
>> attributes.
>
> AIUI "Pre-fail" indicates the drive is going to fail soon and should be
> replaced.

Only if the attribute hits the "failure" threshold, whatever that
happens to be or mean for that particular attribute.

David Christensen

unread,
Jan 10, 2024, 10:40:07 AMJan 10
to
On 1/10/24 01:35, Michael Kjörling wrote:
> On 9 Jan 2024 14:34 -0800, from dpch...@holgerdanske.com (David Christensen):
>>> I don't know how to interpret the "Pre-fail" notation for the other
>>> attributes.
>>
>> AIUI "Pre-fail" indicates the drive is going to fail soon and should be
>> replaced.
>
> Only if the attribute hits the "failure" threshold, whatever that
> happens to be or mean for that particular attribute.


If you choose to run RAID member drives all the way to failure, then you
need to have a reasonable expectation that the remaining drives have
enough reliability and the RAID has enough redundancy to protect the
data until the sysadmin notices the failed drive(s), the sysadmin
replaces the failed drive(s), and the RAID resilvers.


Given the OP's situation -- 8 consumer SSD's, same make and model,
possibly from a defective manufacturing batch, all purchased at the same
time, all deployed in the same RAID-6, all run 2.5 years 24x7, and all
suddenly showing lots of SMART warnings -- I would not have confidence
in that RAID.


David

Curt

unread,
Jan 10, 2024, 12:10:06 PMJan 10
to
On 2024-01-10, David Christensen <dpch...@holgerdanske.com> wrote:
>
>
> Given the OP's situation -- 8 consumer SSD's, same make and model,
> possibly from a defective manufacturing batch, all purchased at the same
> time, all deployed in the same RAID-6, all run 2.5 years 24x7, and all
> suddenly showing lots of SMART warnings -- I would not have confidence
> in that RAID.

It's curious, but I just heard something on French TV from a journalist
that's relevant to this. She said she'd covered the aeronautics field in
the past and mentioned the *principe de dissemblance* (dissimilarity
principle). Critical redundant parts on aircraft, she claimed, would be
sourced from different manufacturers in order to obviate the possibility
of redundant failures you've raised here.
>
> David
>
>

Dan Ritter

unread,
Jan 10, 2024, 12:40:06 PMJan 10
to
I don't know whether that's true in aeronautics, but at the home
and small business scale, that's always something I've
practiced.

At the large scale, server assemblers don't want to mix parts
very often (you can get some of them to do it), so you usually
need your servers as a whole to be the unit of redundancy, not
disks in an array.

-dsr-

Michael Kjörling

unread,
Jan 10, 2024, 12:40:06 PMJan 10
to
On 10 Jan 2024 17:07 -0000, from cu...@free.fr (Curt):
> It's curious, but I just heard something on French TV from a journalist
> that's relevant to this. She said she'd covered the aeronautics field in
> the past and mentioned the *principe de dissemblance* (dissimilarity
> principle). Critical redundant parts on aircraft, she claimed, would be
> sourced from different manufacturers in order to obviate the possibility
> of redundant failures you've raised here.

Indeed. My understanding is that it's even relatively common, at least
for flight-critical components, to use totally different
implementations (of both hardware and software), not just sourced from
different vendors, resellers or batches, such that the same software
bug _cannot_ reasonably appear in both, reducing the scope of software
errors to _specification_ bugs, which an inherently engineering field
(physical engineering, fluid dynamics, ...) is better equipped to deal
with early. Recent events notwithstanding.

As for David's note on OP's RAID array, I think that point has been
sufficiently made by now in this thread; and let's hope that the new
backup drive arrives soon enough that a full copy can be made before
there is any actual data loss.

David Christensen

unread,
Jan 10, 2024, 9:20:06 PMJan 10
to
https://en.wikipedia.org/wiki/Failure_analysis


Using components from different vendors is a known mitigation technique
and can help in the right situations.


Relevant to this this thread, some people use disk drives from different
manufacturers in their RAID's. Doing so with RAID-10 (stripe of
mirrors) is straight forward -- within each mirror, use brand A for the
first disk, brand B for the second, brand C for the third (or hot
spares), etc.. It then makes sense to do the same with HBA's -- use HBA
brand X for the first disks in each mirror, HBA brand Y for the second
disks, HBA brand Z for the third/ spare disks, etc.. For x86
workstations and servers, ECC memory, dual network interfaces, and dual
power supplies come to mind. I am unclear about dual processors and/or
dual memory banks. Moving beyond one computer, the process continues
with KVM/ serial console fabric, networks, electric power, cooling,
etc.. It's just a question of what failure modes you want to protect
against and how much time and money you want to spend.


David

David Christensen

unread,
Jan 10, 2024, 9:30:06 PMJan 10
to
On 1/10/24 09:30, Michael Kjörling wrote:

> My understanding is that it's even relatively common, at least
> for flight-critical components, to use totally different
> implementations (of both hardware and software), not just sourced from
> different vendors, resellers or batches, such that the same software
> bug _cannot_ reasonably appear in both, reducing the scope of software
> errors to _specification_ bugs, which an inherently engineering field
> (physical engineering, fluid dynamics, ...) is better equipped to deal
> with early. Recent events notwithstanding.


Erlang has a different and interesting philosophy to software systems:

https://medium.com/pragmatic-programmers/error-handling-philosophy-d820bd68a469


David

Dan Ritter

unread,
Jan 11, 2024, 9:10:06 AMJan 11
to
David Christensen wrote:
> On 1/10/24 09:07, Curt wrote:
> > On 2024-01-10, David Christensen <dpch...@holgerdanske.com> wrote:
>
> dual network interfaces, and dual power supplies come to mind. I am unclear
> about dual processors and/or dual memory banks. Moving beyond one computer,

There are no systems that I'm aware of which allow you to use 2
or more processors of different models; they always have to be
exact duplicates. Sometimes different step revisions of the same
model will not work -- if Intel makes a Xeon 5254 in March, and
fixes things in June, August and November, sometimes the
November release won't work perfectly with a chip produced in
March.

You can always use identically spec'd RAM from different
manufacturers in different memory banks, but since it's always
possible to power down, replace or just remove memory, and power
up again, I don't know that there's any reason to bother
distributing the manufacturers in a single machine.

-dsr-

David Christensen

unread,
Jan 11, 2024, 10:10:07 AMJan 11
to
On 1/11/24 05:50, Dan Ritter wrote:
> David Christensen wrote:
>> dual network interfaces, and dual power supplies come to mind. I am
>> unclear about dual processors and/or dual memory banks.
>
> There are no systems that I'm aware of which allow you to use 2
> or more processors of different models; they always have to be
> exact duplicates. Sometimes different step revisions of the same
> model will not work -- if Intel makes a Xeon 5254 in March, and
> fixes things in June, August and November, sometimes the
> November release won't work perfectly with a chip produced in
> March.


Okay.


> You can always use identically spec'd RAM from different
> manufacturers in different memory banks, but since it's always
> possible to power down, replace or just remove memory, and power
> up again, I don't know that there's any reason to bother
> distributing the manufacturers in a single machine.


Okay.


STFW the Dell PowerEdge 6850 (circa 2004) featured "hot plug" disk
drives, expansion slots, memory risers, power supplies, and system
cooling fans:

https://downloads.dell.com/manuals/all-products/esuprt_ser_stor_net/esuprt_poweredge/poweredge-6850_user%27s%20guide4_en-us.pdf


STFW dell.com today, I see servers with:

* hot plug hard drives
* hot spare hard drives
* dual hot plug redundant power supplies
* dual hot plug fully redundant power supplies
* dual hot plug fault tolerant power supplies
* dual hot plug fault tolerant redundant power supplies.


This Dell article explains some of the PSU options:

https://infohub.delltechnologies.com/p/full-redundancy-vs-fault-tolerant-redundancy-for-poweredge-server-psus/


David

Dan Ritter

unread,
Jan 11, 2024, 1:30:06 PMJan 11
to
David Christensen wrote:
> On 1/11/24 05:50, Dan Ritter wrote:
> > David Christensen wrote:
> STFW the Dell PowerEdge 6850 (circa 2004) featured "hot plug" disk drives,
> expansion slots, memory risers, power supplies, and system cooling fans:
>
> https://downloads.dell.com/manuals/all-products/esuprt_ser_stor_net/esuprt_poweredge/poweredge-6850_user%27s%20guide4_en-us.pdf
>
>
> STFW dell.com today, I see servers with:
>
> * hot plug hard drives
> * hot spare hard drives
> * dual hot plug redundant power supplies
> * dual hot plug fully redundant power supplies
> * dual hot plug fault tolerant power supplies
> * dual hot plug fault tolerant redundant power supplies.

Hot plug disks are easy -- SATA, SAS and NVMe U.2 interfaces are all
specified so that the chassis manufacturer can arrange for data
to be disconnected before power. This is nearly but not quite ubiquitous
in rackmountable servers; somewhat rare in desktops.

Hot plugged fans are extremely easy -- there's no data being
stored and no state of any consequence. Arranging an easy
disconnect mechanism is the maximum difficulty.

Hot spare is more a property of the RAID management software.
mdadm, btrfs and zfs all support marking a disk as 'spare' and
not using it until another disk is marked as failing.

Interestingly: all PCIe cards are nominally hot-pluggable.

-dsr-

Stefan Monnier

unread,
Jan 11, 2024, 3:30:05 PMJan 11
to
> manufacturers in different memory banks, but since it's always
> possible to power down, replace or just remove memory, and power
> up again,

Hmm... "always"? What about long running computations like that
simulation (or LLM training) launched a month ago and that's expected to
finish in another month or so?

Some mainframes have supported hot (un)plugging RAM modules as well and
I wouldn't be surprised if some x86 servers also support it nowadays.


Stefan

Michael Stone

unread,
Jan 11, 2024, 6:00:06 PMJan 11
to
On Thu, Jan 11, 2024 at 03:25:51PM -0500, Stefan Monnier wrote:
>> manufacturers in different memory banks, but since it's always
>> possible to power down, replace or just remove memory, and power
>> up again,
>
>Hmm... "always"? What about long running computations like that
>simulation (or LLM training) launched a month ago and that's expected to
>finish in another month or so?

I'd expect something like that to have a checkpoint/restart capability
if not starting over actually matters.

>Some mainframes have supported hot (un)plugging RAM modules as well

Yes, mainframes have been engineered that way for a long time. It makes
them very expensive, and their market share has been declining for
decades because most problems can be solved more cheaply in software
(even while maintaining high availability). Hot *spare* memory is
relatively common, as it solves most problems without the complexity of
hot *swapping*, at the (generally low) cost of having to schedule
downtime at some point in the future to actually replace the failed
module.

Dan Ritter

unread,
Jan 12, 2024, 6:20:06 AMJan 12
to
Stefan Monnier wrote:
> > manufacturers in different memory banks, but since it's always
> > possible to power down, replace or just remove memory, and power
> > up again,
>
> Hmm... "always"? What about long running computations like that
> simulation (or LLM training) launched a month ago and that's expected to
> finish in another month or so?

If the job is that big, it's being run on multiple machines. This
machine's current chunk is corrupt, so you can't use it anyway.
The orchestrator stops using this machine, someone comes in to
replace the RAM. Later the machine is re-added to the pool.


> Some mainframes have supported hot (un)plugging RAM modules as well and
> I wouldn't be surprised if some x86 servers also support it nowadays.

https://www.kernel.org/doc/html/latest/admin-guide/mm/memory-hotplug.html

That said, you won't find this feature without specifying it
when you buy it, and very few have a use case for it.

-dsr-

The Wanderer

unread,
Feb 14, 2024, 10:00:08 PMFeb 14
to
TL;DR: It worked! I'm back up and running, with what appears to be all
my data safely recovered from the failing storage stack!


On 2024-01-09 at 14:22, The Wanderer wrote:

> On 2024-01-09 at 14:01, Michael Kjörling wrote:
>
>> On 9 Jan 2024 13:25 -0500, from wand...@fastmail.fm (The
>> Wanderer):

In the time since this, I continued mostly-normal but somewhat-curtailed
use of the system, and saw few messages about these matters that did not
arise from attempts to back up the data for later recovery purposes.

> (For awareness: this is all a source of considerable psychological
> stress to me, to an extent that is leaving me on the edge of
> physically ill, and I am managing to remain on the good side of that
> line only by minimizing my mental engagement with the issue as much
> as possible. I am currently able to read and respond to these mails
> without pressing that line, but that may change at any moment, and if
> so I will stop replying without notice until things change again.)

This need to stop reading wound up happening almost immediately after I
sent the message to which I am replying.

I now, however, have good news to report back: after more than a month,
at least one change of plans, nearly $2200 in replacement hard drives,
much nervous stress, several days of running data copies to and from a
20+-terabyte mechanical hard drive over USB, and a complete manual
removal of my old 8-drive RAID-6 array and build of a new 6-drive RAID-6
array (and of the LVM structure on top of it), I now appear to have
complete success.

I am now running on a restored copy of the data on the affected
partitions, taken from a nearly-fully-shut-down system state, which is
sitting on a new RAID-6 array built on what I understand to be
data-center-class SSDs (which should, therefore, be more suitable to the
24/7-uptime read-mostly workload I expect of my storage). The current
filesystems involved are roughly the same size as the ones previously in
use, but the underlying drives are nearly 2x the size; I decided to
leave the extra capacity for later allocation via LVM, if and when I may
need it.


I did my initial data backup to the external drive, from a
still-up-and-running system, via rsnapshot. Attempting to do a second
rsnapshot, however, failed at the 'cp -al' stage with "too many
hardlinks" errors. It turns out that there is a hard limit of 65000
hardlinks per on-disk file; I had so many files already hardlinked
together on the source filesystem that trying to hardlink each one to
just as many new names as there were already hardlinks for that file ran
into that limit.

(The default rsnapshot configuration doesn't preserve hardlinks,
possibly in order to avoid this exact problem - but that isn't viable
for the case I had at hand, because in some cases I *need* to preserve
the hardlink status, and because without that deduplication there
wouldn't have been enough space on the drive for more than the single
copy, in which case there'd be very little point in using rsnapshot
rather than just rsync.)

In the end, after several flailing-around attempts to minimize or
mitigate that problem, I wound up moving the initial external copy of
the biggest hardlink-deduplicated tree (which is essentially 100%
read-only at this point; it's backup copies of an old system state,
preserved since one of those copies has corrupted data and I haven't yet
been able to confirm that all of the files in my current copy of that
data were taken from the non-corrupt version) out of the way, shutting
down all parts of the system that might be writing to the affected
filesystems, and manually copying out the final state of the *other*
parts of those filesystems via rsync, bypassing rsnapshot. That was on
Saturday the 10th.

Then I grabbed copies of various metadata about the filesystems, the
LVM, and the mdraid config; modified /etc/fstab to not mount them;
deactivated the mdraid, and commented it out of /etc/mdadm/mdadm.conf;
updated the initramfs; shut down; pulled all eight Samsung 870 EVO
drives; installed six brand-new Intel data-center-class (or so I gather)
SSDs; booted up; partitioned the new drives based on the data I had
about what config the Debian installer put in place when creating the
mdraid config on the old ones; created a new mdraid RAID-6 array on
them, based on the copied metadata; created a new LVM stack on top of
that, based on *that* copied metadata; created new filesystems on top of
that, based on *that* copied metadata; rsync'ed the data in from the
manually-created external backup; adjusted /etc/fstab and
/etc/mdadm/mdadm.conf to reflect the new UUID and names of the new
storage configuration; updated the initramfs; and rebooted. Given delay
times for the drives to arrive and for various data-validation and
plan-double-checking steps to complete, the end of that process happened
this afternoon.

And it appears to Just Work. I haven't examined all the data to validate
that it's in good condition, obviously (since there's nearly 3TB of it),
but the parts I use on a day-to-day basis are all looking exactly the
way they should be. It appears that the cross-drive redundancy of the
RAID-6 array was enough to have avoided avoid data loss from the
scattered read failures of the underlying drives before I could get the
data out.

(This does leave me without having restored the read-only backup data
from the old system state. I care less about that; I'll want it
eventually, but it isn't important enough to warrant postponing getting
the system back in working order.)


I do still want/need to figure out what to do about an *actual* backup
system, to external storage, since the rsnapshot thing apparently isn't
going to be viable for my circumstance and use case. There is, however,
now *time* to work on doing that, without living under the shadow of a
known immediate/imminent data-loss hardware failure.

I also do mean to read the rest of the replies in this thread, now that
doing so is unlikely to aggravate my stress heartburn...
signature.asc

songbird

unread,
Feb 15, 2024, 1:20:05 AMFeb 15
to
The Wanderer wrote:
> TL;DR: It worked! I'm back up and running, with what appears to be all
> my data safely recovered from the failing storage stack!
...

i'm glad you got it back up and running and i hope all your
data is intact. :)

which SSDs did you use?


songbird

David Christensen

unread,
Feb 15, 2024, 3:20:07 AMFeb 15
to
On 2/14/24 18:54, The Wanderer wrote:
> TL;DR: It worked! I'm back up and running, with what appears to be all
> my data safely recovered from the failing storage stack!


That is good to hear. :-)


> On 2024-01-09 at 14:22, The Wanderer wrote:
>
>> On 2024-01-09 at 14:01, Michael Kjörling wrote:
>>
>>> On 9 Jan 2024 13:25 -0500, from wand...@fastmail.fm (The
>>> Wanderer):
>
>>>> I've ordered a 22TB external drive


Make? Model? How it is interfaced to your computer?
Migrating large amounts of data from one storage configuration to
another storage configuration is non-trivial. Anticipating problems and
preparing for them ahead of time (e.g. backups) makes it even less
trivial. The last time I lost data was during a migration when I had
barely enough hardware. I made a conscious decision to always have a
surplus of hardware.


>> (For awareness: this is all a source of considerable psychological
>> stress to me, to an extent that is leaving me on the edge of
>> physically ill, and I am managing to remain on the good side of that
>> line only by minimizing my mental engagement with the issue as much
>> as possible. I am currently able to read and respond to these mails
>> without pressing that line, but that may change at any moment, and if
>> so I will stop replying without notice until things change again.)
>
> This need to stop reading wound up happening almost immediately after I
> sent the message to which I am replying.


I remember reading your comment and then noticing you went silent. I
apologize if I pushed your button.


> I now, however, have good news to report back: after more than a month,
> at least one change of plans, nearly $2200 in replacement hard drives,


Ouch.


If you have a processor, memory, PCIe slot, and HBA to match those
SSD's, the performance of those SSD's should be very nice.


> much nervous stress, several days of running data copies to and from a
> 20+-terabyte mechanical hard drive over USB, and a complete manual
> removal of my old 8-drive RAID-6 array and build of a new 6-drive RAID-6
> array (and of the LVM structure on top of it), I now appear to have
> complete success.
>
> I am now running on a restored copy of the data on the affected
> partitions, taken from a nearly-fully-shut-down system state, which is
> sitting on a new RAID-6 array built on what I understand to be
> data-center-class SSDs (which should, therefore, be more suitable to the
> 24/7-uptime read-mostly workload I expect of my storage). The current
> filesystems involved are roughly the same size as the ones previously in
> use, but the underlying drives are nearly 2x the size; I decided to
> leave the extra capacity for later allocation via LVM, if and when I may
> need it.


When I was thinking about building md RAID, and then ZFS, I worried
about having enough capacity for my data. Now I worry about
zfs-auto-snapshot(8), daily backups, monthly archives, monthly images,
etc., clogging my ZFS pools.


The key concept is "data lifetime". (Or alternatively, "destruction
policy".)


> I did my initial data backup to the external drive, from a
> still-up-and-running system, via rsnapshot. Attempting to do a second
> rsnapshot, however, failed at the 'cp -al' stage with "too many
> hardlinks" errors. It turns out that there is a hard limit of 65000
> hardlinks per on-disk file;


65,000 hard links seems to be an ext4 limit:

https://www.linuxquestions.org/questions/linux-kernel-70/max-hard-link-per-file-on-ext4-4175454538/#post4914624


I believe ZFS can do more hard links. (Much more? Limited by available
storage space?)


> I had so many files already hardlinked
> together on the source filesystem that trying to hardlink each one to
> just as many new names as there were already hardlinks for that file ran
> into that limit.
>
> (The default rsnapshot configuration doesn't preserve hardlinks,
> possibly in order to avoid this exact problem - but that isn't viable
> for the case I had at hand, because in some cases I *need* to preserve
> the hardlink status, and because without that deduplication there
> wouldn't have been enough space on the drive for more than the single
> copy, in which case there'd be very little point in using rsnapshot
> rather than just rsync.)


ZFS provides similarly useful results with built-in compression and
de-duplication.


> In the end, after several flailing-around attempts to minimize or
> mitigate that problem, I wound up moving the initial external copy of
> the biggest hardlink-deduplicated tree (which is essentially 100%
> read-only at this point; it's backup copies of an old system state,
> preserved since one of those copies has corrupted data and I haven't yet
> been able to confirm that all of the files in my current copy of that
> data were taken from the non-corrupt version)


That sounds like an N-way merge problem -- old file system, multiple old
backups, and current file system as inputs, all merged into an updated
current file system as output. LVM snapshots, jdupes(1), and your
favorite scripting language come to mind. Take good notes and be
prepared to rollback at any step.


> out of the way, shutting
> down all parts of the system that might be writing to the affected
> filesystems, and manually copying out the final state of the *other*
> parts of those filesystems via rsync, bypassing rsnapshot. That was on
> Saturday the 10th.
>
> Then I grabbed copies of various metadata about the filesystems, the
> LVM, and the mdraid config; modified /etc/fstab to not mount them;
> deactivated the mdraid, and commented it out of /etc/mdadm/mdadm.conf;
> updated the initramfs; shut down; pulled all eight Samsung 870 EVO
> drives; installed six brand-new Intel data-center-class (or so I gather)
> SSDs;


Which model? What size?


> booted up; partitioned the new drives based on the data I had
> about what config the Debian installer put in place when creating the
> mdraid config on the old ones; created a new mdraid RAID-6 array on
> them, based on the copied metadata; created a new LVM stack on top of
> that, based on *that* copied metadata; created new filesystems on top of
> that, based on *that* copied metadata; rsync'ed the data in from the
> manually-created external backup; adjusted /etc/fstab and
> /etc/mdadm/mdadm.conf to reflect the new UUID and names of the new
> storage configuration; updated the initramfs; and rebooted. Given delay
> times for the drives to arrive and for various data-validation and
> plan-double-checking steps to complete, the end of that process happened
> this afternoon.
>
> And it appears to Just Work. I haven't examined all the data to validate
> that it's in good condition, obviously (since there's nearly 3TB of it),
> but the parts I use on a day-to-day basis are all looking exactly the
> way they should be. It appears that the cross-drive redundancy of the
> RAID-6 array was enough to have avoided avoid data loss from the
> scattered read failures of the underlying drives before I could get the
> data out.


Data integrity validation is tough without a mechanism. Adding an
rsnapshot(1) postexec MD5SUMS, etc., file into the root of each backup
tree could solve this need, but could waste a lot of time and energy
checksumming files that have not changed.


One of the reasons I switched to ZFS was because ZFS has built-in data
and metadata integrity checking (and repair; depending upon redundancy).


> (This does leave me without having restored the read-only backup data
> from the old system state. I care less about that; I'll want it
> eventually, but it isn't important enough to warrant postponing getting
> the system back in working order.)
>
>
> I do still want/need to figure out what to do about an *actual* backup
> system, to external storage, since the rsnapshot thing apparently isn't
> going to be viable for my circumstance and use case. There is, however,
> now *time* to work on doing that, without living under the shadow of a
> known immediate/imminent data-loss hardware failure.


rsync(1) should be able to copy backups onto an external HDD.


If your chassis has an available 5.25" half-height external drive bay
and you have an available SATA 6 Gbps port, mobile racks are a more
reliable connection than USB for 3.5" HDD's because there are no cables
to bump or power adapters to fail or unplug:

https://www.startech.com/en-us/hdd/drw150satbk


> I also do mean to read the rest of the replies in this thread, now that
> doing so is unlikely to aggravate my stress heartburn...


Okay.


David

debia...@howorth.org.uk

unread,
Feb 15, 2024, 7:20:06 AMFeb 15
to
The Wanderer <wand...@fastmail.fm> wrote:

> It turns out that there is a hard limit of 65000
> hardlinks per on-disk file;

That's a filesystem dependent value. That's the value for ext4.

XFS has a much larger limit I believe. As well as some other helpful
properties for large filesystems.

btrfs has different limits, depending on where the hardlinks are,
apparently. Some larger, some ridiculously smaller.

The Wanderer

unread,
Feb 15, 2024, 10:10:08 AMFeb 15
to
On 2024-02-15 at 07:14, debia...@howorth.org.uk wrote:

> The Wanderer <wand...@fastmail.fm> wrote:
>
>> It turns out that there is a hard limit of 65000 hardlinks per
>> on-disk file;
>
> That's a filesystem dependent value. That's the value for ext4.

I think I recall reading that while I was flailing over this, yes. ext4
is what I use for daily-driver purposes these days; from the little I've
looked into the matter, everything else seems to be either too
complicated, or too non-robust, to be worth risking my live data on.

> XFS has a much larger limit I believe. As well as some other helpful
> properties for large filesystems.
>
> btrfs has different limits, depending on where the hardlinks are,
> apparently. Some larger, some ridiculously smaller.

So it might make sense to use one of those as the underpinning for
whatever external system I wind up setting up for tiered backup, then.
Though experimentation to determine the limits would be warranted.

That's not immediately actionable, but it's good to have in the
background as planning etc. takes place.
signature.asc

The Wanderer

unread,
Feb 15, 2024, 10:50:05 AMFeb 15
to
On 2024-02-15 at 01:18, songbird wrote:

> The Wanderer wrote:
>
>> TL;DR: It worked! I'm back up and running, with what appears to be
>> all my data safely recovered from the failing storage stack!
>
> i'm glad you got it back up and running and i hope all your data is
> intact. :)

Thank you. It's quite a relief on my end as well.

> which SSDs did you use?

The model name/number isn't terribly meaningful-looking. I gave it in my
reply to David, fairly deep in the wall of text. They're Intel 3.84Ti?B
SSDs, reportedly intended for server or data-center use.
signature.asc

The Wanderer

unread,
Feb 15, 2024, 10:50:07 AMFeb 15
to
On 2024-02-15 at 03:09, David Christensen wrote:

> On 2/14/24 18:54, The Wanderer wrote:
>
>> TL;DR: It worked! I'm back up and running, with what appears to be
>> all my data safely recovered from the failing storage stack!
>
> That is good to hear. :-)
>
>> On 2024-01-09 at 14:22, The Wanderer wrote:
>>
>>> On 2024-01-09 at 14:01, Michael Kjörling wrote:
>>>
>>>> On 9 Jan 2024 13:25 -0500, from wand...@fastmail.fm (The
>>>> Wanderer):
>>
>>>>> I've ordered a 22TB external drive
>
> Make? Model? How it is interfaced to your computer?

It's a WD Elements 20TB drive (I'm not sure where I got the 22 from);
the back of the case has the part number WDBWLG0200HBK-X8 (or possibly
-XB, the font is kind of ambiguous). The connection, per the packaging
label, is USB-3.

>> In the time since this, I continued mostly-normal but
>> somewhat-curtailed use of the system, and saw few messages about
>> these matters that did not arise from attempts to back up the data
>> for later recovery purposes.
>
> Migrating large amounts of data from one storage configuration to
> another storage configuration is non-trivial. Anticipating problems
> and preparing for them ahead of time (e.g. backups) makes it even
> less trivial. The last time I lost data was during a migration when
> I had barely enough hardware. I made a conscious decision to always
> have a surplus of hardware.

The big change of plans in the middle of my month-plus process was the
decision to replace the entire 8-drive array with a 6-drive array, and
the reason for that was because the 8-drive array left me with no open
SATA ports to be able to connect spare drives in order to do drive
replacements without needing to rebuild the whole shaboozle.

I don't currently have a surplus of hardware (see the $2200 it already
cost me for the replacement drives I have), but I also haven't yet
initiated a warranty claim on the 870 EVO drives, and it seems possible
that that process might leave me with either replacement drives on that
front or just plain money (even if from selling the replacement drives
on e.g. eBay) with which to purchase spare-able hardware.

>>> (For awareness: this is all a source of considerable
>>> psychological stress to me, to an extent that is leaving me on
>>> the edge of physically ill, and I am managing to remain on the
>>> good side of that line only by minimizing my mental engagement
>>> with the issue as much as possible. I am currently able to read
>>> and respond to these mails without pressing that line, but that
>>> may change at any moment, and if so I will stop replying without
>>> notice until things change again.)
>>
>> This need to stop reading wound up happening almost immediately
>> after I sent the message to which I am replying.
>
> I remember reading your comment and then noticing you went silent. I
> apologize if I pushed your button.

As far as I know you didn't. I don't think I even read any of the
replies after sending that message, and if I did, I don't remember any
of them having this type of impact; it was just the holistic stress of
the entire situation.

>> I now, however, have good news to report back: after more than a
>> month, at least one change of plans, nearly $2200 in replacement
>> hard drives,
>
> Ouch.

Yeah. The cost factor is why I was originally planning to spread this
out over time, buying two drives a month until I had enough to replace
drives one at a time in the 8-drive array. I eventually decided that -
especially with the rsnapshot tiered backups turning out not to be
viable, because of the hardlinks thing - the risk factor of stretching
things out further wasn't going to be worth the benefit.

IIRC, the drives were actually $339 apiece, which would put the total
price for six in the $2030-$2040 range; sales tax and shipping costs
were what put it up to nearly $2200.

> If you have a processor, memory, PCIe slot, and HBA to match those
> SSD's, the performance of those SSD's should be very nice.

The CPU is a Ryxen 5 5600X. The RAM is G-Skill DDR4 2666MHz, in two 32GB
DIMMs. I don't know how to assess PCIe slots and HBA, but the
motherboard is an Asus ROG Crosshair VIII Dark Hero, which I think was
the top-of-the-line enthusiast motherboard (with the port set my
criteria called for) the year I built this machine.

I'm pretty sure my performance bottleneck for most things is the CPU (or
the GPU, where that comes into play, which here it doesn't);
storage-wise this seems so far to be at least as fast as what I had
before, but it's hard to tell if it's faster.

>> much nervous stress, several days of running data copies to and
>> from a 20+-terabyte mechanical hard drive over USB, and a complete
>> manual removal of my old 8-drive RAID-6 array and build of a new
>> 6-drive RAID-6 array (and of the LVM structure on top of it), I now
>> appear to have complete success.
>>
>> I am now running on a restored copy of the data on the affected
>> partitions, taken from a nearly-fully-shut-down system state, which
>> is sitting on a new RAID-6 array built on what I understand to be
>> data-center-class SSDs (which should, therefore, be more suitable
>> to the 24/7-uptime read-mostly workload I expect of my storage).
>> The current filesystems involved are roughly the same size as the
>> ones previously in use, but the underlying drives are nearly 2x the
>> size; I decided to leave the extra capacity for later allocation
>> via LVM, if and when I may need it.
>
> When I was thinking about building md RAID, and then ZFS, I worried
> about having enough capacity for my data. Now I worry about
> zfs-auto-snapshot(8), daily backups, monthly archives, monthly
> images, etc., clogging my ZFS pools.
>
> The key concept is "data lifetime". (Or alternatively, "destruction
> policy".)

I can see that for when you have a tiered backup structure, and are
looking at the lifetimes of each backup copy. For my live system, my
intended data lifetime (outside of caches and data kept in /tmp) is
basically "forever".

>> I did my initial data backup to the external drive, from a
>> still-up-and-running system, via rsnapshot. Attempting to do a
>> second rsnapshot, however, failed at the 'cp -al' stage with "too
>> many hardlinks" errors. It turns out that there is a hard limit of
>> 65000 hardlinks per on-disk file;
>
> 65,000 hard links seems to be an ext4 limit:
>
> https://www.linuxquestions.org/questions/linux-kernel-70/max-hard-link-per-file-on-ext4-4175454538/#post4914624

That sounds right.

> I believe ZFS can do more hard links. (Much more? Limited by
> available storage space?)

I'm not sure, but I'll have to look into that, when I get to the point
of trying to set up that tiered backup.

>> I had so many files already hardlinked together on the source
>> filesystem that trying to hardlink each one to just as many new
>> names as there were already hardlinks for that file ran into that
>> limit.
>>
>> (The default rsnapshot configuration doesn't preserve hardlinks,
>> possibly in order to avoid this exact problem - but that isn't
>> viable for the case I had at hand, because in some cases I *need*
>> to preserve the hardlink status, and because without that
>> deduplication there wouldn't have been enough space on the drive
>> for more than the single copy, in which case there'd be very little
>> point in using rsnapshot rather than just rsync.)
>
> ZFS provides similarly useful results with built-in compression and
> de-duplication.

I have the impression that there are risk and/or complexity aspects to
it which make it less attractive as a choice, but those features do
sound appealing. I will have to look into it, when I get to that point.

>> In the end, after several flailing-around attempts to minimize or
>> mitigate that problem, I wound up moving the initial external copy
>> of the biggest hardlink-deduplicated tree (which is essentially
>> 100% read-only at this point; it's backup copies of an old system
>> state, preserved since one of those copies has corrupted data and I
>> haven't yet been able to confirm that all of the files in my
>> current copy of that data were taken from the non-corrupt version)
>
> That sounds like an N-way merge problem -- old file system, multiple
> old backups, and current file system as inputs, all merged into an
> updated current file system as output. LVM snapshots, jdupes(1), and
> your favorite scripting language come to mind. Take good notes and
> be prepared to rollback at any step.

It does sound like that, yes. I'm already aware of jdupes, and of a few
other tools (part of the work I already did in getting this far was
rdfind, which is what I used to set up much of the hardlink
deduplication that wound up biting me in the butt), but have not
investigated LVM snapshot - and the idea of trying to script something
like this, without an existing known-safe copy of the data to fall back
on, leaves me *very* nervous.

Figuring out how to be prepared to roll back is the other uncertain and
nervous-making part. In some cases it's straightforward enough, but
doing it at the scale of the size of those copies is at best daunting.

>> out of the way, shutting down all parts of the system that might be
>> writing to the affected filesystems, and manually copying out the
>> final state of the *other* parts of those filesystems via rsync,
>> bypassing rsnapshot. That was on Saturday the 10th.
>>
>> Then I grabbed copies of various metadata about the filesystems,
>> the LVM, and the mdraid config; modified /etc/fstab to not mount
>> them; deactivated the mdraid, and commented it out of
>> /etc/mdadm/mdadm.conf; updated the initramfs; shut down; pulled all
>> eight Samsung 870 EVO drives; installed six brand-new Intel
>> data-center-class (or so I gather) SSDs;
>
> Which model? What size?

lshw says they're INTEL SSDSCK2B03. The packaging says SSDSCK2B038T801.

IIRC, the product listing said they were 3.84 TB (or possibly TiB). lshw
says 'size: 3567GiB (3840GB)'. IIRC, the tools I used to partition them
and build the mdraid and so forth said 3.84 TB/TiB (not sure which), or
3840 GB/GiB (same).

For comparison, the 870 EVO drives - which were supposed to be 2TB
apiece - were reported by some of those same tools as exactly 2000 of
the same unit.

This does mean that I have more total space available in the new array
than in the old one, but I've tried to allocate only as much space as
was in the old array, insofar as I could figure out how to do that in
the limited environment I was working in. (The old array and/or LV setup
had sizes listed along the lines of '<10TiB', but my best attempt at
replicating it gave something which reports sizes along the lines of
'10TiB', so I suspect that my current setup is actually slightly too
large to fit on the old disks.)

>> booted up; partitioned the new drives based on the data I had about
>> what config the Debian installer put in place when creating the
>> mdraid config on the old ones; created a new mdraid RAID-6 array
>> on them, based on the copied metadata; created a new LVM stack on
>> top of that, based on *that* copied metadata; created new
>> filesystems on top of that, based on *that* copied metadata;
>> rsync'ed the data in from the manually-created external backup;
>> adjusted /etc/fstab and /etc/mdadm/mdadm.conf to reflect the new
>> UUID and names of the new storage configuration; updated the
>> initramfs; and rebooted. Given delay times for the drives to arrive
>> and for various data-validation and plan-double-checking steps to
>> complete, the end of that process happened this afternoon.
>>
>> And it appears to Just Work. I haven't examined all the data to
>> validate that it's in good condition, obviously (since there's
>> nearly 3TB of it), but the parts I use on a day-to-day basis are
>> all looking exactly the way they should be. It appears that the
>> cross-drive redundancy of the RAID-6 array was enough to have
>> avoided avoid data loss from the scattered read failures of the
>> underlying drives before I could get the data out.
>
> Data integrity validation is tough without a mechanism. Adding an
> rsnapshot(1) postexec MD5SUMS, etc., file into the root of each
> backup tree could solve this need, but could waste a lot of time and
> energy checksumming files that have not changed.

AFAIK, all such things require you to be starting from a point with a
known-good copy of the data, which is a luxury I don't currently have
(as far as validating my current data goes). It's something to keep in
mind when planning a more proper backup system, however.

> One of the reasons I switched to ZFS was because ZFS has built-in
> data and metadata integrity checking (and repair; depending upon
> redundancy).

I'm not sure I understand how this would be useful in the case I have at
hand; that probably means that I'm not understanding the picture properly.

>> (This does leave me without having restored the read-only backup
>> data from the old system state. I care less about that; I'll want
>> it eventually, but it isn't important enough to warrant postponing
>> getting the system back in working order.)
>>
>>
>> I do still want/need to figure out what to do about an *actual*
>> backup system, to external storage, since the rsnapshot thing
>> apparently isn't going to be viable for my circumstance and use
>> case. There is, however, now *time* to work on doing that, without
>> living under the shadow of a known immediate/imminent data-loss
>> hardware failure.
>
> rsync(1) should be able to copy backups onto an external HDD.

Yeah, but that only provides one tier of backup; the advantage of
rsnapshot (or similar) is the multiple deduplicated tiers, which gives
you options if it turns out the latest backup already included the
damage you're trying to recover from.

> If your chassis has an available 5.25" half-height external drive bay
> and you have an available SATA 6 Gbps port, mobile racks are a more
> reliable connection than USB for 3.5" HDD's because there are no
> cables to bump or power adapters to fail or unplug:
>
> https://www.startech.com/en-us/hdd/drw150satbk

I don't think it does have one; at least from the outside, I don't see
any 5.25" bays on the case at all. I know I didn't include an internal
optical drive when building this system, and while part of that was lack
of free SATA ports, the lack of such an exposed bay would also have been
a contributing factor.

(USB-3 will almost certainly not be a viable option for an automatic
scheduled backup of the sort rsnapshot's documentation suggests, because
the *fastest* backup cycle I saw from my working with the data I had was
over three hours, and the initial pass to copy the data out to the drive
in the first place took nearly *20* hours. A cron job to run even an
incremental backup even once a day, much less the several times a day
suggested for the deeper rsnapshot tiers, would not be *remotely*
workable in that sort of environment. Though on the flip side, that's
not just a USB-3 bottleneck, but also the bottleneck of the spinning
mechanical hard drive inside the external case...)
signature.asc

The Wanderer

unread,
Feb 15, 2024, 10:50:07 AMFeb 15
to
I remember, in my previous job (back in the oughts, now), one occasion
on which I was going around adding RAM to various desktop computers in
the area under my purview, by adding more DIMMs to the open slots - and
discovering, when I put the case back together on one of those computers
and went to power it back on, that *it was already powered on and the
system was still booted*.

Surprisingly, none of the hardware showed any sign of damage, and the
system recognized the RAM just fine after a reboot. But it was a bit of
a jolt at the time to realize that I'd just done parts surgery, however
mild, on a powered and running system.
signature.asc

Michael Kjörling

unread,
Feb 15, 2024, 12:40:05 PMFeb 15
to
On 15 Feb 2024 10:41 -0500, from wand...@fastmail.fm (The Wanderer):
>> 65,000 hard links seems to be an ext4 limit:
>>
>> https://www.linuxquestions.org/questions/linux-kernel-70/max-hard-link-per-file-on-ext4-4175454538/#post4914624
>
> That sounds right.
>
>> I believe ZFS can do more hard links. (Much more? Limited by
>> available storage space?)
>
> I'm not sure, but I'll have to look into that, when I get to the point
> of trying to set up that tiered backup.

ZFS can definitely do more; I ran a background loop hardlinking a
single file on a new pool while typing up this email, and toward the
end, it's at >75K and still going strong. That consumed about 5 MB of
storage.


>> Data integrity validation is tough without a mechanism. Adding an
>> rsnapshot(1) postexec MD5SUMS, etc., file into the root of each
>> backup tree could solve this need, but could waste a lot of time and
>> energy checksumming files that have not changed.
>
> AFAIK, all such things require you to be starting from a point with a
> known-good copy of the data, which is a luxury I don't currently have
> (as far as validating my current data goes). It's something to keep in
> mind when planning a more proper backup system, however.

What you do have is a _current_ state. Being able to detect unintended
changes from that state may be beneficial even if the current state
isn't known-perfect.


>> One of the reasons I switched to ZFS was because ZFS has built-in
>> data and metadata integrity checking (and repair; depending upon
>> redundancy).
>
> I'm not sure I understand how this would be useful in the case I have at
> hand; that probably means that I'm not understanding the picture properly.

Unless you go out of your way to turn off checksumming beforehand, ZFS
will refuse to let you read a block where the checksum doesn't match
the block's payload data. Meaning that if you're able to _read_ a
block of data by normal means, you can be certain that the probability
of it not matching what was originally written to disk to be _very_
low.

ZFS will also automatically repair any repairable error it detects. In
a redundant setup, this is almost everything; in a non-redundant
setup, it's rather less, but still more than nothing.


>> rsync(1) should be able to copy backups onto an external HDD.
>
> Yeah, but that only provides one tier of backup; the advantage of
> rsnapshot (or similar) is the multiple deduplicated tiers, which gives
> you options if it turns out the latest backup already included the
> damage you're trying to recover from.

rsnapshot is largely a front-end for rsync --link-dest=<something>. It
does make a few things easier but there isn't much you can do with
rsnapshot that you can't do with rsync and a little shell scripting if
you're willing to live with a specialized tool for your purposes.
rsnapshot is generic.


> (USB-3 will almost certainly not be a viable option for an automatic
> scheduled backup of the sort rsnapshot's documentation suggests, because
> the *fastest* backup cycle I saw from my working with the data I had was
> over three hours, and the initial pass to copy the data out to the drive
> in the first place took nearly *20* hours. A cron job to run even an
> incremental backup even once a day, much less the several times a day
> suggested for the deeper rsnapshot tiers, would not be *remotely*
> workable in that sort of environment. Though on the flip side, that's
> not just a USB-3 bottleneck, but also the bottleneck of the spinning
> mechanical hard drive inside the external case...)

I think rsnapshot's suggested backup schedule is excessively frequent
for pretty much anything more than a relatively small home directory.
In my case rsnapshot runs for several hours, much of which is likely
for checking file metadata for updates; I run backups once a day and
there is no realistic way that enough data is modified each day to
take that long to copy.

I recently wrote a script to take advantage of ZFS snapshots to get a
basically point-in-time atomic snapshot of the data onto the backup
drive, even in the presence of live changes while the backup is
running. (It's not necessarily _quite_ point-in-time atomic because I
have two ZFS pools plus an ext4 file system; but it's close enough to
be a workable approximation.)

David Christensen

unread,
Feb 16, 2024, 5:00:07 AMFeb 16
to
On 2/15/24 07:41, The Wanderer wrote:
> On 2024-02-15 at 03:09, David Christensen wrote:
>> On 2/14/24 18:54, The Wanderer wrote:
>>> On 2024-01-09 at 14:22, The Wanderer wrote:
>>>> On 2024-01-09 at 14:01, Michael Kjörling wrote:
>>>>> On 9 Jan 2024 13:25 -0500, from The Wanderer
>>>>>> I've ordered a 22TB external drive
>>
>> Make? Model? How it is interfaced to your computer?
>
> It's a WD Elements 20TB drive (I'm not sure where I got the 22 from);
> the back of the case has the part number WDBWLG0200HBK-X8 (or possibly
> -XB, the font is kind of ambiguous). The connection, per the packaging
> label, is USB-3.


Okay.


STFW it seems that drive uses CMR, which is good:

https://nascompares.com/answer/list-of-wd-cmr-and-smr-hard-drives-hdd/


> The big change of plans in the middle of my month-plus process was the
> decision to replace the entire 8-drive array with a 6-drive array, and
> the reason for that was because the 8-drive array left me with no open
> SATA ports to be able to connect spare drives in order to do drive
> replacements without needing to rebuild the whole shaboozle.


Having spare drive bays for RAID drive replacement is smart.


>> If you have a processor, memory, PCIe slot, and HBA to match those
>> SSD's, the performance of those SSD's should be very nice.
>
> The CPU is a Ryxen 5 5600X. The RAM is G-Skill DDR4 2666MHz, in two 32GB
> DIMMs. I don't know how to assess PCIe slots and HBA, but the
> motherboard is an Asus ROG Crosshair VIII Dark Hero, which I think was
> the top-of-the-line enthusiast motherboard (with the port set my
> criteria called for) the year I built this machine.
>
> I'm pretty sure my performance bottleneck for most things is the CPU (or
> the GPU, where that comes into play, which here it doesn't);
> storage-wise this seems so far to be at least as fast as what I had
> before, but it's hard to tell if it's faster.


It would not surprise me if the Intel D3-S4510 server drives are
somewhat slower than the Samsung EVO 870 desktop drives. But the Intel
disks are designed to pull a heavy load all day for years on end.


Do you have a tool to monitor disk throughput and utilization? I use
Xfce panel Disk Performance Monitor applets and nmon(1) in a Terminal.
Those plus CPU and memory monitoring tools should allow you to determine
if your workload is CPU bound, memory bound, or I/O bound.


>> The key concept is "data lifetime". (Or alternatively, "destruction
>> policy".)
>
> I can see that for when you have a tiered backup structure, and are
> looking at the lifetimes of each backup copy. For my live system, my
> intended data lifetime (outside of caches and data kept in /tmp) is
> basically "forever".


I try to group my data in anticipation of backup, etc., requirements.
When I get it right, disaster preparedness and disaster recovery are easier.


>> I believe ZFS can do more hard links. (Much more? Limited by
>> available storage space?)
>
> I'm not sure, but I'll have to look into that, when I get to the point
> of trying to set up that tiered backup.
> ...
>>> ... without [rsnapshot hard link]
>>> deduplication there wouldn't have been enough space on the drive
>>> for more than the single copy, ...
>>
>> ZFS provides similarly useful results with built-in compression and
>> de-duplication.
>
> I have the impression that there are risk and/or complexity aspects to
> it ...


Of course. ZFS is sophisticated storage technology. It looks
deceptively simple when you are window shopping, but becomes non-trivial
once you put real data on it, have to live with it 24x7, have to prepare
for disasters, and have to recover from disasters. There is a lot to
learn and "more than enough rope to shoot yourself in the foot".


>> That sounds like an N-way merge problem ...
>
> It does sound like that, yes. I'm already aware of jdupes, and of a few
> other tools (part of the work I already did in getting this far was
> rdfind, which is what I used to set up much of the hardlink
> deduplication that wound up biting me in the butt), but have not
> investigated LVM snapshot - and the idea of trying to script something
> like this, without an existing known-safe copy of the data to fall back
> on, leaves me *very* nervous.
>
> Figuring out how to be prepared to roll back is the other uncertain and
> nervous-making part. In some cases it's straightforward enough, but
> doing it at the scale of the size of those copies is at best daunting.


https://html.duckduckgo.com/html?q=lvm%20snapshot%20restore


Use another computer or a VM to learn and practice LVM snapshots and
restores, then use those skills when doing the N-way merge.


>>> out of the way, shutting down all parts of the system that might be
>>> writing to the affected filesystems, and manually copying out the
>>> final state of the *other* parts of those filesystems via rsync,
>>> bypassing rsnapshot. That was on Saturday the 10th.
>>>
>>> Then I grabbed copies of various metadata about the filesystems,
>>> the LVM, and the mdraid config; modified /etc/fstab to not mount
>>> them; deactivated the mdraid, and commented it out of
>>> /etc/mdadm/mdadm.conf; updated the initramfs; shut down; pulled all
>>> eight Samsung 870 EVO drives; installed six brand-new Intel
>>> data-center-class (or so I gather) SSDs;
>>
>> Which model? What size?
>
> lshw says they're INTEL SSDSCK2B03. The packaging says SSDSCK2B038T801.


Nice.


> IIRC, the product listing said they were 3.84 TB (or possibly TiB). lshw
> says 'size: 3567GiB (3840GB)'. IIRC, the tools I used to partition them
> and build the mdraid and so forth said 3.84 TB/TiB (not sure which), or
> 3840 GB/GiB (same).
>
> For comparison, the 870 EVO drives - which were supposed to be 2TB
> apiece - were reported by some of those same tools as exactly 2000 of
> the same unit.
>
> This does mean that I have more total space available in the new array
> than in the old one,


8 @ 2 TB disks in RAID6 should provide 12 TB of capacity.

6 @ 3.84 TB disks in RAID6 should provide 15.36 TB of capacity.


> but I've tried to allocate only as much space as
> was in the old array, insofar as I could figure out how to do that in
> the limited environment I was working in. (The old array and/or LV setup
> had sizes listed along the lines of '<10TiB', but my best attempt at
> replicating it gave something which reports sizes along the lines of
> '10TiB', so I suspect that my current setup is actually slightly too
> large to fit on the old disks.)


LVM should give you the ability to resize logical volumes as required
going forward.


>> Data integrity validation is tough without a mechanism. Adding an
>> rsnapshot(1) postexec MD5SUMS, etc., file into the root of each
>> backup tree could solve this need, but could waste a lot of time and
>> energy checksumming files that have not changed.
>
> AFAIK, all such things require you to be starting from a point with a
> known-good copy of the data, which is a luxury I don't currently have
> (as far as validating my current data goes). It's something to keep in
> mind when planning a more proper backup system, however.
>
>> One of the reasons I switched to ZFS was because ZFS has built-in
>> data and metadata integrity checking (and repair; depending upon
>> redundancy).
>
> I'm not sure I understand how this would be useful in the case I have at
> hand; that probably means that I'm not understanding the picture properly.


Bit rot is the enemy of forever:

https://html.duckduckgo.com/html?q=bit%20rot


The sooner you have MD5SUMS, etc., the sooner you can start monitoring
for file damage by any means. The rsnapshot(1) community may already
have a working solution.


>>> (This does leave me without having restored the read-only backup
>>> data from the old system state. I care less about that; I'll want
>>> it eventually, but it isn't important enough to warrant postponing
>>> getting the system back in working order.)
>>>
>>>
>>> I do still want/need to figure out what to do about an *actual*
>>> backup system, to external storage, since the rsnapshot thing
>>> apparently isn't going to be viable for my circumstance and use
>>> case. There is, however, now *time* to work on doing that, without
>>> living under the shadow of a known immediate/imminent data-loss
>>> hardware failure.
>>
>> rsync(1) should be able to copy backups onto an external HDD.
>
> Yeah, but that only provides one tier of backup;


It appears I misunderstood.


So, live data on 6 @ Intel SSD's and rsnapshot(1) backups on the 20 GB
WD Elements USB HDD?


> the advantage of
> rsnapshot (or similar) is the multiple deduplicated tiers, which gives
> you options if it turns out the latest backup already included the
> damage you're trying to recover from.


If rsnapshot(1) is your chosen backup tool, you will want to learn
everything you can about it. Beyond RTFM rsnapshot(1), STFW I see:

https://rsnapshot.org/rsnapshot/docs/docbook/rest.html

http://www2.rsnapshot.org/

https://sourceforge.net/p/rsnapshot/mailman/rsnapshot-discuss/


> (USB-3 will almost certainly not be a viable option for an automatic
> scheduled backup of the sort rsnapshot's documentation suggests, because
> the *fastest* backup cycle I saw from my working with the data I had was
> over three hours, and the initial pass to copy the data out to the drive
> in the first place took nearly *20* hours. A cron job to run even an
> incremental backup even once a day, much less the several times a day
> suggested for the deeper rsnapshot tiers, would not be *remotely*
> workable in that sort of environment. Though on the flip side, that's
> not just a USB-3 bottleneck, but also the bottleneck of the spinning
> mechanical hard drive inside the external case...)


I think the Raspberry Pi, etc., users on this list live with USB storage
and have found it to be reliable enough for personal and SOHO network use.


I have used the rsync(1) command for many years, both interactively and
via scripts, but RTFM indicates rsync(1) can be run as a server. I
wonder if that would help performance, as a service can cache things
that a command must find on every run (?).


David

Roy J. Tellason, Sr.

unread,
Feb 16, 2024, 2:00:05 PMFeb 16
to
On Friday 16 February 2024 04:52:22 am David Christensen wrote:
> I think the Raspberry Pi, etc., users on this list live with USB storage
> and have found it to be reliable enough for personal and SOHO network use.

I have one, haven't done much with it. Are there any alternative ways to interface storage? Maybe add SATA ports or something?


--
Member of the toughest, meanest, deadliest, most unrelenting -- and
ablest -- form of life in this section of space,  a critter that can
be killed but can't be tamed.  --Robert A. Heinlein, "The Puppet Masters"
-
Information is more dangerous than cannon to a society ruled by lies. --James
M Dakin

David Christensen

unread,
Feb 16, 2024, 3:00:07 PMFeb 16
to
On 2/16/24 10:56, Roy J. Tellason, Sr. wrote:
> On Friday 16 February 2024 04:52:22 am David Christensen wrote:
>> I think the Raspberry Pi, etc., users on this list live with USB storage
>> and have found it to be reliable enough for personal and SOHO network use.
>
> I have one, haven't done much with it. Are there any alternative ways to interface storage? Maybe add SATA ports or something?


In general, there are many combinations of storage interfaces offered in
the marketplace. For your specific single-board computer (SBC), I
suggest checking the manual and checking the manufacturer sales and/or
support web pages.


David

Gremlin

unread,
Feb 16, 2024, 4:50:06 PMFeb 16
to
On 2/16/24 13:56, Roy J. Tellason, Sr. wrote:
> On Friday 16 February 2024 04:52:22 am David Christensen wrote:
>> I think the Raspberry Pi, etc., users on this list live with USB storage
>> and have found it to be reliable enough for personal and SOHO network use.
>
> I have one, haven't done much with it. Are there any alternative ways to interface storage? Maybe add SATA ports or something?
>
>

On raspberry Pi 1 to 4 No, you have a choice of USB 2 or USB 3

Raspberry Pi 5 Yes with and NVME hat interfaced to the pcie "port"

I am using a Pi 5 (desktop) with USB 3 port hooked to an NVME external
drive and it works just fine.

It is much faster than the Pi 4 I was using because of the new "south
bridge"

Roy J. Tellason, Sr.

unread,
Feb 17, 2024, 1:50:05 PMFeb 17
to
On Friday 16 February 2024 04:42:12 pm Gremlin wrote:
> On 2/16/24 13:56, Roy J. Tellason, Sr. wrote:
> > On Friday 16 February 2024 04:52:22 am David Christensen wrote:
> >> I think the Raspberry Pi, etc., users on this list live with USB storage
> >> and have found it to be reliable enough for personal and SOHO network use.
> >
> > I have one, haven't done much with it. Are there any alternative ways to interface storage? Maybe add SATA ports or something?
> >
> On raspberry Pi 1 to 4 No, you have a choice of USB 2 or USB 3

Looks like I'll have to go with a USB - SATA adapter, then. It's a 4, I bought the "Canakit" package that included an enclosure, keyboard, mouse, and a small touch screen (4"?).

> Raspberry Pi 5 Yes with and NVME hat interfaced to the pcie "port"
>
> I am using a Pi 5 (desktop) with USB 3 port hooked to an NVME external
> drive and it works just fine.
>
> It is much faster than the Pi 4 I was using because of the new "south
> bridge"

I'm aware of the 5 having come out, but haven't explored the possibility of getting one of those yet.

gene heskett

unread,
Feb 17, 2024, 8:50:06 PMFeb 17
to
On 2/17/24 13:45, Roy J. Tellason, Sr. wrote:
> On Friday 16 February 2024 04:42:12 pm Gremlin wrote:
>> On 2/16/24 13:56, Roy J. Tellason, Sr. wrote:
>>> On Friday 16 February 2024 04:52:22 am David Christensen wrote:
>>>> I think the Raspberry Pi, etc., users on this list live with USB storage
>>>> and have found it to be reliable enough for personal and SOHO network use.
>>>
>>> I have one, haven't done much with it. Are there any alternative ways to interface storage? Maybe add SATA ports or something?
>>>
>> On raspberry Pi 1 to 4 No, you have a choice of USB 2 or USB 3
>
> Looks like I'll have to go with a USB - SATA adapter, then. It's a 4, I bought the "Canakit" package that included an enclosure, keyboard, mouse, and a small touch screen (4"?).
>
>> Raspberry Pi 5 Yes with and NVME hat interfaced to the pcie "port"
>>
>> I am using a Pi 5 (desktop) with USB 3 port hooked to an NVME external
>> drive and it works just fine.
>>
>> It is much faster than the Pi 4 I was using because of the new "south
>> bridge"
>
> I'm aware of the 5 having come out, but haven't explored the possibility of getting one of those yet.
>
StarTech makes an excellent sata-III to usb3 adapter for about a tenner
a copy. So a 7 port hub takes up only 1 or the 4 usb3 ports on a bpi-m5,
leaving 3 more ports available on the bpi-m5 itself. See at <startech.com>
ssh into it from the Main system and run the pi headless.
>

Cheers, Gene Heskett, CET.
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
- Louis D. Brandeis
0 new messages