Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

1 Currently unreadable (pending) sectors How worried should I be?

61 views
Skip to first unread message

Charles Curley

unread,
Jan 2, 2024, 5:50:06 PM1/2/24
to
I have a brand new NVME device, details below, in a brand new computer.
smartd just started returning pending sector errors.

A recent extended (long) test run since the first reported pending
sector returned no errors.

How worried should I be?


Device Model: NS256GSSD330
Serial Number: W3ZK047027T
Firmware Version: V0823A0
User Capacity: 256,060,514,304 bytes [256 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: mSATA
TRIM Command: Available
Device is: Not in smartctl database 7.3/5533
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Jan 2 15:27:45 2024 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED



SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 764 -
# 2 Short offline Completed without error 00% 116 -


root@tiassa:~# journalctl -u smartmontools.service | grep unreadable
Jan 02 13:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jan 02 13:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jan 02 14:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jan 02 14:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jan 02 15:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
root@tiassa:~#


--
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/

Dan Purgert

unread,
Jan 2, 2024, 6:10:06 PM1/2/24
to
On Jan 02, 2024, Charles Curley wrote:
> I have a brand new NVME device, details below, in a brand new computer.
> smartd just started returning pending sector errors.

Means you've got "N" bad sector(s) on the drive. It happens, even on
new drives.

>
> A recent extended (long) test run since the first reported pending
> sector returned no errors.
>
> How worried should I be?

I wouldn't be "very" worried; but I'd keep an eye on it (especially with
regards to any warranties you may have on the machine)

> Device Model: NS256GSSD330
> Serial Number: W3ZK047027T
> Firmware Version: V0823A0
> User Capacity: 256,060,514,304 bytes [256 GB]
> Sector Size: 512 bytes logical/physical
> Rotation Rate: Solid State Device
> Form Factor: mSATA
> TRIM Command: Available
> Device is: Not in smartctl database 7.3/5533
> ATA Version is: ACS-2 T13/2015-D revision 3
> SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is: Tue Jan 2 15:27:45 2024 MST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> …
>
> SMART Self-test log structure revision number 1
> Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
> # 1 Extended offline Completed without error 00% 764 -
> # 2 Short offline Completed without error 00% 116 -


You kinda removed the important bits out of this report with regards to
the drive health. That being said, this drive is not an NVMe -- did you
check the right one?


--
|_|O|_|
|_|_|O| Github: https://github.com/dpurgert
|O|O|O| PGP: DDAB 23FB 19FA 7D85 1CC1 E067 6D65 70E5 4CE7 2860
signature.asc

Dan Ritter

unread,
Jan 2, 2024, 6:10:07 PM1/2/24
to
Charles Curley wrote:
> I have a brand new NVME device, details below, in a brand new computer.

You might, but that's not what the details you show us are
saying.

> smartd just started returning pending sector errors.
>
> A recent extended (long) test run since the first reported pending
> sector returned no errors.
>
> How worried should I be?
>
>
> Device Model: NS256GSSD330
> Serial Number: W3ZK047027T
> Firmware Version: V0823A0
> User Capacity: 256,060,514,304 bytes [256 GB]
> Sector Size: 512 bytes logical/physical
> Rotation Rate: Solid State Device
> Form Factor: mSATA

That says this is a SATA device, not an NVMe device.

Looking up the device model shows me this:
https://smarthdd.com/database/Netac-SSD-256GB/S0626A0/

which confirms: SATA in an M.2 form factor, not NVMe.

> ATA Version is: ACS-2 T13/2015-D revision 3
> SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is: Tue Jan 2 15:27:45 2024 MST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> …
>
> SMART Self-test log structure revision number 1
> Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
> # 1 Extended offline Completed without error 00% 764 -
> # 2 Short offline Completed without error 00% 116 -
>
>
> root@tiassa:~# journalctl -u smartmontools.service | grep unreadable
> Jan 02 13:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
> Jan 02 13:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
> Jan 02 14:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
> Jan 02 14:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
> Jan 02 15:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors

These are logged at suspiciously even times, like something is
looking at the disk every 30 minutes exactly.

Note that "currently unreadable" sometimes means "the disk is
too busy to get back to us" and sometimes means "there's damage
on the disk". The disk's onboard controller should map around
damage automatically.

Do you have any other symptoms? Anything interesting in the
SMART variables?

-dsr-

Charles Curley

unread,
Jan 2, 2024, 6:40:06 PM1/2/24
to
On Tue, 2 Jan 2024 17:47:18 -0500
Dan Ritter <d...@randomstring.org> wrote:

> Charles Curley wrote:
> > I have a brand new NVME device, details below, in a brand new
> > computer.
>
> You might, but that's not what the details you show us are
> saying.
>
> [...]
>
> That says this is a SATA device, not an NVMe device.
>
> Looking up the device model shows me this:
> https://smarthdd.com/database/Netac-SSD-256GB/S0626A0/
>
> which confirms: SATA in an M.2 form factor, not NVMe.

Thank you for that correction.

>
> [...]
>
> These are logged at suspiciously even times, like something is
> looking at the disk every 30 minutes exactly.

If I correctly read the journal entries I appended to my previous email,
that would be smartd.



>
> Note that "currently unreadable" sometimes means "the disk is
> too busy to get back to us" and sometimes means "there's damage
> on the disk". The disk's onboard controller should map around
> damage automatically.
>
> Do you have any other symptoms? Anything interesting in the
> SMART variables?

Nothing that jumps out at me.

Report appended as a text file.
smartmontool.2024.01.02.txt

Charles Curley

unread,
Jan 2, 2024, 6:50:05 PM1/2/24
to
On Tue, 2 Jan 2024 18:01:32 -0500
Dan Purgert <d...@djph.net> wrote:

> On Jan 02, 2024, Charles Curley wrote:
> > I have a brand new NVME device, details below, in a brand new
> > computer. smartd just started returning pending sector errors.
>
> Means you've got "N" bad sector(s) on the drive. It happens, even on
> new drives.

Good to know.

>
> >
> > A recent extended (long) test run since the first reported pending
> > sector returned no errors.
> >
> > How worried should I be?
>
> I wouldn't be "very" worried; but I'd keep an eye on it (especially
> with regards to any warranties you may have on the machine)

OK, will do. If I understand that entry in the SMART report, the
offending sector should eventually be re-mapped or else marked as
unrecoverable. If the latter, I'll get really concerned.


>
> > Device Model: NS256GSSD330
> > Serial Number: W3ZK047027T

>
> You kinda removed the important bits out of this report with regards
> to the drive health.

Sorry. See my recent reply to Dan Ritter <d...@randomstring.org>.

> That being said, this drive is not an NVMe --
> did you check the right one?

It's the only one on the computer. Dan Ritter <d...@randomstring.org>
corrected that. https://smarthdd.com/database/Netac-SSD-256GB/S0626A0/

Andy Smith

unread,
Jan 2, 2024, 7:40:06 PM1/2/24
to
Hello,

On Tue, Jan 02, 2024 at 04:42:37PM -0700, Charles Curley wrote:
> If I understand that entry in the SMART report, the offending
> sector should eventually be re-mapped or else marked as
> unrecoverable. If the latter, I'll get really concerned.

If a SMART long self-test came back clean then it already has been
re-mapped as a long self-test reads every user-accessible sector.

If you really want to reassure yourself, look back in your logs for
the actual sector number and then read it with hdparm. Either it
prints the raw data or it gives an error.

# hdparm --read-sector [sector number] /dev/sda

(generally safe as it's only a read)

It is annoying when a remapped bad sector doesn't seem to increment
the "remapped" count and decrement the "pending" count, but I've had
it happen. I wouldn't particularly worry about it unless the number
keeps going up OR the actual sector is still unreadable (though the
self-test should have spotted that).

You can reconfigure smartd so that it only warns you about error values
that increase, not just the presence of the non-zero value every 30
minutes. That's discussed in the comments of /etc/smartd.conf and
its man page.

> It's the only one on the computer.

Like to live dangerously, huh…

Thanks,
Andy

--
https://bitfolk.com/ -- No-nonsense VPS hosting

Dan Ritter

unread,
Jan 2, 2024, 8:00:06 PM1/2/24
to
Charles Curley wrote:
> On Tue, 2 Jan 2024 17:47:18 -0500
> Dan Ritter <d...@randomstring.org> wrote:
>
> root@tiassa:~# smartctl -a /dev/sda
> smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-17-amd64] (local build)

> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x0032 100 100 050 Old_age Always - 0
> 5 Reallocated_Sector_Ct 0x0032 100 100 050 Old_age Always - 1
> 9 Power_On_Hours 0x0032 100 100 050 Old_age Always - 764
> 12 Power_Cycle_Count 0x0032 100 100 050 Old_age Always - 25
> 178 Used_Rsvd_Blk_Cnt_Chip 0x0032 100 100 050 Old_age Always - 1
> 194 Temperature_Celsius 0x0022 100 100 050 Old_age Always - 45
> 195 Hardware_ECC_Recovered 0x0032 100 100 050 Old_age Always - 0
> 196 Reallocated_Event_Count 0x0032 100 100 050 Old_age Always - 0
> 197 Current_Pending_Sector 0x0032 100 100 050 Old_age Always - 1
> 198 Offline_Uncorrectable 0x0032 100 100 050 Old_age Always - 0
> 199 UDMA_CRC_Error_Count 0x0032 100 100 050 Old_age Always - 0
> 232 Available_Reservd_Space 0x0032 100 100 050 Old_age Always - 96
> 241 Total_LBAs_Written 0x0030 100 100 050 Old_age Offline - 13943
> 242 Total_LBAs_Read 0x0030 100 100 050 Old_age Offline - 5610

These are the values that can indicate health problems with the
disk.

None of them look bad except the temperature - which is only bad
because of the specs on the disk - and
> 197 Current_Pending_Sector 0x0032 100 100 050 Old_age Always - 1

which confirms that something is stuck, but it's just one
sector.

I would not worry about this unless some new symptom emerges.

Make backups, but only because you should pretty much always
have backups.

-dsr-

Charles Curley

unread,
Jan 2, 2024, 10:20:05 PM1/2/24
to
On Wed, 3 Jan 2024 00:29:42 +0000
Andy Smith <an...@strugglers.net> wrote:

> Hello,
>
> On Tue, Jan 02, 2024 at 04:42:37PM -0700, Charles Curley wrote:
> [...]
>
> If a SMART long self-test came back clean then it already has been
> re-mapped as a long self-test reads every user-accessible sector.

I'm not so sure about that. See the journalctl output at the bottom of
this email.


>
> If you really want to reassure yourself, look back in your logs for
> the actual sector number and then read it with hdparm. Either it
> prints the raw data or it gives an error.
>
> # hdparm --read-sector [sector number] /dev/sda
>
> (generally safe as it's only a read)

I'll try that later. I don't want to take the time now to isolate the
relevant log entries.

>
> It is annoying when a remapped bad sector doesn't seem to increment
> the "remapped" count and decrement the "pending" count, but I've had
> it happen. I wouldn't particularly worry about it unless the number
> keeps going up OR the actual sector is still unreadable (though the
> self-test should have spotted that).
>
> You can reconfigure smartd so that it only warns you about error
> values that increase, not just the presence of the non-zero value
> every 30 minutes. That's discussed in the comments of
> /etc/smartd.conf and its man page.

Good thoughts, thank you.

>
> > It's the only one on the computer.
>
> Like to live dangerously, huh…

No. That's what fast networks, good and multiple backup programs, a
good RAID array on another computer, and multiple off-site backups are
for.

>
> Thanks,
> Andy
>

root@tiassa:~# journalctl -b -u smartmontools.service
Jan 02 12:37:39 tiassa systemd[1]: Starting smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon...
Jan 02 12:37:39 tiassa smartd[740]: smartd 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-17-amd64] (local build)
Jan 02 12:37:39 tiassa smartd[740]: Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
Jan 02 12:37:39 tiassa smartd[740]: Opened configuration file /etc/smartd.conf
Jan 02 12:37:39 tiassa smartd[740]: Drive: DEVICESCAN, implied '-a' Directive on line 21 of file /etc/smartd.conf
Jan 02 12:37:39 tiassa smartd[740]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Jan 02 12:37:39 tiassa smartd[740]: Device: /dev/sda, type changed from 'scsi' to 'sat'
Jan 02 12:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], opened
Jan 02 12:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], NS256GSSD330, S/N:W3ZK047027T, FW:V0823A0, 256 GB
Jan 02 12:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], not found in smartd database 7.3/5533.
Jan 02 12:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.
Jan 02 12:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], state read from /var/lib/smartmontools/smartd.NS256GSSD330-W3ZK047027T.ata.state
Jan 02 12:37:39 tiassa smartd[740]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices
Jan 02 12:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.NS256GSSD330-W3ZK047027T.ata.state
Jan 02 12:37:39 tiassa systemd[1]: Started smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon.
Jan 02 13:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jan 02 13:07:39 tiassa smartd[740]: Sending warning via /usr/share/smartmontools/smartd-runner to root ...
Jan 02 13:07:39 tiassa smartd[740]: Warning via /usr/share/smartmontools/smartd-runner to root: successful
Jan 02 13:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jan 02 14:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jan 02 14:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jan 02 14:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], self-test in progress, 20% remaining
Jan 02 15:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jan 02 15:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], previous self-test completed without error
Jan 02 15:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jan 02 16:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jan 02 16:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jan 02 17:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jan 02 17:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jan 02 18:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jan 02 18:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jan 02 19:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jan 02 19:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Jan 02 20:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
root@tiassa:~#

Tixy

unread,
Jan 3, 2024, 2:50:06 AM1/3/24
to
On Tue, 2024-01-02 at 17:47 -0500, Dan Ritter wrote:
> > root@tiassa:~# journalctl -u smartmontools.service | grep unreadable
> > Jan 02 13:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
> > Jan 02 13:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
> > Jan 02 14:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
> > Jan 02 14:37:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
> > Jan 02 15:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
>
> These are logged at suspiciously even times, like something is
> looking at the disk every 30 minutes exactly.

Perhaps 'smartd' the "SMART Disk Monitoring Daemon" ;-)

--
Tixy

Michael Kjörling

unread,
Jan 3, 2024, 6:10:05 AM1/3/24
to
On 2 Jan 2024 20:17 -0700, from charle...@charlescurley.com (Charles Curley):
> Jan 02 20:07:39 tiassa smartd[740]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors

This is not the problem. This is smartd reporting something about the
drive's health which you might be interested in. (Also, about what
someone else wrote, it's not really surprising if smartd checks the
drive every 30 minutes. It would have been more curious if there were
kernel I/O errors logged exactly every 30 minutes, but you haven't
shown anything from those logs in this thread AFAICT.)

What I find curious is the combination of Reallocated_Sector_Ct == 1
and Reallocated_Event_Count == 0. There's also the
Current_Pending_Sector == 1 but Offline_Uncorrectable == 0 even after
two SMART health tests, one of which being an extended offline test.

If a sector has been reallocated, that should have happened at some
point, so if Reallocated_Sector_Ct > 0 then Reallocated_Event_Count
_should_ also be greater than 0 (and hopefully not greater than
Reallocated_Sector_Ct), which it isn't reported as in your case.

Likewise, after an extended offline SMART test, each sector should
have a known status of either readable or not readable. If the
firmware detects a sector as being marginal, it _should_ rewrite it
and check again; if it's still marginal, it _should_ reallocate that
sector, which _should_ increment Reallocated_Event_Count. The "pending
sectors" SMART attribute is supposed to count sectors which the drive
has failed to read, so they cannot be reallocated, and which will be
reallocated on the next write (when the drive knows what data to put
in the reallocated-to sector). Since both tests finished without
finding any errors, there _should_ have been no unreadable sectors.

I'm inclined to believe that your drive is fibbing SMART data.

As a background process, try running something like

# ionice find / -xdev -type f -exec cat {} + >/dev/null

and if that doesn't cause any I/O errors to be output or logged, then
the drive is _likely_ fine. (You may need to adjust for other file
systems also on that drive, such as /boot.)

--
Michael Kjörling 🔗 https://michael.kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”

Andy Smith

unread,
Jan 3, 2024, 8:30:07 AM1/3/24
to
Hi,

On Tue, Jan 02, 2024 at 08:17:55PM -0700, Charles Curley wrote:
> On Wed, 3 Jan 2024 00:29:42 +0000
> Andy Smith <an...@strugglers.net> wrote:
> > If a SMART long self-test came back clean then it already has been
> > re-mapped as a long self-test reads every user-accessible sector.
>
> I'm not so sure about that. See the journalctl output at the bottom of
> this email.

I don't see anything but smartd repeatedly warning you about the 1
pending sector.

As I said, it's annoying when a drive doesn't decremement its
pending sector count after remapping. If you can read the whole
drive then it certainly has been remapped (or was a transient error
that isn't "pending" any more).

None of the logs you presented show any error coming from the drive,
but then they won't as you've only selected logs from smartd. smartd
will complain about that 1 pending sector count until the end of
time unless:

- Drive just decides to clear it, or;

- You reconfigure smartd

All smartd is doing here is reading the attributes of the drive and
reporting them to you. It will never show you the actual error that
caused those attributes to change.

> > Like to live dangerously, huh…
>
> No. That's what fast networks, good and multiple backup programs, a
> good RAID array on another computer, and multiple off-site backups are
> for.

It's not my view as in my experience storage is one of the most
failure-prone parts of a computer, and an outage from non-redundant
storage typically annoys me more than making it redundant does.
That is in most cases really easy and cheap these days, so I do
regard going without it as living dangerously. Not always a good
cost-benefit trade off though, I grant you.

Charles Curley

unread,
Jan 3, 2024, 3:30:06 PM1/3/24
to
On Wed, 3 Jan 2024 11:05:10 +0000
Michael Kjörling <2695bd...@ewoof.net> wrote:

> Since both tests finished without
> finding any errors, there _should_ have been no unreadable sectors.

Agree.

>
> I'm inclined to believe that your drive is fibbing SMART data.

Sigh. I am inclined to agree. Obviously they didn't hire me to write
the firmware on the drive.

>
> As a background process, try running something like
>
> # ionice find / -xdev -type f -exec cat {} + >/dev/null

That would only reach files on the partition where it is run. Since
there is another operating system on this drive, and there are parts of
the drive normally inaccessible to any operating system, I decided
instead to boot to a USB stick and run badblocks. The read-only test
took 12 minutes and reported no errors.

I now have a writing test (-w) running. It has reported no failures on
its first pass.

Charles Curley

unread,
Jan 3, 2024, 6:30:05 PM1/3/24
to
On Wed, 3 Jan 2024 13:25:26 -0700
Charles Curley <charle...@charlescurley.com> wrote:

> I now have a writing test (-w) running. It has reported no failures on
> its first pass.

OOPS! -w is the destructive test. I now have a hard drive full of 0x00s.
I should have used the -n option. However, it reported no failures.

to...@tuxteam.de

unread,
Jan 4, 2024, 6:00:06 AM1/4/24
to
On Wed, Jan 03, 2024 at 04:27:54PM -0700, Charles Curley wrote:
> On Wed, 3 Jan 2024 13:25:26 -0700
> Charles Curley <charle...@charlescurley.com> wrote:
>
> > I now have a writing test (-w) running. It has reported no failures on
> > its first pass.
>
> OOPS! -w is the destructive test. I now have a hard drive full of 0x00s.
> I should have used the -n option. However, it reported no failures.

Ouch, I hope you had a backup.

> --
> Does anybody read signatures any more?

I *never* do.

Cheers
--
t
signature.asc

Charles Curley

unread,
Jan 4, 2024, 9:10:06 AM1/4/24
to
On Thu, 4 Jan 2024 11:58:54 +0100
<to...@tuxteam.de> wrote:

> >
> > OOPS! -w is the destructive test. I now have a hard drive full of
> > 0x00s. I should have used the -n option. However, it reported no
> > failures.
>
> Ouch, I hope you had a backup.

All the essential stuff, yes.

--
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/

Max Nikulin

unread,
Jan 4, 2024, 10:20:06 AM1/4/24
to
On 04/01/2024 03:25, Charles Curley wrote:
> I decided
> instead to boot to a USB stick and run badblocks. The read-only test
> took 12 minutes and reported no errors.
>
> I now have a writing test (-w) running. It has reported no failures on
> its first pass.

Is badblock writing test useful for SSD taking into account wear
leveling? Each write should be mapped to another physical address. All
errors should be handled by firmware.

To test low-end USB pen drives and SD cards there is the f3 (Fight Flash
Fraud or Fight Fake Flash) tool, however such test should not be
necessary for a SATA SSD.

Have you checked that no firmware update is available for this drive?

I have experienced just a few failures of HDD. It may be irrelevant for
SSD, but in the case of HDD I would replace the disk reporting
Current_Pending_Sector as soon as possible. It seems, repeating reports
from smartd are intentional.

On the other hand the "VALUE" has not decreased and is still 100, and
the attribute is not marked as "pre-fail". Perhaps there is not reason
to worry to much.

I am unsure concerning accounting if an error happens during reading a
file then the file is deleted without overwriting and the address range
is marked unused (trimmed).

Michael Kjörling

unread,
Jan 4, 2024, 11:50:05 AM1/4/24
to
On 3 Jan 2024 13:25 -0700, from charle...@charlescurley.com (Charles Curley):
>> As a background process, try running something like
>>
>> # ionice find / -xdev -type f -exec cat {} + >/dev/null
>
> That would only reach files on the partition where it is run.

I covered that on the next few lines, which you choose not to quote.

> Since there is another operating system on this drive,

That kind of information might be helpful to include up front.

> and there are parts of
> the drive normally inaccessible to any operating system,

That's true, but it would have told you about any error in any
accessible data _and_ also told you which file or directory was
affected if that was the case. If the error is in an inaccessible
portion of the drive and the system is working normally aside from a
note in SMART data, then it would stand to reason that the error would
be in an unused location; thereby not likely to affect usage (because
a known bad spot would be reallocated elsewhere by the firmware on the
next write).

Also, badblocks too will only deal with user-accessible blocks; if the
drive already had remapped a bad location, for example as a part of a
write to an identified bad block, the original error would be
invisible to badblocks even if it still was an error.


On 3 Jan 2024 16:27 -0700, from charle...@charlescurley.com (Charles Curley):
> OOPS! -w is the destructive test. I now have a hard drive full of 0x00s.

After restoring your most recent backup, consider doing a fstrim to
TRIM unused blocks.

Andy Smith

unread,
Jan 5, 2024, 4:10:06 PM1/5/24
to
Hello,

On Wed, Jan 03, 2024 at 04:27:54PM -0700, Charles Curley wrote:
> OOPS! -w is the destructive test. I now have a hard drive full of 0x00s.
> I should have used the -n option. However, it reported no failures.

So has this coaxed the drive into reducing its pending sector count
to zero or does that still say 1?

I have had drives in the past that never decremented it even though
they had clearly done a remap, and others that took a long time
(weeks) to get around to doing so.

Charles Curley

unread,
Jan 5, 2024, 6:30:07 PM1/5/24
to
On Fri, 5 Jan 2024 21:01:28 +0000
Andy Smith <an...@strugglers.net> wrote:

> So has this coaxed the drive into reducing its pending sector count
> to zero or does that still say 1?

Last I looked, it was still at 1. When I finish my reinstallation, I
will look again.

>
> I have had drives in the past that never decremented it even though
> they had clearly done a remap, and others that took a long time
> (weeks) to get around to doing so.

As the Zen master said, we will see.

David Christensen

unread,
Jan 5, 2024, 8:30:06 PM1/5/24
to
On 1/5/24 15:20, Charles Curley wrote:
> On Fri, 5 Jan 2024 21:01:28 +0000 Andy Smith wrote:
>> So has this coaxed the drive into reducing its pending sector count
>> to zero or does that still say 1?
>
> Last I looked, it was still at 1. When I finish my reinstallation, I
> will look again.


I like to do a secure erase before re-deploying an SSD. The UEFI ROM
firmware in my newer Dell computers provides an option to make secure
erase easy. Other choices include an SSD manufacturer toolkit or
install/ live/ rescue media with the right tools. It is also useful to
have a hot-swap drive rack and matching port, as powering down,
installing the target drive, powering up, and booting an OS (on
different media) can result in locked drive security.


I save 'smartctl -x ...' output to text files and check them into a
version control system. This facilitates looking for changes and trends
over time.


I would be curious to know if a secure erase forces the pending sector
issue and, if so, what the result is.


David

Max Nikulin

unread,
Jan 5, 2024, 9:50:06 PM1/5/24
to
On 06/01/2024 08:25, David Christensen wrote:
> I like to do a secure erase before re-deploying an SSD.  The UEFI ROM
> firmware in my newer Dell computers provides an option to make secure
> erase easy.  Other choices include an SSD manufacturer toolkit or
> install/ live/ rescue media with the right tools.

I have seen a couple of warnings concerning hdparm, but I am unsure
concerning current state of affairs. Maybe something has changed.

https://archive.kernel.org/oldwiki/ata.wiki.kernel.org/index.php/ATA_Secure_Erase.html>

> - Do not attempt to do this through a USB interface!
> - Do not set the password to an empty string or NULL.
>
> OBSOLETE CONTENT
>
> This wiki has been archived and the content is no longer updated.

Charles Curley

unread,
Jan 6, 2024, 12:20:06 AM1/6/24
to
On Fri, 5 Jan 2024 17:25:48 -0800
David Christensen <dpch...@holgerdanske.com> wrote:

> I would be curious to know if a secure erase forces the pending
> sector issue and, if so, what the result is.

An interesting thought. Alas, I am far enough along on re-installing
that I do not want to try it. Sorry.

David Christensen

unread,
Jan 6, 2024, 3:40:06 AM1/6/24
to
On 1/5/24 21:10, Charles Curley wrote:
> On Fri, 5 Jan 2024 17:25:48 -0800
> David Christensen <dpch...@holgerdanske.com> wrote:
>
>> I would be curious to know if a secure erase forces the pending
>> sector issue and, if so, what the result is.
>
> An interesting thought. Alas, I am far enough along on re-installing
> that I do not want to try it. Sorry.


I suggest taking an image (backup) with dd(1), Clonezilla, etc., when
you're done. This will allow you to restore the image later -- to
roll-back a change you do not like, to recovery from a disaster, to
clone the image to another device, to facilitate experiments, (such as
doing a secure erase to see if it resolves the SSD pending sector
issue), etc..


If you also keep your system configuration files in a version control
system, restoring an image is faster than wipe/ fresh install/
configure/ restore data.


David

Michael Kjörling

unread,
Jan 6, 2024, 7:40:07 AM1/6/24
to
On 6 Jan 2024 00:37 -0800, from dpch...@holgerdanske.com (David Christensen):
> I suggest taking an image (backup) with dd(1), Clonezilla, etc., when you're
> done. This will allow you to restore the image later -- to roll-back a
> change you do not like, to recovery from a disaster, to clone the image to
> another device, to facilitate experiments, (such as doing a secure erase to
> see if it resolves the SSD pending sector issue), etc..
>
> If you also keep your system configuration files in a version control
> system, restoring an image is faster than wipe/ fresh install/ configure/
> restore data.

I would go even farther. Backups should be designed such that
recovering from a catastrophic storage failure, such as getting hit by
ransomware, unintentionally doing a destructive badblocks write test
or the sudden failure of a storage device, is possible by at most
something very similar to:

* Boot some kind of live environment
* Set up file systems on the storage device to be restored onto
(partitioning, setting up LUKS containers, formatting, whatever else
might be called for)
* Within the live environment, install and configure the software
needed to access the backup (if any) (this may include things like
cryptographic keys, access passphrases and the likes)
* Perform the restoration from the most recent backup (this is the
part that likely will take a significant amount of time)
* Update the restored copies of /etc/fstab, /etc/crypttab and any
other files that directly reference the partitions or file systems
by some kind of ID (UUID, /dev/disk/by-*/*, ...)
* Reinstall the boot loader
* Reboot
* Reinstall the boot loader again from within the restored environment
to ensure that everything relating to it is in sync

Such recovery should _not_ need to involve significant reconfiguration
of anything. Any such requirements will massively increase your time
to recovery, as I think we're seeing an example of here. And yes,
pretty much all of this could be scripted, but I strongly suspect that
few people need to do a bare-metal restore of their most recent backup
often enough for _that_ to be worth the effort to create and maintain.

Which is not to say that keeping configuration files
version-controlled cannot provide benefits anyway; but given a proper,
frequent backup regime, the benefits even of that are reduced.

David Christensen

unread,
Jan 6, 2024, 6:40:06 PM1/6/24
to
On 1/6/24 04:36, Michael Kjörling wrote:
> On 6 Jan 2024 00:37 -0800, from dpch...@holgerdanske.com (David Christensen):
>> I suggest taking an image (backup) with dd(1), Clonezilla, etc., when you're
>> done. This will allow you to restore the image later -- to roll-back a
>> change you do not like, to recovery from a disaster, to clone the image to
>> another device, to facilitate experiments, (such as doing a secure erase to
>> see if it resolves the SSD pending sector issue), etc..
>>
>> If you also keep your system configuration files in a version control
>> system, restoring an image is faster than wipe/ fresh install/ configure/
>> restore data.
>
> I would go even farther. Backups should be designed such that
> recovering from a catastrophic storage failure, such as getting hit by
> ransomware, unintentionally doing a destructive badblocks write test
> or the sudden failure of a storage device, is possible by at most
> something very similar to:
>
> * Boot some kind of live environment


I wanted more tools than what the Debian installer rescue shell provides
(e.g. BusyBox) and I am too lazy to learn yet another live system (e.g.
Knoppix), so I installed Debian with Xfce onto two USB drives -- one
with BIOS/MBR and the other with secure UEFI/GPT. They are both
complete installs, so they are familiar and I can add whatever I want.


> * Set up file systems on the storage device to be restored onto
> (partitioning, setting up LUKS containers, formatting, whatever else
> might be called for)
> * Within the live environment, install and configure the software
> needed to access the backup (if any) (this may include things like
> cryptographic keys, access passphrases and the likes)
> * Perform the restoration from the most recent backup (this is the
> part that likely will take a significant amount of time)


I keep my Debian instances small, simple, and self-contained (1 GB ext4
boot, 1 GB dm-crypt swap, and 12 GB LUKS ext4 root on one 16+ GB 2.5"
SATA SSD). dd(1) meets all of my imaging needs. It's fast and requires
minimal storage -- less than 10 minutes using an old-school USB 2.0 HDD;
each 100 GB holds 6+ images. (`apt-get autoremove`, `apt-get
autoclean`, fstrim(8), and/or gzip(1) can reduce time and storage
requirements.)


If my OS instances were larger, more complex, shared disk space, etc. --
e.g. multi-boot Windows, Debian, etc., with a shared data partition --
e.g. what the OP likely had -- I would think about a tool such as
Clonezilla. Then I would get a big USB 3.0+ HDD/RAID, boot one of my
Debian USB instances, look at the partition table, and take dd(1) images
in chunks -- block 0 to the last block before ESP, the ESP, then each
partition or contiguous span of related partitions, and finally the
secondary GPT header.


> * Update the restored copies of /etc/fstab, /etc/crypttab and any
> other files that directly reference the partitions or file systems
> by some kind of ID (UUID, /dev/disk/by-*/*, ...)
> * Reinstall the boot loader


When I take a dd(1) image of an MBR disk, I copy from block 0 through
the end of the root partition. So:

1. UUID's are preserved.

2. All boot loader stages are preserved.


When I take an dd(1) image of a GPT disk with lots of zeros (fresh wipe
and install), I copy the whole thing. Again, UUID's and boot loader
stages are preserved.


Using live media for UUID and/or boot loader surgery is non-trivial, as
discussed in more than a few posts to this list. But, such may be
required after restoring an image onto a different disk and/or hardware
arrangement.


> * Reboot
> * Reinstall the boot loader again from within the restored environment
> to ensure that everything relating to it is in sync


For the simple case of restoring an image onto the exact same hardware,
a restored MBR image just works. Same for GPT. If a GPT disk was
zeroed or secure erased, a secondary GPT header will need to be needed
written. I believe GRUB, Linux, or something on Debian did this
automagically for me the last time I tried.


> Such recovery should _not_ need to involve significant reconfiguration
> of anything. Any such requirements will massively increase your time
> to recovery, as I think we're seeing an example of here. And yes,
> pretty much all of this could be scripted, but I strongly suspect that
> few people need to do a bare-metal restore of their most recent backup
> often enough for _that_ to be worth the effort to create and maintain.


AIUI the OP accidentally zeroed a Windows/ Debian multi-boot disk in a
relatively new computer. Rebuilding from scratch is going to involve
more than twice the effort of rebuilding one OS from scratch, but
hopefully there was no live data lost.


I have a half dozen computers in my SOHO network. I trash my daily
driver at least once a year and my workhorse more often than that.


I started with disaster preparedness/ recovery using
lowest-common-denominator tools -- tar(1), gzip(1), rsync(1), dd(1),
etc.. I am a coder, so I wrapped those with shell and Perl scripts.
For better or worse, I have built my own backup, recovery, image,
archive, etc., suite and have tailored my work flow to match. The tool
chain is Rube Goldberg, but the backup and archive products are
identifiable as standard Unix tool outputs and accessible by hand.


> Which is not to say that keeping configuration files
> version-controlled cannot provide benefits anyway; but given a proper,
> frequent backup regime, the benefits even of that are reduced.


The goal is defense in depth -- version control, backup, restore,
imaging, archive, zfs-auto-snapshot, replication, rotation, RAID, etc..


David
0 new messages