smartctl cannot access my storage, need syntax help

gene heskett

unread,

Jan 12, 2024, 9:50:05 PM1/12/24

to

I just found an mbox file in my home directory, containing about 90 days
worth of undelivered msgs from smartctl running as root.

smartctl says my raid10 is dying, but will not access the drives for
detail. The -d /dev/sde1 for instance generates a help msg saying it
needs a devicename as final argument, being run as "sudo smartctl -i -d
/dev/sde1". or as -i -d /dev/md0p1???
Typical: sudo smartctl -i -d /dev/md0p1:
gene@coyote:~$ sudo smartctl -i -d /dev/md0p1
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-17-rt-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

ERROR: smartctl requires a device name as the final command-line argument.

Use smartctl -h to get a usage summary

Instructions please...

Thank you.

Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
- Louis D. Brandeis

Gareth Evans

unread,

Jan 12, 2024, 10:00:07 PM1/12/24

to

On Sat 13/01/2024 at 02:42, gene heskett <ghes...@shentel.net> wrote:
> I just found an mbox file in my home directory, containing about 90 days
> worth of undelivered msgs from smartctl running as root.
>
> smartctl says my raid10 is dying, but will not access the drives for
> detail. The -d /dev/sde1 for instance generates a help msg saying it
> needs a devicename as final argument, being run as "sudo smartctl -i -d
> /dev/sde1". or as -i -d /dev/md0p1???
> Typical: sudo smartctl -i -d /dev/md0p1:
> gene@coyote:~$ sudo smartctl -i -d /dev/md0p1
> smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-17-rt-amd64] (local build)
> Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
>
> ERROR: smartctl requires a device name as the final command-line argument.
>
>
> Use smartctl -h to get a usage summary
>
>
> Instructions please...

Hi Gene,

You seem to be asking it to operate on partitions.

smartctl -options /dev/sdX

is the form, I believe, in which sdX must be a device, not a virtual thing such as MD, LVM etc.

Andy Smith

unread,

Jan 12, 2024, 10:00:07 PM1/12/24

to

Hi Gene,

There are some indicators of a fundamental lack of understanding
here I'm afraid.

On Fri, Jan 12, 2024 at 09:42:54PM -0500, gene heskett wrote:
> smartctl says my raid10 is dying,

No it doesn't; smartctl works on drives, not mdadm arrays. mdadm
arrays are composed of block devices. Therefore any output you get
from smartd refers to a storage drive, not an mdadm array.

As usual you have not bothered to show us what you are talking about
(the email from smartd), so we are left to guess. We should not
assume that it even says what you think it says.

> The -d /dev/sde1 for instance generates a help msg saying it needs a
> devicename as final argument, being run as "sudo smartctl -i -d /dev/sde1".
> or as -i -d /dev/md0p1???

Neither. /dev/sde1 is a partition on a block device.
/dev/md0p1 is a partition on an mdadm array. Neither one is
something that smartd works with.

You probably wanted /dev/sde.

Also, the -d option of smartctl specifies the device type. You
almost certainly don't need it. If you don't absolutely know why
you are using -d, don't use it. So:

# smartctl -i /dev/sde

or, heck, get all the info at once:

# smartctl -a /dev/sde

**********************************************************************
If there is anything in that output that you have questions about,
please make sure to quote the full and unedited output back here to
the list, so we aren't left guessing what the subject of
discussion is.
**********************************************************************

Thanks,
Andy

--
https://bitfolk.com/ -- No-nonsense VPS hosting

gene heskett

unread,

Jan 13, 2024, 12:00:06 AM1/13/24

to

On 1/12/24 21:56, Andy Smith wrote:
> Hi Gene,
>
> There are some indicators of a fundamental lack of understanding
> here I'm afraid.
>
> On Fri, Jan 12, 2024 at 09:42:54PM -0500, gene heskett wrote:
>> smartctl says my raid10 is dying,
>
> No it doesn't; smartctl works on drives, not mdadm arrays. mdadm
> arrays are composed of block devices. Therefore any output you get
> from smartd refers to a storage drive, not an mdadm array.
>

This appears to be true, there are 4 1t drives as a raid10, and the
various messages in that mbox file name 3 of the individual drives.
But those individual drives cannot now be found by smartctl. So I must
be doing something wrong. individually it names /dev/sde1, /dev/sdg1,
and /dev/sdd1. but -h offers no syntax help that works

> As usual you have not bothered to show us what you are talking about
> (the email from smartd), so we are left to guess. We should not
> assume that it even says what you think it says.

copy paste from another shell:
gene@coyote:~$ sudo smartctl -i -d /dev/sde1
[sudo] password for gene:

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-17-rt-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

ERROR: smartctl requires a device name as the final command-line argument.

Ok, got that figure out, the -d specs the interface sat ot whatever.
Then I find the linux has played 52 pickup with the device names.
There are in actual fact 3 sata controller is this machine, the
motherboards 6 ports, 6 more on an inexpensive sata contrller that are
actually the 4 raid10 amsung 870 1T drives, and 4 more on a more
sxpensive 16 port card which has a quartet of 2T gigastone SSD's on it,
but the drives are not found in the order of the controllers. That
raid10 was composed w/o the third controller.

blkid does not sort them in order either. And of coarse does not list
whats unmounted, forcing me to ident the drive by gparted in order to
get its device name. From that I might be able to construct another raid
from the 8T of 4 2T drives but its confusing as hell when the first of
those 2T drives is assigned /dev/sde and the next 4 on the new
controller are /dev/sdi, j, k, & l.

So it appears I have 5 of those gigastones, and sde is the odd one
So that one could be formatted ext4 and serve as a backup of the raid10.
But since I can't copy a locked file, how do I make an image of that
raid10 to /dev/sde and get every byte? That seems like the first step
to me.

> Neither. /dev/sde1 is a partition on a block device.
> /dev/md0p1 is a partition on an mdadm array. Neither one is
> something that smartd works with.

I've got that now.

>
> You probably wanted /dev/sde.
>
> Also, the -d option of smartctl specifies the device type. You
> almost certainly don't need it. If you don't absolutely know why
> you are using -d, don't use it. So:
>
> # smartctl -i /dev/sde
>
> or, heck, get all the info at once:
>
> # smartctl -a /dev/sde
>
> **********************************************************************
> If there is anything in that output that you have questions about,
> please make sure to quote the full and unedited output back here to
> the list, so we aren't left guessing what the subject of
> discussion is.
> **********************************************************************
>
> Thanks,
> Andy
>

/dev/sde1 has been formatted and mounted, what cmd line will copy every
byte including locked files in that that raid10 to it?

Thank you

Charles Curley

unread,

Jan 13, 2024, 12:40:05 AM1/13/24

to

On Fri, 12 Jan 2024 21:42:54 -0500
gene heskett <ghes...@shentel.net> wrote:

> gene@coyote:~$ sudo smartctl -i -d /dev/md0p1

Gene, you could try reading the fine man page. The -d option takes an
argument, which eats the /dev/md0p1, leaving no device for smartctl to
look at. I have no idea what md0p1 is, but I doubt it's a physical
drive, like sda. So that's two problems. And you probably don't need
the -d option.

Try "ls /dev/sd?" and go from there.

--
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/

Andy Smith

unread,

Jan 13, 2024, 10:50:06 AM1/13/24

to

Hi Gene,

On Fri, Jan 12, 2024 at 11:57:23PM -0500, gene heskett wrote:
> On 1/12/24 21:56, Andy Smith wrote:
> > No it doesn't; smartctl works on drives, not mdadm arrays. mdadm
> > arrays are composed of block devices. Therefore any output you get
> > from smartd refers to a storage drive, not an mdadm array.
> >
> This appears to be true, there are 4 1t drives as a raid10, and the various
> messages in that mbox file name 3 of the individual drives.

Messages you do not show us, meanwhile the rest of your report is
littered with errors, so I'm afraid I can't take you at your word
until you show me.

I repeat, smartd only works with whole drives. Those emails will
show device paths for whole drives.

> But those individual drives cannot now be found by smartctl.

You have not yet demonstrated use of a single correct smartctl
command even though I literally told you what to type.

> individually it names /dev/sde1, /dev/sdg1, and
> /dev/sdd1.

I don't believe that you have an email from smartctl saying any of
that. So please show us. Again, it would be plausible for
these emails to mention /dev/sde etc.

> > As usual you have not bothered to show us what you are talking about
> > (the email from smartd), so we are left to guess. We should not
> > assume that it even says what you think it says.
>
> copy paste from another shell:
> gene@coyote:~$ sudo smartctl -i -d /dev/sde1

Here is what I said, which is quoted above, but I'll repeat it here
for emphasis:

> As usual you have not bothered to show us what you are talking

> about (THE EMAIL FROM SMARTD)

You then proceed to show us something that is not the email from
smartd — that is the very topic of your email — but just repeat the
output of a command that I already advised you was erroneously
formed.

> blkid does not sort them in order either. And of coarse does not list whats
> unmounted, forcing me to ident the drive by gparted in order to get its
> device name. From that I might be able to construct another raid from the 8T
> of 4 2T drives but its confusing as hell when the first of those 2T drives
> is assigned /dev/sde and the next 4 on the new controller are /dev/sdi, j,
> k, & l.

WHAT ON EARTH are you talking about. You start off by complaining
about an email that you don't show us, by email two you are on about
tearing your RAID apart and making a new one, all without a shred of
relevant information or the first idea of how to show the status of
anything.

You are working blind here, DO NOT DO ANYTHING until you fully
understand what is going on.

Start with your first concern which was these emails from smartd.
SHOW THEM TO US.

> > or, heck, get all the info at once:
> >
> > # smartctl -a /dev/sde
> >
> > **********************************************************************
> > If there is anything in that output that you have questions about,
> > please make sure to quote the full and unedited output back here to
> > the list, so we aren't left guessing what the subject of
> > discussion is.
> > **********************************************************************
> >
> > Thanks,
> > Andy
> >
> /dev/sde1 has been formatted and mounted, what cmd line will copy every byte
> including locked files in that that raid10 to it?

!?

For the love of God can someone, anyone, any intelligent entity
out there, explain to me how I could have been ANY MORE EXPLICIT
about the need for you to run a single command that I specified and
show us the output of it?

And did you do that?

No, apparently you have nuked a drive that we don't know the status
of.

Incredible.

Let's just assume for a second that we can just ignore everything
you have said previously and focus on your last question about
copying data, why would anyone even both responding given that as
demonstrated here you are prepared to ignore even the most basic
explicit advice and do something insane like nuke a whole drive?

Just what is the point?

Lost for words.

gene heskett

unread,

Jan 13, 2024, 12:10:06 PM1/13/24

to

So am I Andy. Since writing that, and my urge to get rid of a 30+ second
delay on opening ANYTHING that wants write perms to this raid, I've done
this this morning:
used gparted to format to ext4 a single gpt partition on that /dev/sde
with a LABEL=homesde1 but forgot the 1 when editing /etc/fstab to
remount it on a reboot to /mnt/homesde1, which resulted in a failed
boot, look up the root pw and finally get in to fix /etc/fstab for the
missing 1 in the labelname.

but first mounted a 2t gigastone ssd to /mnt/homesde1 which is where it
showed up in an lsblk -f report.
Spent 2+ hours rsync'ing with:
sudo rsync -av /home/ /mnt/homesde1
which worked entirely within the same 6 port controller as this raid10
is running on.

reboot failed, moved the data cable to the motherboard port 5 or 6 (or
maybe 1 or 2, 6 ports, nfi which is 0 and which is 5) but its on the
mobo ports now, should be easily found at boot time.

Finally look up root pw, get in to fix /etc/fstab and get booted.
Talk about portable devicenames, that drive is now /dev/sdk1 !!! And
empty of a LABELname but now has the 360gigs of data I just rsync'd to it.
but on reboot, its now /dev/sdb1 and empty.

from a df:
gene@coyote:~$ df
Filesystem 1K-blocks Used Available Use% Mounted on
udev 16327704 0 16327704 0% /dev
tmpfs 3272684 1888 3270796 1% /run
/dev/sda1 863983352 376505108 443516596 46% /
tmpfs 16363420 1244 16362176 1% /dev/shm
tmpfs 5120 8 5112 1% /run/lock
/dev/sda3 47749868 132 45291728 1% /tmp
/dev/md0p1 1796382580 334985304 1370072300 20% /home
/dev/sdb1 1967892164 28 1867855888 1% /mnt/homesde1
tmpfs 3272684 2544 3270140 1% /run/user/1000
gene@coyote:~$
and gparted now says that indeed, /dev/sdb is the drive with the label
"homesde1" on it. And showing 31GiB used. What for unless thats ext4
overhead. All I can see on /mnt/homesde1 is lost+found, which is empty.

So at this point I still have a home raid10, and have NDI where the he!!
the rsync line actually copied 360 Gb of stuff from home to.
smartctl -a /dev/sdb shows:
gene@coyote:~$ sudo smartctl -a /dev/sdb

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-17-rt-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: Gigastone SSD <- the devices name
Serial Number: GST02TBG221146
Firmware Version: T0917A0
User Capacity: 2,048,408,248,320 bytes [2.04 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available
Device is: Not in smartctl database 7.3/5319
ATA Version is: ACS-3 T13/2161-D revision 4
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sat Jan 13 11:28:50 2024 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection:
Disabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test
has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x11) SMART execute Offline immediate.
No Auto Offline data collection
support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 100 100 050 Old_age
Always - 0
5 Reallocated_Sector_Ct 0x0032 100 100 050 Old_age
Always - 0
9 Power_On_Hours 0x0032 100 100 050 Old_age
Always - 884
12 Power_Cycle_Count 0x0032 100 100 050 Old_age
Always - 5
160 Unknown_Attribute 0x0032 100 100 050 Old_age Always
- 0
161 Unknown_Attribute 0x0033 100 100 050 Pre-fail Always
- 100
163 Unknown_Attribute 0x0032 100 100 050 Old_age Always
- 10
164 Unknown_Attribute 0x0032 100 100 050 Old_age Always
- 46
165 Unknown_Attribute 0x0032 100 100 050 Old_age Always
- 2
166 Unknown_Attribute 0x0032 100 100 050 Old_age Always
- 0
167 Unknown_Attribute 0x0032 100 100 050 Old_age Always
- 0
168 Unknown_Attribute 0x0032 100 100 050 Old_age Always
- 1500
169 Unknown_Attribute 0x0032 100 100 050 Old_age Always
- 100
175 Program_Fail_Count_Chip 0x0032 100 100 050 Old_age Always
- 0
176 Erase_Fail_Count_Chip 0x0032 100 100 050 Old_age Always
- 0
177 Wear_Leveling_Count 0x0032 100 100 050 Old_age Always
- 0
178 Used_Rsvd_Blk_Cnt_Chip 0x0032 100 100 050 Old_age Always
- 0
181 Program_Fail_Cnt_Total 0x0032 100 100 050 Old_age Always
- 0
182 Erase_Fail_Count_Total 0x0032 100 100 050 Old_age Always
- 0
192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age Always
- 4
194 Temperature_Celsius 0x0022 100 100 050 Old_age Always
- 40
195 Hardware_ECC_Recovered 0x0032 100 100 050 Old_age Always
- 0
196 Reallocated_Event_Count 0x0032 100 100 050 Old_age Always
- 0
197 Current_Pending_Sector 0x0032 100 100 050 Old_age Always
- 0
198 Offline_Uncorrectable 0x0032 100 100 050 Old_age Always
- 0
199 UDMA_CRC_Error_Count 0x0032 100 100 050 Old_age Always
- 0
232 Available_Reservd_Space 0x0032 100 100 050 Old_age Always
- 100
241 Total_LBAs_Written 0x0030 100 100 050 Old_age
Offline - 986
242 Total_LBAs_Read 0x0030 100 100 050 Old_age
Offline - 7
245 Unknown_Attribute 0x0032 100 100 050 Old_age Always
- 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported
*********
please $diety, deliver me from linux's vaporous disk naming scheme that
changes faster than the weather. Even device LABEL= does not work. I
mounted that drive by its label to /mnt/homesde1 and rsync'd /home/ to
it but that 360Gb of data went someplace else. Since the data, according
to what I see in gparted, actually went to /dev/sdk1, which is another
of the 2T gigastones, I intend to make a raid6 out of, no harm to my
data is done. My raid10 was not destroyed. But I'm burned out and
frustrated. This is hardware, not a roll of the dice per boot.

I can easily erase and restart that drive for a raid with gparted, But
howinhell do I get a stable drive detection system so I know what I am
doing??????????????????????????????????????????

Besides that, I'm running low on hair too.

gene heskett

unread,

Jan 13, 2024, 12:30:06 PM1/13/24

to

On 1/13/24 10:49, Andy Smith wrote:

One last question before I embark on a replay of what I just did, and
failed at. This time by using device labels.

Does making a raid erase the drives label field in a gpt partition scheme?

That question ought to have a simple yes or no answer.

Thanks.

Andy Smith

unread,

Jan 13, 2024, 1:50:07 PM1/13/24

to

Gene,

On Sat, Jan 13, 2024 at 12:23:28PM -0500, gene heskett wrote:
> Does making a raid erase the drives label field in a gpt partition scheme?
>
> That question ought to have a simple yes or no answer.

I'm forced to conclude that it's a waste of anyone's time to try to
help you since you don't listen to advice, won't even type commands
you are asked to, and don't provide relevant information. On top of
that between any two emails you embark on misguided epic
restructuring of your entire computing environment rendering
anything that was said in between utterly pointless.

At this point you are just howling into the wind and we're hearing
the noise from far away. Please get a blog or something, instead of
directing it here, where well-meaning people can mistake it for
something they could engage with.

I assume this thread can now close since you are never going to
provide the email from smartd that prompted it, for a system that no
longer exists. Otherwise it's destined to become another Gene
megathread with a shifting stream of consciousness topic of the
hour.

Andy

gene heskett

unread,

Jan 13, 2024, 6:30:06 PM1/13/24

to

On 1/13/24 13:41, Andy Smith wrote:
> Gene,
>
> On Sat, Jan 13, 2024 at 12:23:28PM -0500, gene heskett wrote:
>> Does making a raid erase the drives label field in a gpt partition scheme?
>>
>> That question ought to have a simple yes or no answer.
>
> I'm forced to conclude that it's a waste of anyone's time to try to
> help you since you don't listen to advice, won't even type commands
> you are asked to, and don't provide relevant information.

I've furnished exactly what you asked for in previous msgs, until you
ask for something that is NOT installed by bookworm and cannot be found
by synaptic. It must take an extra long ladder to get up on the horse
you are riding.

On top of
> that between any two emails you embark on misguided epic
> restructuring of your entire computing environment rendering
> anything that was said in between utterly pointless.

I'm looking for a solution to a broken install, all caused by the
installer finding a plugged in FDTI usb-serial adapter so it
automatically assumed I was blind and automatically installed brltty and
orca, which are not removable once installed without ripping the system
apart rendering it unbootable. If orca is disabled, the system will
_NOT_ reboot. And I catch hell for discriminating against the blind when
I complained at the time.

That took me 20+ installs to get this far because if I removed the exec
bits on orca, disabling it=no reboot=yet another re-install go thru the
same thing with orca yelling at me for every keystroke entered, till
someone took pity on me and wrote to unplug the usb stuff which looks
like a weeping willow tree here, nothing more or less.

And I'm forced to conclude that a simple yes or no answer to what looks
like a single, simple question to me, included above, is beyond you.
Surely there is someone who /can/ answer that question.

Do take care, stay warm, dry and well Andy. And unvaxed, so you might
live to be my age.

I am not. You are helpful, just not to me. That, I do not understand. It
comes across to me that you have no time for anyone north of 50 years
old, we are too dumb to be help when something goes south. The only
part of the advanced age category I fit into is the poor short term
memory of someone 89 years old, which I am.

Nicolas George

unread,

Jan 14, 2024, 5:40:05 AM1/14/24

to

Hi.

Andy Smith (12024-01-13):

> As usual you have not bothered to show us what you are talking about
> (the email from smartd)

And that leads you to write a patient and detailed answer, so surely it
was the best way to proceed.

Regards,

--
Nicolas George

David Christensen

unread,

Jan 14, 2024, 7:50:07 AM1/14/24

to

Re-ordered for clarity -- David.

On 1/12/24 18:42, gene heskett wrote:
> I just found an mbox file in my home directory, containing about 90 days
> worth of undelivered msgs from smartctl running as root.

Do you know how the mbox file got there?

> smartctl says my raid10 is dying, ...

Please post a console session with a command that displays the message.

On 1/12/24 20:57, gene heskett wrote:
> ... there are 4 1t drives as a raid10, and the

> various messages in that mbox file name 3 of the individual drives.

Please post a representative sample of the messages.

> Then I find the linux has played 52 pickup with the device names.

/dev/sd* device node names are unpredictable. The traditional solution
is UUID's. Linux added /dev/disk/by-id/* a while ago and I am starting
to use them as much as possible. Make sure you look very carefully at
the serial numbers when you have several drives of the same make and model.

> There are in actual fact 3 sata controller is this machine, the
> motherboards 6 ports, 6 more on an inexpensive sata contrller that are
> actually the 4 raid10 amsung 870 1T drives, and 4 more on a more
> sxpensive 16 port card which has a quartet of 2T gigastone SSD's on it,
> but the drives are not found in the order of the controllers. That
> raid10 was composed w/o the third controller.

So:

* /home is on a RAID 10 with 2 @ mirror of 2 @ 1 TB Samsung 870 SSD?

* 4 @ 2 TB Gigastone SSD for a new RAID 10?

What drives are connected to which ports?

What is on the other 20 ports?

> blkid does not sort them in order either. And of coarse does not list
> whats unmounted, forcing me to ident the drive by gparted in order to
> get its device name. From that I might be able to construct another raid
> from the 8T of 4 2T drives but its confusing as hell when the first of
> those 2T drives is assigned /dev/sde and the next 4 on the new
> controller are /dev/sdi, j, k, & l.

> So it appears I have 5 of those gigastones, and sde is the odd one

I am confused -- do you have 4 or 5 Gigastone 2 TB SSD?

> So that one could be formatted ext4 and serve as a backup of the raid10.

> how do I make an image of that
> raid10 to /dev/sde and get every byte? That seems like the first step
> to me.

Please get a USB 3.x HDD, do a full backup of your entire computer, put
it off-site, get another USB 3.x HDD, do another full backup, and keep
it nearby.

> But since I can't copy a locked file,

What file is lock? Please post a console session that demonstrates.

> /dev/sde1 has been formatted and mounted, what cmd line will copy every
> byte including locked files in that that raid10 to it?

See above for locked. Otherwise, I suggest rsync(1).

On 1/13/24 09:02, gene heskett wrote:
> ... I've done this this morning:

What does "NDI" mean?

> where the he!!
> the rsync line actually copied 360 Gb of stuff from home to.

Please post:

# ls -a /mnt

# ls -a /mnt/homesde1

> gene@coyote:~$ sudo smartctl -a /dev/sdb

Please use /dev/disk/by-id/* paths.

> ...

> Device Model: Gigastone SSD <- the devices name
> Serial Number: GST02TBG221146

> ...

> User Capacity: 2,048,408,248,320 bytes [2.04 TB]
> Sector Size: 512 bytes logical/physical

> ...

> Form Factor: 2.5 inches
> TRIM Command: Available

> ...

> SMART overall-health self-assessment test result: PASSED

Okay.

> SMART Attributes Data Structure revision number: 1
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x0032 100 100 050 Old_age
> Always - 0
> 5 Reallocated_Sector_Ct 0x0032 100 100 050 Old_age
> Always - 0
> 9 Power_On_Hours 0x0032 100 100 050 Old_age
> Always - 884

> ...

Attribute values of 100 for all VALUE and WORST makes sense for a brand
new drive, which contradicts "Old_age" and "Pre-fail" (?).

How long has that drive been in your computer? How many hours per week
has it been on?

> SMART Error Log Version: 1
> No Errors Logged

Okay.

> SMART Self-test log structure revision number 1
> No self-tests have been logged. [To run self-tests, use: smartctl -t]

/sd
Please run:

# smartctl -t short /dev/disk/by-id/*GST02TBG221146*

Wait a few minutes. Then run:

# smartctl -x short /dev/disk/by-id/*GST02TBG221146*

We do not need to see the whole output, but please post the "SMART
Self-test log structure revision number 1" section. It should show the
short test.

> please $diety, deliver me from linux's vaporous disk naming scheme that
> changes faster than the weather. Even device LABEL= does not work. I
> mounted that drive by its label to /mnt/homesde1 and rsync'd /home/ to
> it but that 360Gb of data went someplace else.

Use script(1) to capture your console sessions. Save the files it
generates. Use cat(1) to display their contents.

> Since the data, according
> to what I see in gparted, actually went to /dev/sdk1,
> which is another
> of the 2T gigastones, I intend to make a raid6 out of, no harm to my
> data is done. My raid10 was not destroyed. But I'm burned out and
> frustrated. This is hardware, not a roll of the dice per boot.
>
> I can easily erase and restart that drive for a raid with gparted, But
> howinhell do I get a stable drive detection system so I know what I am
> doing??????????????????????????????????????????
>
> Besides that, I'm running low on hair too.

Please use /dev/disk/by-id/* paths.

On 1/13/24 09:23, gene heskett wrote:
> One last question before I embark on a replay of what I just did, and
> failed at. This time by using device labels.
>

> Does making a raid erase the drives label field in a gpt partition
scheme?

It has been several years since I used mdadm(8), but I suspect the
answer depends upon whether you give whole disks as arguments or give
disk partitions as arguments. If you give whole disks, it would not
surprise me if mdadm(8) overwrote any and all sectors as it pleased,
including MBR and/or GPT partition tables. If you give partitions, I
would expect mdadm(8) to not write outside those partitions, so GPT
labels should be untouched. Please also see mdadm(8) "CREATE MODE" and
the discussion about partition type and version-1.x metadata.

On 1/13/24 15:21, gene heskett wrote:
> I'm looking for a solution to a broken install, all caused by the
> installer finding a plugged in FDTI usb-serial adapter so it
> automatically assumed I was blind and automatically installed brltty and
> orca, which are not removable once installed without ripping the system
> apart rendering it unbootable. If orca is disabled, the system will
> _NOT_ reboot. And I catch hell for discriminating against the blind when
> I complained at the time.
>
> That took me 20+ installs to get this far because if I removed the exec
> bits on orca, disabling it=no reboot=yet another re-install go thru the
> same thing with orca yelling at me for every keystroke entered, till
> someone took pity on me and wrote to unplug the usb stuff which looks
> like a weeping willow tree here, nothing more or less.

Do you have a USB drive with an installation of Debian? If not, build
one. I used SanDisk Ultra Fit USB 3.0 16 GB drives for many years. Now
I use Samsung UM410 16 GB SSD's and a StarTech USB to SATA adapter cable.

Then disconnect all the drives except the 4 @ 1 TB SSD's for the RAID10
/home, boot your USB Debian drive, assemble the RAID10, mount the file
system read-only, and test for the 30 second delay.

David

Steve McIntyre

unread,

Jan 14, 2024, 8:40:06 AM1/14/24

to

Gene Heskett wrote:
>
>I'm looking for a solution to a broken install, all caused by the
>installer finding a plugged in FDTI usb-serial adapter so it
>automatically assumed I was blind and automatically installed brltty and
>orca, which are not removable once installed without ripping the system
>apart rendering it unbootable. If orca is disabled, the system will
>_NOT_ reboot. And I catch hell for discriminating against the blind when
>I complained at the time.
>
>That took me 20+ installs to get this far because if I removed the exec
>bits on orca, disabling it=no reboot=yet another re-install go thru the
>same thing with orca yelling at me for every keystroke entered, till
>someone took pity on me and wrote to unplug the usb stuff which looks
>like a weeping willow tree here, nothing more or less.

Gene, *stop* doing this.

When you ask for help with *one* issue (in this case, smartctl),
looping around other issues you've had in the last few years *does not
help*. It's unrelated, it obfuscates what you're saying, and it's
*intensely* frustrating to the people here who might actually be
trying to help you.

You do this *a lot*. Please focus on one thing at a time, and we might
be able to help you better. The usual advice applies:

* Give a clear description of the problem you're seeing. Include
command lines and command output, log entries or similar. Make it
possible for people to actually identify the problem - vague
descriptions make it much harder.

* When people reply to you asking for more information, they're not
doing that to annoy you. They're trying to understand the problem
you've reported more, so they can help you fix it. If they ask you
to run extra commands and report the output, *please do that*.

* Stay on topic. If you have another issue you'd like help with, send
a separate mail about that and have a separate thread of
conversation about that issue. Don't mix things up.

Please think about this, and help people to help you.

>And I'm forced to conclude that a simple yes or no answer to what looks
>like a single, simple question to me, included above, is beyond you.
>Surely there is someone who /can/ answer that question.
>
>Do take care, stay warm, dry and well Andy. And unvaxed, so you might
>live to be my age.
>
>I am not. You are helpful, just not to me. That, I do not understand. It
>comes across to me that you have no time for anyone north of 50 years
>old, we are too dumb to be help when something goes south. The only
>part of the advanced age category I fit into is the poor short term
>memory of someone 89 years old, which I am.

It's nothing to do with your age. You keep on bringing this up. People
are volunteering their time to help you. When you don't pay attention
and go wandering off-topic it makes it much harder for people to
help. I hope you can understand that.

--
Steve McIntyre, Cambridge, UK. st...@einval.com
Can't keep my eyes from the circling sky,
Tongue-tied & twisted, Just an earth-bound misfit, I...

Andy Smith

unread,

Jan 14, 2024, 11:30:06 AM1/14/24

to

Hello,

On Sun, Jan 14, 2024 at 11:37:08AM +0100, Nicolas George wrote:
> Andy Smith (12024-01-13):
> > As usual you have not bothered to show us what you are talking about
> > (the email from smartd)
>
> And that leads you to write a patient and detailed answer, so surely it
> was the best way to proceed.

If only Gene had signalled that no matter what I had written, he was
going to nuke it all and start again anyway. Then I could have given
a detailed answer piped from /dev/urandom and nothing would have
changed for anyone involved.

gene heskett

unread,

Jan 14, 2024, 2:50:06 PM1/14/24

to

On 1/14/24 07:42, David Christensen wrote:
> Re-ordered for clarity -- David.

And snipped by Gene as I updated

>
> On 1/12/24 18:42, gene heskett wrote:
>> I just found an mbox file in my home directory, containing about 90
>> days worth of undelivered msgs from smartctl running as root.
>
>
> Do you know how the mbox file got there?

No, it just appeared.

>
>
>> smartctl says my raid10 is dying, ...
>
>
> Please post a console session with a command that displays the message.

This is a copy/paste of the second message in that file, the first from
smartctl, followed by the last message in that file:

From ro...@coyote.coyote.den Wed Nov 02 00:29:05 2022
Return-path: <ro...@coyote.coyote.den>
Envelope-to: ro...@coyote.coyote.den
Delivery-date: Wed, 02 Nov 2022 00:29:05 -0400
Received: from root by coyote.coyote.den with local (Exim 4.94.2)
(envelope-from <ro...@coyote.coyote.den>)
id 1oq5NB-000DBx-15
for ro...@coyote.coyote.den; Wed, 02 Nov 2022 00:29:05 -0400
To: ro...@coyote.coyote.den
Subject: SMART error (SelfTest) detected on host: coyote
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Message-Id: <E1oq5NB-...@coyote.coyote.den>
From: root <ro...@coyote.coyote.den>
Date: Wed, 02 Nov 2022 00:29:05 -0400
Content-Length: 513
Lines: 16
Status: RO
X-Status:
X-Keywords:
X-UID: 2

This message was generated by the smartd daemon running on:

host name: coyote
DNS domain: coyote.den

The following warning/error was logged by the smartd daemon:

Device: /dev/sde [SAT], Self-Test Log error count increased from 0 to 1

Device info:
Samsung SSD 870 EVO 1TB, S/N:S626NF0R302507V, WWN:5-002538-f413394ae,
FW:SVT01B6Q, 1.00 TB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
Another message will be sent in 24 hours if the problem persists.
======= 3 more identical msgs refering to the other 3 drives in the
raid.=====
From ro...@coyote.coyote.den Wed Nov 16 06:22:02 2022
Return-path: <ro...@coyote.coyote.den>
Envelope-to: ro...@coyote.coyote.den
Delivery-date: Wed, 16 Nov 2022 06:22:02 -0500
Received: from root by coyote.coyote.den with local (Exim 4.94.2)
(envelope-from <ro...@coyote.coyote.den>)
id 1ovGUR-0000De-Bc
for ro...@coyote.coyote.den; Wed, 16 Nov 2022 06:21:59 -0500
To: ro...@coyote.coyote.den
Subject: SMART error (SelfTest) detected on host: coyote
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Message-Id: <E1ovGUR-...@coyote.coyote.den>
From: root <ro...@coyote.coyote.den>
Date: Wed, 16 Nov 2022 06:21:59 -0500
Content-Length: 592
Lines: 17
Status: RO
X-Status:
X-Keywords:
X-UID: 9

This message was generated by the smartd daemon running on:

host name: coyote
DNS domain: coyote.den

The following warning/error was logged by the smartd daemon:

Device: /dev/sdd [SAT], Self-Test Log error count increased from 1 to 2

Device info:
Samsung SSD 870 EVO 1TB, S/N:S626NF0R302502E, WWN:5-002538-f413394a9,
FW:SVT01B6Q, 1.00 TB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Wed Nov 2 06:59:04
2022 EDT
Another message will be sent in 24 hours if the problem persists.

I also note they are now very old messages but the file itself is dated
Jan 7nth. And syslog has been rotated several times since.

I'm not expert at interpreting smartctl reports, but I do not see such
in the smarttcl output now. going backwads thru the list, the 4th drive
in the raid has had 3334 errors, as had the third drive with 3332
ettors, the 1st and 2nd are clean.

One stanza of the error report:
Error 3328 occurred at disk power-on lifetime: 21027 hours (876 days + 3
hours)
When the command that caused the error occurred, the device was
active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 28 00 54 a9 40 Error: UNC at LBA = 0x00a95400 = 11097088

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 28 00 54 a9 40 05 15:16:34.891 READ FPDMA QUEUED
61 18 18 e8 ea 67 40 03 15:16:34.891 WRITE FPDMA QUEUED
60 00 10 00 5e a9 40 02 15:16:34.891 READ FPDMA QUEUED
60 28 08 00 f4 87 40 01 15:16:34.891 READ FPDMA QUEUED
60 00 00 00 7c a9 40 00 15:16:34.891 READ FPDMA QUEUED

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 50% 10917
1847474376
# 2 Extended offline Completed: read failure 50% 10586
1847474376

So half the samsung 870's are on their way out. But nothing recent...
So I am now trying to get a good rsync copy on another drive.

>
> On 1/12/24 20:57, gene heskett wrote:
> > ... there are 4 1t drives as a raid10, and the

> > various messages in that mbox file name all of the individual drives.

>
>
> Please post a representative sample of the messages.

See above, most of it is swahili to me.

>
> > Then I find the linux has played 52 pickup with the device names.
>
>
> /dev/sd* device node names are unpredictable. The traditional solution
> is UUID's. Linux added /dev/disk/by-id/* a while ago and I am starting
> to use them as much as possible. Make sure you look very carefully at
> the serial numbers when you have several drives of the same make and model.
>
>
> > There are in actual fact 3 sata controller is this machine, the

> > motherboards 6 ports, 6 more on an inexpensive sata controller that are
> > actually the 4 raid10 Samsung 870 1T drives, and 4 more on a more

> > sxpensive 16 port card which has a quartet of 2T gigastone SSD's on it,
> > but the drives are not found in the order of the controllers. That
> > raid10 was composed w/o the third controller.
>
>
> So:
>
> * /home is on a RAID 10 with 2 @ mirror of 2 @ 1 TB Samsung 870 SSD?

I think thasts what you call a raid10

> * 4 @ 2 TB Gigastone SSD for a new RAID 10?

just installed, not mounted or made into a raid yet. WIP?

>
> What drives are connected to which ports?

4 Samsung 870 1T's are on the 1st added controller.
ATM 5, 2T gigastone's are on the 2nd, 16 port added controller
smarttcl says all 5 of those are fine.

>
>
> What is on the other 20 ports?

On the mobo? A big dvd writer and 2 other half T or 1T samsung drives
from earlier 860 runs, not currently mounted. No spinning rust anyplace
now. I don't appreciate being a lab rat for seagate to experiment on.
A current lsblk:
gene@coyote:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 931.5G 0 disk
├─sda1 8:1 0 838.2G 0 part /
├─sda2 8:2 0 46.8G 0 part [SWAP]
└─sda3 8:3 0 46.6G 0 part /tmp

sdb 8:16 1 0B 0 disk is probably my camera, currently
plugged in

sdc 8:32 1 0B 0 disk is probably my brother
MFP-J6920DW printer, always plugged in
first controller, 6 port
sdd 8:48 0 931.5G 0 disk
├─sdd1 8:49 0 900G 0 part
│ └─md0 9:0 0 1.7T 0 raid10
│ └─md0p1 259:0 0 1.7T 0 part /home
├─sdd2 8:50 0 30G 0 part
│ └─md1 9:1 0 60G 0 raid10 [SWAP]
└─sdd3 8:51 0 1.5G 0 part
└─md2 9:2 0 3G 0 raid10
sde 8:64 0 931.5G 0 disk
├─sde1 8:65 0 900G 0 part
│ └─md0 9:0 0 1.7T 0 raid10
│ └─md0p1 259:0 0 1.7T 0 part /home
├─sde2 8:66 0 30G 0 part
│ └─md1 9:1 0 60G 0 raid10 [SWAP]
└─sde3 8:67 0 1.5G 0 part
└─md2 9:2 0 3G 0 raid10
sdf 8:80 0 931.5G 0 disk
├─sdf1 8:81 0 900G 0 part
│ └─md0 9:0 0 1.7T 0 raid10
│ └─md0p1 259:0 0 1.7T 0 part /home
├─sdf2 8:82 0 30G 0 part
│ └─md1 9:1 0 60G 0 raid10 [SWAP]
└─sdf3 8:83 0 1.5G 0 part
└─md2 9:2 0 3G 0 raid10
sdg 8:96 0 931.5G 0 disk
├─sdg1 8:97 0 900G 0 part
│ └─md0 9:0 0 1.7T 0 raid10
│ └─md0p1 259:0 0 1.7T 0 part /home
├─sdg2 8:98 0 30G 0 part
│ └─md1 9:1 0 60G 0 raid10 [SWAP]
└─sdg3 8:99 0 1.5G 0 part
└─md2 9:2 0 3G 0 raid10

2nd controller, 16 ports, all 5 2T gigastone's
sdh 8:112 0 1.9T 0 disk
└─sdh1 8:113 0 1.9T 0 part
sdi 8:128 0 1.9T 0 disk
└─sdi1 8:129 0 1.9T 0 part
sdj 8:144 0 1.9T 0 disk
└─sdj1 8:145 0 1.9T 0 part
sdk 8:160 0 1.9T 0 disk
└─sdk1 8:161 0 1.9T 0 part
sdl 8:176 0 1.9T 0 disk
└─sdl1 8:177 0 1.9T 0 part
sr0 11:0 1 1024M 0 rom The internal dvd writer
gene@coyote:~$

>
>
> > blkid does not sort them in order either. And of coarse does not list
> > whats unmounted, forcing me to ident the drive by gparted in order to
> > get its device name. From that I might be able to construct another raid
> > from the 8T of 4 2T drives but its confusing as hell when the first of
> > those 2T drives is assigned /dev/sde and the next 4 on the new
> > controller are /dev/sdi, j, k, & l.
> > So it appears I have 5 of those gigastones, and sde is the odd one

Which when it was /dev/sde1, was plugged into the 1st extra controller
When the data cable was plugged into a motherboard port, it became
/dev/sdb1. So I've relabeled it, and about to test it on the second 16
port controller.

>
>
> I am confused -- do you have 4 or 5 Gigastone 2 TB SSD?

5, ordered in 2 separate orders.

>
> > So that one could be formatted ext4 and serve as a backup of the raid10.

What I am trying to do now, but cannot if it is plugged into a
motherboard port, hence the repeat of this exercise on the 2nd sata card.

>
> > how do I make an image of that
> > raid10 to /dev/sde and get every byte? That seems like the first step
> > to me.

This I am still trying to do, the first pass copied all 350G of /home
but went to the wrong drive, and I had mounted the drive by its label.
It is now /dev/sdh and all labels above it are now wrong. Crazy.
These SSD's all have an OTP serial number. I am tempted to use that
serial number as a label _I_ can control. And according to gparted,
labels do not survive being incorporated into a raid as the raid is all
labeled with hostname : partition number. So there really is no way in
linux to define a drive that is that drive forever. Unreal...

> Please get a USB 3.x HDD, do a full backup of your entire computer, put
> it off-site, get another USB 3.x HDD, do another full backup, and keep
> it nearby

That, using amanda is the end target of this. But I have bought 3 such
spinning rust drives over the years and not had any survive being hot
plugged into a usb port more than twice.

With that track record, I'll not waste any more money down that rabbit hole.

>
> > But since I can't copy a locked file,
>
>
> What file is lock? Please post a console session that demonstrates.

A file that is opened but not closed is exclusive to that app and its
lock, and cannot be copied except by rsync, or so I have been told. And
there are quite a few such open locks on this system right now. This
killed my full housed amiga when the boot drive with all its custom
scripts died, and I found the backups I had were totally devoid of any
of those scripts. I still have about 20 QIC tapes from that machine, but
now no drives to read them. I need to cull the midden heap.

>
> > /dev/sde1 has been formatted and mounted, what cmd line will copy every
> > byte including locked files in that that raid10 to it?
>
>
> See above for locked. Otherwise, I suggest rsync(1).
>

[...]
Thank you David.

David Wright

unread,

Jan 14, 2024, 7:50:06 PM1/14/24

to

On Sun 14 Jan 2024 at 14:48:49 (-0500), gene heskett wrote:
> On 1/14/24 07:42, David Christensen wrote:

> > I am confused -- do you have 4 or 5 Gigastone 2 TB SSD?
>
> 5, ordered in 2 separate orders.
> >
> > > So that one could be formatted ext4 and serve as a backup of the raid10.
> What I am trying to do now, but cannot if it is plugged into a
> motherboard port, hence the repeat of this exercise on the 2nd sata
> card.
> >
> > > how do I make an image of that
> > > raid10 to /dev/sde and get every byte? That seems like the first step
> > > to me.
> This I am still trying to do, the first pass copied all 350G of /home
> but went to the wrong drive, and I had mounted the drive by its label.
> It is now /dev/sdh and all labels above it are now wrong. Crazy.
> These SSD's all have an OTP serial number. I am tempted to use that
> serial number as a label _I_ can control. And according to gparted,
> labels do not survive being incorporated into a raid as the raid is
> all labeled with hostname : partition number. So there really is no
> way in linux to define a drive that is that drive forever. Unreal...

Interesting to see in how many differents ways you can use the
term "label". BTW I have no idea what an "OTP serial number" is.

On Sun 14 Jan 2024 at 16:47:41 (-0500), gene heskett wrote:
> ene@coyote:~/src/klipper-docs$ sudo smartctl -a /dev/sde
> [sudo] password for gene:

> smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-17-rt-amd64] (local build)
> Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
>

> === START OF INFORMATION SECTION ===
> Device Model: Gigastone SSD

> Serial Number: GST02TBG221146

↑↑↑↑↑↑↑↑↑↑↑↑↑↑

You see that there? You should find this symlink on your system:

/dev/disk/by-id/… …GST02TBG221146

pointing at some random /dev/sdX. Where you put /dev/sdX,
put /dev/disk/by-id/… …GST02TBG221146 instead. Then you'll
know you're referring to that disk. Likewise the others.
(As already suggested by David C, Sun, 14 Jan 2024 04:41:51 -0800)

Cheers,
David.

gene heskett

unread,

Jan 14, 2024, 8:20:05 PM1/14/24

to

On 1/14/24 19:48, David Wright wrote:
> On Sun 14 Jan 2024 at 14:48:49 (-0500), gene heskett wrote:
>> On 1/14/24 07:42, David Christensen wrote:
>
>>> I am confused -- do you have 4 or 5 Gigastone 2 TB SSD?
>>
>> 5, ordered in 2 separate orders.
>>>
>>> > So that one could be formatted ext4 and serve as a backup of the raid10.
>> What I am trying to do now, but cannot if it is plugged into a
>> motherboard port, hence the repeat of this exercise on the 2nd sata
>> card.
>>>
>>> > how do I make an image of that
>>> > raid10 to /dev/sde and get every byte? That seems like the first step
>>> > to me.
>> This I am still trying to do, the first pass copied all 350G of /home
>> but went to the wrong drive, and I had mounted the drive by its label.
>> It is now /dev/sdh and all labels above it are now wrong. Crazy.
>> These SSD's all have an OTP serial number. I am tempted to use that
>> serial number as a label _I_ can control. And according to gparted,
>> labels do not survive being incorporated into a raid as the raid is
>> all labeled with hostname : partition number. So there really is no
>> way in linux to define a drive that is that drive forever. Unreal...
>
> Interesting to see in how many differents ways you can use the
> term "label". BTW I have no idea what an "OTP serial number" is.
>

OTP=One Time Pad, never to be used again.

> On Sun 14 Jan 2024 at 16:47:41 (-0500), gene heskett wrote:
>> ene@coyote:~/src/klipper-docs$ sudo smartctl -a /dev/sde
>> [sudo] password for gene:
>> smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-17-rt-amd64] (local build)
>> Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Device Model: Gigastone SSD
>> Serial Number: GST02TBG221146
>
> ↑↑↑↑↑↑↑↑↑↑↑↑↑↑
>
> You see that there? You should find this symlink on your system:
>
> /dev/disk/by-id/… …GST02TBG221146
>
> pointing at some random /dev/sdX. Where you put /dev/sdX,
> put /dev/disk/by-id/… …GST02TBG221146 instead. Then you'll
> know you're referring to that disk. Likewise the others.
> (As already suggested by David C, Sun, 14 Jan 2024 04:41:51 -0800)
>
> Cheers,
> David.
>

David Christensen

unread,

Jan 15, 2024, 2:50:07 AM1/15/24

to

On 1/14/24 11:48, gene heskett wrote:
> On 1/14/24 07:42, David Christensen wrote:
>> Re-ordered for clarity -- David.
> And snipped by Gene as I updated
>>
>> On 1/12/24 18:42, gene heskett wrote:
>>> I just found an mbox file in my home directory, containing about 90
>>> days worth of undelivered msgs from smartctl running as root.
>> Do you know how the mbox file got there?
> No, it just appeared.
>>
>>> smartctl says my raid10 is dying, ...
>>
>>
>> Please post a console session with a command that displays the message.
> This is a copy/paste of the second message in that file, the first from
> smartctl, followed by the last message in that file:
>
> From ro...@coyote.coyote.den Wed Nov 02 00:29:05 2022
> Return-path: <ro...@coyote.coyote.den>
> Envelope-to: ro...@coyote.coyote.den
> Delivery-date: Wed, 02 Nov 2022 00:29:05 -0400
> Received: from root by coyote.coyote.den with local (Exim 4.94.2)

It looks like you configured Exim to put root's mailbox in your home
directory, to make it easier to read (?).

> ...

> I also note they are now very old messages but the file itself is dated
> Jan 7nth. And syslog has been rotated several times since.
>
> I'm not expert at interpreting smartctl reports, but I do not see such
> in the smarttcl output now. going backwads thru the list, the 4th drive
> in the raid has had 3334 errors, as had the third drive with 3332
> ettors, the 1st and 2nd are clean.
>
> One stanza of the error report:
> Error 3328 occurred at disk power-on lifetime: 21027 hours (876 days + 3
> hours)

I believe "3328" is an error number, not the quantity of errors -- the
smartd mail said the count increased from 0 to 1.

> SMART Self-test log structure revision number 1
> Num Test_Description    Status                  Remaining
> LifeTime(hours) LBA_of_first_error
> # 1 Extended offline    Completed: read failure       50%     10917
> 1847474376
> # 2 Extended offline    Completed: read failure       50%     10586
> 1847474376
>
> So half the samsung 870's are on their way out. But nothing recent... So
> I am now trying to get a good rsync copy on another drive.

Before you conclude that two of the Samsung 870 1 TB's are dying, please
run a SMART short test on all four:

# smartctl -t short /dev/disk/by-id/...

What a few minutes for the test to complete (10 minutes should be more
than enough).

Then get full SMART reports and save them to files:

# smartctl -x /dev/disk/by-id/... >
YYYYMMDD-HHMM-smartctl-x-MANF-MODEL-SERIAL.out

Then upload the SMART reports someplace we can see them and post the URL's.

>> * /home is on a RAID 10 with 2 @ mirror of 2 @ 1 TB Samsung 870 SSD?
>
> I think thasts what you call a raid10

Okay.

>> * 4 @ 2 TB Gigastone SSD for a new RAID 10?
>
> just installed, not mounted or made into a raid yet. WIP?

Okay.

>> What drives are connected to which ports?
>
> 4 Samsung 870 1T's are on the 1st added controller.
> ATM 5, 2T gigastone's are on the 2nd, 16 port added controller
> smarttcl says all 5 of those are fine.

Okay.

>> What is on the other 20 ports?
> On the mobo? A big dvd writer and 2 other half T or 1T samsung drives
> from earlier 860 runs, not currently mounted.
> No spinning rust anyplace

> now. ...

> A current lsblk:
> gene@coyote:~$ lsblk
> NAME        MAJ:MIN RM   SIZE RO TYPE   MOUNTPOINTS
> sda           8:0    0 931.5G 0 disk
> ├─sda1        8:1    0 838.2G 0 part   /
> ├─sda2        8:2    0 46.8G 0 part   [SWAP]
> └─sda3        8:3    0 46.6G 0 part   /tmp
>
> sdb           8:16   1     0B 0 disk   is probably my camera, currently
> plugged in
>
> sdc           8:32   1     0B 0 disk   is probably my brother
> MFP-J6920DW printer, always plugged in

So, your OS disk is a Samsung 1 TB SSD on port /dev/sda.

I do not see the second Samsung SSD (?). I would use dmesg(1) and
grep(1) to figure out what /dev/sdb and /dev/sdc are:

# dmesg | egrep '/dev/sd[bc]'

Okay. Those are the 4 @ Samsung 870 1 TB SSD's.

It looks like you partitioned them for three RAID10'sP:

1. 900 GB first partitions for /home RAID10.

2. 30 GB second partitions for swap RAID10.

3. What are the 1.5 GB third partitions for?

> 2nd controller, 16 ports, all 5 2T gigastone's
> sdh           8:112 0   1.9T 0 disk
> └─sdh1        8:113 0   1.9T 0 part
> sdi           8:128 0   1.9T 0 disk
> └─sdi1        8:129 0   1.9T 0 part
> sdj           8:144 0   1.9T 0 disk
> └─sdj1        8:145 0   1.9T 0 part
> sdk           8:160 0   1.9T 0 disk
> └─sdk1        8:161 0   1.9T 0 part
> sdl           8:176 0   1.9T 0 disk
> └─sdl1        8:177 0   1.9T 0 part
> sr0          11:0    1 1024M 0 rom The internal dvd writer
> gene@coyote:~$

Those are the 5 @ Gigastone 2 TB SSD's, with one big partition on each.

>> > blkid does not sort them in order either. And of coarse does not list
>> > whats unmounted, forcing me to ident the drive by gparted in order to
>> > get its device name. From that I might be able to construct another
>> raid
>> > from the 8T of 4 2T drives but its confusing as hell when the first of
>> > those 2T drives is assigned /dev/sde and the next 4 on the new
>> > controller are /dev/sdi, j, k, & l.

Use /dev/disk/by-id/* paths when referring to drives.

>> > So it appears I have 5 of those gigastones, and sde is the odd one
> Which when it was /dev/sde1, was plugged into the 1st extra controller
> When the data cable was plugged into a motherboard port, it became
> /dev/sdb1. So I've relabeled it, and about to test it on the second 16
> port controller. >>
>>
>> I am confused -- do you have 4 or 5 Gigastone 2 TB SSD?
>
> 5, ordered in 2 separate orders.
>>
>> > So that one could be formatted ext4 and serve as a backup of the
>> raid10.
> What I am trying to do now, but cannot if it is plugged into a
> motherboard port, hence the repeat of this exercise on the 2nd sata card.
>>
>> > how do I make an image of that
>> > raid10 to /dev/sde and get every byte? That seems like the first
>> step
>> > to me.
> This I am still trying to do, the first pass copied all 350G of /home
> but went to the wrong drive, and I had mounted the drive by its label.
> It is now /dev/sdh and all labels above it are now wrong. Crazy.
> These SSD's all have an OTP serial number. I am tempted to use that
> serial number as a label _I_ can control.

When I built and ran a Debian 2 @ HDD RAID1 using mdadm(8), I did not
partiton the HDD's -- I gave mdadm(8) the whole drives.

> And according to gparted,
> labels do not survive being incorporated into a raid as the raid is all
> labeled with hostname : partition number. So there really is no way in
> linux to define a drive that is that drive forever. Unreal...

Do what I did -- forget partitions and give the whole SSD's to mdadm(8).
Make sure you zero or secure erase them first.

>> Please get a USB 3.x HDD, do a full backup of your entire computer,
>> put it off-site, get another USB 3.x HDD, do another full backup, and
>> keep it nearby
>
> That, using amanda is the end target of this. But I have bought 3 such
> spinning rust drives over the years and not had any survive being hot
> plugged into a usb port more than twice.
>
> With that track record, I'll not waste any more money down that rabbit
> hole.

Okay. I would not mind two big USB 3.x SSD's for backups, but I cannot
justify the expense.

>> > But since I can't copy a locked file,
>>
>> What file is lock? Please post a console session that demonstrates.
>
> A file that is opened but not closed is exclusive to that app and its
> lock, and cannot be copied except by rsync, or so I have been told.

AIUI that depends upon how locks are implemented -- advisory or enforced.

That said, you want to backup files when they are closed. Coordinating
applications, services, and backups such that you obtain correct and
consistent backup files every time is non-trivial. My SOHO network is
easy -- I close all apps, do not use any services, and run my backup script.

> And there are quite a few such open locks on this system right now.

If you installed Debian onto a USB drive (flash or SSD), you could boot
that, mount your disks/ RAID's read-only, and run your backups without
any open or locked files.

> This
> killed my full housed amiga when the boot drive with all its custom
> scripts died, and I found the backups I had were totally devoid of any
> of those scripts.

That is a good reason to validate your backup/ restore processes.

> I still have about 20 QIC tapes from that machine, but
> now no drives to read them. I need to cull the midden heap.

That is a good reason to backup/ archive onto multiple media types.

David

gene heskett

unread,

Jan 15, 2024, 8:50:06 AM1/15/24

to

There is a big horse-fly in that soup !!!
I moved the data cable to where I knew I could find it again, as one of
5 drives attached to the 16 port card, and on reboot it shows up in an
lsblk list as:
root@coyote:~# lsblk

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 931.5G 0 disk
├─sda1 8:1 0 838.2G 0 part /
├─sda2 8:2 0 46.8G 0 part [SWAP]
└─sda3 8:3 0 46.6G 0 part /tmp
sdb 8:16 1 0B 0 disk

sdc 8:32 1 0B 0 disk

sdh 8:112 0 1.9T 0 disk
└─sdh1 8:113 0 1.9T 0 part <<<
sdi 8:128 0 1.9T 0 disk
└─sdi1 8:129 0 1.9T 0 part <<<
sdj 8:144 0 1.9T 0 disk
└─sdj1 8:145 0 1.9T 0 part <<<
sdk 8:160 0 1.9T 0 disk
└─sdk1 8:161 0 1.9T 0 part <<<
sdl 8:176 0 1.9T 0 disk
└─sdl1 8:177 0 1.9T 0 part <<<
sr0 11:0 1 1024M 0 rom

root@coyote:~#
Now confirmed by looking at all 5 with gparted, there are only 3 unique
serial numbers:
root@coyote:~# ls /dev/disk/by-id
ata-ATAPI_iHAS424_B_3524253_327133504865
ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W
wwn-0x5002538f413394a5-part1
ata-Gigastone_SSD_GST02TBG221146
ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W-part1
wwn-0x5002538f413394a5-part2
ata-Gigastone_SSD_GST02TBG221146-part1 ===========
ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W-part2
wwn-0x5002538f413394a5-part3
ata-Gigastone_SSD_GSTD02TB230102
ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W-part3 wwn-0x5002538f413394a9
ata-Gigastone_SSD_GSTD02TB230102-part1 ===========
ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T201730V
wwn-0x5002538f413394a9-part1
ata-Gigastone_SSD_GSTG02TB230206
ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T201730V-part1
wwn-0x5002538f413394a9-part2
ata-Gigastone_SSD_GSTG02TB230206-part1 ===========
ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T201730V-part2
wwn-0x5002538f413394a9-part3
ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T
ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T201730V-part3 wwn-0x5002538f413394ae
ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T-part1
md-name-coyote:0
wwn-0x5002538f413394ae-part1
ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T-part2
md-name-coyote:0-part1
wwn-0x5002538f413394ae-part2
ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T-part3
md-name-coyote:2
wwn-0x5002538f413394ae-part3
ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E
md-name-_none_:1 wwn-0x5002538f413394b0
ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E-part1
md-uuid-3d5a3621:c0e32c8a:e3f7ebb3:318edbfb
wwn-0x5002538f413394b0-part1
ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E-part2
md-uuid-3d5a3621:c0e32c8a:e3f7ebb3:318edbfb-part1
wwn-0x5002538f413394b0-part2
ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E-part3
md-uuid-57a88605:27f5a773:5be347c1:7c5e7342
wwn-0x5002538f413394b0-part3
ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V
md-uuid-bb6e03ce:19d290c8:5171004f:0127a392 wwn-0x5002538f42205e8e
ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V-part1
usb-Brother_MFC-J6920DW_BROG5F229909-0:0
wwn-0x5002538f42205e8e-part1
ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V-part2
usb-USB_Mass_Storage_Device_816820130806-0:0
wwn-0x5002538f42205e8e-part2
ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V-part3
wwn-0x5002538f413394a5
wwn-0x5002538f42205e8e-part3
root@coyote:~#
only 3 unique serial numbers!!!!!!
udev when finding that situation should scream from the rooftops!!
not silently overwrite an entry in the by-id that its already made.
HTH do I fix that?
can tune2fs edit a serial number? no.
can the UUID be rendered read-only, no.
gparted and similar can change the UUID by a click of the mouse.
I'm officially screwed and I have had them too long to return them now.

That could even explain why my first run of rsync worked fine but went
to a drive that was NOT mounted. And it was mounted by LABEL= The only
one of those 5 SSD's I had labeled at that time.

Thanks everybody.

gene heskett

unread,

Jan 15, 2024, 9:50:05 AM1/15/24

to

On 1/15/24 02:45, David Christensen wrote:
> On 1/14/24 11:48, gene heskett wrote:
>> On 1/14/24 07:42, David Christensen wrote:
>>> Re-ordered for clarity -- David.
>> And snipped by Gene as I updated

[...]
which aren't atm, the 2dn seagate 2T drive failure was my amanda vtapes
drive, and bookworm has been such a headache I not managed to restart it
since all the helper scripts I've written over the last 20+ years were
either on the main drive of on the vtapes drive, so I'm restarting from
square one.
rsync running for about an hour at a --bwlimit-5m, so progress makes a
slug look speedy.
gene@coyote:/etc$ df

Filesystem 1K-blocks Used Available Use% Mounted on
udev 16327704 0 16327704 0% /dev

tmpfs 3272684 1880 3270804 1% /run
/dev/sda1 863983352 22267116 797754588 3% /

tmpfs 16363420 1244 16362176 1% /dev/shm
tmpfs 5120 8 5112 1% /run/lock

/dev/sda3 47749868 45060 45246800 1% /tmp
/dev/md0p1 1796382580 335012600 1370045004 20% /home

tmpfs 3272684 2544 3270140 1% /run/user/1000

/dev/sdh1 1967892164 22190160 1845665756 2% /mnt/homevol
the last line, 22G out of about 350G total. But it hasn't locked me up,
yet...

>
> That is a good reason to validate your backup/ restore processes.
>
>
>> I still have about 20 QIC tapes from that machine, but now no drives
>> to read them. I need to cull the midden heap.
>
>
> That is a good reason to backup/ archive onto multiple media types.

If you have the facility's.
ATM I have rsync in its 7nth attempt to make a copy of the nominally
350G in /home, and its now working again, very slowly with a
--bwlimit=5m. IOW these taiwanese gigastone 2T SSD drives cannot handle
data in big gulps. the copy locked me up and needed the reset button
twice already this morning at 20m and 10m, so now trying 5m. These
drives, if 5m allows it to complete, will get a damning review on amazon
if only 5m allows it to work. U-sd cards are faster than that.
>
>
> David
>

Thanks David, take care now.

gene heskett

unread,

Jan 15, 2024, 11:10:06 AM1/15/24

to

On 1/15/24 02:45, David Christensen wrote:

and rsync just locked me up for about the 8th time, requiring the reset
button. And that was at a --bwlimit=5m.
rebooted, running test=short on the SSD, looks fine
restarted rsync -av --bwlimit=3m, but its hung on an .local~akonadi
.glass file, two of them total almost 80 gigs! WTH? What the heck bs is
akonadi up to?

And I may have a small clue about my lagging, on boot, akonadi is trying
to open kwallet, but neither mine, nor roots pw succeeds, is this my 30+
second lag?

Need a kde5 expert here, and I'm not it, kde threw us all under the bus
with kde5 & plasma. I'm using tbird which isn't very controllable, but I
spent a day trying to get a connection to my isp's imap mail server and
failed at passwds, not akonadi is too. Does it have a default pw? IDK.

Thanks David.

Felix Miata

unread,

Jan 15, 2024, 1:50:06 PM1/15/24

to

gene heskett composed on 2024-01-15 08:39 (UTC-0500):

> └─md2 9:2 0 3G 0 raid10
> sdh 8:112 0 1.9T 0 disk
> └─sdh1 8:113 0 1.9T 0 part <<<
> sdi 8:128 0 1.9T 0 disk
> └─sdi1 8:129 0 1.9T 0 part <<<
> sdj 8:144 0 1.9T 0 disk
> └─sdj1 8:145 0 1.9T 0 part <<<
> sdk 8:160 0 1.9T 0 disk
> └─sdk1 8:161 0 1.9T 0 part <<<
> sdl 8:176 0 1.9T 0 disk
> └─sdl1 8:177 0 1.9T 0 part <<<
> sr0 11:0 1 1024M 0 rom

Is there a smart card reader in coyote? They can cause what looks like phantom
drives. If not, I have no idea what those 5 <<< devices might be assigned to,
other than their sizes. :( I suspect it could be some kind of bug, possibly in
SATA expansion card firmware. There was a buggy Bookworm kernel recently causing
I/O problems. What kernel are you running? 6.1.0-17 is current. Bug was sometime
after 6.1.0-13.

I have a card reader that produces sd[cdef]:
# lsscsi
[2:0:0:0] disk ATA ST1000NM0011 SN02 /dev/sda
[3:0:0:0] disk ATA ST1000DM003-1CH1 CC49 /dev/sdb
[4:0:0:0] cd/dvd ASUS DRW-24B1ST j 1.11 /dev/sr0
[6:0:0:0] disk Generic USB SD Reader 1.00 /dev/sdc
[6:0:0:1] disk Generic USB CF Reader 1.01 /dev/sdd
[6:0:0:2] disk Generic USB SM Reader 1.02 /dev/sde
[6:0:0:3] disk Generic USB MS Reader 1.03 /dev/sdf
#
So, when I insert a generic USB stick, it gets /dev/sdg. :p
--
Evolution as taught in public schools is, like religion,
based on faith, not based on science.

Team OS/2 ** Reg. Linux User #211409 ** a11y rocks!

Felix Miata

David Wright

unread,

Jan 15, 2024, 3:00:06 PM1/15/24

to

I too can lookup acronyms with ease. I asked about "OTP serial number",
not "OTP" serial number.

> I moved the data cable to where I knew I could find it again, as one
> of 5 drives attached to the 16 port card,

No idea what that means.

> and on reboot it shows up in
> an lsblk list as:
> root@coyote:~# lsblk

[ … table showing five Samsungs, five Gigastones, and two
other items, perhaps printer (confirmed later) and camera … ]

> Now confirmed by looking at all 5 with gparted, there are only 3
> unique serial numbers:

I don't see any parted output.

> root@coyote:~# ls /dev/disk/by-id

ls -1 would at least sort out this mess, but more useful would be ls -l
or for j in /dev/disk/by-id/* ; do printf '%s\t%s\n' "$(realpath "$j")" "$j" ; done
as you could then see what the symlinks point to, which after all
is their raison d'être.

Extracting:

> ata-Gigastone_SSD_GST02TBG221146
> ata-Gigastone_SSD_GSTD02TB230102
> ata-Gigastone_SSD_GSTG02TB230206

these devices appear to have normal serial numbers. Do they bear
any other indication, like engravings or stickers? If not, I would,
in turn, plug each one in, read the serial number from its symlink,
and write on it with a marker. While doing that, you could also
run smartctl.

> only 3 unique serial numbers!!!!!!
> udev when finding that situation should scream from the rooftops!!
> not silently overwrite an entry in the by-id that its already made.

If you say so. I never had any problem with udev when I had
three identical USB sticks with the "serial" number
ID_SERIAL=SMI_USB_DISK-0:0
I scratched distinguishing letter on them, and watched xconsole for
the drive letter whenever I plugged one in. At 8GB a piece, the
largest I had at the time, they were very useful. And they didn't
cost me a penny as they were giveaways.

> HTH do I fix that?
> can tune2fs edit a serial number? no.
> can the UUID be rendered read-only, no.
> gparted and similar can change the UUID by a click of the mouse.
> I'm officially screwed and I have had them too long to return them now.

You need to find out, in turn, which ones work, and that goes for
both the SSDs and wherever you plug them in. You can make little
progress at all without distinguishing them, as you have already
shown by copying data to who knows where.

> That could even explain why my first run of rsync worked fine but went
> to a drive that was NOT mounted. And it was mounted by LABEL= The
> only one of those 5 SSD's I had labeled at that time.

You haven't shown any evidence of such LABELling, and most of your
anecdotal narratives don't give much confidence for us to really
know what was actually done. But to be fair, anything could
happen if the hardware is not working properly.

Cheers,
David.

David Christensen

unread,

Jan 15, 2024, 5:00:06 PM1/15/24

to

On 1/15/24 06:45, gene heskett wrote:
> On 1/15/24 02:45, David Christensen wrote:
>> On 1/14/24 11:48, gene heskett wrote:
>>> On 1/14/24 07:42, David Christensen wrote:

> ... the 2dn seagate 2T drive failure was my amanda vtapes

> drive, and bookworm has been such a headache I not managed to restart it
> since all the helper scripts I've written over the last 20+ years were
> either on the main drive of on the vtapes drive, so I'm restarting from
> square one.

That is a good reason to use a version control system. Once you have
one, you will wonder how you ever got by without it.

> ATM I have rsync in its 7nth attempt to make a copy of the nominally
> 350G in /home, and its now working again, very slowly with a
> --bwlimit=5m.

350 G / 5 M/s * 1000 M/G = 70000 s, or about 19 hours.

> IOW these taiwanese gigastone 2T SSD drives cannot handle
> data in big gulps. the copy locked me up and needed the reset button
> twice already this morning at 20m and 10m, so now trying 5m. These
> drives, if 5m allows it to complete, will get a damning review on amazon
> if only 5m allows it to work. U-sd cards are faster than that.

I think your computer has numerous issues, including storage. Unless
and until you benchmark the Gigastone SSD's in a stripped-down machine
with a reference OS and tool set, I would not blame the Gigastone SSD's.

David

gene heskett

unread,

Jan 15, 2024, 5:00:06 PM1/15/24

to

On 1/15/24 13:44, Felix Miata wrote:
> gene heskett composed on 2024-01-15 08:39 (UTC-0500):
>
>> └─md2 9:2 0 3G 0 raid10
>> sdh 8:112 0 1.9T 0 disk

>> └─sdh1 8:113 0 1.9T 0 part <<< the one I'm fooling with

I have one, plugged into usb, but nothing is plugged into it ATM, an ONN
from wallies. no problems here but it can mess with armbian booting on a
bannapi-m5. Got a bunch of those. Nothing that ob this box that looks
like the last 4 above. I look at dmesg tp see what it is before I mount it.

Thank you Felix.

David Christensen

unread,

Jan 15, 2024, 6:10:06 PM1/15/24

to

On 1/15/24 08:03, gene heskett wrote:
> On 1/15/24 02:45, David Christensen wrote:
> and rsync just locked me up for about the 8th time, requiring the reset
> button. And that was at a --bwlimit=5m.
> rebooted, running test=short on the SSD, looks fine
> restarted rsync -av --bwlimit=3m, but its hung on an .local~akonadi
> .glass file, two of them total almost 80 gigs! WTH? What the heck bs is
> akonadi up to?
>
> And I may have a small clue about my lagging, on boot, akonadi is trying
> to open kwallet, but neither mine, nor roots pw succeeds, is this my 30+
> second lag?
>
> Need a kde5 expert here, and I'm not it, kde threw us all under the bus
> with kde5 & plasma. I'm using tbird which isn't very controllable, but I
> spent a day trying to get a connection to my isp's imap mail server and
> failed at passwds, not akonadi is too. Does it have a default pw? IDK.
>
> Thanks David.
>
> Cheers, Gene Heskett.

I think your computer is overloaded with stuff and cruft, and you are
trapped in an infinite loop of bugs.

As I have mentioned before, I have installed Debian amd64 BIOS/MBR onto
one USB device and installed Debian amd64 UEFI/GPT onto another. These
are very useful tools. I suggest you build whichever corresponds to
your computer, and use it to help with trouble shooting, backups, etc..

David

David Christensen

unread,

Jan 15, 2024, 6:20:06 PM1/15/24

to

On 1/15/24 14:56, gene heskett wrote:
> root@coyote:~# for j in /dev/disk/by-id/* ; do printf '%s\t%s\n'

> "$(realpath "$j")" "$j" ; done

> /dev/sr0        /dev/disk/by-id/ata-ATAPI_iHAS424_B_3524253_327133504865
> /dev/sdi        /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146
> /dev/sdj1       /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146-part1
> /dev/sdh        /dev/disk/by-id/ata-Gigastone_SSD_GSTD02TB230102
> /dev/sdh1       /dev/disk/by-id/ata-Gigastone_SSD_GSTD02TB230102-part1
> /dev/sdk        /dev/disk/by-id/ata-Gigastone_SSD_GSTG02TB230206
> /dev/sdk1       /dev/disk/by-id/ata-Gigastone_SSD_GSTG02TB230206-part1
> /dev/sdf        /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T

> ... 2 pairs with identical "serial numbers", ...

Are you certain that it is not two drives that fail to connect at boot?
You previously posted smartctl reports indicating a bad SATA connection.

If you disconnect everything except one Gigastone SSD, connect it to a
known good motherboard SATA port using a known good SATA cable, connect
it to a known good PSU power cable, boot live media into a rescue shell,
examine the Gigastone, write down the serial number, shutdown, and
repeat for the four other Gigastone drives, can you confirm the
duplicate serial numbers?

David

gene heskett

unread,

Jan 15, 2024, 6:40:06 PM1/15/24

to

Ah,but I finally glombed onto the bug tan memory bar in htop as it was
runniing, someplace in the data chain is a huge memory leak, my crash is
caused by the OOM daemon killing things. And it only occurs when I run
rsync. Only takes it 10 minute to eat 32G of memory, then 500k into
swap, and the OOM daemon start killing the system until there's nothing
left to run.

>
> David

gene heskett

unread,

Jan 15, 2024, 6:50:06 PM1/15/24

to

On 1/15/24 17:58, gene heskett wrote:

> cuz it doesn't want to be copy/pasted.

>>
>>> root@coyote:~# ls /dev/disk/by-id
>>
>> ls -1 would at least sort out this mess, but more useful would be ls -l
>> or for j in /dev/disk/by-id/* ; do printf '%s\t%s\n' "$(realpath
>> "$j")" "$j" ; done
>> as you could then see what the symlinks point to, which after all
>> is their raison d'être.

> Thanks for that composition: but it will be word wrapped:
> root@coyote:~# for j in /dev/disk/by-id/* ; do printf '%s\t%s\n'

> "$(realpath "$j")" "$j" ; done

> /dev/sr0        /dev/disk/by-id/ata-ATAPI_iHAS424_B_3524253_327133504865
> /dev/sdi        /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146
> /dev/sdj1       /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146-part1
> /dev/sdh        /dev/disk/by-id/ata-Gigastone_SSD_GSTD02TB230102
> /dev/sdh1       /dev/disk/by-id/ata-Gigastone_SSD_GSTD02TB230102-part1
> /dev/sdk        /dev/disk/by-id/ata-Gigastone_SSD_GSTG02TB230206
> /dev/sdk1       /dev/disk/by-id/ata-Gigastone_SSD_GSTG02TB230206-part1
> /dev/sdf        /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T

> /dev/sdf1 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T-part1
> /dev/sdf2 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T-part2
> /dev/sdf3 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T-part3
> /dev/sde        /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E
> /dev/sde1 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E-part1
> /dev/sde2 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E-part2
> /dev/sde3 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E-part3
> /dev/sdd        /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V
> /dev/sdd1 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V-part1
> /dev/sdd2 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V-part2
> /dev/sdd3 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V-part3
> /dev/sdg        /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W
> /dev/sdg1 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W-part1
> /dev/sdg2 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W-part2
> /dev/sdg3 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W-part3
> /dev/sda        /dev/disk/by-id/ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T201730V
> /dev/sda1 /dev/disk/by-id/ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T201730V-part1
> /dev/sda2 /dev/disk/by-id/ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T201730V-part2
> /dev/sda3 /dev/disk/by-id/ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T201730V-part3
> /dev/md0        /dev/disk/by-id/md-name-coyote:0
> /dev/md0p1      /dev/disk/by-id/md-name-coyote:0-part1
> /dev/md2        /dev/disk/by-id/md-name-coyote:2
> /dev/md1        /dev/disk/by-id/md-name-_none_:1
> /dev/md0        /dev/disk/by-id/md-uuid-3d5a3621:c0e32c8a:e3f7ebb3:318edbfb
> /dev/md0p1
> /dev/disk/by-id/md-uuid-3d5a3621:c0e32c8a:e3f7ebb3:318edbfb-part1
> /dev/md1        /dev/disk/by-id/md-uuid-57a88605:27f5a773:5be347c1:7c5e7342
> /dev/md2        /dev/disk/by-id/md-uuid-bb6e03ce:19d290c8:5171004f:0127a392
> /dev/sdc        /dev/disk/by-id/usb-Brother_MFC-J6920DW_BROG5F229909-0:0
> /dev/sdb
> /dev/disk/by-id/usb-USB_Mass_Storage_Device_816820130806-0:0
> /dev/sdf        /dev/disk/by-id/wwn-0x5002538f413394a5
> /dev/sdf1       /dev/disk/by-id/wwn-0x5002538f413394a5-part1
> /dev/sdf2       /dev/disk/by-id/wwn-0x5002538f413394a5-part2
> /dev/sdf3       /dev/disk/by-id/wwn-0x5002538f413394a5-part3
> /dev/sde        /dev/disk/by-id/wwn-0x5002538f413394a9
> /dev/sde1       /dev/disk/by-id/wwn-0x5002538f413394a9-part1
> /dev/sde2       /dev/disk/by-id/wwn-0x5002538f413394a9-part2
> /dev/sde3       /dev/disk/by-id/wwn-0x5002538f413394a9-part3
> /dev/sdd        /dev/disk/by-id/wwn-0x5002538f413394ae
> /dev/sdd1       /dev/disk/by-id/wwn-0x5002538f413394ae-part1
> /dev/sdd2       /dev/disk/by-id/wwn-0x5002538f413394ae-part2
> /dev/sdd3       /dev/disk/by-id/wwn-0x5002538f413394ae-part3
> /dev/sdg        /dev/disk/by-id/wwn-0x5002538f413394b0
> /dev/sdg1       /dev/disk/by-id/wwn-0x5002538f413394b0-part1
> /dev/sdg2       /dev/disk/by-id/wwn-0x5002538f413394b0-part2
> /dev/sdg3       /dev/disk/by-id/wwn-0x5002538f413394b0-part3
> /dev/sda        /dev/disk/by-id/wwn-0x5002538f42205e8e
> /dev/sda1       /dev/disk/by-id/wwn-0x5002538f42205e8e-part1
> /dev/sda2       /dev/disk/by-id/wwn-0x5002538f42205e8e-part2
> /dev/sda3       /dev/disk/by-id/wwn-0x5002538f42205e8e-part3
> root@coyote:~#
> but like I wrote, 2 pairs with identical "serial numbers", so the
> assunption is that the last one overwrites the first on by udev, when
> IMO it should be yelling about the duplicats.

> Which occurred before I knew about the dups. Now you seem intent on
> calling me a liar, which I do not intentionally do. I have now LABELed
> the drives but the only way I can prove it to you ts seems would be to
> take screen snapshots x5 of them, and one of those would be too big for
> the server.
>
> Now, I have spent since around 08:30 my time this morning trying to
> backup this raid with around 350G of data on it, to one of those 2T
> gigastones but have lost the whole system because (apparently) rsync has
> a huge memory leak, the system is fine, until I start as root
>
> "rsync -av --bwlimit=3m --fsync /home/ /mnt/homevol"
>
> which is where /dev/sdh1 is mounted. watch the system with htop, memory
> bar goes to full scale in tan, in 10 to 15 minutes and into swap 500
> kilobytes in another 5 if OOM hasn't killed htop, the OOM daemon starts
> killing things, and is not the least fussy what.
>
> I have tried it without the --bwlimit and --fsync but that just bombs
> the system faster. I thought I was overpower the write speed of the
> drive but slowing it to garden slug speeds still eats memory and the
> system gets killed. 38G of 350G is as far as I'm managed to get copied
> from 08:30 to 17:35. If that commandline above it correct, then as
> AFAIAC rsync is busted. The memory gauge bar is in 3 colors, green at
> the left is I prsume allocated mamory, next is red (cache maybe) and
> then tan or orange. The man page does not define what color signifies
> which but run rsync, it runs to the right end of the bar in orange/tan,
> then into swap and the system slowly dies.
>
> I'm not a great user of rsync, use mc more often than not. I just have
> seen it take up from where the reset or power switch was used to reboot
> cuz the control of the system had been killed. So the command line
> above might be the problem. You tell me, please.

>
>>> That could even explain why my first run of rsync worked fine but went
>>> to a drive that was NOT mounted. And it was mounted by LABEL= The
>>> only one of those 5 SSD's I had labeled at that time.
>>
>> You haven't shown any evidence of such LABELling, and most of your
>> anecdotal narratives don't give much confidence for us to really
>> know what was actually done. But to be fair, anything could
>> happen if the hardware is not working properly.

> I give up, lets see if a gparted screenshot comes thru. there was one
> attached when I hit send.
>
>> Cheers,
>> David.
>>

whoppie-ding, it worked. call me a liar again.
>> .
>
> Cheers, Gene Heskett.

David Christensen

unread,

Jan 15, 2024, 6:50:06 PM1/15/24

to

On 1/15/24 15:37, gene heskett wrote:
> On 1/15/24 16:51, David Christensen wrote:
>> I think your computer has numerous issues, including storage. Unless
>> and until you benchmark the Gigastone SSD's in a stripped-down machine
>> with a reference OS and tool set, I would not blame the Gigastone SSD's.
>
> Ah,but I finally glombed onto the bug tan memory bar in htop as it was
> runniing, someplace in the data chain is a huge memory leak, my crash is
> caused by the OOM daemon killing things. And it only occurs when I run
> rsync. Only takes it 10 minute to eat 32G of memory, then 500k into
> swap, and the OOM daemon start killing the system until there's nothing
> left to run.

That computer has issues. Please build a USB Debian drive and use it to
copy out everything to two USB backup drives.

David

gene heskett

unread,

Jan 15, 2024, 7:00:06 PM1/15/24

to

The serial number that shows in the pix I just posted is everything
right of the SSD_ above up to the "part1" If there is a different one
some place, tell me how to extract it. In the 6 entries above there are
only 3 unique numbers. If gparted is showing me a pack of lies, show me
how to prove gparted is lieing,
>
> David

David Christensen

unread,

Jan 15, 2024, 7:10:06 PM1/15/24

to

I have no confidence in the Debian instance on your computer. I think
your first priority should be to back up your data, using Debian
installer media, Debian live, a Debian USB drive you make yourself, or
some other known good live media.

David

gene heskett

unread,

Jan 15, 2024, 7:10:06 PM1/15/24

to

On 1/15/24 18:41, gene heskett wrote:
> On 1/15/24 17:58, gene heskett wrote:
>> On 1/15/24 14:55, David Wright wrote:
>>> On Mon 15 Jan 2024 at 08:39:37 (-0500), gene heskett wrote:
>>>> On 1/14/24 20:19, gene heskett wrote:
>>>>> On 1/14/24 19:48, David Wright wrote:
>>>>>> On Sun 14 Jan 2024 at 14:48:49 (-0500), gene heskett wrote:
>>>>>>> On 1/14/24 07:42, David Christensen wrote:
>>>>>>
>>>>>>>> I am confused -- do you have 4 or 5 Gigastone 2 TB SSD?
>>>>>>>
>>>>>>> 5, ordered in 2 separate orders.
>>>>>>>>
>>>>>>>> > So that one could be formatted ext4 and serve as a
>>>>>>>> backup of the raid10.
>>>>>>> What I am trying to do now, but cannot if it is plugged into a

>>>>>>> motherboard port, hence the repeat of thnis exercise on the 2nd sata

There is a sticker on the bottom containing the numbers you see above,
and a (upc?) bar code I don't have a reader for.

David Christensen

unread,

Jan 15, 2024, 7:20:06 PM1/15/24

to

On 1/15/24 16:03, gene heskett wrote:
> On 1/15/24 18:41, gene heskett wrote:
>> On 1/15/24 17:58, gene heskett wrote:
>>> On 1/15/24 14:55, David Wright wrote:
>>>> On Mon 15 Jan 2024 at 08:39:37 (-0500), gene heskett wrote:
>>>>> ata-Gigastone_SSD_GST02TBG221146
>>>>> ata-Gigastone_SSD_GSTD02TB230102
>>>>> ata-Gigastone_SSD_GSTG02TB230206
>>>>
>>>> these devices appear to have normal serial numbers. Do they bear
>>>> any other indication, like engravings or stickers? If not, I would,
>>>> in turn, plug each one in, read the serial number from its symlink,
>>>> and write on it with a marker. While doing that, you could also
>>>> run smartctl.
>>>>
> There is a sticker on the bottom containing the numbers you see above,
> and a (upc?) bar code I don't have a reader for.

So, two stickers have one number, two stickers have another number, and
one sticker has a third number? Or, three stickers have one number, one
sticker has another number, and the last stick has a third number?

David

gene heskett

unread,

Jan 15, 2024, 8:40:05 PM1/15/24

to

5 ssd's
3 unique numbers on those 5 stickerss the same numbers you can see
above. 2 drives with the same sticker, 2 more drive that have identical
sticks and one with a different sticker. I am inclined to think the
numbers are based on production batches, and not unique as there may be
500 in each batch.
>
> David

David Christensen

unread,

Jan 15, 2024, 9:20:05 PM1/15/24

to

Duplicate serial numbers are going to cause confusion.

If any of the drives with duplicate numbers are eligible for return, I
would return them. If not, perhaps you could resell them to somebody.

If you are going to keep them, I seem to recall that all five drives
were partitioned with GPT and had one large partition (?). You could
invent a unique identifier for each drive, put a physical label on each
drive, and assign the same identifier to each GPT partition label.

Alternatively, UUID's and/or PARTUUID's should be unique for both MBR
and GPT:

# ls -l /dev/disk/by-uuid/

# ls -l /dev/disk/by-partuuid/

David

David Wright

unread,

Jan 15, 2024, 10:00:05 PM1/15/24

to

Ouch. Well that leaves you with several choices, like exchanging two
of them, or moving them to different machines, or using them for
backing up but not at the same time as their twin. That's if your
use case relies on their serial numbers.

If you're using them in a more conventional manner, where UUIDs,
LABELs, PARTUUIDs and PARTLABELs are stable, and serial numbers
are ignored, then you should have no problems. Just start by
inserting them separately for partitioning and filesystem creation
with unique strings.

But obviously step one is labelling them (unless you're exchanging
two of them in nearly-new condition).

BTW, I wrote:

> You haven't shown any evidence of such LABELling, and most of your
> anecdotal narratives don't give much confidence for us to really
> know what was actually done. But to be fair, anything could
> happen if the hardware is not working properly.

Nothing at all there about lying.

Cheers,
David.

Felix Miata

unread,

Jan 16, 2024, 1:00:07 AM1/16/24

to

gene heskett composed on 2024-01-15 17:56 (UTC-0500):

> Thanks for that composition: but it will be word wrapped:

> root@coyote:~# for j in /dev/disk/by-id/* ; do printf '%s\t%s\n'

> "$(realpath "$j")" "$j" ; done

I straightened out the wrapping mess, and gave each entry a line number. I see
nothing I recognize as representing serial number duplication among /dev/sdX
(physical device) names:

/dev/md0 1 /dev/disk/by-id/md-name-coyote:0
/dev/md0 2 /dev/disk/by-id/md-uuid-3d5a3621:c0e32c8a:e3f7ebb3:318edbfb
/dev/md0p1 3 /dev/disk/by-id/md-name-coyote:0-part1
/dev/md0p1 4 /dev/disk/by-id/md-uuid-3d5a3621:c0e32c8a:e3f7ebb3:318edbfb-part1
/dev/md1 5 /dev/disk/by-id/md-name-_none_:1
/dev/md1 6 /dev/disk/by-id/md-uuid-57a88605:27f5a773:5be347c1:7c5e7342
/dev/md2 7 /dev/disk/by-id/md-name-coyote:2
/dev/md2 8 /dev/disk/by-id/md-uuid-bb6e03ce:19d290c8:5171004f:0127a392
/dev/sda 9 /dev/disk/by-id/ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T201730V
/dev/sda 10 /dev/disk/by-id/wwn-0x5002538f42205e8e
/dev/sda1 11 /dev/disk/by-id/ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T201730V-part1
/dev/sda1 12 /dev/disk/by-id/wwn-0x5002538f42205e8e-part1
/dev/sda2 13 /dev/disk/by-id/ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T201730V-part2
/dev/sda2 14 /dev/disk/by-id/wwn-0x5002538f42205e8e-part2
/dev/sda3 15 /dev/disk/by-id/ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T201730V-part3
/dev/sda3 16 /dev/disk/by-id/wwn-0x5002538f42205e8e-part3
/dev/sdb 17 /dev/disk/by-id/usb-USB_Mass_Storage_Device_816820130806-0:0
/dev/sdc 18 /dev/disk/by-id/usb-Brother_MFC-J6920DW_BROG5F229909-0:0 # How does a printer get a storage device assignment???
/dev/sdd 19 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V
/dev/sdd 20 /dev/disk/by-id/wwn-0x5002538f413394ae
/dev/sdd1 21 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V-part1
/dev/sdd1 22 /dev/disk/by-id/wwn-0x5002538f413394ae-part1
/dev/sdd2 23 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V-part2
/dev/sdd2 24 /dev/disk/by-id/wwn-0x5002538f413394ae-part2
/dev/sdd3 25 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V-part3
/dev/sdd3 26 /dev/disk/by-id/wwn-0x5002538f413394ae-part3
/dev/sde 27 /dev/disk/by-id/wwn-0x5002538f413394a9
/dev/sde 28 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E
/dev/sde1 29 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E-part1
/dev/sde1 30 /dev/disk/by-id/wwn-0x5002538f413394a9-part1
/dev/sde2 31 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E-part2
/dev/sde2 32 /dev/disk/by-id/wwn-0x5002538f413394a9-part2
/dev/sde3 33 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E-part3
/dev/sde3 34 /dev/disk/by-id/wwn-0x5002538f413394a9-part3
/dev/sdf 35 /dev/disk/by-id/wwn-0x5002538f413394a5
/dev/sdf 36 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T
/dev/sdf1 37 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T-part1
/dev/sdf1 38 /dev/disk/by-id/wwn-0x5002538f413394a5-part1
/dev/sdf2 39 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T-part2
/dev/sdf2 40 /dev/disk/by-id/wwn-0x5002538f413394a5-part2
/dev/sdf3 41 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T-part3
/dev/sdf3 42 /dev/disk/by-id/wwn-0x5002538f413394a5-part3
/dev/sdg 43 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W
/dev/sdg 44 /dev/disk/by-id/wwn-0x5002538f413394b0
/dev/sdg1 45 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W-part1
/dev/sdg1 46 /dev/disk/by-id/wwn-0x5002538f413394b0-part1
/dev/sdg2 47 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W-part2
/dev/sdg2 48 /dev/disk/by-id/wwn-0x5002538f413394b0-part2
/dev/sdg3 49 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W-part3
/dev/sdg3 50 /dev/disk/by-id/wwn-0x5002538f413394b0-part3
/dev/sdh 51 /dev/disk/by-id/ata-Gigastone_SSD_GSTD02TB230102
/dev/sdh1 52 /dev/disk/by-id/ata-Gigastone_SSD_GSTD02TB230102-part1
/dev/sdi 53 /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146
/dev/sdj1 54 /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146-part1
/dev/sdk 55 /dev/disk/by-id/ata-Gigastone_SSD_GSTG02TB230206
/dev/sdk1 56 /dev/disk/by-id/ata-Gigastone_SSD_GSTG02TB230206-part1
/dev/sr0 57 /dev/disk/by-id/ata-ATAPI_iHAS424_B_3524253_327133504865

Exactly which line numbers represent duplication among the physical drives?

Felix Miata

unread,

Jan 16, 2024, 1:10:06 AM1/16/24

to

gene heskett composed on 2024-01-15 18:37 (UTC-0500):

> Ah,but I finally glombed onto the bug tan memory bar in htop as it was
> runniing, someplace in the data chain is a huge memory leak, my crash is
> caused by the OOM daemon killing things. And it only occurs when I run
> rsync. Only takes it 10 minute to eat 32G of memory, then 500k into
> swap, and the OOM daemon start killing the system until there's nothing
> left to run.

What does free report before starting rsync? Do you have all your swap on a
partition? Do you have any swapspace?

I would log out of XFCE, login on a vtty to open top, then login on another to try
to run rsync. If that fails OOM too, since the target is ostensibly starting from
scratch, use MC, and divide the job into the source's directories if necessary. MC
gets rather bogged down if you try to do a bazillion individual files in a single
copy operation.

Felix Miata

unread,

Jan 16, 2024, 1:20:07 AM1/16/24

to

Felix Miata composed on 2024-01-16 01:05 (UTC-0500):

> gene heskett composed on 2024-01-15 18:37 (UTC-0500):

>> Ah,but I finally glombed onto the bug tan memory bar in htop as it was
>> runniing, someplace in the data chain is a huge memory leak, my crash is
>> caused by the OOM daemon killing things. And it only occurs when I run
>> rsync. Only takes it 10 minute to eat 32G of memory, then 500k into
>> swap, and the OOM daemon start killing the system until there's nothing
>> left to run.

> What does free report before starting rsync? Do you have all your swap on a
> partition? Do you have any swapspace?

> I would log out of XFCE, login on a vtty to open top, then login on another to try
> to run rsync. If that fails OOM too, since the target is ostensibly starting from
> scratch, use MC, and divide the job into the source's directories if necessary. MC
> gets rather bogged down if you try to do a bazillion individual files in a single
> copy operation.

Trying to think outside the box, something else to think about, from the man page:
[quote]
--archive, -a
This is equivalent to -rlptgoD. It is a quick way of saying you want recursion
and want to preserve almost everything. Be aware that it does not include
preserving ACLs (-A), xattrs (-X), atimes (-U), crtimes (-N), nor the finding and
preserving of hardlinks (-H).
[/quote]

If rsync really is bugged, maybe a change of options would avoid the bug. Try
instead of -av, -rlptgoDAXUNH. Could it be that verbosity is the OOM crippler, and
not necessarily from rsync itself, but possibly from the xterm in which rsync is
running? Does your source contain any hard links? Do you use ACLs or xattrs?

Tom Furie

unread,

Jan 16, 2024, 3:20:06 AM1/16/24

to

Felix Miata <mrm...@earthlink.net> writes:

> /dev/sdc 18 /dev/disk/by-id/usb-Brother_MFC-J6920DW_BROG5F229909-0:0 #
> How does a printer get a storage device assignment???

By having some kind of SD card slot or similar.

Felix Miata

unread,

Jan 16, 2024, 6:10:06 AM1/16/24

to

Tom Furie composed on 2024-01-16 08:18 (UTC):

> Felix Miata writes:

>> /dev/sdc 18 /dev/disk/by-id/usb-Brother_MFC-J6920DW_BROG5F229909-0:0 #
>> How does a printer get a storage device assignment???

> By having some kind of SD card slot or similar.

So this pollution only results from a USB-connected printer? IP printer
connections don't cause it too?

Valerio Vanni

unread,

Jan 16, 2024, 7:40:06 AM1/16/24

to

Il 16/01/2024 12:08, Felix Miata ha scritto:
> Tom Furie composed on 2024-01-16 08:18 (UTC):
>
>> Felix Miata writes:
>
>>> /dev/sdc 18 /dev/disk/by-id/usb-Brother_MFC-J6920DW_BROG5F229909-0:0 #
>>> How does a printer get a storage device assignment???
>
>> By having some kind of SD card slot or similar.
>
> So this pollution only results from a USB-connected printer? IP printer
> connections don't cause it too?

Yes, IP printers don't.

David Wright

unread,

Jan 16, 2024, 9:10:06 AM1/16/24

to

On Tue 16 Jan 2024 at 00:55:52 (-0500), Felix Miata wrote:
> gene heskett composed on 2024-01-15 17:56 (UTC-0500):
>
> > Thanks for that composition: but it will be word wrapped:
> > root@coyote:~# for j in /dev/disk/by-id/* ; do printf '%s\t%s\n'
> > "$(realpath "$j")" "$j" ; done
> > /dev/sr0 /dev/disk/by-id/ata-ATAPI_iHAS424_B_3524253_327133504865
> > /dev/sdi /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146
> > /dev/sdj1 /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146-part1

It's right here at the top.

[ … ]

I'd /love/ my HP8500 scanner (that's all that works now) to have
the USB stick (which I scan onto) be visible to a connected PC
or the network. Is that what /dev/sdc is?

↑↑↑↑↑↑↑↑↑ that is /really/ bad!

> /dev/sdk 55 /dev/disk/by-id/ata-Gigastone_SSD_GSTG02TB230206
> /dev/sdk1 56 /dev/disk/by-id/ata-Gigastone_SSD_GSTG02TB230206-part1
> /dev/sr0 57 /dev/disk/by-id/ata-ATAPI_iHAS424_B_3524253_327133504865
>
> Exactly which line numbers represent duplication among the physical drives?

Cheers,
David.

Felix Miata

unread,

Jan 16, 2024, 9:40:07 AM1/16/24

to

David Wright composed on 2024-01-16 08:05 (UTC-0600):

> On Tue 16 Jan 2024 at 00:55:52 (-0500), Felix Miata wrote:

>> gene heskett composed on 2024-01-15 17:56 (UTC-0500):

>>> Thanks for that composition: but it will be word wrapped:
>>> root@coyote:~# for j in /dev/disk/by-id/* ; do printf '%s\t%s\n'
>>> "$(realpath "$j")" "$j" ; done
>>> /dev/sr0 /dev/disk/by-id/ata-ATAPI_iHAS424_B_3524253_327133504865
>>> /dev/sdi /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146
>>> /dev/sdj1 /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146-part1

> It's right here at the top.

I missed that, probably because i & j look similar in the big sea of
alphanumerics, /and/ sdi has no partitions, while sdj1 has no parent disk. That
seems to smell as much like a bug somewhere as two different disks with the same
serial number, a cheap SATA port card maybe. Does ...1146 get duplication like
that when connected to any/every available SATA port?

Greg Wooledge

unread,

Jan 16, 2024, 9:50:05 AM1/16/24

to

On Tue, Jan 16, 2024 at 09:31:54AM -0500, Felix Miata wrote:
> David Wright composed on 2024-01-16 08:05 (UTC-0600):
>
> > On Tue 16 Jan 2024 at 00:55:52 (-0500), Felix Miata wrote:
>
> >> gene heskett composed on 2024-01-15 17:56 (UTC-0500):
>
> >>> Thanks for that composition: but it will be word wrapped:
> >>> root@coyote:~# for j in /dev/disk/by-id/* ; do printf '%s\t%s\n'
> >>> "$(realpath "$j")" "$j" ; done
> >>> /dev/sr0 /dev/disk/by-id/ata-ATAPI_iHAS424_B_3524253_327133504865
> >>> /dev/sdi /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146
> >>> /dev/sdj1 /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146-part1
>
> > It's right here at the top.
>
> I missed that, probably because i & j look similar in the big sea of
> alphanumerics, /and/ sdi has no partitions, while sdj1 has no parent disk. That
> seems to smell as much like a bug somewhere as two different disks with the same
> serial number, a cheap SATA port card maybe. Does ...1146 get duplication like
> that when connected to any/every available SATA port?

I missed it too. It actually looks like someone copy/pasted the
pathnames on the right, but then manually typed the device names on
the left, and made a typo here. Or, somehow, the device names and
the pathnames got mixed together, and someone tried to separate them
manually, and got these two crossed.

Max Nikulin

unread,

Jan 16, 2024, 10:00:06 AM1/16/24

to

On 16/01/2024 15:18, Tom Furie wrote:
>> /dev/sdc 18 /dev/disk/by-id/usb-Brother_MFC-J6920DW_BROG5F229909-0:0 #
>> How does a printer get a storage device assignment???
>
> By having some kind of SD card slot or similar.

I have heard that some devices expose a USB mass storage interface out
of the box to autorun an installer when the device is plugged. Finally
the installer switches the device to its normal mode. On Linux
usb-modeswitch might be required.

Thomas Schmitt

unread,

Jan 16, 2024, 11:10:06 AM1/16/24

to

Hi,

i, too, wondered where there should be a duplicate serial number.
But indeed:

David Wright wrote:
> > /dev/sdi 53 /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146
> > /dev/sdj1 54 /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146-part1
> ↑↑↑↑↑↑↑↑↑ that is /really/ bad!

Does the number of 4 device files /dev/sd[h-k] match the number of
installed ata-Gigastone_SSD devices ? Gene talked of

"5, ordered in 2 separate orders".

(Looking at https://lists.debian.org/debian-user/2024/01/msg00667.html)
Now we see 3 to 4, depending on what one wants to believe.

Wild ideas:
One possible reason could be that a device is mapped to both, /dev/sdi
and /dev/sdj. udev would then suffer a race condition when creating the
/dev/disk/by-id.
Another could be that udev's assessment of the drives derails and that
serial number information spilled from the assessment of /dev/sdi to
the assessment of /dev/sdj*.

It would be interesting to see the output of

ls -l /dev/sd[ij]*

in order to learn about the existence of /dev/sdj and the the device
numbers of sdi* and sdj*.

Further one should inquire the serial numbers by

lsblk -d -o NAME,MAJ:MIN,MODEL,SERIAL,WWN /dev/sd[hijk]

Have a nice day :)

Thomas

Andy Smith

unread,

Jan 16, 2024, 11:40:07 AM1/16/24

to

Hello,

On Tue, Jan 16, 2024 at 01:17:42AM -0500, Felix Miata wrote:
> If rsync really is bugged, maybe a change of options would avoid the bug. Try
> instead of -av, -rlptgoDAXUNH. Could it be that verbosity is the OOM crippler, and
> not necessarily from rsync itself, but possibly from the xterm in which rsync is
> running? Does your source contain any hard links? Do you use ACLs or xattrs?

I'm totally burned out on trying to get info out of Gene, but my
experience with rsync is that use of some options can massively
increase memory usage.

The options covered by -a don't tend to do it (and I doubt -v does
anything), but things like --delay-updates, --delete--before,
--delete-after and --prune-empty-dirs do. This is because rsync
normally incrementally finds files to transfer so it only keeps a
certain number of entries in memory and can sync any number of files
without blowing up RAM, but those options disable that strategy.

Even so, rsync only needs about 100 bytes of RAM per file that is
checked on source, and the size of the files doesn't matter.

In desperate circumstances, file tree can be rsynced in multiple
segments, e.g. one rsync for each subdir or whatever other split
makes sense.

Maybe also ulimit can be used to set an artificially low value on
the memory that rsync is allowed to use. It will fail sooner, but
hopefully before using all the system's RAM and swap and having the
oom-killer intervene.

Thanks,
Andy

--
https://bitfolk.com/ -- No-nonsense VPS hosting

Franco Martelli

unread,

Jan 16, 2024, 3:00:37 PM1/16/24

to

On 15/01/24 at 08:43, David Christensen wrote:
>> This I am still trying to do, the first pass copied all 350G of /home
>> but went to the wrong drive, and I had mounted the drive by its label.
>> It is now /dev/sdh and all labels above it are now wrong. Crazy.
>> These SSD's all have an OTP serial number. I am tempted to use that
>> serial number as a label _I_ can control.
>
>

> When I built and ran a Debian 2 @ HDD RAID1 using mdadm(8), I did not
> partiton the HDD's -- I gave mdadm(8) the whole drives.

I don't know if it is a good idea, in fact it exists a special partition
type for RAID array listed in fdisk, I used that for my RAID:

---
~# fdisk -l /dev/sd[a-d]
Disk /dev/sda: 931,51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: ST1000DM003-1CH1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00088ecc

Device Boot Start End Sectors Size Id Type
/dev/sda1 * 2048 1953523711 1953521664 931,5G fd Linux raid autodetect

Disk /dev/sdb: 931,51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: ST1000DM003-1CH1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x000d65c9

Device Boot Start End Sectors Size Id Type
/dev/sdb1 2048 1953523711 1953521664 931,5G fd Linux raid autodetect

Disk /dev/sdc: 931,51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: ST1000DM003-1CH1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x000306a3

Device Boot Start End Sectors Size Id Type
/dev/sdc1 2048 1953523711 1953521664 931,5G fd Linux raid autodetect

Disk /dev/sdd: 931,51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: ST1000DM003-1CH1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x0007a1fe

Device Boot Start End Sectors Size Id Type
/dev/sdd1 2048 1953523711 1953521664 931,5G fd Linux raid autodetect
---

I thought it was mandatory for a RAID to partition drives with this
partition type, am I wrong?

Cheers,

--
Franco Martelli

David Christensen

unread,

Jan 16, 2024, 4:10:06 PM1/16/24

to

On 1/16/24 11:51, Franco Martelli wrote:
> On 15/01/24 at 08:43, David Christensen wrote:
>> When I built and ran a Debian 2 @ HDD RAID1 using mdadm(8), I did not
>> partiton the HDD's -- I gave mdadm(8) the whole drives.
>
> I don't know if it is a good idea, in fact it exists a special partition
> type for RAID array listed in fdisk, I used that for my RAID:
>
> ---
> ~# fdisk -l /dev/sd[a-d]
> Disk /dev/sda: 931,51 GiB, 1000204886016 bytes, 1953525168 sectors
> Disk model: ST1000DM003-1CH1
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disklabel type: dos

> Disk identifier: 0x00088ecc > ...

> I thought it was mandatory for a RAID to partition drives with this
> partition type, am I wrong?

STFW and RTFM I have seen recommendations for and against using whole
disks for RAID and for and against using partitions for RAID. And, as
this in the Internet, there are countless rumors and speculation. As I
switched from mdadm(8) to zfs(8) years ago, perhaps another reader can
explain what mdadm(8) does when given whole disks and when given disk
partitions.

David

Andy Smith

unread,

Jan 16, 2024, 4:40:07 PM1/16/24

to

Hello,

On Tue, Jan 16, 2024 at 01:01:02PM -0800, David Christensen wrote:
> On 1/16/24 11:51, Franco Martelli wrote:
> > I thought it was mandatory for a RAID to partition drives with this
> > partition type, am I wrong?

In the ancient past it was required, because that was one of the
ways that mdadm arrays were assembled: the md kernel driver saw the
"LInux RAID" partition types and tried using them. If you weren't
going to do that, you had to have an mdadm config file, or ewven
specify the topology on the kernel command line. This was 15
or more years ago.

Ever since udev, each newly-appearing block device is handed to a
script for incremental assembly based on the md metadata on the
device itself, so any kind of block device will do.

> As I switched from mdadm(8) to zfs(8) years ago, perhaps another
> reader can explain what mdadm(8) does when given whole disks and
> when given disk partitions.

mdadm doesn't care.

The older set of people recommending partitions were because drive
capacities used to vary quite a lot more than they do today. So
people used to say, "put a partition on and make it few hundred MB
less than the total size of the drive, then if you have to replace
it with a slightly smaller one you'll be fine."

Since 2005 or so there has been a standard called IDEMA LBA1-03¹
about what the actual capacity in sectors should be for any stated
drive capacity, and most drives obey this, though there are still a
few exceptions. So this is very much less of a concern, especially
for those buying "enterprise" storage.

The newer set of people recommending partitions are mostly doing so
because there's been a few incidents of "helpful" PC motherboards
detecting on boot what they think is a corrupt GPT, and replacing it
with a blank one, damaging the RAID. This is a real thing that has
happened to more than one person; it even got linked on Hacker News
I believe.

Then there will just be people going by taste.

Personally I still put them directly on drives. If I ever get taken
out by one of those crappy motherboards, I reserve the right to get
a different religion. 😀

Thanks,
Andy

¹ https://idema.org/wp-content/downloads/2169.pdf

Felix Miata

unread,

Jan 16, 2024, 5:00:07 PM1/16/24

to

David Christensen composed on 2024-01-16 13:01 (UTC-0800):

> STFW and RTFM I have seen recommendations for and against using whole
> disks for RAID and for and against using partitions for RAID. And, as
> this in the Internet, there are countless rumors and speculation. As I
> switched from mdadm(8) to zfs(8) years ago, perhaps another reader can
> explain what mdadm(8) does when given whole disks and when given disk
> partitions.

I've been running RAID1 on pairs of multi-partition disks for well over a decade,
first with 320G, then 500G, currently 1T. Since the move to 1T, I've replaced both
disks. Both were originally 512/512 v2.0 Hitachis. Now, one is a ST1000NM0011
512l/512p Seagate Constellation, the other a ST1000DM003-1CH1 512l/4096p Seagate
Barracuda.

They've been divided into partitions to comprise 5-6 md devices, currently 6, with
small other partitions not parts of any RAID, such as no longer used /boot/s.
Since moving the OS onto an SSD, I have one md device not in use, previously used
for swap:

# hdparm -t /dev/md0

/dev/md0:
Timing buffered disk reads: 554 MB in 3.02 seconds = 183.59 MB/sec
#
What can I use to test what its write speed is? I'm not seeing any option to do so
in hdparm.

David Wright

unread,

Jan 16, 2024, 7:20:05 PM1/16/24

to

On Tue 16 Jan 2024 at 06:08:35 (-0500), Felix Miata wrote:
> Tom Furie composed on 2024-01-16 08:18 (UTC):
> > Felix Miata writes:
>
> >> /dev/sdc 18 /dev/disk/by-id/usb-Brother_MFC-J6920DW_BROG5F229909-0:0 #
> >> How does a printer get a storage device assignment???
>
> > By having some kind of SD card slot or similar.
>
> So this pollution only results from a USB-connected printer? IP printer
> connections don't cause it too?

AIUI (not very well), you only get a /dev/sdX when the linux kernel
is what's writing the blocks on the filesystem.

So when I plug in my Galaxy 4 mobile and tap the appropriate buttons
on its screen, /dev/sdb{,1} appear as a block device and partition:

sdb 8:16 1 29.7G 0 disk
└─sdb1 8:17 1 29.7G 0 part

so I can run fdisk on the SD card while in the phone, for example:

$ sudo fdisk -l /dev/sdb
Disk /dev/sdb: 29.72 GiB, 31914983424 bytes, 62333952 sectors
Disk model: S5360 Card

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x03399e11

Device Boot Start End Sectors Size Id Type

/dev/sdb1 2048 62333951 62331904 29.7G c W95 FAT32 (LBA)
$

OTOH with my A13 phone, I don't get a block device created, but just
a FUSE wrapper round the filesystems that Android is running, both
internal and any SD card:

$ mount
[ … ]
aft-mtp-mount on /media/samsungd type fuse.aft-mtp-mount (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
$

Cheers,
David.

David Wright

unread,

Jan 16, 2024, 7:20:05 PM1/16/24

to

It's the sticky labels that convinced me. I had one last possibility
in mind, that the serial numbers were being generated by the
interfaces somehow, but they wouldn't be able to read the labels.

I know nothing about Gene's interfaces, but my SD cards can appear
with false by-id/ values depending on where they're plugged in:
slots (on different PCs), via µSD-SD adapter, SD-USB adapter, etc.

Cheers,
David.

gene heskett

unread,

Jan 16, 2024, 8:10:06 PM1/16/24

to

lsblk, which I've published several times, shows 5 drives. by-id listing
only shows 3. The drive I've been trying to use bounces from /dev/sdd to
sde to sdh dependin on which controller it is curently plugged into.

And I've since tried cp in addition to rsync, does the same thing,
killing the sysytem with the OOM but much quicker. cp using all system
memory (32Gb) in 1 minute, another 500K into swap adds another 15 secs,
and the OOM kills the system. So both cp and rsync act broken.

rsync, with a --bwlimit=3m set, takes much longer to kill the system but
the amount of data moved is very similar, 13.5G from clean disk to
system freeze for rsync, 13.4G for cp.

Cheers, Gene Heskett

gene heskett

unread,

Jan 16, 2024, 8:30:05 PM1/16/24

to

On 1/16/24 01:05, Felix Miata wrote:
> gene heskett composed on 2024-01-15 18:37 (UTC-0500):
>
>> Ah,but I finally glombed onto the bug tan memory bar in htop as it was
>> runniing, someplace in the data chain is a huge memory leak, my crash is
>> caused by the OOM daemon killing things. And it only occurs when I run
>> rsync. Only takes it 10 minute to eat 32G of memory, then 500k into
>> swap, and the OOM daemon start killing the system until there's nothing
>> left to run.
>
> What does free report before starting rsync? Do you have all your swap on a
> partition? Do you have any swapspace?

Actually, swap is in 2 locations, one is a swap-dir on /dev/sda, 47G
IIRC, and 60G on md1. Shows in htop as 107G total.

>
> I would log out of XFCE, login on a vtty to open top, then login on another to try
> to run rsync. If that fails OOM too, since the target is ostensibly starting from
> scratch, use MC, and divide the job into the source's directories if necessary. MC
> gets rather bogged down if you try to do a bazillion individual files in a single
> copy operation.

True, but I don't recall it ever failing

Cheers, Gene Heskett.

gene heskett

unread,

Jan 16, 2024, 8:50:06 PM1/16/24

to

unreported here because it didn't seem to have any effect, I've tried to
test that theory by clearing the back-trace buffer at 30 second
intervals. no obviously detectable effect, untested is setting that back
to a 1000 line default.

And since I've driven around 170 miles in poor visibility bad weather
today, no more tests will be done tonight, I'm not the 16 years old I
was when I learned to drive 70 mph on even worse roads 75 years ago. So
I'll sign off shortly.

David Wright

unread,

Jan 16, 2024, 9:00:05 PM1/16/24

to

On Tue 16 Jan 2024 at 20:08:12 (-0500), gene heskett wrote:
> On 1/16/24 00:56, Felix Miata wrote:
> > gene heskett composed on 2024-01-15 17:56 (UTC-0500):
> >
> > > Thanks for that composition: but it will be word wrapped:
> > > root@coyote:~# for j in /dev/disk/by-id/* ; do printf '%s\t%s\n'
> > > "$(realpath "$j")" "$j" ; done

[ … ]

> > I straightened out the wrapping mess, and gave each entry a line number. I see
> > nothing I recognize as representing serial number duplication among /dev/sdX
> > (physical device) names:

> > [ … ]

> lsblk, which I've published several times, shows 5 drives. by-id
> listing only shows 3. The drive I've been trying to use bounces from
> /dev/sdd to sde to sdh dependin on which controller it is curently
> plugged into.

I take it that you're trying to copy to one Gigastone SSD. Presumably
the kernel favours some controllers over others in the race to name
them. This is why using the kernel's device names is no longer
recommended.

> And I've since tried cp in addition to rsync, does the same thing,
> killing the sysytem with the OOM but much quicker. cp using all system
> memory (32Gb) in 1 minute, another 500K into swap adds another 15
> secs, and the OOM kills the system. So both cp and rsync act broken.

I'd be tempted to bisect the problem by copying to another machine
though a cat5 cable.

> rsync, with a --bwlimit=3m set, takes much longer to kill the system
> but the amount of data moved is very similar, 13.5G from clean disk to
> system freeze for rsync, 13.4G for cp.

I don't know enough about how rsync behaves to interpret that
coincidence, but it seems ominous on its face.

Cheers,
David.

gene heskett

unread,

Jan 16, 2024, 9:00:06 PM1/16/24

to

On 1/16/24 06:09, Felix Miata wrote:
> Tom Furie composed on 2024-01-16 08:18 (UTC):
>
>> Felix Miata writes:
>
>>> /dev/sdc 18 /dev/disk/by-id/usb-Brother_MFC-J6920DW_BROG5F229909-0:0 #
>>> How does a printer get a storage device assignment???
>
>> By having some kind of SD card slot or similar.
>
> So this pollution only results from a USB-connected printer? IP printer
> connections don't cause it too?

Since I have one of the above printers it does indeed have an editable
ipv4 address, but I don't generally use it as the usb2 is faster. Its
been so long since I did use that interface that I do not recall if it
listed the card memory. I'd expect it would since it can also to a free
standing copy from its tabloid sized scanner. The printer can handle
tabloid sized paper by hand feeding, so the copy function includes
tabloid size too.

gene heskett

unread,

Jan 16, 2024, 9:20:06 PM1/16/24

to

On 1/16/24 11:08, Thomas Schmitt wrote:
> ls -l /dev/sd[ij]*

oot@coyote:~# ls -l /dev/sd[ij]*
brw-rw---- 1 root disk 8, 128 Jan 16 05:01 /dev/sdi
brw-rw---- 1 root disk 8, 129 Jan 16 05:01 /dev/sdi1
brw-rw---- 1 root disk 8, 144 Jan 16 05:01 /dev/sdj
brw-rw---- 1 root disk 8, 145 Jan 16 05:01 /dev/sdj1
root@coyote:~#

lsblk -d -o NAME,MAJ:MIN,MODEL,SERIAL,WWN /dev/sd[hijkl]
gene@coyote:~/src/klipper-docs$ lsblk -d -o
NAME,MAJ:MIN,MODEL,SERIAL,WWN /dev/sd[hijkl]
NAME MAJ:MIN MODEL SERIAL WWN
sdh 8:112 Gigastone SSD GSTD02TB230102
sdi 8:128 Gigastone SSD GST02TBG221146
sdj 8:144 Gigastone SSD GST02TBG221146
sdk 8:160 Gigastone SSD GSTG02TB230206
sdl 8:176 Gigastone SSD GSTG02TB230206
note added l to get them all

gene@coyote:~/src/klipper-docs$

Andy Smith

unread,

Jan 16, 2024, 10:40:06 PM1/16/24

to

Hi Felix,

On Tue, Jan 16, 2024 at 04:50:01PM -0500, Felix Miata wrote:
> What can I use to test what its write speed is? I'm not seeing any option to do so
> in hdparm.

The king of storage performance testing is "fio". It's packaged in
Debian. It's really worth learning a bit about.

What sort of performance were you looking to test?

Bear in mind that not all IO is the same and can have vastly
different performance characteristics. For example, reading data is
generally faster than writing data, because you can take advantages
of several layers of caching in Linux and in your drives.

On HDDs, a streaming read will be much faster than a random read,
because the drive head will get into position once and stream the
data from there, as opposed to having to seek about the platters.
Less of a concern for non-rotational media.

Small writes will go into Linux's buffers and drive buffers and
appear to happen at memory speed, until those buffers are full,
and then you'll start to see the real ability of the underlying
drives. Various types of write IO, e.g. synchronous, bypass cache
and will be slower from the start. There are options for fio to
simulate that.

Typically you care about:

- random read and random write performance as measured in IO
operations per second (IOPS) at a given mix of read/write and IO
size, e.g. "100% randreads at 4KiB"

- streaming (sequential) read and write bandwidth in MB/sec at a
given IO size

- Different levels of concurrency, e.g. queue depth of 1, 8, 64…

A modern HDD doesn't have much concurrency for the simple reason
that it generally has just one arm, whereas a modern NVMe may have
a queue depth of 64 or more.

You might expect 150 IOPS and 150MB/s 4K streaming read out of a
decent HDD, and thousands or millions of IOPSs and more than a GB/s
out of a modern NVMe.

Drive manufacturers often list IOPS and MB/sec figures for 4K IOs in
their spec sheets.

You can target fio at your underlying drives, or at any other block
device, so you can test filesystems, RAID arrays, zfs, etc. Though
obviously the write tests will write data and may destroy the
contents of the device unless done in a filesystem directory.

Thanks,
Andy

Felix Miata

unread,

Jan 16, 2024, 11:10:06 PM1/16/24

to

gene heskett composed on 2024-01-16 20:08 (UTC-0500):

> Felix Miata wrote:

>> I straightened out the wrapping mess, and gave each entry a line number. I see
>> nothing I recognize as representing serial number duplication among /dev/sdX
>> (physical device) names:

>> /dev/sda 9 /dev/disk/by-id/ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T201730V
>> /dev/sdd 19 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302507V
>> /dev/sde 28 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302502E
>> /dev/sdf 36 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302498T
>> /dev/sdg 43 /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R302509W
>> /dev/sdh 51 /dev/disk/by-id/ata-Gigastone_SSD_GSTD02TB230102
>> /dev/sdi 53 /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146
>> /dev/sdk 55 /dev/disk/by-id/ata-Gigastone_SSD_GSTG02TB230206

>> Exactly which line numbers represent duplication among the physical drives?

> lsblk, which I've published several times, shows 5 drives. by-id listing
> only shows 3. The drive I've been trying to use bounces from /dev/sdd to
> sde to sdh dependin on which controller it is curently plugged into.

>From your 2024-01-15 17:56 -0500 post, I see 8 unique serial numbers from SATA
SSDs, 5 Samsung, 3 Gigastone.

I ignore all your posts with lsblk that didn't use the -f option to facilitate
identifying individual SSDs.

> And I've since tried cp in addition to rsync, does the same thing,
> killing the sysytem with the OOM but much quicker. cp using all system
> memory (32Gb) in 1 minute, another 500K into swap adds another 15 secs,
> and the OOM kills the system. So both cp and rsync act broken.

> rsync, with a --bwlimit=3m set, takes much longer to kill the system but
> the amount of data moved is very similar, 13.5G from clean disk to

> system freeze for rsync, 13.4G for cp.--

David Christensen

unread,

Jan 17, 2024, 1:00:05 AM1/17/24

to

On 1/16/24 17:08, gene heskett wrote:
> lsblk, which I've published several times, shows 5 drives. by-id listing
> only shows 3. The drive I've been trying to use bounces from /dev/sdd to
> sde to sdh dependin on which controller it is curently plugged into.
>

> And I've since tried cp in addition to rsync, does the same thing,
> killing the sysytem with the OOM but much quicker. cp using all system
> memory (32Gb) in 1 minute, another 500K into swap adds another 15 secs,
> and the OOM kills the system. So both cp and rsync act broken.
>
> rsync, with a --bwlimit=3m set, takes much longer to kill the system but
> the amount of data moved is very similar, 13.5G from clean disk to
> system freeze for rsync, 13.4G for cp.

On 1/16/24 18:10, gene heskett wrote:
> On 1/16/24 11:08, Thomas Schmitt wrote:
>> ls -l /dev/sd[ij]*
> oot@coyote:~# ls -l /dev/sd[ij]*
> brw-rw---- 1 root disk 8, 128 Jan 16 05:01 /dev/sdi
> brw-rw---- 1 root disk 8, 129 Jan 16 05:01 /dev/sdi1
> brw-rw---- 1 root disk 8, 144 Jan 16 05:01 /dev/sdj
> brw-rw---- 1 root disk 8, 145 Jan 16 05:01 /dev/sdj1
> root@coyote:~#
>
> lsblk -d -o NAME,MAJ:MIN,MODEL,SERIAL,WWN /dev/sd[hijkl]
> gene@coyote:~/src/klipper-docs$ lsblk -d -o
> NAME,MAJ:MIN,MODEL,SERIAL,WWN /dev/sd[hijkl]
> NAME MAJ:MIN MODEL         SERIAL         WWN
> sdh    8:112 Gigastone SSD GSTD02TB230102
> sdi    8:128 Gigastone SSD GST02TBG221146
> sdj    8:144 Gigastone SSD GST02TBG221146
> sdk    8:160 Gigastone SSD GSTG02TB230206
> sdl    8:176 Gigastone SSD GSTG02TB230206

I suggest removing one GST02TBG221146 and one GSTG02TB230206. Put them
on the shelf, in other computer(s), or sell them. Then perhaps copying
the /home RAID10 2 TB to one Gigastone 2 TB SSD would work.

David

Thomas Schmitt

unread,

Jan 17, 2024, 2:50:06 AM1/17/24

to

Hi,

Gene Heskett wrote:
> lsblk, which I've published several times, shows 5 drives.

Duh. Obviously this thread overstretches my mental capacity.

> And I've since tried cp in addition to rsync, does the same thing, killing
> the sysytem with the OOM but much quicker. cp using all system memory (32Gb)
> in 1 minute, another 500K into swap adds another 15 secs, and the OOM kills
> the system. So both cp and rsync act broken.

I get the suspicion that your disk set overstretches the mental capacity
of the hardware or the operating system.
Both "cp" and "rsync" are heavily tested by the GNU/Linux community and
quite independently developed. A common memory leak would have to sit
deeper in the software stack, i.e. in kernel or firmware.

> rsync, with a --bwlimit=3m set, takes much longer to kill the system but the
> amount of data moved is very similar, 13.5G from clean disk to system freeze
> for rsync, 13.4G for cp.

This observation might be significant. But i fail to make up a theory.

> gene@coyote:~/src/klipper-docs$ lsblk -d -o NAME,MAJ:MIN,MODEL,SERIAL,WWN /dev/sd[hijkl]
> NAME MAJ:MIN MODEL SERIAL WWN
> sdh 8:112 Gigastone SSD GSTD02TB230102
> sdi 8:128 Gigastone SSD GST02TBG221146
> sdj 8:144 Gigastone SSD GST02TBG221146
> sdk 8:160 Gigastone SSD GSTG02TB230206
> sdl 8:176 Gigastone SSD GSTG02TB230206

This is just weird.
I still have difficulties to believe that any disk manufacturer would
hand out disks with colliding serial numbers. I googled for this
phenomenon, but except two mails of Gene nothing similar popped up.

One of these mails from a thread in december reveals that the three
unique serial numbers GSTD02TB230102, GST02TBG221146, GSTG02TB230206
each come with a different version of "1C0", "7A0", "5A0", respectively.
https://www.mail-archive.com/debia...@lists.debian.org/msg799307.html
That's unexpected, too, as the disk properties look identical elsewise.

I guess that it is not possible to identify which disk came with which
of the two separate purchases ?
How many days were these purchases apart ?

David Christensen wrote:
> I suggest removing one GST02TBG221146 and one GSTG02TB230206. Put them on
> the shelf, in other computer(s), or sell them. Then perhaps copying the
> /home RAID10 2 TB to one Gigastone 2 TB SSD would work.

I join this proposal.
... and dimly remember to have seen the proposal to attach the disks
one by one without the other four, in order to see whether the serial
numbers are the same as with all five together.

Since you got quite some hardware zoo:
Consider to try the Gigastone disks with a different machine.
Do the serial numbers show up as with the machine where you experience
all those difficulties.

gene heskett

unread,

Jan 17, 2024, 9:34:41 AM1/17/24

to

> .
Or LABEL them.
And I seem to be making some progress this morning. opening a konsole
and setting scrollback to 200 lines, limiting its use of memory, the tan
memory bar in htop if full scale and it a couple megs into swap out of
107G. and the system still feels normal.
in another multitabbed xfce4 shell, a "df && free" is showing this:
root@coyote:~# df && free
Filesystem 1K-blocks Used Available Use% Mounted on
udev 16327704 0 16327704 0% /dev
tmpfs 3272684 1904 3270780 1% /run
/dev/sda1 863983352 22346276 797675428 3% /
tmpfs 16363420 1244 16362176 1% /dev/shm
tmpfs 5120 8 5112 1% /run/lock
/dev/sda3 47749868 580 45291280 1% /tmp
/dev/md0p1 1796382580 335100148 1369957456 20% /home
tmpfs 3272684 3752 3268932 1% /run/user/1000
/dev/sdh1 1967892164 23830812 1844025104 2% /mnt/homevol
total used free shared buff/cache
available
Mem: 32726840 3343048 218316 922196 30443960
29383792
Swap: 111902712 1536 111901176
root@coyote:~#

rsync has been stopped and restarted, 4 times, but stopping it has not
recovered the cache, so swap is increasing slowly.
That faint knocking sound? Me, knocking on wood... ;o)>

command line: rsync -a --bwlimit=10m --fsync --progress /home/ /mnt/homevol

So we'll eventually either git-r-done or crask the system but this is
farther than it ever got before in several days.

Thanks everybody.

gene heskett

unread,

Jan 17, 2024, 10:40:05 AM1/17/24

to

On 1/17/24 02:42, Thomas Schmitt wrote:
> Hi,
>
> Gene Heskett wrote:
>> lsblk, which I've published several times, shows 5 drives.
>
> Duh. Obviously this thread overstretches my mental capacity.
>
>
>> And I've since tried cp in addition to rsync, does the same thing, killing
>> the sysytem with the OOM but much quicker. cp using all system memory (32Gb)
>> in 1 minute, another 500K into swap adds another 15 secs, and the OOM kills
>> the system. So both cp and rsync act broken.
>
> I get the suspicion that your disk set overstretches the mental capacity
> of the hardware or the operating system.
> Both "cp" and "rsync" are heavily tested by the GNU/Linux community and
> quite independently developed. A common memory leak would have to sit
> deeper in the software stack, i.e. in kernel or firmware.

kernel. firmware, or terminals scroll back memory, I purposely set this
particular terminals scrollback to 200 lines with that in mind.

>
>> rsync, with a --bwlimit=3m set, takes much longer to kill the system but the
>> amount of data moved is very similar, 13.5G from clean disk to system freeze
>> for rsync, 13.4G for cp.
>
> This observation might be significant. But i fail to make up a theory.

One of the things I'm fairly good at, they gave all 7nth graders the
Iowa test in 1947, similar to the S/B IQ test but not copyrighted, there
fore a lot cheaper, and I came out of that with an equivalent of 147. I
quit school 2 years later when I could and went to work fixing tv's. Had
my draft number moved up in '52 in the middle of korea to get that out
of the way, drafted was 2 years, volunteered was 4 years, but failed the
AFQT by getting a 98 out of 100, which earned me a 4F classification
because I wouldn't take orders from the Sargent, I find out the next
best score that day among 130+ boys was 36/100 which freed me to let a
girl become my wife in '57, & started making kids, got a 1st phone in
1962 without cracking a book, did the same thing in 1972 to become a
registered CET which I'll readily admit is getting rusty in my dotage at
89 yo. The technology is slowly passing me by since I retired in the
middle of 2002. Because I went diabetic in the '80's, my beer limit is
1, but I'd do it with any of you folks if we ever meet in person. Let
the war stories flow. ;o)> <-smiley with a goatee.

That copy is now up to 4x the data copied in any other try.

root@coyote:~# df && free
Filesystem 1K-blocks Used Available Use% Mounted on
udev 16327704 0 16327704 0% /dev
tmpfs 3272684 1904 3270780 1% /run

/dev/sda1 863983352 22346308 797675396 3% /

tmpfs 16363420 1244 16362176 1% /dev/shm
tmpfs 5120 8 5112 1% /run/lock

/dev/sda3 47749868 612 45291248 1% /tmp
/dev/md0p1 1796382580 335101664 1369955940 20% /home

tmpfs 3272684 3752 3268932 1% /run/user/1000

/dev/sdh1 1967892164 64369552 1803486364 4% /mnt/homevol

total used free shared buff/cache
available

Mem: 32726840 3453372 199708 919044 30336824
29273468
Swap: 111902712 1536 111901176
And swap use has not increased, its stabilized.

>
>> gene@coyote:~/src/klipper-docs$ lsblk -d -o NAME,MAJ:MIN,MODEL,SERIAL,WWN /dev/sd[hijkl]
>> NAME MAJ:MIN MODEL SERIAL WWN
>> sdh 8:112 Gigastone SSD GSTD02TB230102
>> sdi 8:128 Gigastone SSD GST02TBG221146
>> sdj 8:144 Gigastone SSD GST02TBG221146
>> sdk 8:160 Gigastone SSD GSTG02TB230206
>> sdl 8:176 Gigastone SSD GSTG02TB230206
>
> This is just weird.
> I still have difficulties to believe that any disk manufacturer would
> hand out disks with colliding serial numbers. I googled for this
> phenomenon, but except two mails of Gene nothing similar popped up.
>
> One of these mails from a thread in december reveals that the three
> unique serial numbers GSTD02TB230102, GST02TBG221146, GSTG02TB230206
> each come with a different version of "1C0", "7A0", "5A0", respectively.

Which is why, when I let my imagination out to play w/o a chaperone, my
thoughts run toward some invented date code for a batch number.

> https://www.mail-archive.com/debia...@lists.debian.org/msg799307.html
> That's unexpected, too, as the disk properties look identical elsewise.
>
> I guess that it is not possible to identify which disk came with which
> of the two separate purchases ?

Once removed from the boxes, no.

> How many days were these purchases apart ?

6 weeks or so, as I formulated what to do next. But that isn't carved
even in sandstone.

>
> David Christensen wrote:
>> I suggest removing one GST02TBG221146 and one GSTG02TB230206. Put them on
>> the shelf, in other computer(s), or sell them. Then perhaps copying the
>> /home RAID10 2 TB to one Gigastone 2 TB SSD would work.
>
> I join this proposal.
> ... and dimly remember to have seen the proposal to attach the disks
> one by one without the other four, in order to see whether the serial
> numbers are the same as with all five together.

Not as easily tried, the other 4 are in twin mounts in another portion
of the drive cages in this 30" tall tiger direct cage and not too
readily accessible w/o tipping the mobo out on its hinged mount.

> Since you got quite some hardware zoo:
> Consider to try the Gigastone disks with a different machine.
> Do the serial numbers show up as with the machine where you experience
> all those difficulties.

Again, that has not been tried.

>
> Have a nice day :)
>
> Thomas

Back at you Tomas, thanks for your patience to you, Felix and David &
pocket. One of you made the remark that seems to be the secret password.
Its still, slowly at 10 megs a second, working. And I appreciate it, a lot.

Thomas Schmitt

unread,

Jan 17, 2024, 11:40:05 AM1/17/24

to

Hi,

after i began enumerating suspects, gene heskett wrote:
> terminals scroll back memory, I purposely set this
> particular terminals scrollback to 200 lines with that in mind.

How large was it set when your runs caused the OOM killer to act ?

I have a good number of xterms with 10,000 lines each. No tabs, no KDE,
but 8 fvwm "desktops" (virtual screens) full of terminal windows.

> > [Request to test the disks one-by-one on some other computer, whether
> > they bear the same serial number at all controllers in all machines.]

> Not as easily tried, the other 4 are in twin mounts in another portion of
> the drive cages in this 30" tall tiger direct cage and not too readily
> accessible w/o tipping the mobo out on its hinged mount.

One should raise protest at Gigastone if the disks really have the same
serial numbers. But before doing so, one would have to make sure that
it is not some weird effect of them all being plugged into that machine
at the same time.

> One of you made the remark that seems to be the secret password.

What did finally help ? Just the shorter terminal scroll back memory ?

It would explain why a verbose rsync could summon the OOM killer always
around the same stage of progress. But what waste of memory would have
to happen with each of the rsync messages ?

(You mentioned LABEL as a possibility. But not as actually used.)

> Its still, slowly at 10 megs a second, working.

I see in your previous mail rsync option --bwlimit=10m . But in the
same mail there is an older quote from you that --bwlimit=3m only
prolonged the time until the OOM killer appeared.
So i wonder whether it would work at a more contemporary speed.

------------------------------------------------------------------------
Self-incrimination: The rest of this mail is off topic.

> they gave all 7nth graders the Iowa
> test in 1947, similar to the S/B IQ test but not copyrighted, there fore a
> lot cheaper, and I came out of that with an equivalent of 147.

I was tested in the 1960s but they did not tell the results to kids or
parents. We only got recommendations at which of our three types of school
we should continue at the age of 10 or 11 years.
(So it was not to avoid discrimination of the dumb but rather to avoid
that pupils feel more intelligent than their teachers.)

Curt

unread,

Jan 17, 2024, 11:40:08 AM1/17/24

to

On 2024-01-17, Thomas Schmitt <scdb...@gmx.net> wrote:
>
> This is just weird.
> I still have difficulties to believe that any disk manufacturer would
> hand out disks with colliding serial numbers. I googled for this
> phenomenon, but except two mails of Gene nothing similar popped up.

I discovered a couple of discussions of the phenomenon, the upshot of which
were:

1) That's what you get when you purchase cheap SSDs.

https://www.reddit.com/r/truenas/comments/s0rrpo/two_sata_ssds_with_identical_serial_numbers/

2) SSDs belonging to the same software RAID show identical serial numbers
in software, but these numbers don't match the serial numbers printed on the SSDs themselves.

https://www.reddit.com/r/truenas/comments/s0rrpo/two_sata_ssds_with_identical_serial_numbers/

But you said *similar*. As Gene's threads have too many movable parts
for me to follow, on that point I couldn't say.

David Christensen

unread,

Jan 17, 2024, 12:10:05 PM1/17/24

to

On 1/16/24 23:46, Thomas Schmitt wrote:

> Gene Heskett wrote:
> One of these mails from a thread in december reveals that the three
> unique serial numbers GSTD02TB230102, GST02TBG221146, GSTG02TB230206
> each come with a different version of "1C0", "7A0", "5A0", respectively.
> https://www.mail-archive.com/debia...@lists.debian.org/msg799307.html
> That's unexpected, too, as the disk properties look identical elsewise.

Thank you for locating the lshw(1) output. It appears to have been run
when one Gigastone SSD was on the motherboard SATA controller and four
Gigastone SSD's were on the 6-port HBA:

2024-01-17 08:58:54 dpchrist@laalaa ~
$ egrep 'sata|disk|product|version|serial' gene-heskett-coyote-lshw.out
| grep -B 1 -A 2 Gigastone
*-disk:1
product: Gigastone SSD
version: 7A0
serial: GST02TBG221146
--
*-disk:0
product: Gigastone SSD
version: 7A0
serial: GST02TBG221146
--
*-disk:1
product: Gigastone SSD
version: 5A0
serial: GSTG02TB230206
--
*-disk:2
product: Gigastone SSD
version: 5A0
serial: GSTG02TB230206
--
*-disk:3
product: Gigastone SSD
version: 1C0
serial: GSTD02TB230102

David

David Christensen

unread,

Jan 17, 2024, 12:10:07 PM1/17/24

to

On 1/17/24 06:18, gene heskett wrote:
> On 1/17/24 00:52, David Christensen wrote:
>> I suggest removing one GST02TBG221146 and one GSTG02TB230206. Put
>> them on the shelf, in other computer(s), or sell them. Then perhaps
>> copying the /home RAID10 2 TB to one Gigastone 2 TB SSD would work.
>

> Or LABEL them.

I suspect the conflicting serial numbers are causing problems in the
kernel, as indicated by the /dev/disk/by-id/* problems. I would remove
one each of the duplicate serial number disks to eliminate that possibility.

David
> Cheers, Gene Heskett.

Thomas Schmitt

unread,

Jan 17, 2024, 12:20:05 PM1/17/24

to

Hi,

Curt wrote:
> I discovered a couple of discussions of the phenomenon, the upshot of which
> were:
> 1) That's what you get when you purchase cheap SSDs.
> https://www.reddit.com/r/truenas/comments/s0rrpo/two_sata_ssds_with_identical_serial_numbers/
> 2) SSDs belonging to the same software RAID show identical serial numbers
> in software, but these numbers don't match the serial numbers printed on the
> SSDs themselves.
> https://www.reddit.com/r/truenas/comments/s0rrpo/two_sata_ssds_with_identical_serial_numbers/

Those URLs are identical. (OMG ! Is it contageous ?)

Number 2 would match my suspicion that some layer in the disk driving
gets confused and mixes up the serial numbers.

> But you said *similar*.

By "colliding serial numbers" i mean indeed "identical serial numbers".

How cheap the disks may ever be, that would be no excuse for not making
them individually distinguishable.

> As Gene's threads have too many movable parts
> for me to follow, on that point I couldn't say.

This one begins to gain presence in the web. So one can use search engines
and AI to untangle its sub-threads. I meanwhile participate in two of them:
serial number collision, rsync caused OOM killer (solved now, but how ?).

Thomas Schmitt

unread,

Jan 17, 2024, 12:30:05 PM1/17/24

to

Hi,

David Christensen wrote:
> I suspect the conflicting serial numbers are causing problems in the kernel,
> as indicated by the /dev/disk/by-id/* problems.

That's not in the kernel but in udev/systemd's process of creating the
symbolic links in /dev/disk/by-id/.
It gets /dev/sd[h-l] and /dev/sd[h-l]1 as kernel generated device files.
But sd[ij] and also sd[hl] show pair-wise the same serial numbers.
In case of sd[ij] the outcome is mixed: links to sdi and sdj1 survive.
In case of sd[hl] we see a less strange outcome: sdh and sdh1, while
sdl and sdl1 are missing.

The open question (at least to me) is whether it's the disks or the
controllers or the drivers which cause the duplication.

Thomas Schmitt

unread,

Jan 17, 2024, 2:40:06 PM1/17/24

to

Hi,

i see that i messed up "h" and "k" in my explanation of the fight over
the link targets in /dev/disk/by-id. So another attempt:

sdh has a unique serial number GSTD02TB230102. Thus we see in
https://lists.debian.org/debian-user/2024/01/msg00667.html
these two links:

/dev/sdh /dev/disk/by-id/ata-Gigastone_SSD_GSTD02TB230102
/dev/sdh1 /dev/disk/by-id/ata-Gigastone_SSD_GSTD02TB230102-part1

sdi and sdj share the serial number GST02TBG221146. So the concurrent
attempts to create the links let only these two survive:

/dev/sdi /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146
/dev/sdj1 /dev/disk/by-id/ata-Gigastone_SSD_GST02TBG221146-part1

sdk and sdl share GSTG02TB230206. The survivors are:

/dev/sdk /dev/disk/by-id/ata-Gigastone_SSD_GSTG02TB230206
/dev/sdk1 /dev/disk/by-id/ata-Gigastone_SSD_GSTG02TB230206-part1

The next system startup might yield other survivors.

gene heskett

unread,

Jan 17, 2024, 3:20:06 PM1/17/24

to

On 1/17/24 11:30, Thomas Schmitt wrote:
> Hi,
>
> after i began enumerating suspects, gene heskett wrote:
>> terminals scroll back memory, I purposely set this
>> particular terminals scrollback to 200 lines with that in mind.
>
> How large was it set when your runs caused the OOM killer to act ?

different terminal, xfce4's is apparently unlimited but can't find it in
the config prefs.

> I have a good number of xterms with 10,000 lines each. No tabs, no KDE,
> but 8 fvwm "desktops" (virtual screens) full of terminal windows.

12 workspaces with 1 to 8 tabs open. 32G of main memory.

>
>
>>> [Request to test the disks one-by-one on some other computer, whether
>>> they bear the same serial number at all controllers in all machines.]
>
>> Not as easily tried, the other 4 are in twin mounts in another portion of
>> the drive cages in this 30" tall tiger direct cage and not too readily
>> accessible w/o tipping the mobo out on its hinged mount.
>
> One should raise protest at Gigastone if the disks really have the same
> serial numbers. But before doing so, one would have to make sure that
> it is not some weird effect of them all being plugged into that machine
> at the same time.

Should not be a problem if labeled uniquely. And that's easily affected
by gparted.

>
>
>> One of you made the remark that seems to be the secret password.
>
> What did finally help ? Just the shorter terminal scroll back memory ?

That, and possibly the --bwlimit=10m, giving the SSD time to keep their
stuff in one sock.

> It would explain why a verbose rsync could summon the OOM killer always
> around the same stage of progress. But what waste of memory would have
> to happen with each of the rsync messages ?
>
> (You mentioned LABEL as a possibility. But not as actually used.)
>
>
>> Its still, slowly at 10 megs a second, working.
>
> I see in your previous mail rsync option --bwlimit=10m . But in the
> same mail there is an older quote from you that --bwlimit=3m only
> prolonged the time until the OOM killer appeared.
> So i wonder whether it would work at a more contemporary speed.

A probably informative test.
But as yet not tested.

>
> ------------------------------------------------------------------------
> Self-incrimination: The rest of this mail is off topic.
>
>> they gave all 7nth graders the Iowa
>> test in 1947, similar to the S/B IQ test but not copyrighted, there fore a
>> lot cheaper, and I came out of that with an equivalent of 147.
>
> I was tested in the 1960s but they did not tell the results to kids or
> parents. We only got recommendations at which of our three types of school
> we should continue at the age of 10 or 11 years.

That I believe was the intention but one of the teachers was a blabbermouth.

> (So it was not to avoid discrimination of the dumb but rather to avoid
> that pupils feel more intelligent than their teachers.)

That avoidance was untenable, in the 1st semester of my freshman year I
got thrown out of the senior physics class for correcting an erroneous
statement by the teacher that was patently at odds with Newton's 3rd law
of motion. For every action, there is an equal but opposite reaction.

Pretty basic stuff. But correcting the teacher in front of the other
students was absolutely not to be tolerated. But I felt correcting him
AND setting it straight was more important to the rest of the nominally
20 students present than any embarrassment it may have caused him.

Same with the papered EE's who can't understand that E=MV2 does not have
a speed floor, below which its doesn't work when the electron beam in a
klystron amplifier is only moving at a potential of 20,000 volts. The
problem not understood is that the amplification is obtained not from a
current variation, but a velocity variation induced by a 1 watt signal
speeding up or slowing down the passing beam as it traverses the first
cavity of 4, the next two to control the bandwidth, the last one picks
30 kilowatts back off the beam by the capacitative coupling effects as
the beam goes on thru into a copper funnel cooled by 70 gallons of very
pure water to absorb the end of that beam which takes around 125
kilowatts to generate.

But that beams electrons have mass, another name for weight, and one
watt to slow them slows them more than 1 watt to speed them up speeds
them up, so at high power levels, the tube is effective longer in terms
of the transit time. This puts a time of flight error into the signal we
didn't know how to pre-distort for in the 1970's. A very dependable way
to generate transmitter power levels that was also not very efficient,
95% of the uhf stations that went dark in those years were bankrupted by
the power bills even at 3 cents a kw.

So there was a huge financial push to find a better method as that time
distortion would have killed hidef tv before it ever got out of the
laboritory,

And E=MV2 is as valid at 25 mph as it is at C speed, nominally 186,272
miles per second.

Yup, I understand Albert Eintein's theory. Did this help you to
understand it? I hope so.

> Have a nice day :)
>
> Thomas
>

Thanks for pulling the trigger of the teacher I can be, Thomas. BTW,
rsync is still toddling along.

root@coyote:~# df && free
Filesystem 1K-blocks Used Available Use% Mounted on
udev 16327704 0 16327704 0% /dev
tmpfs 3272684 1904 3270780 1% /run

/dev/sda1 863983352 22348260 797673444 3% /

tmpfs 16363420 1244 16362176 1% /dev/shm
tmpfs 5120 8 5112 1% /run/lock

/dev/sda3 47749868 680 45291180 1% /tmp
/dev/md0p1 1796382580 335104628 1369952976 20% /home
tmpfs 3272684 3756 3268928 1% /run/user/1000
/dev/sdh1 1967892164 239677008 1628178908 13% /mnt/homevol

total used free shared buff/cache
available

Mem: 32726840 3552420 307328 921164 30132056
29174420
Swap: 111902712 1536 111901176

So the copy is a little over 2/3rds done in nominally 8 hours.

gene heskett

unread,

Jan 17, 2024, 3:20:06 PM1/17/24

to

On 1/17/24 11:38, Curt wrote:
> On 2024-01-17, Thomas Schmitt <scdb...@gmx.net> wrote:
>>
>> This is just weird.
>> I still have difficulties to believe that any disk manufacturer would
>> hand out disks with colliding serial numbers. I googled for this
>> phenomenon, but except two mails of Gene nothing similar popped up.
>
> I discovered a couple of discussions of the phenomenon, the upshot of which
> were:
>
> 1) That's what you get when you purchase cheap SSDs.
>
> https://www.reddit.com/r/truenas/comments/s0rrpo/two_sata_ssds_with_identical_serial_numbers/
>
> 2) SSDs belonging to the same software RAID show identical serial numbers
> in software, but these numbers don't match the serial numbers printed on the SSDs themselves.

But the drives in question are not yet and never have been in a raid
just plugged in awaiting my putting them to work.

>
> https://www.reddit.com/r/truenas/comments/s0rrpo/two_sata_ssds_with_identical_serial_numbers/
>
> But you said *similar*. As Gene's threads have too many movable parts
> for me to follow, on that point I couldn't say.
>

> .

gene heskett

unread,

Jan 17, 2024, 3:40:05 PM1/17/24

to

On 1/17/24 12:27, Thomas Schmitt wrote:
> Hi,
>
> David Christensen wrote:
>> I suspect the conflicting serial numbers are causing problems in the kernel,
>> as indicated by the /dev/disk/by-id/* problems.
>
> That's not in the kernel but in udev/systemd's process of creating the
> symbolic links in /dev/disk/by-id/.
> It gets /dev/sd[h-l] and /dev/sd[h-l]1 as kernel generated device files.
> But sd[ij] and also sd[hl] show pair-wise the same serial numbers.
> In case of sd[ij] the outcome is mixed: links to sdi and sdj1 survive.
> In case of sd[hl] we see a less strange outcome: sdh and sdh1, while
> sdl and sdl1 are missing.
>

missing because the original command line did not look at sdl.
I added the l and it showed up. No magic.

> The open question (at least to me) is whether it's the disks or the
> controllers or the drivers which cause the duplication.

Neither, a typu in the original command.

>
>
> Have a nice day :)
>
> Thomas
>

gene heskett

unread,

Jan 17, 2024, 3:40:05 PM1/17/24

to

By LABELing the partitions uniquely, that problem so far as I can see,
is solved.

The OOM death of the system was the xfce4 terminal apparently being set
for unlimited scrollback and that was eating the memory. Switching to
Konsole with has the ability to control the scrollback to 200 lines, and
its taken all 32G's as .cache and 1536 1k blocks of swap, and its
working w/o any OOM actions I've detected.

>
> Have a nice day :)
>
> Thomas

gene heskett

unread,

Jan 17, 2024, 4:10:05 PM1/17/24

to

Everything you see flying by when the -v is in the opts, and some of the
pathnames are 250-300 bytes long.

>> (You mentioned LABEL as a possibility. But not as actually used.)

Yes I have, repeatedly.

>>
>>> Its still, slowly at 10 megs a second, working.
>>
>> I see in your previous mail rsync option --bwlimit=10m . But in the
>> same mail there is an older quote from you that --bwlimit=3m only
>> prolonged the time until the OOM killer appeared.
>> So i wonder whether it would work at a more contemporary speed.

I can't change it for testing? Boggles my mind.

I forgot to mention that 70 gallons figure is a per minute value
supplied by a 15 hp ingersol-rand pump. A semi sealed system that has a
4' wide x8' long x1.5' thick radiator supplied with external cooling air
by a another 20 horse motor. Rigged by vent louvers to control the air
flow to maintain the water above freezing. That 20 horse had the power
to blow that whole louver out into the field behind the building when
the modutrol motor that controlled that hot air exit louver failed to
open it at signon time one morning. Panic call from the remote control
site as it was only about 20F outside and the water was getting colder,
endangering the radiators freezing and $250,000 dollars worth of klystrons.

Thomas Schmitt

unread,

Jan 17, 2024, 4:20:06 PM1/17/24

to

Hi,

i wrote:
> > What did finally help ? Just the shorter terminal scroll back memory ?

gene heskett wrote:
> That, and possibly the --bwlimit=10m, giving the SSD time to keep their
> stuff in one sock.

Then i place my bet on the terminal alone.
Linux is able to handle disk-to-disk copies that are larger than the
available memory. This is a standard use case.

> > How large was it set when your runs caused the OOM killer to act ?

> different terminal, xfce4's is apparently unlimited but can't find it in the
> config prefs.

I normally start new xterms by

xterm -ls -geometry 80x24 -bg wheat -fg black -sl 10000 +sb &

The -sl option gives the number of lines to be memorized for scrollback.
Black-on-wheat is a calmative color combination which does not overwork
the eyes.

David Christensen

unread,

Jan 17, 2024, 4:20:06 PM1/17/24

to

Thank you for the explanation.

I would still remove them.

David

gene heskett

unread,

Jan 17, 2024, 4:40:06 PM1/17/24

to

Thank you, I did not know that.

>
> Have a nice day :)
>
> Thomas
>

David Christensen

unread,

Jan 17, 2024, 4:50:06 PM1/17/24

to

On 1/17/24 12:30, gene heskett wrote:
> By LABELing the partitions uniquely, that problem so far as I can
> see, is solved.

Okay.

So, are you confident that your motherboard ports, HBA ports, and SSD's
are all working correctly now?

> The OOM death of the system was the xfce4 terminal apparently being
> set for unlimited scrollback and that was eating the memory.
> Switching to Konsole with has the ability to control the scrollback
> to 200 lines, and its taken all 32G's as .cache and 1536 1k blocks of
> swap, and its working w/o any OOM actions I've detected.

Okay.

Xfce -> Terminal Emulator -> right click on screen -> Preferences ->
General -> Scrolling:

Scrollback 200
Unlimited scrollback uncheck

Using tee(1) would allow you to both monitor progress and save standard
output and/or standard error (via shell redirection).

A related issue is that lots of standard output can slow a program.
Minimizing a terminal can help. Redirecting standard output to a file
or to /dev/null can help, especially when done on the remote host while
using ssh(1).

The best solution is to tell rsync(1) not to generate messages on
standard output -- do not use --verbose, do not use --info, do not use
--progress, etc.; use --quiet, etc..

David

gene heskett

unread,

Jan 17, 2024, 7:00:06 PM1/17/24

to

All good hints after it is done. Now the question is how did it make
this: homevol s/b very close to /home in size but:

root@coyote:~# df && free
Filesystem 1K-blocks Used Available Use% Mounted on
udev 16327704 0 16327704 0% /dev

tmpfs 3272684 1912 3270772 1% /run
/dev/sda1 863983352 22348472 797673232 3% /

tmpfs 16363420 1244 16362176 1% /dev/shm
tmpfs 5120 8 5112 1% /run/lock

/dev/sda3 47749868 784 45291076 1% /tmp
/dev/md0p1 1796382580 335102676 1369954928 20% /home
tmpfs 3272684 4956 3267728 1% /run/user/1000
/dev/sdh1 1967892164 354519236 1513336680 19% /mnt/homevol

total used free shared buff/cache
available

Mem: 32726840 3417576 515520 934540 30072184
29309264
Swap: 111902712 2048 111900664
root@coyote:~#

It somehow changed 335G into 354G. Thinking the AppImages dir is full of
soft links of short names pointing at the long filename and had turned
the links into duplicates, that was the first thing I checked, but it
was all good soft-links, so where did the extra 19.4G's come from? Can
filesystem ext4's overhead account for that?
>
> David
>
Thanks David.
> .

David Christensen

unread,

Jan 17, 2024, 8:00:06 PM1/17/24

to

I suggest running rsync(1) with --dry-run, --log-file=FILE,
--itemize_changes, and whatever other options are needed to find the
differences. Please RTFM rsync(1) to choose your options. These look
useful:

--archive, -a (-rlptgoD)
--delete
--hard-links, -H
--one-file-system, -x
--sparse, -S

David

Steve McIntyre

unread,

Jan 17, 2024, 8:00:06 PM1/17/24

to

Andy Smith wrote:
>
>The newer set of people recommending partitions are mostly doing so
>because there's been a few incidents of "helpful" PC motherboards
>detecting on boot what they think is a corrupt GPT, and replacing it
>with a blank one, damaging the RAID. This is a real thing that has
>happened to more than one person; it even got linked on Hacker News
>I believe.
>
>Then there will just be people going by taste.
>
>Personally I still put them directly on drives. If I ever get taken
>out by one of those crappy motherboards, I reserve the right to get
>a different religion. 😀

I'm clearly a member of a third group of people,,, :-)

Putting partitions on the RAID drives helps *me* identify them.

--
Steve McIntyre, Cambridge, UK. st...@einval.com
Can't keep my eyes from the circling sky,
Tongue-tied & twisted, Just an earth-bound misfit, I...

gene heskett

unread,

Jan 17, 2024, 10:40:05 PM1/17/24

to

On 1/17/24 19:54, Steve McIntyre wrote:
> Andy Smith wrote:
>>
>> The newer set of people recommending partitions are mostly doing so
>> because there's been a few incidents of "helpful" PC motherboards
>> detecting on boot what they think is a corrupt GPT, and replacing it
>> with a blank one, damaging the RAID. This is a real thing that has
>> happened to more than one person; it even got linked on Hacker News
>> I believe.
>>
>> Then there will just be people going by taste.
>>
>> Personally I still put them directly on drives. If I ever get taken
>> out by one of those crappy motherboards, I reserve the right to get
>> a different religion. 😀
>
> I'm clearly a member of a third group of people,,, :-)
>
> Putting partitions on the RAID drives helps *me* identify them.
>

you aren't alone Steve.

David Wright

unread,

Jan 17, 2024, 10:50:05 PM1/17/24

to

On Wed 17 Jan 2024 at 15:34:09 (-0500), gene heskett wrote:
> On 1/17/24 12:27, Thomas Schmitt wrote:
> > David Christensen wrote:
> > > I suspect the conflicting serial numbers are causing problems in the kernel,
> > > as indicated by the /dev/disk/by-id/* problems.
> >
> > That's not in the kernel but in udev/systemd's process of creating the
> > symbolic links in /dev/disk/by-id/.
> > It gets /dev/sd[h-l] and /dev/sd[h-l]1 as kernel generated device files.
> > But sd[ij] and also sd[hl] show pair-wise the same serial numbers.
> > In case of sd[ij] the outcome is mixed: links to sdi and sdj1 survive.
> > In case of sd[hl] we see a less strange outcome: sdh and sdh1, while
> > sdl and sdl1 are missing.
> >
> missing because the original command line did not look at sdl.
> I added the l and it showed up. No magic.

What do you mean, it was "missing"? The original command, which I wrote
for you, contained a wildcard, so it doesn't miss anything that's there:

root@coyote:~# for j in /dev/disk/by-id/* ; do printf '%s\t%s\n' "$(realpath "$j")" "$j" ; done

and there was no sdl in the output from that command. In fact,
there was no "l" in your post between the "l" in "realpath",
above, and the "l" in "like", below:

root@coyote:~#
but like I wrote, 2 pairs with identical "serial numbers", so the

https://lists.debian.org/debian-user/2024/01/msg00658.html
shows this, that no sdl was seen under by-id/.

> > The open question (at least to me) is whether it's the disks or the
> > controllers or the drivers which cause the duplication.
> Neither, a typu in the original command.

Cheers,
David.

gene heskett

unread,

Jan 17, 2024, 11:30:05 PM1/17/24

to

Why --delete?

>     --hard-links, -H
>     --one-file-system, -x
>     --sparse, -S

or --sparse?

Well, my abundance of curiosity, may have killed the cat, but if I
understand how rsync's -a works, re-running the same command will only
update for the incoming email and any posts I've made while it was
running the first time. So the same command quoted last is now running
again. when it has exited, which it has now done in about 15 minutes
I'll edit fstab to remove the 60 gigs of swap on md1, remove the
existing mount of md0p1 as /home taking the raid10 completely out of the
system. And add the mounting of LABEL=homevolsdh1 as the /home partition
and reboot. In the event I have to re-install, the raid will still
contain my data and can be recovered.
I already have a dvd with the most recent netinstall burnt. All I have
to do is convince it to not install orca and brltty. Probably by
unplugging _all_ usb stuff except the keyboard and mouse buttons.

What would solve many of my problems is a bit of help from someone who
it running trinity to tell me how to install it on a system w/o any
installed gui which obviously disables synaptic. That leaves apt,
apt-get, and aptitude, unless there is a better way. aptitude is
uncontrollable, has fixed me once, has torn the system down to another
install 3 times so the odds are not in my favor.

So those fstab edits have been done, next is a reboot
>
>
> David

David Christensen

unread,

Jan 18, 2024, 1:00:05 AM1/18/24

to

If you have files on the destination from a previous run of rsync(1) and
they no longer exist on the source, --delete will get rid of extraneous
files on the destination.

>>      --hard-links, -H
>>      --one-file-system, -x
>>      --sparse, -S
> or --sparse?

First, you need to understand what "sparse file" means:

https://en.wikipedia.org/wiki/Sparse_file

If you have sparse files on the source -- say, 10 GB virtual machine
images -- then you want rsync(1) to create sparse files on the destination.

> Well, my abundance of curiosity, may have killed the cat, but if I
> understand how rsync's -a works, re-running the same command will only
> update for the incoming email and any posts I've made while it was
> running the first time. So the same command quoted last is now running
> again. when it has exited, which it has now done in about 15 minutes
> I'll edit fstab to remove the 60 gigs of swap on md1, remove the
> existing mount of md0p1 as /home taking the raid10 completely out of the
> system. And add the mounting of LABEL=homevolsdh1 as the /home partition
> and reboot. In the event I have to re-install, the raid will still
> contain my data and can be recovered.
> I already have a dvd with the most recent netinstall burnt. All I have
> to do is convince it to not install orca and brltty. Probably by
> unplugging _all_ usb stuff except the keyboard and mouse buttons.
>
> What would solve many of my problems is a bit of help from someone who
> it running trinity to tell me how to install it on a system w/o any
> installed gui which obviously disables synaptic. That leaves apt,
> apt-get, and aptitude, unless there is a better way. aptitude is
> uncontrollable, has fixed me once, has torn the system down to another
> install 3 times so the odds are not in my favor.
>
> So those fstab edits have been done, next is a reboot

You should be able to migrate your /home file system from RAID10 to an
SSD without needing to reinstall Debian.

Copying a file system that is mounted read-write is problematic. It is
best to remount it read-only, and then copy. This is hard to do when
you are logged in and using the file system you want to copy. Options
include rebooting into single-user root console or using live media.

To make an exact copy of the source, consider using a tool designed for
this task -- such as cpio(1), tar(1), or a backup/restore system such as
amanda(8).

If you use rsync(1), I suggest using some kind of integrity checking
tool to verify that the source and destination file systems are
identical. I prefer BSD mtree(8):

https://manpages.debian.org/bullseye/mtree-netbsd/mtree.8.en.html

(Be careful not to confuse the above with mtree(5) via libarchive.)

David

gene heskett

unread,

Jan 18, 2024, 1:50:07 AM1/18/24

to

The migration took two passes because udev can't make up its alleged
mind so I was finally forced to use the rescue mode to edit fstab to
mount it by UUID and that worked, I've got /home on the copy right now.
and I took the 60 G's of swap out too since I've never used more the 20G
with any gfx program, so I figure 47G's on /dev/sda is enough. So now
none of the raid is mounted, but the 30+ second lag when opening a write
path is still there, so I was erroneously blaming the raid. So I've
narrowed the problem but w/o a good clue what to do next. One thing that
bothers me is there is no way the installers parted shows partition
names for non-raid disks. To me that is a serious bug. It appears from
the help that it can LABEL a partition but can't read that LABEL. parted
when asked to print all does that just fine, but the | doesn't put it to
less, so it scrolls off screen the top 60% of a parted's print all
output at some fraction of C speed. Not exactly helpful. I have other
things to do while I cogitate on what to do next. Many thanks to all
that helped.

> If you use rsync(1), I suggest using some kind of integrity checking
> tool to verify that the source and destination file systems are
> identical. I prefer BSD mtree(8):

I assume I'd have to remount the raid like to /raid?
Whew! That's got more arguments than rsync...

>
> https://manpages.debian.org/bullseye/mtree-netbsd/mtree.8.en.html
>
>
> (Be careful not to confuse the above with mtree(5) via libarchive.)
>
>
> David
>

Charles Curley

unread,

Jan 18, 2024, 2:20:06 AM1/18/24

to

On Tue, 16 Jan 2024 21:10:28 -0500
gene heskett <ghes...@shentel.net> wrote:

> gene@coyote:~/src/klipper-docs$ lsblk -d -o
> NAME,MAJ:MIN,MODEL,SERIAL,WWN /dev/sd[hijkl]
> NAME MAJ:MIN MODEL SERIAL WWN
> sdh 8:112 Gigastone SSD GSTD02TB230102
> sdi 8:128 Gigastone SSD GST02TBG221146
> sdj 8:144 Gigastone SSD GST02TBG221146
> sdk 8:160 Gigastone SSD GSTG02TB230206
> sdl 8:176 Gigastone SSD GSTG02TB230206

Something is seriously wrong here. I worked at Maxtor for a while. They
went out of their way to be sure there were no duplicate serial
numbers.

Gene, I suggest you check these SNs with the SN on the packages (if
there is one) and on the label on the drive.

Also, take each drive, one at a time, attach it to another computer
with a fresh installation of Debian, one you haven't mucked with in any
way, and only one other drive already in it, and read the SNs there.

I also went looking for Gigastone's web site. Every page I tried at
gigastone.com led to what I presume was an Error 404 page. I say
presume because most of the text was in non-English, probably Chinese,
characters.

--
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/

Thomas Schmitt

unread,

Jan 18, 2024, 2:40:06 AM1/18/24

to

Hi,

gene heskett wrote:
> > where did the extra 19.4G's come from? Can filesystem
> > ext4's overhead account for that?

In an earlier mail:

> > > command line: rsync -a --bwlimit=10m --fsync --progress /home/ /mnt/homevol

David Christensen wrote:
> Please RTFM rsync(1) to choose your options. These look
> useful:
> --archive, -a (-rlptgoD)
> --delete

> --hard-links, -H
> --one-file-system, -x
> --sparse, -S

I bet on --hard-links and --sparse as means to avoid the extra disk space
consumption. (--archive is important for other reasons, but it was
already in use as -a with your successful rsync run. --delete will be
of importance if the rsync run gets repeated on the already filled target
directory tree.)

man rsync:

-H, --hard-links
This tells rsync to look for hard-linked files in the source and
link together the corresponding files on the destination. With‐
out this option, hard-linked files in the source are treated as
though they were separate files.
[...]
-S, --sparse
Try to handle sparse files efficiently so they take up less
space on the destination. [...]

One can observe a similar inflation effect when copying the files of a
Debian installation ISO to hard disk. In the original disk directory
on the machine which created the ISO there were hardlinked kernels and
firmware packages. In the ISO these link siblings share the same file
content storage.
But when mounted, the siblings get treated as separate files with
different inode numbers. So the 8,135,584 bytes of the hardlink siblings
/install.amd/gtk/vmlinuz
/install.amd/vmlinuz
/install.amd/xen/vmlinuz
get triplicated when these three files get copied out of the ISO.

I am somewhat astonished that --hard-links is not default in rsync,
as it is quite important for backup fidelity.
(On the other hand it is some effort to find all siblings on the disk.)

Sparse files are files with large areas of 0-bytes. Many filesystems
don't store the zeros but rather an instruction to hand out the given
number of 0-bytes when requested by a reader.

If i were you, i'd let rsync make a complete new copy with --hard-links
--sparse, and --delete, but without --bwlimit= in order to get a higher
copy fidelity and also to check whether the transfer speed really was not
to blame for the appearance of the OOM killer.

David Christensen

unread,

Jan 18, 2024, 4:00:05 AM1/18/24

to

On 1/17/24 22:44, gene heskett wrote:>> On 1/18/24 00:50, David

Christensen wrote:
> The migration took two passes because udev can't make up its alleged
> mind so I was finally forced to use the rescue mode to edit fstab to
> mount it by UUID and that worked, I've got /home on the copy right now.

Congratulations! :-)

> and I took the 60 G's of swap out too since I've never used more the 20G
> with any gfx program, so I figure 47G's on /dev/sda is enough.

1 GB swap works for me. When a memory leak gets out of control, I do
not have to wait long for the lock up.

> So now
> none of the raid is mounted, but the 30+ second lag when opening a write
> path is still there, so I was erroneously blaming the raid. So I've
> narrowed the problem

Good to know.

> but w/o a good clue what to do next.

Find the needle in the haystack or do a fresh install. I prefer the
latter, because I can estimate the effort and I am reasonably confident
of the outcome.

> One thing that
> bothers me is there is no way the installers parted shows partition
> names for non-raid disks. To me that is a serious bug. It appears from
> the help that it can LABEL a partition but can't read that LABEL.

When installing to UEFI/GPT, I am able to label partitions in the Debian
Installer, the labels are visible in the installer, and the labels
persist on disk after installation is complete.

> parted
> when asked to print all does that just fine, but the | doesn't put it to
> less, so it scrolls off screen the top 60% of a parted's print all
> output at some fraction of C speed. Not exactly helpful. I have other
> things to do while I cogitate on what to do next.

The following works as expected on my machine:

2024-01-18 00:34:41 root@laalaa ~
# parted -l | less

> Many thanks to all that helped.

YW. :-)

>> If you use rsync(1), I suggest using some kind of integrity checking
>> tool to verify that the source and destination file systems are
>> identical. I prefer BSD mtree(8):
>
> I assume I'd have to remount the raid like to /raid?
> Whew! That's got more arguments than rsync...

The old /home RAID10 still has its metadata on disk. I would install
the "mdadm" package, edit /etc/fstab, copy and rework the old /home line
(new mount point, add option "ro"), create the mount point, and mount.

David

gene heskett

unread,

Jan 18, 2024, 6:50:07 AM1/18/24

to

I believe mdadm is already installed. At least enough to collect and
mount this raid10 and use it for /home for the last nearly 2 years.
Now after all this folderall, all 4 of the SSD's are reporting read
errors at very high lba's.

all 4 drives are reporting the same poh, 21027 hours for the occurence
of the error, that sounds like it could be just one crash or dirty power
down. In which case it s/b repairable

Do we have a repair utility that will force the drive to reallocate a
spare sector and fix those?
I have issued a smartctl -tlong on all 4 drives, results in about 3 hours.

Andy Smith

unread,

Jan 18, 2024, 9:20:06 AM1/18/24

to

Hello,

On Thu, Jan 18, 2024 at 12:53:43AM +0000, Steve McIntyre wrote:
> I'm clearly a member of a third group of people,,, :-)

Oh, I didn't mean to imply that those going by taste were in a
minority! Taste, or possibly, "just never thought about it" could
well be the biggest group. I was only talking about my observations
of those who seem to hold strong opinions on this, usually to the
point where they will advocate "their way" to others.

> Putting partitions on the RAID drives helps *me* identify them.

So, I don't care what people do and I'm not trying to change your
mind. Would you mind going into what makes "sda1" more identifiable
for you than "sda" though?

Or is it that you make use of partition labels for some extra info?

Thanks,
Andy

--
https://bitfolk.com/ -- No-nonsense VPS hosting

Steve McIntyre

unread,

Jan 18, 2024, 11:10:06 AM1/18/24

to

Hey Andy.

Andy Smith wrote:
>
>On Thu, Jan 18, 2024 at 12:53:43AM +0000, Steve McIntyre wrote:
>> I'm clearly a member of a third group of people,,, :-)
>
>Oh, I didn't mean to imply that those going by taste were in a
>minority! Taste, or possibly, "just never thought about it" could
>well be the biggest group. I was only talking about my observations
>of those who seem to hold strong opinions on this, usually to the
>point where they will advocate "their way" to others.

ACK!

>> Putting partitions on the RAID drives helps *me* identify them.
>
>So, I don't care what people do and I'm not trying to change your
>mind. Would you mind going into what makes "sda1" more identifiable
>for you than "sda" though?
>
>Or is it that you make use of partition labels for some extra info?

If I'm looking at disks on a system, the first thing I'll look for is
the partition table. If a disk has a partition table with "Linux RAID"
partitions viaible, that gives me a strong hint of what I should
expect on the disk. Especially if I'm swappings disk around between
systems, commisioning new systems and re-using disks etc.

Nicholas Geovanis

unread,

Jan 18, 2024, 11:30:06 AM1/18/24

to

On Wed, Jan 17, 2024, 9:35 PM gene heskett <ghes...@shentel.net> wrote:

On 1/17/24 19:54, Steve McIntyre wrote:
> Andy Smith wrote:

.......

>> Then there will just be people going by taste.
>>
>> Personally I still put them directly on drives. If I ever get taken
>> out by one of those crappy motherboards, I reserve the right to get
>> a different religion. 😀
>
> I'm clearly a member of a third group of people,,, :-)
>
> Putting partitions on the RAID drives helps *me* identify them.
>
you aren't alone Steve.
Cheers, Gene Heskett.

Sounds like this group has finally achieved a long overdue consensus. How many times since LVM was ready for root/boot volumes have I been told that using partitions was necessary good practice. Even had that in job interviews, where half the team would grin at me saying it and the other half scowling at my "poor practice".

Now we know it was just personal preference all along. Like somebody said :-)

Max Nikulin

unread,

Jan 18, 2024, 11:40:06 AM1/18/24

to

On 18/01/2024 04:20, Thomas Schmitt wrote:
>
> I normally start new xterms by
>
> xterm -ls -geometry 80x24 -bg wheat -fg black -sl 10000 +sb &

Options may be put into ~/.Xresources

xterm*vt100.saveLines: 10000
xterm*VT100.background: wheat
xterm*VT100.foreground: black
! etc

Use xrdb to merge changes without restarting X session. It is possible
to have several presets (-name or -class), see /etc/X11/app-defaults/

Curt

unread,

Jan 18, 2024, 11:40:06 AM1/18/24

to

On 2024-01-17, Thomas Schmitt <scdb...@gmx.net> wrote:
> Hi,
>
> Curt wrote:
>> I discovered a couple of discussions of the phenomenon, the upshot of which
>> were:
>> 1) That's what you get when you purchase cheap SSDs.
>> https://www.reddit.com/r/truenas/comments/s0rrpo/two_sata_ssds_with_identical_serial_numbers/
>> 2) SSDs belonging to the same software RAID show identical serial numbers
>> in software, but these numbers don't match the serial numbers printed on the
>> SSDs themselves.
>> https://www.reddit.com/r/truenas/comments/s0rrpo/two_sata_ssds_with_identical_serial_numbers/
>
> Those URLs are identical. (OMG ! Is it contageous ?)

Human error may very be:
https://www.reddit.com/r/synology/comments/18fe6ez/how_to_fix_2_drives_with_same_serial_number/

> Number 2 would match my suspicion that some layer in the disk driving
> gets confused and mixes up the serial numbers.
>
>
>> But you said *similar*.
>
> By "colliding serial numbers" i mean indeed "identical serial numbers".
>
> How cheap the disks may ever be, that would be no excuse for not making
> them individually distinguishable.
>
>
>> As Gene's threads have too many movable parts
>> for me to follow, on that point I couldn't say.
>
> This one begins to gain presence in the web. So one can use search engines
> and AI to untangle its sub-threads. I meanwhile participate in two of them:
> serial number collision, rsync caused OOM killer (solved now, but how ?).
>
>

> Have a nice day :)
>
> Thomas
>
>

--