rant -- followup questions

299 views
Skip to first unread message

hymie!

unread,
May 29, 2019, 9:43:02 AM5/29/19
to
So ... Over the last two days, I got what I think are incredibly stupid
follow-up questions.

===

(Technical one)

I have a machine. It's running RAID 5 or 6. A disk failed. My team of
users has monitoring software that notices this, so they know a disk
failed. A replacement disk has been ordered. Today I was asked (by one
of my users) if I know which disk failed.

Why the hell does it matter?

===

(Non-technical one)

I was at a convention, and I was doing data entry for an art show.
The options for the various items in the show are "unsold", "sold",
"purchased/released", or "went to auction." I needed to print the
list of "went to auction", and I only mention that because the printer
was not cooperating. But since there were only eight of them, the
person who needed the print-out was content to hand-write them rather
than fight with the printer.

As I related this story to my wife, she asked me "Which items went to
auction?"

Why the hell does it matter? Why would you think I remembered this
incredibly trivial detail?

===

Is it just part of living in The Information Age, where everybody wants
to know every detail as soon as it happens? Or am I missing some
fundamental reason why the user needs to know which RAID disk is being
replaced in a machine two time-zones away?

--hymie! http://lactose.homelinux.net/~hymie hy...@lactose.homelinux.net

Grant Taylor

unread,
May 29, 2019, 11:47:59 AM5/29/19
to
On 5/29/19 7:43 AM, hymie! wrote:
> I have a machine. It's running RAID 5 or 6. A disk failed. My team
> of users has monitoring software that notices this, so they know a
> disk failed. A replacement disk has been ordered. Today I was asked
> (by one of my users) if I know which disk failed.
>
> Why the hell does it matter?

They may have been asking to make sure that you knew which drive to replace.

Sadly, I've worked behind people that were just going to pull a random
drive and check the raid status. That's how they identify which drive
is bad.



--
Grant. . . .
unix || die

The Horny Goat

unread,
May 29, 2019, 12:28:08 PM5/29/19
to
On Wed, 29 May 2019 09:48:00 -0600, Grant Taylor
<gta...@tnetconsulting.net> wrote:

>They may have been asking to make sure that you knew which drive to replace.
>
>Sadly, I've worked behind people that were just going to pull a random
>drive and check the raid status. That's how they identify which drive
>is bad.

I worked with a guy years ago who headed for the server room saying he
was going to do that but quickly made it clear that he was pulling my
leg.

Scott

unread,
May 29, 2019, 12:46:51 PM5/29/19
to
On Wed, 29 May 2019 13:43:00 GMT, hymie! <hy...@lactose.homelinux.net>
wrote:

>I have a machine. It's running RAID 5 or 6. A disk failed. My team of
>users has monitoring software that notices this, so they know a disk
>failed. A replacement disk has been ordered. Today I was asked (by one
>of my users) if I know which disk failed.
>
>Why the hell does it matter?

Just say yes. Do you know which disk failed? Yes. Yes I do.

Same answer you give a traffic cop when he asks, do you know how fast
you were going? Yes. Yes I do.

And leave it at that.

Wojciech Derechowski

unread,
May 29, 2019, 2:26:26 PM5/29/19
to
On Wed, 29 May 2019 13:43:00 +0000, hymie! wrote:
> So ... Over the last two days, I got what I think are incredibly stupid
> follow-up questions.

One of the worst follow-up questions I can think of is what if... he put
forth his hand, and take also of the tree of life, and eat, and live
for ever...? or something to that effect, not to mention an extremely bad
case of induction that followed it.

WD
--
Who is Entscheidungs and what is his problem?

Steve VanDevender

unread,
May 30, 2019, 3:59:17 AM5/30/19
to
There was also the time I was called in to help with a disk replacement.
Somehow these people had obtained a server that had no drive activity
lights or any visible numbering on the drive slots. The RAID controller
management utility told us which drive of the four had failed by number.
So we made a reasonable guess about how the numbering corresponded to
slots -- left-to-right as seen from the front. This was, unfortunately,
the wrong guess.

Peter Corlett

unread,
May 30, 2019, 4:34:17 AM5/30/19
to
Roger Bell_West <roger+a...@nospam.firedrake.org> wrote:
> On 2019-05-29, hymie! wrote:
>> Why the hell does it matter?
> They might be curious to know what brand to avoid in their own machines.

This doesn't necessarily help that much, since said manufacturer uses a number
of different trading names. See also Calder Hall, Windscale and Sellafield.

Grant Taylor

unread,
May 30, 2019, 2:02:44 PM5/30/19
to
On 5/30/19 1:59 AM, Steve VanDevender wrote:
> There was also the time I was called in to help with a disk
> replacement. Somehow these people had obtained a server that had no
> drive activity lights or any visible numbering on the drive slots.
> The RAID controller management utility told us which drive of the
> four had failed by number. So we made a reasonable guess about how
> the numbering corresponded to slots -- left-to-right as seen from
> the front. This was, unfortunately, the wrong guess.

Ew.

Every time I ran into that, I always went back to the identifiers on the
controller (for channel) and jumpers on the drive for ID.

Peter Corlett

unread,
May 30, 2019, 3:43:25 PM5/30/19
to
Grant Taylor <gta...@tnetconsulting.net> wrote:
[...]
> Every time I ran into that, I always went back to the identifiers on the
> controller (for channel) and jumpers on the drive for ID.

"Jumpers on the drive" dates you somewhat. SATA and SAS are all point-to-point
links rather than the multidrop busses of yore, and drives no longer have any
interesting jumpers.

My preferred approach is to use a competent RAID system which indicates the
serial number of the bad disk (and not just those of the working disks), which
can then be compared with the serial number printed on the tiny label on the
edge of the drive, readable by any common-or-garden electron microscope.

I do occasionally use the "bugger this for a lark" disk-identification system
of momentarily yanking each disk in turn to see what turns up in the logs.
Again, this involves having selected a competent RAID system in the first place
which isn't stuck in the 1970s and doesn't fail-deadly.

Garrett Wollman

unread,
May 30, 2019, 5:05:05 PM5/30/19
to
In article <qcpbor$pvv$1...@mooli.org.uk>,
Peter Corlett <ab...@mooli.org.uk> wrote:

>My preferred approach is to use a competent RAID system which indicates the
>serial number of the bad disk (and not just those of the working disks), which
>can then be compared with the serial number printed on the tiny label on the
>edge of the drive, readable by any common-or-garden electron microscope.

I prefer to use a competently integrated chassis or drive shelf that
has locator lights for each disk, and use frfhgvy to flash the light
for the bad disk. (But I also try to assign software labels that
reflect the physical location.)

-GAWollman

--
Garrett A. Wollman | "Act to avoid constraining the future; if you can,
wol...@bimajority.org| act to remove constraint from the future. This is
Opinions not shared by| a thing you can do, are able to do, to do together."
my employers. | - Graydon Saunders, _A Succession of Bad Days_ (2015)

The Horny Goat

unread,
May 30, 2019, 5:19:04 PM5/30/19
to
On Thu, 30 May 2019 19:43:23 -0000 (UTC), ab...@mooli.org.uk (Peter
Corlett) wrote:

>> Every time I ran into that, I always went back to the identifiers on the
>> controller (for channel) and jumpers on the drive for ID.
>
>"Jumpers on the drive" dates you somewhat. SATA and SAS are all point-to-point
>links rather than the multidrop busses of yore, and drives no longer have any
>interesting jumpers.

Indeed - the most recent time I dealt with anything like "interesting
jumpers" was when I was working on my (personal) Apple II.

How times change! (wink)

Peter Corlett

unread,
May 31, 2019, 7:18:22 AM5/31/19
to
Garrett Wollman <wol...@bimajority.org> wrote:
> Peter Corlett <ab...@mooli.org.uk> wrote:
>> My preferred approach is to use a competent RAID system which indicates the
>> serial number of the bad disk (and not just those of the working disks),
>> which can then be compared with the serial number printed on the tiny label
>> on the edge of the drive, readable by any common-or-garden electron
>> microscope.

> I prefer to use a competently integrated chassis or drive shelf that has
> locator lights for each disk, and use frfhgvy to flash the light for the bad
> disk. (But I also try to assign software labels that reflect the physical
> location.)

Oh to have a hardware budget expansive enough to cover such fripperies. Why, I
bet you even have a safe electricity supply and easy physical access to the
equipment. Next you're going to tell me that you haven't ended up using Frntngr
because even though they're not fit for purpose, they're 20% cheaper than the
stuff that actually works and are already in stock and need using up.

Chris Adams

unread,
May 31, 2019, 9:52:26 AM5/31/19
to
Once upon a time, Peter Corlett <ab...@mooli.org.uk> said:
>I do occasionally use the "bugger this for a lark" disk-identification system
>of momentarily yanking each disk in turn to see what turns up in the logs.
>Again, this involves having selected a competent RAID system in the first place
>which isn't stuck in the 1970s and doesn't fail-deadly.

My "no idea which drive is which" method is to run something on the
system continuously reading the drive (like dd if=/dev/sda
of=/dev/null), and watch the drive activity LEDs. Pull the drive(s)
with no LED lit!
--
Chris Adams <cma...@cmadams.net>

Grant Taylor

unread,
May 31, 2019, 10:47:53 AM5/31/19
to
On 5/31/19 7:52 AM, Chris Adams wrote:
> My "no idea which drive is which" method is to run something on
> the system continuously reading the drive (like dd if=/dev/sda
> of=/dev/null), and watch the drive activity LEDs. Pull the drive(s)
> with no LED lit!

That's my preferred method.

But it does require drive activity LEDs. I've had more than one
occasion where I didn't have that luxury.

Garrett Wollman

unread,
May 31, 2019, 1:04:48 PM5/31/19
to
In article <qcr2ht$nij$1...@mooli.org.uk>,
Peter Corlett <ab...@mooli.org.uk> wrote:

>Oh to have a hardware budget expansive enough to cover such fripperies. Why, I
>bet you even have a safe electricity supply and easy physical access to the
>equipment.

Theoretically. Except for about half of the servers are in a remote
DC 90 miles away, and I'm one of only two people from our group who
are authorized. Luckily we have "remote hands" there but they're
mostly good for pushing buttons and taking pictures of consoles. Oh,
and stuff in the remote DC has 208 power and IEC connectors on the
PDU, which is fine for normal servers but not so great for things that
require wall warts. (Do your PDUs have CEE 7/[357], BS1363, or IEC
60320?)

>Next you're going to tell me that you haven't ended up using Frntngr
>because even though they're not fit for purpose, they're 20% cheaper
>than the stuff that actually works and are already in stock and need
>using up.

Actually, Frntngr is our preferred vendor, but the integrators we work
with seem to prefer JQ these days. And now we're building more
SSD-only fileservers so there's a completely different set of vendors
whose modes of suckitude we haven't yet identified.

Grant Taylor

unread,
May 31, 2019, 1:47:14 PM5/31/19
to
On 5/31/19 11:04 AM, Garrett Wollman wrote:
> Theoretically. Except for about half of the servers are in a remote
> DC 90 miles away, and I'm one of only two people from our group who
> are authorized. Luckily we have "remote hands" there but they're
> mostly good for pushing buttons and taking pictures of consoles.

That's when a good OoB console / remotely managed PDUs /
iDRAC/iLOM/iLO/etc. are nice things to have.

> Oh, and stuff in the remote DC has 208 power and IEC connectors on
> the PDU, which is fine for normal servers but not so great for things
> that require wall warts. (Do your PDUs have CEE 7/[357], BS1363,
> or IEC 60320?)

That means that there is extremely likely 3ɸ power to the DC, feeding
PDUs with 1ɸ wired across two legs. I'm betting that each ɸ is 120 VAC
to ground. This means that you can use a C14 to NEMA 5-15 adapter like
the following to connect wall warts.

https://www.amazon.com/ACA1017-Adapter-Official-Certification-Standard/dp/B07DCWXTYM

Obviously, confirm with the facility electrician.

I've got a handful of these in my DC.

Peter Corlett

unread,
Jul 8, 2019, 3:29:07 AM7/8/19
to
Garrett Wollman <wol...@bimajority.org> wrote:
> Peter Corlett <ab...@mooli.org.uk> wrote:
>> Oh to have a hardware budget expansive enough to cover such fripperies. Why,
>> I bet you even have a safe electricity supply and easy physical access to
>> the equipment.

(To clarify, I was referring to my domestic kit. The stuff in datacentres is
rented and therefore dealing with the hardware is Somebody Else's Problem.)

[...]
> (Do your PDUs have CEE 7/[357], BS1363, or IEC 60320?)

"PDU" is a fancy name for an extension lead. Those are a mix of BS1363 and CEE
7/7. Which are plugged into Ol' Sparky CEE 7/1 sockets because earth
connections or indeed building wiring newer than 1964 is for wusses. No wonder
that one of the flats in the block goes up in flames every few years. I intend
to move out before this one joins them.

[...]
> Actually, Frntngr is our preferred vendor, but the integrators we work with
> seem to prefer JQ these days. And now we're building more SSD-only
> fileservers so there's a completely different set of vendors whose modes of
> suckitude we haven't yet identified.

My admittedly relatively limited experience with SSDs is that data which is not
also backed up to hard disk might as well not exist. The phrase "RAID is not a
backup" applies in spades with SSDs.

Sir Chewbury Gubbins

unread,
Jul 11, 2019, 8:35:37 AM7/11/19
to
Peter Corlett <ab...@mooli.org.uk> ejaculated:
>
> My admittedly relatively limited experience with SSDs is that data which is not
> also backed up to hard disk might as well not exist. The phrase "RAID is not a
> backup" applies in spades with SSDs.

</lurk> I did once enjoy a long, confused, blinking session at a $coworker
who thought it would be a great idea to run SSDs in a mirrorset. <lurk>

J

--
John Dow <j...@nelefa.org.invalid>
... Blog & Game Diary : http://www.nelefa.org
/|\ Constructed using Mutt, Tin and Vi.
/ | \ Zomoniac is Wrong. Fact.

Chris Adams

unread,
Jul 11, 2019, 11:07:05 AM7/11/19
to
Once upon a time, Sir Chewbury Gubbins <chewbury...@nelefa.org> said:
></lurk> I did once enjoy a long, confused, blinking session at a $coworker
>who thought it would be a great idea to run SSDs in a mirrorset. <lurk>

Why wouldn't you run SSDs in a mirror, assuming a proper RAID setup that
supports SSDs (for example, can pass down TRIM)?

RAID is about high availability... most things don't handle a filesystem
going away very well, so RAID allows the system to continue to operate
while you replace failed drives. You can (and should) have HA above the
single system layer as well, but usually failure at that level is at
least somewhat disruptive.

--
Chris Adams <cma...@cmadams.net>

Alexander Schreiber

unread,
Jul 13, 2019, 12:10:05 PM7/13/19
to
Sir Chewbury Gubbins <chewbury...@nelefa.org> wrote:
> Peter Corlett <ab...@mooli.org.uk> ejaculated:
>>
>> My admittedly relatively limited experience with SSDs is that data which is not
>> also backed up to hard disk might as well not exist. The phrase "RAID is not a
>> backup" applies in spades with SSDs.
>
></lurk> I did once enjoy a long, confused, blinking session at a $coworker
> who thought it would be a great idea to run SSDs in a mirrorset. <lurk>

Why not? Knowning that SSDs tend to fail quietly and totally (whereas
spinning rust usually warns you with bad blocks before entirely dying),
putting them in a mirror at least gives you a chance to survice the failure
of one them and continue to run (and then quickly replace the failed
one). If both fail, well, that's what your backups are for. Sure, RAID
is not backup, but it can do wonders for service availability. Having
to cold restore from backup tends to be somewhat disruptive, usually.

Kind regards,
Alex.
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison

Peter Corlett

unread,
Jul 16, 2019, 7:01:45 PM7/16/19
to
Sir Chewbury Gubbins <chewbury...@nelefa.org> wrote:
> Peter Corlett <ab...@mooli.org.uk> ejaculated:
>> My admittedly relatively limited experience with SSDs is that data which is
>> not also backed up to hard disk might as well not exist. The phrase "RAID is
>> not a backup" applies in spades with SSDs.

> </lurk> I did once enjoy a long, confused, blinking session at a $coworker
> who thought it would be a great idea to run SSDs in a mirrorset. <lurk>

Check out uggcf://jjj.nznmba.qr/qc/O07Q998212. €88 per terabyte. Prime Day has
already ended on this side of the North Sea, so that's the regular deal.
Welcome to the future.

At that sort of price, and with the commensurate reliability of all
slightly-too-cheap consumer-grade storage, you're a fool to not buy a second
and mirror them.

Mans Nilsson

unread,
Jul 23, 2019, 9:33:39 AM7/23/19
to
Den 2019-05-30 skrev Satya <sat...@satyaonline.cjb.net>:

> Lrnu zl jvsr nfxf zr gevivny qrgnvyf yvxr gung (nqzvggrqyl fbzr ner
> aba-gevivny) naq V'z bire urer guvaxvat V unir orggre guvatf gb erzrzore, yvxr
> gur rknpg bcgvbaf V arrq sbe eflap gb qb gur evtug guvat.

FJZOB vf n ybg orggre jvgu eflap guna lbhef gehyl. V raq hc nfxvat ure,
be erfbegvat gb gne va cvcrf.

--
Måns Nilsson primary/secondary/besserwisser/machina
MN-1334-RIPE SA0XLR +46 705 989668
Content: 80% POLYESTER, 20% DACRONi ... The waitress's UNIFORM sheds
TARTAR SAUCE like an 8" by 10" GLOSSY ...
Reply all
Reply to author
Forward
0 new messages