On Aug 9, 2:44 pm,
an...@mips.complang.tuwien.ac.at (Anton Ertl)
wrote:
> Alex McDonald <
b...@rivadpm.com> writes:
> >On Aug 9, 7:00=A0am,
an...@mips.complang.tuwien.ac.at (Anton Ertl)
> >wrote:
> >> Alex McDonald <
b...@rivadpm.com> writes:
> >> >On Aug 8, 1:24=3DA0pm,
an...@mips.complang.tuwien.ac.at (Anton Ertl)
> >> >wrote:
> >> >> Alex McDonald <
b...@rivadpm.com> writes:
> >> [...]
> >> >Then the best of luck getting DHL to deliver your 4TB drive in one
> >> >piece.
>
> >> The drives are delivered to us in one piece, why wouldn't they
> >> delivered elsewhere in one piece.
>
> >They don't contain your data.
>
> So what? If it's broken, I send another one; it's a backup. It's
> redundant, and it's definitely not the only backup.
>
> >> Sounds like you swallowed some horror stories some people like to
> >> spin. =A0Why should spin down exacerbate these problems?
>
> >Several reasons.
>
> >Rated start/stop cycles; 250 average on/off cycles per year at the
> >expected population AFR of 0.55% (Seagate Cheetah 15.7, enterprise
> >class drive).
>
> What does AFR have to do with the horror stories about corrupted data?
AFR includes corrupted data.
> And anyone who uses "enterprise class" drives for backup has too much
> money.
Why? Since many operations value data integrity greater than the cost,
this is an economic argument, not one of wealth causing stupidity.
>
> >Low temperature operation; the AFR increases significantly (5 times
> >the AFR at <20C to those running >40C, Google study on 100000 desktop
> >class drives), and spun down drives will be cooler during early hours
> >of operation.
>
> Fortunately even our spun-down drives have a higher temperature. And
> again, what does AFR have to do with your horror stories about
> corrupted data?
AFR includes corrupted data.
>
> >Slow spin; heads are designed for flight at a given RPM. Slow spin
> >reduces the air cushion/head height and makes the drives more
> >susceptible to shock. Even at full speed they can be shouted into
> >submission;
http://www.youtube.com/watch?v=3DtDacjrSCeq4&feature=3Dplayer_e=
> >mbedded
>
> What do head crashes resulting from shock have to do with the horror
> stories about corrupted data?
Shock can cause high flying writes; the exact opposite of a head
crash. The data isn't written. What is being demonstrated in the video
is the effect of drive recovery (which will be successful if the
software is up to the task, something that most OSes find hard to deal
with) on response time as disks fail to write data.
And AFR includes corrupted data. I'm mystified; where did I say that
corrupted data was the only issue?
>
> >Due to commercial NDAs and other reasons, I can't do any better than
> >point you at what is publicly available. Our AFRs are much lower for a
> >variety of reasons; dual parity RAID
>
> How does RAID make individual drives more reliable?
It doesn't. It makes them collectively more reliable.
>
> >> BTW, in my experience (based on several occasions) the most frequent
> >> cause of corrupted disk blocks is due to misdesigned drives that do not
> >> react correctly to power fluctuations.
>
> >That is rarely a problem on a well designed storage array, where the
> >power management is more sophisticated than that of a server. Pulling
> >the plug on such a system should have no deleterious effects.
>
> It's also not a problem for well-designed disk drives, but yes, to
> some extent the power supply can alleviate the problems coming from
> misdesigned drives; but if the problem is between the power supply and
> the drive (i.e., a suboptimal power connection), the misdesigned drive
> will still produce corrupt blocks.
>
>
Caveat emptor.
> >> lists a power-up power consumption of at most 2A @12V, i.e., 24W (my
> >> own measurements are in the same ballpark), and an average power
> >> consumption of 8W for the 3TB model. =A0It takes about 10s to spin up a
> >> drive, so spinning up takes as much as running a disk for half a
> >> minute.
>
> >Drives vary; SATA drives at 5k RPM spin up faster than high RPM SAS
> >drives at 15K, which may take minutes to stabilize at operating speed.
> >During that time, the disk isn't usable, and I stand by my assertion
> >that spin up wastes as much power as several minutes of full
> >operation.
>
> Sure, if a drive takes several minutes to spin up, it will consume as
> much power as several minutes of full operation.
>
> But who in his right mind uses an expensive and power-hungry high-RPM
> drive that takes forever to spin up for a storage solution that
> requires low power and fast spin-up? Ok, a sales guy selling to a
> clueless and rich customer will do it, but not because of technical
> merit.
I was giving an example of slow spin up to counterpoint the "10
seconds and you're good to go" example you gave.
To spin up a RAID group of say 14 drives on a shelf of disks will
require that the drives are turned on serially in small groups. By the
time they're all turned on and ready to go, regardless of whether
they're SATA or SAS, enterprise or desktop, slow or fast, a certain
amount of time will have elapsed. In the case of systems I know and
understand -- the majority of commercially available systems --
minutes will have passed during which there has been (a) no productive
work and (b) higher than average power consumption. Then there's the
decision on when to power down; that's made after a period of
inactivity, during which there has been no productive work and
continued power consumption.
All spin-down/up schemes for infrequently accessed data have to
account for these issues, and none do so in any effective way since
crystal balls aren't part of the armoury of most storage management
systems. That's where the cluelessness plays its part.
>
> >> With the power supply you would need for the 480 tape
> >> drives, yes, you could spin them all up at the same time. =A0But this is
> >> typically not needed, certainly not for a saner backup management (but
> >> neither are the 480 tape drives).
>
> >I don't know where you got the idea that 480 tape drives was the
> >equivalent to 480 disk drives, but it's not an assertion I made and
> >certainly qualifies as insane.
>
> You claimed that lots of disks had to be spun up for bandwidth
>
> reasons, and you wrote:
>
> |It's the economics of competing with tape; big power supplies to
> |support 480 disks packed in a single rack cost lots of money.
>
> which suggest that you think that a backup solution needs 480 disks
> spun up for bandwidth reasons.
No, that was the COPAN solution. (IIRC it was the smallest COPAN
system you could buy.) Streaming backups is not a difficult task; if
all you have is a single stream, then a couple of active disks will
do. For 100s of streams to a single backup system, then you need a lot
more, and the task is correspondingly more complicated to achieve at
decent speeds.
>
> >> >For the occasional server with a handful of disk
> >> >drives, it's not so much of a problem, but at scale, even a moderate
> >> >scale in the 10s of TB range, it's unworkable.
>
> >> That's nonsense.
>
> >Why? The limiting factor isn't the disk or tape that you're backing up
> >to, but how fast you can shovel it off the server.
>
> It's nonsense, because we are backing up to disks with a total of 10s
> of TB, and it's workable, and if we wanted to back up to more disks,
> we would just use more disks. And the main bandwidth limit is, as you
> write, getting the data off the main storage.
That was my point. If you want off-server backup, then the bandwidth
off the server is the issue. That's what kills very large disk server
systems from doing adequate & timely backups; not everyone has a
backup window. Adding more disks inside the same box isn't a backup.
>
> >> >Again, the power economics
> >> >don't make sense for main storage where a complete stripe of 10s or
> >> >more of them need spun up to get at a single 4K file.
>
> >> Yes, striping (RAID-0) a 4KB file across tens of disks does not make
> >> sense.
>
> >Ignoring RAID-0, since RAID-any systems also stripe,
>
> RAID-1 doesn't.
True, if a nit pick, since the stripe is a mirror.
>
> >the problem is
> >that such files do get spread across an unknown number of disks.
>
> With typical block sizes, a 4KB block is not distributed across
> multiple disks, even with RAID-0.
It would appear on at least 3 disks in most modern systems using large
multi TB disks with adequate protection like RAID-6. Once as a data
block, and twice for its contribution to parity. It's at least 2 on
RAID-5 or RAID-1/10; it may be many more on systems that employ
erasure encoding schemes. Without meta data (see below), it's not
possible to tell which disks to fire up to cover the blocks in
question; and the meta data is on the disks, normally well distributed
over them to increase opportunities for parallelism.
>
> >They
> >all need fired up to find even the smallest file, since it's not just
> >the file, but the meta data that needs accessed too.
>
> Meta data is often in OS caches, at least on decent OSs.
In shared system environments, caches can and do contain stale
information; coherency is a big issue, and high end clusters (both
storage and server types) spend a lot of expensive compute and wire
time (and presumably power) making sure that they are consistent.
Plus, infrequently used data should be flushed, along with its meta
data; if you don't need the former, you're unlikely to need the latter
any time soon.
>
> But yes, I agree that spin-down is not practical for main storage; but
> from what I read, the idea of COPAN was to make it practical by
> rearranging data such that frequently-accessed data resides on a few
> drives.
At last! Agreement! Yes, that was the very thing they failed to
accomplish.