Implementing virtual memory on cassette tape

chitselb

unread,

Aug 7, 2012, 9:21:33 AM8/7/12

to

I'm working on a retro computing project, a 6502 Forth implementation for the Commodore PET 2001. https://github.com/chitselb/pettil if you're curious. The goal is for the language to be fast, tight, and capable of running on the actual hardware. For development I'm using the viceteam.org PET emulator with the xa65 cross-assembler, on Linux.

Since most of us back then (1980) didn't have disk drives, I am going to use the cassette tape for mass storage. These are a few ways I'm considering:

1) Simulate random access using two cassette decks and copy/merge

The PET cassette had two file types, sequential(data) and program.
a) For program files, there's a long tone followed by a short header block containing the filename, and then a shorter tone followed by one continuous block of memory (two byte load address followed by the data)
b) for data files, there's the same long tone/file name header, followed by zero or more short tone/192-byte data blocks

On the PET (not the VIC-20 or C=64) there were two datassette ports, and I have two drives. Using the sequential file format and both decks, FLUSH would copy the entire virtual memory from one tape to the other in 1024-byte blocks (preceded by a 16-bit unsigned block number), inserting and replacing blocks from the memory buffers. Then rewind both tapes and go the other way. Slow, tedious, cumbersome. Welcome to my world in 1980.

2) Historically accurate

Some Forth implementations from back then implemented tape storage. I have been unable to locate one for the PET but yesterday I found tape images for Datatronic Forth on the C=64 and another thing called "C=64 Forth". Both of these appear to implement some type of mass storage on tape.

I'd be very interested to know what other Forth implementations of that era did as far as tape storage. What Forth words, what did they do, etc...

3) Save source code as sequential files

Using native named files instead of blocks. Not very Forth-like, but possibly the most expedient.

I'm very grateful for the help of this community with my earlier design considerations (circa 2010) on this project, particularly the hashed dictionary and the incredibly fast inner interpreter. Check the project link above if you're curious to see how those parts turned out.

Charlie

Andrew Haley

unread,

Aug 7, 2012, 9:44:19 AM8/7/12

to

chitselb <chit...@gmail.com> wrote:

> I'm working on a retro computing project, a 6502 Forth
> implementation for the Commodore PET 2001.
> https://github.com/chitselb/pettil if you're curious. The goal is
> for the language to be fast, tight, and capable of running on the
> actual hardware. For development I'm using the viceteam.org PET
> emulator with the xa65 cross-assembler, on Linux.
>
> Since most of us back then (1980) didn't have disk drives, I am
> going to use the cassette tape for mass storage.

I've ported a Forth for PET.

> These are a few ways I'm considering:
>
> 1) Simulate random access using two cassette decks and copy/merge
>
> The PET cassette had two file types, sequential(data) and program.
> a) For program files, there's a long tone followed by a short header
> block containing the filename, and then a shorter tone followed by
> one continuous block of memory (two byte load address followed by
> the data)
> b) for data files, there's the same long tone/file name header,
> followed by zero or more short tone/192-byte data blocks
>
> On the PET (not the VIC-20 or C=64) there were two datassette ports,
> and I have two drives. Using the sequential file format and both
> decks, FLUSH would copy the entire virtual memory from one tape to
> the other in 1024-byte blocks (preceded by a 16-bit unsigned block
> number), inserting and replacing blocks from the memory buffers.
> Then rewind both tapes and go the other way. Slow, tedious,
> cumbersome. Welcome to my world in 1980.

God, no. I used floppies at the time, and they worked OK. Cassette
storage and Forth do not work at all well together.

Andrew.

Anton Ertl

unread,

Aug 7, 2012, 10:01:15 AM8/7/12

to

chitselb <chit...@gmail.com> writes:
>I'm working on a retro computing project, a 6502 Forth implementation for t=
>he Commodore PET 2001. https://github.com/chitselb/pettil if you're curiou=
>s. The goal is for the language to be fast, tight, and capable of running =
>on the actual hardware. For development I'm using the viceteam.org PET emul=

>ator with the xa65 cross-assembler, on Linux.
>

>Since most of us back then (1980) didn't have disk drives, I am going to us=

>e the cassette tape for mass storage.

On my C64, I used the datasette for half a year, then got a floppy
drive. The datasette was not working reliably in the end anyway.

>1) Simulate random access using two cassette decks and copy/merge

The datasette was already unbearably slow with sequential access,
simulated random access would be even slower.

>3) Save source code as sequential files
>

>Using native named files instead of blocks. Not very Forth-like, but possi=
>bly the most expedient.

Blocks were designed for disk drives. They are not very appropriate
for the datasette. Using named files where the underlying system
provides them seems more Forth-like to me than using disk-inspired
blocks on a tape-based system.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2012: http://www.euroforth.org/ef12/

Mark Wills

unread,

Aug 7, 2012, 10:24:28 AM8/7/12

to

On Aug 7, 2:21 pm, chitselb <chits...@gmail.com> wrote:
> I'm working on a retro computing project, a 6502 Forth implementation for the Commodore PET 2001. https://github.com/chitselb/pettilif you're curious. The goal is for the language to be fast, tight, and capable of running on the actual hardware. For development I'm using the viceteam.org PET emulator with the xa65 cross-assembler, on Linux.

Hmm... This sounds like a really bad idea to me! Sorry to sound so
negative!

If I were FORCED and GUN-POINT to implement a block storage on
cassette, I would take the Forth approach and keep it very simple...

"You want an 80 block cassette based block system? Okay, give me 80 5
minute cassette tapes. One tape per block. Screen prompts will tell
you which cassette to put in the tape deck. Now, please stop pointing
that gun at me."

Stan Barr

unread,

Aug 7, 2012, 11:30:36 AM8/7/12

to

On Tue, 7 Aug 2012 06:21:33 -0700 (PDT), chitselb <chit...@gmail.com> wrote:
>
> I'd be very interested to know what other Forth implementations of that era
> did as far as tape storage. What Forth words, what did they do, etc...

I ran a converted ZX81 with a Skywave Forth ROM for a while. I'll try
and find the manual and see how it worked.

Skywave Forth was a multi-tasking, windowing Forth. Multiple
applications running in seperate windows on a ZX-81 was something to
behold! Still got the computer and it still works, except the add-on
keyboard has grown a fault. (16K ram, 16 in/out lines and 8-port a2d
converter too...)

--
Cheers,
Stan Barr plan.b .at. dsl .dot. pipex .dot. com

The future was never like this!

Stan Barr

unread,

Aug 7, 2012, 1:36:19 PM8/7/12

to

On 7 Aug 2012 15:30:36 GMT, Stan Barr <pla...@dsl.pipex.com> wrote:
> On Tue, 7 Aug 2012 06:21:33 -0700 (PDT), chitselb <chit...@gmail.com> wrote:
>>
>> I'd be very interested to know what other Forth implementations of that era
>> did as far as tape storage. What Forth words, what did they do, etc...
>
> I ran a converted ZX81 with a Skywave Forth ROM for a while. I'll try
> and find the manual and see how it worked.
>
> Skywave Forth was a multi-tasking, windowing Forth. Multiple
> applications running in seperate windows on a ZX-81 was something to
> behold! Still got the computer and it still works, except the add-on
> keyboard has grown a fault. (16K ram, 16 in/out lines and 8-port a2d
> converter too...)
>

Couldn't locate the manual, but see here...

http://www.dibsco.co.uk/index.php/skywave-software/78-forth-general/\
forth-skywave/73

watch out for the line wrap.

Jason Damisch

unread,

Aug 7, 2012, 2:52:27 PM8/7/12

to

You can assume that for the sake of the simulation, that you didn't get your computer from K-Mart or Toys-R-Us and didn't have enough to pay for a disk drive so went with the tape drive. LOL

Paul Rubin

unread,

Aug 7, 2012, 3:39:15 PM8/7/12

to

The PET and microcomputer cassette interfaces were before my time but
looking at

http://en.wikipedia.org/wiki/Datassette

was interesting. It was apparently a standard cassette recorder with
some adc/dac's and a special edge connector. In particular the article
doesn't say whether the computer could start and stop the tape transport
or rewind the tape under software control. Do you know if that was
possible?

If it wasn't, I doubt think the datasette was likely usable for much
other than a program loader. In particular, the usual things one did
with magtape, such as merge sorting, required controlling the transport.
Merge sorting involved reading blocks of data from 2 drives, merging
them, and writing the output to a third drive. That meant you had to
write to the output drive twice as fast as you were reading from the
individual input drives, so you had to control the drive speed or be
able to start and stop the tape when you read a block.

I did see some 9-track magtape drives back in the day and they had
powerful motors and tape slack mechanisms to be able to start and stop
the tape quickly. I don't think audio recorders were built anything
like that.

Paul Rubin

unread,

Aug 7, 2012, 3:55:35 PM8/7/12

to

Paul Rubin <no.e...@nospam.invalid> writes:
> Merge sorting involved reading blocks of data from 2 drives, merging
> them, and writing the output to a third drive. That meant you had to
> write to the output drive twice as fast as you were reading from the
> individual input drives,

Should have added, of course there were fancier methods with more than 3
drives. Maybe there were ways to do it with 4 drives, reading from two
and simultaneously writing to two. A lot of Knuth vol 3 is about how to
do stuff like this. The most recent edition says something like: nobody
uses those methods any more, but they're still in the book just in case
they come back into use someday, and because the methods were so
interesting that it's worth preserving the info.

Bernd Paysan

unread,

Aug 7, 2012, 4:00:39 PM8/7/12

to

Paul Rubin wrote:
> The most recent edition says something like:
> nobody uses those methods any more, but they're still in the book just
> in case they come back into use someday, and because the methods were
> so interesting that it's worth preserving the info.

It should be noted that current hard disks already have somewhat similar
performance to these old tape drives: In the time you can do *one* seek,
you can also read/write a megabyte of data.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

Anton Ertl

unread,

Aug 8, 2012, 3:08:09 AM8/8/12

to

Paul Rubin <no.e...@nospam.invalid> writes:
>The PET and microcomputer cassette interfaces were before my time but
>looking at
>
> http://en.wikipedia.org/wiki/Datassette
>
>was interesting. It was apparently a standard cassette recorder with
>some adc/dac's and a special edge connector. In particular the article
>doesn't say whether the computer could start and stop the tape transport
>or rewind the tape under software control. Do you know if that was
>possible?

IIRC it was not.

Paul Rubin

unread,

Aug 8, 2012, 3:27:58 AM8/8/12

to

Bernd Paysan <bernd....@gmx.de> writes:
> It should be noted that current hard disks already have somewhat similar
> performance to these old tape drives: In the time you can do *one* seek,
> you can also read/write a megabyte of data.

There is a more recent saying, "flash is disk, disk is tape, tape is
dead". I think disk is still "disk-like" in important ways, though. At
least on serious tape drives, "seeking" (spinning) a tape from one end
to the other was at most a few times faster than actually reading the
whole tape. Seeking across a disk is many orders of magnitude faster
than reading the disk.

Also, you can do quite a lot of disk seeks in the time it would take to
fill memory from disk transfers, on today's computers with gigabytes of
ram. That means you can treat a disk as "multiple" tape drives for
external sort/merge: in the merge phase you'd read a few hundred
contiguous megabytes of data into ram, then seek to antoher run
elsewhere on the disk, read that into ram, do the same with several more
runs until ram is full, then merge from ram to a new combined run on the
disk.

I saw in some storage newsletter earlier today that tape is essentially
not used any more even for backup, which surprised me. It's apparently
only used for massive archiving (I guess meaning long term retention of
data that probably won't be accessed anytime soon, as distinct from
backups which are for short term use in case a disk crashes).

Mark Wills

unread,

Aug 8, 2012, 4:23:52 AM8/8/12

to

On Aug 7, 8:39 pm, Paul Rubin <no.em...@nospam.invalid> wrote:

It could start and stop the tape, but nothing else. No forward and
rewing via software. That had to be done by a human. Loading from
cassette tape was a big part of my computing life as a young child. I
shudder when I think of it now!

Mark Wills

unread,

Aug 8, 2012, 4:26:17 AM8/8/12

to

On Aug 8, 8:27 am, Paul Rubin <no.em...@nospam.invalid> wrote:

That's interesting. Is there any evidence to suggest that tape is less
susceptible to bit-rot than say, flash?

Alex McDonald

unread,

Aug 8, 2012, 5:23:32 AM8/8/12

to

On Aug 8, 8:27 am, Paul Rubin <no.em...@nospam.invalid> wrote:

> Bernd Paysan <bernd.pay...@gmx.de> writes:
> > It should be noted that current hard disks already have somewhat similar
> > performance to these old tape drives: In the time you can do *one* seek,
> > you can also read/write a megabyte of data.
>
> There is a more recent saying, "flash is disk, disk is tape, tape is
> dead". I think disk is still "disk-like" in important ways, though. At
> least on serious tape drives, "seeking" (spinning) a tape from one end
> to the other was at most a few times faster than actually reading the
> whole tape. Seeking across a disk is many orders of magnitude faster
> than reading the disk.

I think you may have that back to front?

>
> Also, you can do quite a lot of disk seeks in the time it would take to
> fill memory from disk transfers, on today's computers with gigabytes of
> ram.

Err, no. Disk seeks are to be avoided, as they are horrendously time
expensive; they're measured in low milliseconds at best. The bandwidth
still lags main memory on data transfer, but it's not far behind;
arrays that can transfer 10s to 100s of GBytes/sec (that's bytes, not
bits) are commonplace.

> That means you can treat a disk as "multiple" tape drives for
> external sort/merge: in the merge phase you'd read a few hundred
> contiguous megabytes of data into ram, then seek to antoher run
> elsewhere on the disk, read that into ram, do the same with several more
> runs until ram is full, then merge from ram to a new combined run on the
> disk.
>

It's a long time since I've seen a merge sort as you describe here.
It's bottlenecked on the write, and isn't practical. Building an index
is cheaper and provides the needed parallelism.

> I saw in some storage newsletter earlier today that tape is essentially
> not used any more even for backup, which surprised me. It's apparently
> only used for massive archiving (I guess meaning long term retention of
> data that probably won't be accessed anytime soon, as distinct from
> backups which are for short term use in case a disk crashes).

Tape is not dead; far from it. It has some distinct advantages over
other media, foremost of which is a very low cost per bit and no power
requirements when not being accessed (ignoring disk spin down, which
in contrast is pretty much a dead technology). When the archive has to
be "deep and cold", tape wins. Disk costs are just about bearable for
medium term storage. Flash memory technologies are just too expensive
for petabyte scale backup and archive. There are new tape file systems
being developed; if you're interested, take a look at LTFS.

Alex McDonald

unread,

Aug 8, 2012, 5:31:53 AM8/8/12

to

Economically, comparing flash and tape (today, anyway) doesn't stack
up, regardless of bit error rates. The advantage of flash and disk
over tape is parity or its equivalent; it can be maintained on a
different device. More important to designers of storage systems is
ensuring that what you think you wrote to the media actually got
written, and that if it didn't, or when devices fail, that the data
they were supposed to contain can be recreated from elsewhere. Even
the flakiest of devices can be made to look very robust when you make
a pool out of them and use techniques such as erasure encodings.

Paul Rubin

unread,

Aug 8, 2012, 5:46:02 AM8/8/12

to

Mark Wills <markrob...@yahoo.co.uk> writes:
> That's interesting. Is there any evidence to suggest that tape is less
> susceptible to bit-rot than say, flash?

Flash is prohibitively expensive for large archiving. "Large" in this
context means potentially 1000's of petabytes.

ken...@cix.compulink.co.uk

unread,

Aug 8, 2012, 6:06:13 AM8/8/12

to

In article <7xipcui...@ruckus.brouhaha.com>, no.e...@nospam.invalid

(Paul Rubin) wrote:

> Do you know if that was
> possible?

Don't know about the Pet but it was certainly possible to start stop
the tape on my Video Genie. However the only file system supported was
sequential access. Personally I would abandon the idea of using blocks
on a tape system. Using a file system should make it possible to use
routines in the Pet ROM to control the cassette.

Ken Young

Anton Ertl

unread,

Aug 8, 2012, 6:57:16 AM8/8/12

to

Alex McDonald <bl...@rivadpm.com> writes:
>> I saw in some storage newsletter earlier today that tape is essentially

>> not used any more even for backup, which surprised me. =A0It's apparently

>> only used for massive archiving (I guess meaning long term retention of
>> data that probably won't be accessed anytime soon, as distinct from
>> backups which are for short term use in case a disk crashes).
>
>Tape is not dead; far from it. It has some distinct advantages over
>other media, foremost of which is a very low cost per bit and no power
>requirements when not being accessed (ignoring disk spin down, which
>in contrast is pretty much a dead technology). When the archive has to
>be "deep and cold", tape wins. Disk costs are just about bearable for
>medium term storage.

Disk costs are lowest for the kinds of backup activities we do in our
group. Tape is only cheaper for much bigger volumes and then by not
much, and it also has handling disadvantages. Concerning power
requirements and deep and cold, a disk drive that is powered off
consumes no power, either. And for powered drives, disk spin-down is
everything but dead.

We switched from tape for backups to disks about a decade ago and are
happy with that. Tape is not dead, but has been pushed into the
very-high-volume niche.

As for long-term storage, I would not rely on being able to read
current tapes with drives twenty years down the road. So the way to
go is to copy to new formats regularly, and you can do the same with
disks; reliablility is determined by redundancy, so again which one is
lower cost depends on the volume.

Alex McDonald

unread,

Aug 8, 2012, 7:59:48 AM8/8/12

to

On Aug 8, 11:57 am, an...@mips.complang.tuwien.ac.at (Anton Ertl)
wrote:

> Alex McDonald <b...@rivadpm.com> writes:
> >> I saw in some storage newsletter earlier today that tape is essentially
> >> not used any more even for backup, which surprised me. =A0It's apparently
> >> only used for massive archiving (I guess meaning long term retention of
> >> data that probably won't be accessed anytime soon, as distinct from
> >> backups which are for short term use in case a disk crashes).
>
> >Tape is not dead; far from it. It has some distinct advantages over
> >other media, foremost of which is a very low cost per bit and no power
> >requirements when not being accessed (ignoring disk spin down, which
> >in contrast is pretty much a dead technology). When the archive has to
> >be "deep and cold", tape wins. Disk costs are just about bearable for
> >medium term storage.
>
> Disk costs are lowest for the kinds of backup activities we do in our
> group. Tape is only cheaper for much bigger volumes and then by not
> much, and it also has handling disadvantages.

That's true for many classes of use, but... One of the well understood
issues of backup is ensuring that your backup is offsite. What is
often not understood is that the per-byte transmission costs and
bandwidth of tape are ridiculously low, manual handling or otherwise.
I can move an 8TB LTO tape anywhere in the world in 24 hours or less;
that kind of bandwidth to offsite disk isn't feasible, even if you
could afford it.

> Concerning power
> requirements and deep and cold, a disk drive that is powered off
> consumes no power, either. And for powered drives, disk spin-down is
> everything but dead.

MAID (Massive Array of Idle Disks) is dead. COPAN, the one company
that tried this on an industrial scale, went out of business several
years ago. There are far too many issues to overcome to make the
technology reliable and practical. About the only surviving technique
is spin-slow, and it has very few supporters or suppliers. In general,
drives don't like being spun up; they fail much more quickly than
disks that are spun throughout their entire lives. The goal these days
is to increase areal density and reduce (or at least keep flat) the
power requirements to reduce the power cost per byte. For example, 4TB
at 7 watts is achievable today; with shingle drives, that could rise
to 30+TB for the same amount of power.

>
> We switched from tape for backups to disks about a decade ago and are
> happy with that. Tape is not dead, but has been pushed into the
> very-high-volume niche.
>
> As for long-term storage, I would not rely on being able to read
> current tapes with drives twenty years down the road. So the way to
> go is to copy to new formats regularly, and you can do the same with
> disks; reliablility is determined by redundancy, so again which one is
> lower cost depends on the volume.

Neither would I rely on reading dusty tapes. Data needs to be on the
move all the time if it is to be immortal.

Anton Ertl

unread,

Aug 8, 2012, 8:24:17 AM8/8/12

to

Alex McDonald <bl...@rivadpm.com> writes:
>On Aug 8, 11:57=A0am, an...@mips.complang.tuwien.ac.at (Anton Ertl)
>wrote:

>That's true for many classes of use, but... One of the well understood
>issues of backup is ensuring that your backup is offsite. What is
>often not understood is that the per-byte transmission costs and
>bandwidth of tape are ridiculously low, manual handling or otherwise.
>I can move an 8TB LTO tape anywhere in the world in 24 hours or less;
>that kind of bandwidth to offsite disk isn't feasible, even if you
>could afford it.

According to Wikipedia, and confirmed by my local price watch site,
the largest LTO tapes can hold 1500GB of data (without compression).
Disk drives can hold up to 4000GB without compression. So if you want
to move 8TB, you can do it with either technology at similar costs.
And for moving, flash should also be in the same ballpark.

As for moving something to anywhere in the world in 24h, that may be
possible, but extremely expensive (regular flights from Vienna to
Auckland take more than 24h from airport to airport, and what if you
want to move the things from some more remote place than VIE to a more
remote place than AKL?

But anyway, for off-site storage I don't expect "anywhere in the world"
to be relevant.

>> Concerning power
>> requirements and deep and cold, a disk drive that is powered off

>> consumes no power, either. =A0And for powered drives, disk spin-down is

>> everything but dead.
>
>MAID (Massive Array of Idle Disks) is dead. COPAN, the one company
>that tried this on an industrial scale, went out of business several
>years ago. There are far too many issues to overcome to make the
>technology reliable and practical. About the only surviving technique
>is spin-slow, and it has very few supporters or suppliers. In general,
>drives don't like being spun up; they fail much more quickly than
>disks that are spun throughout their entire lives.

Not in my experience. We have a backup server that spins down idle
disks, and have not noticed any reliability problems. And we have not
noticed reliability problems with our off-line storage disks, either.
So disk spin-down is practical, and, in our experience, reliable.

I never heard about MAID and COPAN before, but it seems that this was
not sold as a backup solution, but as main storage. There, I agree,
it is not very practical for most uses, and spin-slow is better.

chitselb

unread,

Aug 8, 2012, 9:25:44 AM8/8/12

to

>
> > http://en.wikipedia.org/wiki/Datassette
>
> >
>
> >was interesting. It was apparently a standard cassette recorder with
>
> >some adc/dac's and a special edge connector. In particular the article
>
> >doesn't say whether the computer could start and stop the tape transport
>
> >or rewind the tape under software control. Do you know if that was
>
> >possible?
>
>
>
> IIRC it was not.
>
>
>
> - anton

The user has to push the buttons on the drive. The PET can start/stop either tape motor under program control.

Alex McDonald

unread,

Aug 8, 2012, 2:10:40 PM8/8/12

to

On Aug 8, 1:24 pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)
wrote:

> Alex McDonald <b...@rivadpm.com> writes:
> >On Aug 8, 11:57=A0am, an...@mips.complang.tuwien.ac.at (Anton Ertl)
> >wrote:
> >That's true for many classes of use, but... One of the well understood
> >issues of backup is ensuring that your backup is offsite. What is
> >often not understood is that the per-byte transmission costs and
> >bandwidth of tape are ridiculously low, manual handling or otherwise.
> >I can move an 8TB LTO tape anywhere in the world in 24 hours or less;
> >that kind of bandwidth to offsite disk isn't feasible, even if you
> >could afford it.
>
> According to Wikipedia, and confirmed by my local price watch site,
> the largest LTO tapes can hold 1500GB of data (without compression).

3.5TB raw, 8TB with 2.5:1 compression on LTO6. You can't buy them at
your corner shop just quite yet, but they are available.

> Disk drives can hold up to 4000GB without compression. So if you want
> to move 8TB, you can do it with either technology at similar costs.
> And for moving, flash should also be in the same ballpark.

Then the best of luck getting DHL to deliver your 4TB drive in one
piece. 8TB of quality flash is a little expensive...

>
> As for moving something to anywhere in the world in 24h, that may be
> possible, but extremely expensive (regular flights from Vienna to
> Auckland take more than 24h from airport to airport, and what if you
> want to move the things from some more remote place than VIE to a more
> remote place than AKL?
>
> But anyway, for off-site storage I don't expect "anywhere in the world"
> to be relevant.

That's true. But for a handful of tapes or more, tape is hard to beat.
Moving your data center out of Vienna to a city better servicing
Auckland seems like a good idea too. Unless you value a good cup of
coffee more highly, in which case I would stay where you are.

>
> >> Concerning power
> >> requirements and deep and cold, a disk drive that is powered off
> >> consumes no power, either. =A0And for powered drives, disk spin-down is
> >> everything but dead.
>
> >MAID (Massive Array of Idle Disks) is dead. COPAN, the one company
> >that tried this on an industrial scale, went out of business several
> >years ago. There are far too many issues to overcome to make the
> >technology reliable and practical. About the only surviving technique
> >is spin-slow, and it has very few supporters or suppliers. In general,
> >drives don't like being spun up; they fail much more quickly than
> >disks that are spun throughout their entire lives.
>
> Not in my experience. We have a backup server that spins down idle
> disks, and have not noticed any reliability problems. And we have not
> noticed reliability problems with our off-line storage disks, either.
> So disk spin-down is practical, and, in our experience, reliable.

Yes, you're one of the lucky devils that I meet; but much less
frequently than I used to, since the advent of super large SATA
drives. The stats speak for themselves; failure rates of SATA drives
are in the ones and twos of % per annum, and if you have an array with
a couple of hundred plus, failure is to be expected and needs to be
managed.

It's not just that the drive fails to spin up, or dies with a
catastrophic failure in operation either. Bit error rates per byte
haven't changed in 10 years, but the size of drives has grown
exponentially. Every 4TB drives will have, on average, several
correctly sent but badly written blocks, and a number of blocks where
the drive declared that it had -- honest! -- written your data but
hadn't. You just haven't found those corrupt or silent blocks yet.
Spin down & up exacerbates these problems.

>
> I never heard about MAID and COPAN before, but it seems that this was
> not sold as a backup solution, but as main storage. There, I agree,
> it is not very practical for most uses, and spin-slow is better.

It was sold as backup; the systems couldn't support all the disks
spinning at the same time.

To get adequate bandwidth, data needs to be striped across several 10s
of disks, and everyone wants to do their backups at the same time. But
the COPAN power supplies were inadequate to support all the disks.
It's the economics of competing with tape; big power supplies to
support 480 disks packed in a single rack cost lots of money. Then,
when data is required for restore, lots of disks have to be spun down,
and others spun up, an activity that draws a lot of juice; it takes as
much power to spin up a disk as running it for several minutes. And it
takes 10s of minutes to do so as they can't all be powered up at
once.

For users that want to do restores quickly, it's useless. The power
economics vs the high latency, low bandwidth & inconvenience just
don't stack up. For the occasional server with a handful of disk
drives, it's not so much of a problem, but at scale, even a moderate
scale in the 10s of TB range, it's unworkable.

Spin-slow disks only work once spun up. Again, the power economics
don't make sense for main storage where a complete stripe of 10s or
more of them need spun up to get at a single 4K file. It's the age old
problem of working out in advance if the data will ever be used again.
It's hard to tell; ask Google or Yahoo. Big disks with high rates of
compression and dedupe, with cache (DRAM and flash) in front of SATA
performs well and saves a significant amount more juice than turning
drives down or off.

(As an aside, I admit to working for a disk storage vendor. I still
like tape though! My company has stats on disk drives and their
behaviour and reliability going back to the mid 1980s for literally
100s of millions of disk drives. We shipped an exabyte of storage in
2009. Goodness knows how much we've shipped in the intervening years.
That's a lot of disks, and a huge amount of knowledge on how and why
they fail, and how best to avoid the problems that depending on
unreliable devices causes. Believe me, you're gambling with your
current backup strategy...)

Bernd Paysan

unread,

Aug 8, 2012, 6:13:24 PM8/8/12

to

Alex McDonald wrote:
> 3.5TB raw, 8TB with 2.5:1 compression on LTO6. You can't buy them at
> your corner shop just quite yet, but they are available.

I don't think it makes sense to compare compressed size with
uncompressed hard disk size. We have better compression algorithms, and
for hard disk backups, we use those - and usually, the tape backup is
already pre-compressed. Tapes are filled with already compressed data,
because you only copy hard disk backups to tape, and the harddisk backup
is already compressed. AFAIK, LTO-6 is 2.5TB raw, and the usual status
you get is "planned". As a data center, I woulc not consider that as
"available", even if there are low-quanity prototypes available for
testing. Available is the 1.5TB LTO-5, the price is a bit above $45.
About half the price of a similar-sized hard disk, as hard disk prices
are still suffering a bit from the Thailand flood.

> That's true. But for a handful of tapes or more, tape is hard to beat.
> Moving your data center out of Vienna to a city better servicing
> Auckland seems like a good idea too. Unless you value a good cup of
> coffee more highly, in which case I would stay where you are.

Whereever you move, there's a place on the other side of the world,
which is more than 24h flight away - and for Vienna, the place on the
other side of the world is Auckland. The best I can get from Frankfurt
is 28:25h via Dubai&Melbourne. And that sort of flight has two slots
per day or so, so add an average waiting time of 12h (customer calls you
"I need the data *now*" in the middle of your night).

Both A380 and 787 have a maximum range of ~15000km. So you can't go
nonstop to the other side of the world.

Alex McDonald

unread,

Aug 8, 2012, 7:05:17 PM8/8/12

to

On Aug 8, 11:13 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> Alex McDonald wrote:
> > 3.5TB raw, 8TB with 2.5:1 compression on LTO6. You can't buy them at
> > your corner shop just quite yet, but they are available.
>
> I don't think it makes sense to compare compressed size with
> uncompressed hard disk size. We have better compression algorithms, and
> for hard disk backups, we use those - and usually, the tape backup is
> already pre-compressed. Tapes are filled with already compressed data,
> because you only copy hard disk backups to tape, and the harddisk backup
> is already compressed.

That's true, but it doesn't change the fundamentals; a 4TB disk still
contains 4TB of data, and a 3.2TB tape 3.2TB of data, compressed or
otherwise.

> AFAIK, LTO-6 is 2.5TB raw, and the usual status
> you get is "planned". As a data center, I woulc not consider that as
> "available", even if there are low-quanity prototypes available for
> testing. Available is the 1.5TB LTO-5, the price is a bit above $45.

HP's LTO6 is 3.2TB uncompressed, which is what the Ultrium consortium
indicated. I mistyped 3.2TB as as 3.5TB above.

> About half the price of a similar-sized hard disk, as hard disk prices
> are still suffering a bit from the Thailand flood.

True. But you're quoting bog standard desktop drives at that price,
which I sincerely hope you aren't using in your backup servers.
Enterprise class drives are a good bit more expensive than that.

>
> > That's true. But for a handful of tapes or more, tape is hard to beat.
> > Moving your data center out of Vienna to a city better servicing
> > Auckland seems like a good idea too. Unless you value a good cup of
> > coffee more highly, in which case I would stay where you are.
>
> Whereever you move, there's a place on the other side of the world,
> which is more than 24h flight away - and for Vienna, the place on the
> other side of the world is Auckland. The best I can get from Frankfurt
> is 28:25h via Dubai&Melbourne. And that sort of flight has two slots
> per day or so, so add an average waiting time of 12h (customer calls you
> "I need the data *now*" in the middle of your night).
>
> Both A380 and 787 have a maximum range of ~15000km. So you can't go
> nonstop to the other side of the world.

Perhaps I should have used a few smilies rather than references to the
crap coffee in NZ on this one; the point that tape has incredible
bandwidth per km seems to be getting lost in plane timetables...

Paul Rubin

unread,

Aug 8, 2012, 8:30:54 PM8/8/12

to

Alex McDonald <bl...@rivadpm.com> writes:
> HP's LTO6 is 3.2TB uncompressed, which is what the Ultrium consortium
> indicated. I mistyped 3.2TB as as 3.5TB above.

It was decreased to 2.5TB:

http://www.storagenewsletter.com/news/tapes/licensing-specs-august-2012

"The new main specs of LTO-6 are below what was formerly announced by
the LTO consortium. For uncompressed capacity and transfer rates, it
was supposed to be 3.2TB and 210MB/s for LTO-6, it's now 2.5TB and
160MB/s, or an increase of only 67% and 14% respectively, in
comparison to 1.5TB and 140MB/s for LTO-5."

van...@vsta.org

unread,

Aug 8, 2012, 8:32:02 PM8/8/12

to

Bernd Paysan <bernd....@gmx.de> wrote:
> We have better compression algorithms, and
> for hard disk backups, we use those

I've had some troubles with compressed archives, where there were some hits
on the media. The fact that the archive was compressed made it much harder
to recover the remaining bits on the media. I'd recommend avoiding
compression in your backups if you can afford the storage.

--
Andy Valencia
Home page: http://www.vsta.org/andy/
To contact me: http://www.vsta.org/contact/andy.html

Bernd Paysan

unread,

Aug 8, 2012, 9:26:27 PM8/8/12

to

Alex McDonald wrote:
> True. But you're quoting bog standard desktop drives at that price,
> which I sincerely hope you aren't using in your backup servers.

Of course I do. Backup is redundancy, not expensive disks, the
likelyhood that a bog standard desktop drive fails is not that much
different from a snake-oil expensive SAS drive - the fundamental
construction of both drives are the same. Backup is not something you
need extremely high bandwidth for, bog standard desktop drives are fine.
LTO-5 is 140MB/s, that's in the range of cheap 5400rpm desktop drives -
7200rpm are already in the 200MB/s range of LTO-6.

Bernd Paysan

unread,

Aug 8, 2012, 9:33:04 PM8/8/12

to

van...@vsta.org wrote:

> Bernd Paysan <bernd....@gmx.de> wrote:
>> We have better compression algorithms, and
>> for hard disk backups, we use those
>
> I've had some troubles with compressed archives, where there were some
> hits
> on the media. The fact that the archive was compressed made it much
> harder
> to recover the remaining bits on the media. I'd recommend avoiding
> compression in your backups if you can afford the storage.

I recommend saving twice if you can afford the storage. It's way more
robust to have redundancy to recover these problems than to have
uncompressed data. Anyways, most of the current data that really takes
space is already compressed - videos, images, music. Some data, like
textures, are even compressed in RAM, because decompression on the fly
is worth the effort (RAM is slow... the GPU is much faster). We keep
text files uncompressed, and to be honest, I don't know why. Usually,
we read and write them in one go today, compressing/decompressing on the
fly is not really a problem.

Percy

unread,

Aug 9, 2012, 12:11:23 AM8/9/12

to

I'm not sure how determined you are to use original cassette hardware, but my PET (4032) had a robust 8-bit user-port that you could access with a suitable PCB edge-connector. How would you feel about building a small microcontroller project to interface with the user-port and provide permanent storage?

On the PET side you'd need to develop a very small set of routines in assembler that implement a basic command protocol for communictaion with the microcontroller. You could make the command protocol closely aligned with the FORTH BLOCK word-set (i.e., load and save a block, etc.) and let the microcontroller do all the work of actually fetching and saving. The microcontroller could use an SD-card for storing the data. You would not need to implement a proper file system since it's all just blocks, just very basic commands for SD-card interface in SPI mode. If you do something like that helpful resources are available from a very impressive website here: http://elm-chan.org/docs/mmc/mmc_e.html

chitselb

unread,

Aug 9, 2012, 12:30:30 AM8/9/12

to

My reasons for using the PET's C2N cassette are:
1) It's the hardware most PET owners had
2) It's the hardware I have already
3) It feels like a cool problem to solve. Reminds me of reading those Knuth books on tape sorting

Put another way: I am very unlikely in 2012 to go shopping for an 8050 dual floppy drive (or even a 4040) and some diskettes to put in it.

http://www.bitfixer.com/bf/PETdisk <-- however, this is very appealing

percival...@gmail.com

unread,

Aug 9, 2012, 2:50:59 AM8/9/12

to

The bitfixer PETdisk looks great! Thank you for sharing the link. Maybe you can develop and test your FORTH using the PETdisk as a stop-gap storage solution to support you while you port the FORTH tape drive as phase 2?

Anton Ertl

unread,

Aug 9, 2012, 2:00:22 AM8/9/12

to

Alex McDonald <bl...@rivadpm.com> writes:
>On Aug 8, 1:24=A0pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)

>wrote:
>> Alex McDonald <b...@rivadpm.com> writes:

[...]

>Then the best of luck getting DHL to deliver your 4TB drive in one
>piece.

The drives are delivered to us in one piece, why wouldn't they
delivered elsewhere in one piece.

>> >In general,
>> >drives don't like being spun up; they fail much more quickly than
>> >disks that are spun throughout their entire lives.
>>

>> Not in my experience. =A0We have a backup server that spins down idle
>> disks, and have not noticed any reliability problems. =A0And we have not

>> noticed reliability problems with our off-line storage disks, either.
>> So disk spin-down is practical, and, in our experience, reliable.
>
>Yes, you're one of the lucky devils that I meet; but much less
>frequently than I used to, since the advent of super large SATA
>drives. The stats speak for themselves; failure rates of SATA drives
>are in the ones and twos of % per annum, and if you have an array with
>a couple of hundred plus, failure is to be expected and needs to be
>managed.

Sure, hard disks fail now and then. And we certainly have organized
our backups such that the failure of one or two drives does not lead
to catastrophic loss.

>It's not just that the drive fails to spin up, or dies with a
>catastrophic failure in operation either. Bit error rates per byte
>haven't changed in 10 years, but the size of drives has grown
>exponentially. Every 4TB drives will have, on average, several
>correctly sent but badly written blocks, and a number of blocks where
>the drive declared that it had -- honest! -- written your data but
>hadn't. You just haven't found those corrupt or silent blocks yet.
>Spin down & up exacerbates these problems.

Sounds like you swallowed some horror stories some people like to
spin. Why should spin down exacerbate these problems?

BTW, in my experience (based on several occasions) the most frequent
cause of corrupted disk blocks is due to misdesigned drives that do not
react correctly to power fluctuations.

>> I never heard about MAID and COPAN before, but it seems that this was

>> not sold as a backup solution, but as main storage. =A0There, I agree,

>> it is not very practical for most uses, and spin-slow is better.
>
>It was sold as backup; the systems couldn't support all the disks
>spinning at the same time.

The latter is true. But according to
<http://wikibon.org/blog/copan-may-be-dead-but-maid-isnt/>, this was
sold as main storage.

>To get adequate bandwidth, data needs to be striped across several 10s
>of disks, and everyone wants to do their backups at the same time. But
>the COPAN power supplies were inadequate to support all the disks.
>It's the economics of competing with tape; big power supplies to
>support 480 disks packed in a single rack cost lots of money.

If you need that much bandwidth from your backup system, the tape
solution needs a similar number of tape drives (because tape drives
have a similar bandwidth), and the cost of that would dwarf the costs
of everything in the disk system, including a power supply for
spinning all the disks. And these tape drives would need an even more
powerful power supply: Looking at
<https://iq.quantum.com/exLink.asp?8078910OS53M15I46299120>, idle
power consumption is 6.5W, typical 21.4W, peak 30.2W, i.e., about 2-3
times of a hard disk drive.

And how are you getting all this bandwidth to and from the backup
system? 480 disks with, say, 150MB/s each means 72000MB/s.

> Then,
>when data is required for restore, lots of disks have to be spun down,
>and others spun up, an activity that draws a lot of juice; it takes as
>much power to spin up a disk as running it for several minutes.

<http://www.seagate.com/files/staticfiles/docs/pdf/de-DE/datasheet/disc/barracuda-ds1737-1-1111de.pdf>
lists a power-up power consumption of at most 2A @12V, i.e., 24W (my
own measurements are in the same ballpark), and an average power
consumption of 8W for the 3TB model. It takes about 10s to spin up a
drive, so spinning up takes as much as running a disk for half a
minute.

> And it
>takes 10s of minutes to do so as they can't all be powered up at
>once.

With the power supply you would need for the 480 tape
drives, yes, you could spin them all up at the same time. But this is
typically not needed, certainly not for a saner backup management (but
neither are the 480 tape drives).

>For users that want to do restores quickly, it's useless. The power
>economics vs the high latency, low bandwidth & inconvenience just
>don't stack up.

Tape loses in power, latency, bandwidth, and convenience. The only
thing where it wins is cost for low-bandwidth high-capacity storage.
Taking the numbers from my price watch site, I get:

EUR 46/TB for external 3TB disks (similar price for internal disks)
EUR 1311 for an internal LTO-5 tape drive (>1700 for external)
EUR 41 for a 1.5TB LTO-5 tape (EUR27/TB)

the crossover is at about 69 TB per tape drive (higher if you use
external tape drives). If you want reliable and timely access to your
tapes, you need at least two tape drives, so tape is only cheaper if
you want to store more than 138TB on it (and even then it still has
all the other disadvantages).

>For the occasional server with a handful of disk
>drives, it's not so much of a problem, but at scale, even a moderate
>scale in the 10s of TB range, it's unworkable.

That's nonsense.

>Again, the power economics
>don't make sense for main storage where a complete stripe of 10s or
>more of them need spun up to get at a single 4K file.

Yes, striping (RAID-0) a 4KB file across tens of disks does not make
sense.

>Believe me, you're gambling with your
>current backup strategy...)

The only thing you know about our current backup strategy is that we
use disks and spin-down. If by "gambling" you mean that we are
relying on luck, no, not much. Of course there is the possibility
that everything fails at the same time, but that possibility is not
exclusive to disk drives. Actually the probability that two tape
drives fail before we get a replacement is much higher than the
probability that all the disks on which we have our backups fail
between two backups.

chitselb

unread,

Aug 9, 2012, 6:54:13 AM8/9/12

to

On Thursday, August 9, 2012 2:50:59 AM UTC-4, (unknown) wrote:
> The bitfixer PETdisk looks great! Thank you for sharing the link. Maybe you can develop and test your FORTH using the PETdisk as a stop-gap storage solution to support you while you port the FORTH tape drive as phase 2?

I do most of the development and testing on my VAIO netbook which mostly runs Ubuntu 12.04 Precise. One problem is there is currently only support for a single tape drive in the xpet emulator. In a rare historical inversion, the viceteam.org xpet code came *after* the x64 and that's where the tape code came from.

chitselb@pakora:~$ dpkg -l vice xa65
||/ Name Version Description
+++-==============-==============-============================================
ii vice 2.3.dfsg-2 Versatile Commodore Emulator
ii xa65 2.3.5-1 cross-assembler and utility suite for 65xx/6

Alex McDonald

unread,

Aug 9, 2012, 8:26:21 AM8/9/12

to

On Aug 9, 7:00 am, an...@mips.complang.tuwien.ac.at (Anton Ertl)

wrote:
> Alex McDonald <b...@rivadpm.com> writes:
> >On Aug 8, 1:24=A0pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)
> >wrote:
> >> Alex McDonald <b...@rivadpm.com> writes:
> [...]
> >Then the best of luck getting DHL to deliver your 4TB drive in one
> >piece.
>
> The drives are delivered to us in one piece, why wouldn't they
> delivered elsewhere in one piece.

They don't contain your data. The failure rate of new drives is partly
as high as it is due to shipping.

Several reasons.

Rated start/stop cycles; 250 average on/off cycles per year at the
expected population AFR of 0.55% (Seagate Cheetah 15.7, enterprise
class drive). Cumulative head damage; carbonisation during spin up
drag.
Low temperature operation; the AFR increases significantly (5 times
the AFR at <20C to those running >40C, Google study on 100000 desktop
class drives), and spun down drives will be cooler during early hours
of operation.
Slow spin; heads are designed for flight at a given RPM. Slow spin
reduces the air cushion/head height and makes the drives more
susceptible to shock. Even at full speed they can be shouted into
submission; http://www.youtube.com/watch?v=tDacjrSCeq4&feature=player_embedded

Overview of the Google experience, including a pointer to the paper
http://storagemojo.com/2007/02/19/googles-disk-failure-experience/

Due to commercial NDAs and other reasons, I can't do any better than
point you at what is publicly available. Our AFRs are much lower for a
variety of reasons; dual parity RAID, enterprise class drives,
temperature & vibration control, scrubbing, not depending on SMART or
for the drive to terminally die before replacing it amongst them.

>
> BTW, in my experience (based on several occasions) the most frequent
> cause of corrupted disk blocks is due to misdesigned drives that do not
> react correctly to power fluctuations.

That is rarely a problem on a well designed storage array, where the
power management is more sophisticated than that of a server. Pulling
the plug on such a system should have no deleterious effects.

>
> >> I never heard about MAID and COPAN before, but it seems that this was
> >> not sold as a backup solution, but as main storage. =A0There, I agree,
> >> it is not very practical for most uses, and spin-slow is better.
>
> >It was sold as backup; the systems couldn't support all the disks
> >spinning at the same time.
>
> The latter is true. But according to
> <http://wikibon.org/blog/copan-may-be-dead-but-maid-isnt/>, this was
> sold as main storage.

I beg to differ. David Vallente is a sharp analyst, but to suggest
that a system that could only support 25% of its disks running at any
one time as "main storage" is a stretch; nor is it what he says in
that 3 year old article. He describes it as "disk arrays for storing
less active enterprise data"; the rest of the industry and COPAN's
hundred-odd customers were less charitable, and it only ever found a
place as a backup device.

>
> >To get adequate bandwidth, data needs to be striped across several 10s
> >of disks, and everyone wants to do their backups at the same time. But
> >the COPAN power supplies were inadequate to support all the disks.
> >It's the economics of competing with tape; big power supplies to
> >support 480 disks packed in a single rack cost lots of money.
>
> If you need that much bandwidth from your backup system, the tape
> solution needs a similar number of tape drives (because tape drives
> have a similar bandwidth), and the cost of that would dwarf the costs
> of everything in the disk system, including a power supply for
> spinning all the disks. And these tape drives would need an even more
> powerful power supply: Looking at
> <https://iq.quantum.com/exLink.asp?8078910OS53M15I46299120>, idle
> power consumption is 6.5W, typical 21.4W, peak 30.2W, i.e., about 2-3
> times of a hard disk drive.
>
> And how are you getting all this bandwidth to and from the backup
> system? 480 disks with, say, 150MB/s each means 72000MB/s.

A lot of connectivity. It's not unusual to see 100s of 8Gb/s FC
interconnects or 6Gb/s SAS, or 10GbE on high end systems; they are
designed to support multiple parallel streams from 100s of systems.
Note that out of 480 disk drives, COPAN could only support 120
spinning.

>
> > Then,
> >when data is required for restore, lots of disks have to be spun down,
> >and others spun up, an activity that draws a lot of juice; it takes as
> >much power to spin up a disk as running it for several minutes.
>

> <http://www.seagate.com/files/staticfiles/docs/pdf/de-DE/datasheet/dis...>

> lists a power-up power consumption of at most 2A @12V, i.e., 24W (my
> own measurements are in the same ballpark), and an average power
> consumption of 8W for the 3TB model. It takes about 10s to spin up a
> drive, so spinning up takes as much as running a disk for half a
> minute.

Drives vary; SATA drives at 5k RPM spin up faster than high RPM SAS
drives at 15K, which may take minutes to stabilize at operating speed.
During that time, the disk isn't usable, and I stand by my assertion
that spin up wastes as much power as several minutes of full
operation.

>
> > And it
> >takes 10s of minutes to do so as they can't all be powered up at
> >once.
>
> With the power supply you would need for the 480 tape
> drives, yes, you could spin them all up at the same time. But this is
> typically not needed, certainly not for a saner backup management (but
> neither are the 480 tape drives).

I don't know where you got the idea that 480 tape drives was the
equivalent to 480 disk drives, but it's not an assertion I made and
certainly qualifies as insane.

>
> >For users that want to do restores quickly, it's useless. The power
> >economics vs the high latency, low bandwidth & inconvenience just
> >don't stack up.
>
> Tape loses in power, latency, bandwidth, and convenience. The only
> thing where it wins is cost for low-bandwidth high-capacity storage.
> Taking the numbers from my price watch site, I get:
>
> EUR 46/TB for external 3TB disks (similar price for internal disks)
> EUR 1311 for an internal LTO-5 tape drive (>1700 for external)
> EUR 41 for a 1.5TB LTO-5 tape (EUR27/TB)
>
> the crossover is at about 69 TB per tape drive (higher if you use
> external tape drives). If you want reliable and timely access to your
> tapes, you need at least two tape drives, so tape is only cheaper if
> you want to store more than 138TB on it (and even then it still has
> all the other disadvantages).
>
> >For the occasional server with a handful of disk
> >drives, it's not so much of a problem, but at scale, even a moderate
> >scale in the 10s of TB range, it's unworkable.
>
> That's nonsense.

Why? The limiting factor isn't the disk or tape that you're backing up
to, but how fast you can shovel it off the server.

>
> >Again, the power economics
> >don't make sense for main storage where a complete stripe of 10s or
> >more of them need spun up to get at a single 4K file.
>
> Yes, striping (RAID-0) a 4KB file across tens of disks does not make
> sense.

Ignoring RAID-0, since RAID-any systems also stripe, the problem is
that such files do get spread across an unknown number of disks. They
all need fired up to find even the smallest file, since it's not just
the file, but the meta data that needs accessed too.

>
> >Believe me, you're gambling with your
> >current backup strategy...)
>
> The only thing you know about our current backup strategy is that we
> use disks and spin-down. If by "gambling" you mean that we are
> relying on luck, no, not much. Of course there is the possibility
> that everything fails at the same time, but that possibility is not
> exclusive to disk drives. Actually the probability that two tape
> drives fail before we get a replacement is much higher than the
> probability that all the disks on which we have our backups fail
> between two backups.

That's true. I didn't mean to imply that your backup strategy wasn't
thoughtful or adequate, but it's my experience that such things are
rarely on anyone's mind until they fail to provide an adequate
restore, particularly when disaster strikes. A fire a few years ago at
Edinburgh Uni destroyed much of the AI department; they had just
implemented a DR system that saved the main electronic archives
(although much personal research data & the non-digitized archive was
lost). Good on them for recognizing at least part of the problem; many
don't until it's too late.

Alex McDonald

unread,

Aug 9, 2012, 8:30:34 AM8/9/12

to

On Aug 9, 2:26 am, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> Alex McDonald wrote:
> > True. But you're quoting bog standard desktop drives at that price,
> > which I sincerely hope you aren't using in your backup servers.
>
> Of course I do. Backup is redundancy, not expensive disks, the
> likelyhood that a bog standard desktop drive fails is not that much
> different from a snake-oil expensive SAS drive - the fundamental
> construction of both drives are the same.

Yet they have different specs. Why?

> Backup is not something you
> need extremely high bandwidth for, bog standard desktop drives are fine.
> LTO-5 is 140MB/s, that's in the range of cheap 5400rpm desktop drives -
> 7200rpm are already in the 200MB/s range of LTO-6.

Backup needs reliability, which is where I take issue with your
assertion that desktop drives are "good enough". Your backup is not
redundancy when you only have it to continue with.

Anton Ertl

unread,

Aug 9, 2012, 9:44:25 AM8/9/12

to

Alex McDonald <bl...@rivadpm.com> writes:
>On Aug 9, 7:00=A0am, an...@mips.complang.tuwien.ac.at (Anton Ertl)

>wrote:
>> Alex McDonald <b...@rivadpm.com> writes:

>> >On Aug 8, 1:24=3DA0pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)

>> >wrote:
>> >> Alex McDonald <b...@rivadpm.com> writes:
>> [...]
>> >Then the best of luck getting DHL to deliver your 4TB drive in one
>> >piece.
>>
>> The drives are delivered to us in one piece, why wouldn't they
>> delivered elsewhere in one piece.
>
>They don't contain your data.

So what? If it's broken, I send another one; it's a backup. It's
redundant, and it's definitely not the only backup.

>> Sounds like you swallowed some horror stories some people like to

>> spin. =A0Why should spin down exacerbate these problems?

>
>Several reasons.
>
>Rated start/stop cycles; 250 average on/off cycles per year at the
>expected population AFR of 0.55% (Seagate Cheetah 15.7, enterprise
>class drive).

What does AFR have to do with the horror stories about corrupted data?
And anyone who uses "enterprise class" drives for backup has too much
money.

>Low temperature operation; the AFR increases significantly (5 times
>the AFR at <20C to those running >40C, Google study on 100000 desktop
>class drives), and spun down drives will be cooler during early hours
>of operation.

Fortunately even our spun-down drives have a higher temperature. And
again, what does AFR have to do with your horror stories about
corrupted data?

>Slow spin; heads are designed for flight at a given RPM. Slow spin
>reduces the air cushion/head height and makes the drives more
>susceptible to shock. Even at full speed they can be shouted into

>submission; http://www.youtube.com/watch?v=3DtDacjrSCeq4&feature=3Dplayer_e=
>mbedded

What do head crashes resulting from shock have to do with the horror
stories about corrupted data?

>Due to commercial NDAs and other reasons, I can't do any better than
>point you at what is publicly available. Our AFRs are much lower for a
>variety of reasons; dual parity RAID

How does RAID make individual drives more reliable?

>> BTW, in my experience (based on several occasions) the most frequent
>> cause of corrupted disk blocks is due to misdesigned drives that do not
>> react correctly to power fluctuations.
>
>That is rarely a problem on a well designed storage array, where the
>power management is more sophisticated than that of a server. Pulling
>the plug on such a system should have no deleterious effects.

It's also not a problem for well-designed disk drives, but yes, to
some extent the power supply can alleviate the problems coming from
misdesigned drives; but if the problem is between the power supply and
the drive (i.e., a suboptimal power connection), the misdesigned drive
will still produce corrupt blocks.

>> > Then,
>> >when data is required for restore, lots of disks have to be spun down,
>> >and others spun up, an activity that draws a lot of juice; it takes as
>> >much power to spin up a disk as running it for several minutes.
>>

>> <http://www.seagate.com/files/staticfiles/docs/pdf/de-DE/datasheet/dis...=

>>
>> lists a power-up power consumption of at most 2A @12V, i.e., 24W (my
>> own measurements are in the same ballpark), and an average power

>> consumption of 8W for the 3TB model. =A0It takes about 10s to spin up a

>> drive, so spinning up takes as much as running a disk for half a
>> minute.
>
>Drives vary; SATA drives at 5k RPM spin up faster than high RPM SAS
>drives at 15K, which may take minutes to stabilize at operating speed.
>During that time, the disk isn't usable, and I stand by my assertion
>that spin up wastes as much power as several minutes of full
>operation.

Sure, if a drive takes several minutes to spin up, it will consume as

much power as several minutes of full operation.

But who in his right mind uses an expensive and power-hungry high-RPM
drive that takes forever to spin up for a storage solution that
requires low power and fast spin-up? Ok, a sales guy selling to a
clueless and rich customer will do it, but not because of technical
merit.

>> With the power supply you would need for the 480 tape

>> drives, yes, you could spin them all up at the same time. =A0But this is

>> typically not needed, certainly not for a saner backup management (but
>> neither are the 480 tape drives).
>
>I don't know where you got the idea that 480 tape drives was the
>equivalent to 480 disk drives, but it's not an assertion I made and
>certainly qualifies as insane.

You claimed that lots of disks had to be spun up for bandwidth

reasons, and you wrote:

|It's the economics of competing with tape; big power supplies to
|support 480 disks packed in a single rack cost lots of money.

which suggest that you think that a backup solution needs 480 disks
spun up for bandwidth reasons.

>> >For the occasional server with a handful of disk
>> >drives, it's not so much of a problem, but at scale, even a moderate
>> >scale in the 10s of TB range, it's unworkable.
>>
>> That's nonsense.
>
>Why? The limiting factor isn't the disk or tape that you're backing up
>to, but how fast you can shovel it off the server.

It's nonsense, because we are backing up to disks with a total of 10s
of TB, and it's workable, and if we wanted to back up to more disks,
we would just use more disks. And the main bandwidth limit is, as you
write, getting the data off the main storage.

>> >Again, the power economics
>> >don't make sense for main storage where a complete stripe of 10s or
>> >more of them need spun up to get at a single 4K file.
>>
>> Yes, striping (RAID-0) a 4KB file across tens of disks does not make
>> sense.
>
>Ignoring RAID-0, since RAID-any systems also stripe,

RAID-1 doesn't.

>the problem is
>that such files do get spread across an unknown number of disks.

With typical block sizes, a 4KB block is not distributed across
multiple disks, even with RAID-0.

>They
>all need fired up to find even the smallest file, since it's not just
>the file, but the meta data that needs accessed too.

Meta data is often in OS caches, at least on decent OSs.

But yes, I agree that spin-down is not practical for main storage; but
from what I read, the idea of COPAN was to make it practical by
rearranging data such that frequently-accessed data resides on a few
drives. Anyway, for backups spin-down is totally practical, certainly
the way we do our backups. When the backup is written (or read), the
disk spins up, and some time after the access, it spins down.

Paul Rubin

unread,

Aug 9, 2012, 12:07:05 PM8/9/12

to

chitselb <chit...@gmail.com> writes:
> One problem is there is currently only support for a single tape drive
> in the xpet emulator. In a rare historical inversion, the
> viceteam.org xpet code came *after* the x64 and that's where the tape
> code came from.

Did anyone actually use multiple cassette drives with a PET? Given what
I've seen about the datasette's transfer speed, running any of those
multi-tape algorithms sounds utterly unbearable. Old-fashioned magtape
transferred fast enough to fill fill main memory in a few seconds on
computers of the era. At 50 bytes/sec the datasette would take over 2
minutes to fill 8k on a PET. Also, it sounds like the dataset didn't
have a way to switch from writing to reading or vice versa, without the
user having to push buttons on the drive.

I think you have to treat the datasette as more like a paper tape
reader/punch, than like a magtape-like device.

It looks like Commodore floppies are pretty cheap:

http://www.ebay.com/itm/350581937903

Getting one is probably more in the retro spirit, than trying to do
contorted things with cassette drives that nobody ever(?) considered on
real systems of the day.

Bernd Paysan

unread,

Aug 9, 2012, 1:21:13 PM8/9/12

to

Alex McDonald wrote:

> On Aug 9, 2:26 am, Bernd Paysan <bernd.pay...@gmx.de> wrote:
>> Alex McDonald wrote:
>> > True. But you're quoting bog standard desktop drives at that price,
>> > which I sincerely hope you aren't using in your backup servers.
>>
>> Of course I do. Backup is redundancy, not expensive disks, the
>> likelyhood that a bog standard desktop drive fails is not that much
>> different from a snake-oil expensive SAS drive - the fundamental
>> construction of both drives are the same.
>
> Yet they have different specs. Why?

To get the money of idiots believing that expensive is better. And
because the SAS controller is low volume, while the SATA controller is
high volume. SAS drives have faster spindle speeds and shorter access
times, which is completely useless in this case: All we want is
reasonable speed to fill the backup disk with files.

And BTW: You can buy disks with similar specs both for SATA and SAS,
they can have similar pricing, though.

>> Backup is not something you
>> need extremely high bandwidth for, bog standard desktop drives are
>> fine. LTO-5 is 140MB/s, that's in the range of cheap 5400rpm desktop
>> drives - 7200rpm are already in the 200MB/s range of LTO-6.
>
> Backup needs reliability, which is where I take issue with your
> assertion that desktop drives are "good enough". Your backup is not
> redundancy when you only have it to continue with.

You have only *one* backup? That's why I suggest using cheap drives:
Buy another, make two backups.

Alex McDonald

unread,

Aug 9, 2012, 1:21:14 PM8/9/12

to

On Aug 9, 2:44 pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)

wrote:
> Alex McDonald <b...@rivadpm.com> writes:
> >On Aug 9, 7:00=A0am, an...@mips.complang.tuwien.ac.at (Anton Ertl)
> >wrote:
> >> Alex McDonald <b...@rivadpm.com> writes:
> >> >On Aug 8, 1:24=3DA0pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)
> >> >wrote:
> >> >> Alex McDonald <b...@rivadpm.com> writes:
> >> [...]
> >> >Then the best of luck getting DHL to deliver your 4TB drive in one
> >> >piece.
>
> >> The drives are delivered to us in one piece, why wouldn't they
> >> delivered elsewhere in one piece.
>
> >They don't contain your data.
>
> So what? If it's broken, I send another one; it's a backup. It's
> redundant, and it's definitely not the only backup.
>
> >> Sounds like you swallowed some horror stories some people like to
> >> spin. =A0Why should spin down exacerbate these problems?
>
> >Several reasons.
>
> >Rated start/stop cycles; 250 average on/off cycles per year at the
> >expected population AFR of 0.55% (Seagate Cheetah 15.7, enterprise
> >class drive).
>
> What does AFR have to do with the horror stories about corrupted data?

AFR includes corrupted data.

> And anyone who uses "enterprise class" drives for backup has too much
> money.

Why? Since many operations value data integrity greater than the cost,
this is an economic argument, not one of wealth causing stupidity.

>
> >Low temperature operation; the AFR increases significantly (5 times
> >the AFR at <20C to those running >40C, Google study on 100000 desktop
> >class drives), and spun down drives will be cooler during early hours
> >of operation.
>
> Fortunately even our spun-down drives have a higher temperature. And
> again, what does AFR have to do with your horror stories about
> corrupted data?

AFR includes corrupted data.

>
> >Slow spin; heads are designed for flight at a given RPM. Slow spin
> >reduces the air cushion/head height and makes the drives more
> >susceptible to shock. Even at full speed they can be shouted into
> >submission;http://www.youtube.com/watch?v=3DtDacjrSCeq4&feature=3Dplayer_e=
> >mbedded
>
> What do head crashes resulting from shock have to do with the horror
> stories about corrupted data?

Shock can cause high flying writes; the exact opposite of a head
crash. The data isn't written. What is being demonstrated in the video
is the effect of drive recovery (which will be successful if the
software is up to the task, something that most OSes find hard to deal
with) on response time as disks fail to write data.

And AFR includes corrupted data. I'm mystified; where did I say that
corrupted data was the only issue?

>
> >Due to commercial NDAs and other reasons, I can't do any better than
> >point you at what is publicly available. Our AFRs are much lower for a
> >variety of reasons; dual parity RAID
>
> How does RAID make individual drives more reliable?

It doesn't. It makes them collectively more reliable.

>
> >> BTW, in my experience (based on several occasions) the most frequent
> >> cause of corrupted disk blocks is due to misdesigned drives that do not
> >> react correctly to power fluctuations.
>
> >That is rarely a problem on a well designed storage array, where the
> >power management is more sophisticated than that of a server. Pulling
> >the plug on such a system should have no deleterious effects.
>
> It's also not a problem for well-designed disk drives, but yes, to
> some extent the power supply can alleviate the problems coming from
> misdesigned drives; but if the problem is between the power supply and
> the drive (i.e., a suboptimal power connection), the misdesigned drive
> will still produce corrupt blocks.
>
>

Caveat emptor.

>
>
>
>
>
>
>
> >> > Then,
> >> >when data is required for restore, lots of disks have to be spun down,
> >> >and others spun up, an activity that draws a lot of juice; it takes as
> >> >much power to spin up a disk as running it for several minutes.
>
> >> <http://www.seagate.com/files/staticfiles/docs/pdf/de-DE/datasheet/dis...
>

> >> lists a power-up power consumption of at most 2A @12V, i.e., 24W (my
> >> own measurements are in the same ballpark), and an average power
> >> consumption of 8W for the 3TB model. =A0It takes about 10s to spin up a
> >> drive, so spinning up takes as much as running a disk for half a
> >> minute.
>
> >Drives vary; SATA drives at 5k RPM spin up faster than high RPM SAS
> >drives at 15K, which may take minutes to stabilize at operating speed.
> >During that time, the disk isn't usable, and I stand by my assertion
> >that spin up wastes as much power as several minutes of full
> >operation.
>
> Sure, if a drive takes several minutes to spin up, it will consume as
> much power as several minutes of full operation.
>
> But who in his right mind uses an expensive and power-hungry high-RPM
> drive that takes forever to spin up for a storage solution that
> requires low power and fast spin-up? Ok, a sales guy selling to a
> clueless and rich customer will do it, but not because of technical
> merit.

I was giving an example of slow spin up to counterpoint the "10
seconds and you're good to go" example you gave.

To spin up a RAID group of say 14 drives on a shelf of disks will
require that the drives are turned on serially in small groups. By the
time they're all turned on and ready to go, regardless of whether
they're SATA or SAS, enterprise or desktop, slow or fast, a certain
amount of time will have elapsed. In the case of systems I know and
understand -- the majority of commercially available systems --
minutes will have passed during which there has been (a) no productive
work and (b) higher than average power consumption. Then there's the
decision on when to power down; that's made after a period of
inactivity, during which there has been no productive work and
continued power consumption.

All spin-down/up schemes for infrequently accessed data have to
account for these issues, and none do so in any effective way since
crystal balls aren't part of the armoury of most storage management
systems. That's where the cluelessness plays its part.

>
> >> With the power supply you would need for the 480 tape
> >> drives, yes, you could spin them all up at the same time. =A0But this is
> >> typically not needed, certainly not for a saner backup management (but
> >> neither are the 480 tape drives).
>
> >I don't know where you got the idea that 480 tape drives was the
> >equivalent to 480 disk drives, but it's not an assertion I made and
> >certainly qualifies as insane.
>
> You claimed that lots of disks had to be spun up for bandwidth
>
> reasons, and you wrote:
>
> |It's the economics of competing with tape; big power supplies to
> |support 480 disks packed in a single rack cost lots of money.
>
> which suggest that you think that a backup solution needs 480 disks
> spun up for bandwidth reasons.

No, that was the COPAN solution. (IIRC it was the smallest COPAN
system you could buy.) Streaming backups is not a difficult task; if
all you have is a single stream, then a couple of active disks will
do. For 100s of streams to a single backup system, then you need a lot
more, and the task is correspondingly more complicated to achieve at
decent speeds.

>
> >> >For the occasional server with a handful of disk
> >> >drives, it's not so much of a problem, but at scale, even a moderate
> >> >scale in the 10s of TB range, it's unworkable.
>
> >> That's nonsense.
>
> >Why? The limiting factor isn't the disk or tape that you're backing up
> >to, but how fast you can shovel it off the server.
>
> It's nonsense, because we are backing up to disks with a total of 10s
> of TB, and it's workable, and if we wanted to back up to more disks,
> we would just use more disks. And the main bandwidth limit is, as you
> write, getting the data off the main storage.

That was my point. If you want off-server backup, then the bandwidth
off the server is the issue. That's what kills very large disk server
systems from doing adequate & timely backups; not everyone has a
backup window. Adding more disks inside the same box isn't a backup.

>
> >> >Again, the power economics
> >> >don't make sense for main storage where a complete stripe of 10s or
> >> >more of them need spun up to get at a single 4K file.
>
> >> Yes, striping (RAID-0) a 4KB file across tens of disks does not make
> >> sense.
>
> >Ignoring RAID-0, since RAID-any systems also stripe,
>
> RAID-1 doesn't.

True, if a nit pick, since the stripe is a mirror.

>
> >the problem is
> >that such files do get spread across an unknown number of disks.
>
> With typical block sizes, a 4KB block is not distributed across
> multiple disks, even with RAID-0.

It would appear on at least 3 disks in most modern systems using large
multi TB disks with adequate protection like RAID-6. Once as a data
block, and twice for its contribution to parity. It's at least 2 on
RAID-5 or RAID-1/10; it may be many more on systems that employ
erasure encoding schemes. Without meta data (see below), it's not
possible to tell which disks to fire up to cover the blocks in
question; and the meta data is on the disks, normally well distributed
over them to increase opportunities for parallelism.

>
> >They
> >all need fired up to find even the smallest file, since it's not just
> >the file, but the meta data that needs accessed too.
>
> Meta data is often in OS caches, at least on decent OSs.

In shared system environments, caches can and do contain stale
information; coherency is a big issue, and high end clusters (both
storage and server types) spend a lot of expensive compute and wire
time (and presumably power) making sure that they are consistent.
Plus, infrequently used data should be flushed, along with its meta
data; if you don't need the former, you're unlikely to need the latter
any time soon.

>
> But yes, I agree that spin-down is not practical for main storage; but
> from what I read, the idea of COPAN was to make it practical by
> rearranging data such that frequently-accessed data resides on a few
> drives.

At last! Agreement! Yes, that was the very thing they failed to
accomplish.

Bernd Paysan

unread,

Aug 9, 2012, 1:50:38 PM8/9/12

to

Alex McDonald wrote:
>> >Rated start/stop cycles; 250 average on/off cycles per year at the
>> >expected population AFR of 0.55% (Seagate Cheetah 15.7, enterprise
>> >class drive).
>>
>> What does AFR have to do with the horror stories about corrupted
>> data?
>
> AFR includes corrupted data.

I can believe that a 15k RPM drive which takes minutes to stabilize will
have start-stop problems, and will even have data corruptions in
operation by vibrations causing to write over the next track. This
simply means that these drives are not built for reliability, but for
speed. We are talking about backup here. If you think the Cheetah 15.7
is the right drive to backup your data, you are simply wrong - you need
an elephant for backups, not a cheetah. I think you are simply wrong by
buying the Cheetah at all (no matter what metric), and not an SSD for
the same price per gigabyte - if you need the performance, the SSD will
beat the Cheetah hands down.

chitselb

unread,

Aug 9, 2012, 3:20:15 PM8/9/12

to

On Thursday, August 9, 2012 12:07:05 PM UTC-4, Paul Rubin wrote:

Yes, I have used two cassettes on the PET. Circa 1981 I wrote a program in Microsoft BASIC to catalog my record (now we say "vinyls") collection. There was a screen for each album with artist, title, and a track listing. I had a few hundred albums which exceeded my 32K of RAM, so I resorted to datafiles on tape, and I used two cassette drives (not unlike the "Plan 1" from the original post on this thread). Yes, it was slow.

Scanning through even a 30-minute tape to find the 10th program was slow too, so the first program on every cassette was a short menu program that would show a listing of the programs on the rest of the tape. After selecting, say, #7, the user would be prompted to press fast-forward, then the tape motor would stop after 1517 jiffies (25.28 seconds). The user would then press stop, play, and the tape would be cued up in the right position to load the 7th program.

The reason I was going for blocks is two-fold:
1) I want to make PETTIL (anachronistically) Forth-83 compliant. BLOCK is in the required wordset
2) Forth is (to me) both a high-level and a bare-metal language that gives the programmer maximum control over low-resource hardware. 16-bit words are ideal within a 64K address space. It has a tradition and a feel, and BLOCK-based virtual memory is a part of that.

A primary goal is to leverage and present as much of the power of the PET as possible to the programmer. Particularly, I want to mine the ROMs for value. Floating point math would use the BASIC floating point routines. The screen editor will be the native 25x40 screen editor already familiar to the PET BASIC programmer, with the RVS/OFF as a meta key to invoke editor commands. The PET 2001 and 2001-N don't have a "Control" or "Alt" button, so I have to do something else to let the programmer escape out of the editor and replace control-meta-cokebottle type keysequences.

Another of my goals is to have PETTIL eventually become self-compiling, with the entire language written in itself. The Klein-bottle nature of Forth was a huge part of its initial appeal to me when I discovered it in R.G. Loeliger's "Threaded Interpretive Languages" book in 1982. I hadn't considered how this could be done in native source files, but I suppose it would work.

The way the design is evolving, I'll add a set of words to use the Kernal routines for OPEN, CLOSE, CMDIN and CMDOUT etc... using Ballantyne's excellent Blazin' Forth as a model, implement sequential files on top of that, and that should be good enough. To get all the way to FORTH-83 compliance I would just implement BLOCK, FLUSH, etc... on top of that scheme.

Alex McDonald

unread,

Aug 9, 2012, 3:32:42 PM8/9/12

to

On Aug 9, 6:50 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> Alex McDonald wrote:
> >> >Rated start/stop cycles; 250 average on/off cycles per year at the
> >> >expected population AFR of 0.55% (Seagate Cheetah 15.7, enterprise
> >> >class drive).
>
> >> What does AFR have to do with the horror stories about corrupted
> >> data?
>
> > AFR includes corrupted data.
>
> I can believe that a 15k RPM drive which takes minutes to stabilize

It takes a lot longer to get up to a stable spin speed, yes.

> will
> have start-stop problems, and will even have data corruptions in
> operation by vibrations causing to write over the next track.

Politely put, I'd say you were guessing, and that's not what I said.

> This
> simply means that these drives are not built for reliability, but for
> speed.

Because of the guesswork in the previous sentence, no doubt.

> We are talking about backup here. If you think the Cheetah 15.7
> is the right drive to backup your data,

I don't think and didn't say any such thing. This was in the context
of spin down and the subsequent reliability, availability and so on vs
power savings that could be achieved as a "main storage" system. See
my reply to Anton.

> you are simply wrong - you need

Well, there's a surprise. Strawman up...

> an elephant for backups, not a cheetah. I think you are simply wrong by
> buying the Cheetah at all (no matter what metric), and not an SSD for
> the same price per gigabyte - if you need the performance, the SSD will
> beat the Cheetah hands down.

...and knocked down.

Look, if you're happy with backups to large TB desktop class drives
and can afford the time and effort to do it several times to avoid the
lottery that are unrecoverable disk errors, good on you. I'll withdraw
my "best of luck" comment and reserve it for the companies that take
your approach but go down the pan while footering around looking for
an end to end accurate & readable copy to do a restore.

Bernd Paysan

unread,

Aug 9, 2012, 4:07:11 PM8/9/12

to

Alex McDonald wrote:
> Look, if you're happy with backups to large TB desktop class drives
> and can afford the time and effort to do it several times to avoid the
> lottery that are unrecoverable disk errors, good on you. I'll withdraw
> my "best of luck" comment and reserve it for the companies that take
> your approach but go down the pan while footering around looking for
> an end to end accurate & readable copy to do a restore.

Honestly, I don't understand what you mean. No media is completely 100%
reliable and error-proof. When I did tape backups, I had them stored
off-site, and I carried them to the off-site storage by bike. So
there's always the risk of a bus driving over the tape or the hard disk
(this is regardless of how you transport them). In either case, the
medium is gone, they will not survive. So whatever you do, you must
make sure that this is not the only backup you have.

And the hard disk is not a tape. If you have really bad luck, and you
end up in a situation where both hard disks you made the backup on have
non-recoverable read errors on several blocks, you just mount them
RAID-1, and read the RAID volume. The RAID controller (or the software
that mimics a RAID controller) will do all the work for your. The RAID
controller also does the work for you to create duplicated backups,
almost effort-less.

The only medium I bought in my five year IT-side-job carreer that was
damaged beyond recovery was a LTO tape. The LTO drive ate it. We got a
replacement for the drive on warranty, but the tape was completely
destroyed. This wasn't a problem, as said above - the tape was just one
part of the redundant backup strategy, and it was destroyed while
writing. The other medium that did fail wasn't bought by me, and it was
an expensive SAS drive - and this left the server without spares,
because due to the high price, the IT department didn't have hot spares,
and due to incompetence, they weren't informed about the problem. I
just saw the red light blinking on their server in the server room. The
cheap desktop harddisks I bought and I intented to replace after two
years with newer, higher capacity ones, lasted all five years, because
the bosses didn't understand why you should replace things which work
perfectly fine ;-).

Alex McDonald

unread,

Aug 9, 2012, 4:30:02 PM8/9/12

to

On Aug 9, 6:21 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> Alex McDonald wrote:
> > On Aug 9, 2:26 am, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> >> Alex McDonald wrote:
> >> > True. But you're quoting bog standard desktop drives at that price,
> >> > which I sincerely hope you aren't using in your backup servers.
>
> >> Of course I do. Backup is redundancy, not expensive disks, the
> >> likelyhood that a bog standard desktop drive fails is not that much
> >> different from a snake-oil expensive SAS drive - the fundamental
> >> construction of both drives are the same.
>
> > Yet they have different specs. Why?
>
> To get the money of idiots believing that expensive is better. And
> because the SAS controller is low volume, while the SATA controller is
> high volume. SAS drives have faster spindle speeds and shorter access
> times, which is completely useless in this case: All we want is
> reasonable speed to fill the backup disk with files.
>
> And BTW: You can buy disks with similar specs both for SATA and SAS,
> they can have similar pricing, though.

There are enterprise class SATA drives too, btw, something that may
not be apparent from a casual inspection of a disk mfrs website. Both
SATA and SAS enterprise drives have a much lower bit error rate (a
factor of 10), a lower AFR and higher MTBF than the corresponding
desktop variety. The firmware is not the same either, since the
assumption is made that the drives are part of a RAID group, and the
hardware or software upstream can handle the errors. Desktop drives go
to extraordinary lengths to read data to the point of "spasm" for what
might be several minutes, since they assume that this is your only
copy. They re-allocate bad blocks out of line, requiring a hidden
seek. That's not desirable in a system that needs to perform as though
these issues don't exist. Enterprise drives need to tolerate much
higher levels of vibration, since they are mounted cheek by jowl in
dense arrays where vibration can be a significant factor. They have
multiple servo wedges (track markers monitored by the read heads) to
provide accurate tracking & feedback through the servo system; desktop
drives may have one or even none, and track entirely based on data
written.

And so on. They are not the same. The price is not hugely different
from a desktop drive; around twice the price.

Alex McDonald

unread,

Aug 9, 2012, 4:58:34 PM8/9/12

to

Humans are notoriously bad at assessing risk; crossing the road vs
flying will produce all sorts of negative responses for flying when
it's demonstrably safer. Very few IT specialists understand risk
assessment either; identify, estimate, evaluate, mitigate,
communicate, measure. Even I have sometimes forgotten this, and I was
recently undone by an unprofessional approach to my personal and my
company's data. The last disk drive I bought as a backup & archive
failed after a month. It was a desktop class MLC SSD. I will not be
repeating the experiment.

Bernd Paysan

unread,

Aug 9, 2012, 7:27:21 PM8/9/12

to

Alex McDonald wrote:
> There are enterprise class SATA drives too, btw, something that may
> not be apparent from a casual inspection of a disk mfrs website. Both
> SATA and SAS enterprise drives have a much lower bit error rate (a
> factor of 10), a lower AFR and higher MTBF than the corresponding
> desktop variety.

Actually, most of the desktop varieties have bit error rate, AFR and
MTBF unspecified, and the enterprice class disks have them specified.
This is IMHO, because they are actually identical, maybe except firmware
issues, as below. What is certainly possible is that the desktop
varieties haven't been tested and contain severe bugs.

> The firmware is not the same either, since the
> assumption is made that the drives are part of a RAID group, and the
> hardware or software upstream can handle the errors. Desktop drives go
> to extraordinary lengths to read data to the point of "spasm" for what
> might be several minutes, since they assume that this is your only
> copy.

Yes, and the RAID controller assumes that when exceeding a certain
timeout, the disk is due to replacement. Which is true. I'm completely
unconvinced by what e.g Western Digital says about this topic: If your
drive in a RAID array has problems reading data, replace it *now*.
Crash early, as we Forthers say. This long spasm is the right reaction
in both environments: The desktop user gets his precious data back, and
the RAID controller throws the bad disk out. Which he should.

IMHO, they got the complete protocol wrong, this shouldn't be hidden.
The correct way to deal with these problems should be:

Say "Oops, read error" when you encounter a read error. The host then
can respond with "retry", "retry harder", and "attempt to repair", if it
feels like it. For a RAID system, a read error is no problem, there is
enough redundancy to deliver the data, anyways. It's much more
important that you say "Oops" quickly. Even on a mirrored system where
you only access one disk for one request to improve throughput (they can
serve twice as many read requests in that mode), retrying on the other
trive is faster than the thorrough "retry harder".

We had that 20 years ago with floppy disk drivers, and the
"Abort/Ignore/Retry" message from DOS. The wrong thing was to present
this message to the user; internally, the protocol was perfectly ok.
Say something when you don't feel ok, say it quickly.

I've a similar thing in my net2o protocol. TCP tries to retransmit
packets which have been dropped. net2o tries to re-request packets
which didn't arrive. This turns the situation around, the client is
responsible for correct transmission, not the server. Which allows the
client to use more intelligent strategies - e.g. when copying identical
files from several peers, you can ask any of them to transmit that lost
block. Or when you stream real-time low-latency audio data, just
interpolate the lost block. Assuming that you have to deliver 100%
quality of service all the time can be wrong. Deliver what you can, and
say when you can't. If that's not acceptable, the other side will
complain, and then you can try harder.

> And so on. They are not the same. The price is not hugely different
> from a desktop drive; around twice the price.

Yes, I know, and I'm quite convinced that this is not worth it, and that
this comes from ill-percieved risk assessment. Or ill-perceived ways to
save costs or something - it *is* cheaper to remove the vibrations of
hard-disks than to make vibration-resistant ones, which probably are
more vibration-resistant on paper than in reality.

Always remember: For twice the price, you can get twice the cheap disks.
Usually, you don't need that many to reduce the risk to the same level
you paid twice the price for. Or put differently: Flying business class
is no more secure than flying economy class. But when paying a higher
price makes you feel better, you should fly business class.

Paul Rubin

unread,

Aug 9, 2012, 8:36:29 PM8/9/12

to

Alex McDonald <bl...@rivadpm.com> writes:
> Look, if you're happy with backups to large TB desktop class drives
> and can afford the time and effort to do it several times to avoid the
> lottery that are unrecoverable disk errors, good on you. I'll withdraw
> my "best of luck" comment and reserve it for the companies that take
> your approach but go down the pan while footering around looking for
> an end to end accurate & readable copy to do a restore.

I don't understand what the big deal is.

1) If your data is valuable, you need multiple backups in physically
dispersed locations in case of earthquake, meteor, etc. regardless.

2) The issue of disk errors is handled by a) redundancy within the
backup set (RAID and maybe some ECC applied within the dump streams),
plus storing checksums in the metadata and doing a verification pass
after writing the data. This is surely more cost effective than using
drives that are 2x as expensive so you can get by with a few percent
less redundancy.

Alex McDonald

unread,

Aug 10, 2012, 7:13:44 AM8/10/12

to

On Aug 10, 1:36 am, Paul Rubin <no.em...@nospam.invalid> wrote:

We've been over a lot of ground (probably OT for CLF, but even so more
interesting than Gavino on-topic).

I haven't advocated "2x more expensive drives" because I'm paid a
penny on every sale. There was also some discussion about the
bandwidth of shipping data that got lost in airline timetables and the
quality of coffee but I haven't suggested that the airlines should
drop their prices or that datacenters should be near sources of fine
Arabica beans either (well, perhaps I did tongue in cheek to Anton).

All I'm advocating is a robust backup (and I provided some information
to explain what can mitigate the issues of data corruption or loss),
and disk dumps to large multi TB destktop drives is a no-no in my
book. The rest fell out of that discussion.

Anton Ertl

unread,

Aug 10, 2012, 11:57:04 AM8/10/12

to

Alex McDonald <bl...@rivadpm.com> writes:
>On Aug 9, 2:44=A0pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)

>wrote:
>> Alex McDonald <b...@rivadpm.com> writes:

>> >On Aug 9, 7:00=3DA0am, an...@mips.complang.tuwien.ac.at (Anton Ertl)
>> >wrote:

>> >> Sounds like you swallowed some horror stories some people like to

>> >> spin. =3DA0Why should spin down exacerbate these problems?

>>
>> >Several reasons.
>>
>> >Rated start/stop cycles; 250 average on/off cycles per year at the
>> >expected population AFR of 0.55% (Seagate Cheetah 15.7, enterprise
>> >class drive).
>>
>> What does AFR have to do with the horror stories about corrupted data?
>
>AFR includes corrupted data.

It includes other failure modes, so this says nothing about spin-down
exacerbating disk corruption.

>> And anyone who uses "enterprise class" drives for backup has too much
>> money.
>
>Why? Since many operations value data integrity greater than the cost,
>this is an economic argument, not one of wealth causing stupidity.

In backup and also in RAIDs, we increase safety/reliability through
redundancy. For a given amount of money, we get more
safety/reliability by using more cheap drives instead of fewer
expensive drives.

>And AFR includes corrupted data. I'm mystified; where did I say that
>corrupted data was the only issue?

You spun horror stories about data corruption as if it was the main
issue. In my experience it's a minor issue.

>> >Drives vary; SATA drives at 5k RPM spin up faster than high RPM SAS
>> >drives at 15K, which may take minutes to stabilize at operating speed.
>> >During that time, the disk isn't usable, and I stand by my assertion
>> >that spin up wastes as much power as several minutes of full
>> >operation.
>>
>> Sure, if a drive takes several minutes to spin up, it will consume as
>> much power as several minutes of full operation.
>>
>> But who in his right mind uses an expensive and power-hungry high-RPM
>> drive that takes forever to spin up for a storage solution that

>> requires low power and fast spin-up? =A0Ok, a sales guy selling to a

>> clueless and rich customer will do it, but not because of technical
>> merit.
>
>I was giving an example of slow spin up to counterpoint the "10
>seconds and you're good to go" example you gave.

It's an irrelevant example, because nobody in his right mind will use
such drives for such a design. The 10s example is an ordinary 7200rpm
drive. If somebody wanted to use special drives for a spin-down
system and spin-up time is of any relevance, they will choose drives
that spin up at least as fast as the one I measured.

>To spin up a RAID group of say 14 drives on a shelf of disks will
>require that the drives are turned on serially in small groups.

No. If the hardware cannot spin them up at the same time, one will
not choose such a large RAID group. Conversely, if RAID groups of 14
disks are desired, the hardware should be designed so that the group
can be spun up at the same time. For a system that contains 480 disk
drives, dimensioning the power supply such that it can spin up 14
drives at once should be no problem.

>> >I don't know where you got the idea that 480 tape drives was the
>> >equivalent to 480 disk drives, but it's not an assertion I made and
>> >certainly qualifies as insane.
>>
>> You claimed that lots of disks had to be spun up for bandwidth
>>
>> reasons, and you wrote:
>>
>> |It's the economics of competing with tape; big power supplies to
>> |support 480 disks packed in a single rack cost lots of money.
>>
>> which suggest that you think that a backup solution needs 480 disks
>> spun up for bandwidth reasons.
>
>No, that was the COPAN solution. (IIRC it was the smallest COPAN
>system you could buy.)

And you claimed that their solution was insufficient because they
could only spin 25% of the disks, and that that was insufficient
because it limits the bandwidth too much.

>> It's nonsense, because we are backing up to disks with a total of 10s
>> of TB, and it's workable, and if we wanted to back up to more disks,

>> we would just use more disks. =A0And the main bandwidth limit is, as you

>> write, getting the data off the main storage.
>
>That was my point. If you want off-server backup, then the bandwidth
>off the server is the issue.

For our servers, the bandwidth off the server disks is the limit most
of the time, because there is a lot of seeking during backups, and
also, the data is already compressed when it goes off the server.

Mat

unread,

Aug 10, 2012, 4:41:09 PM8/10/12

to

Hello,
I don't understand why you burden your forth system (and possible
users) with block accessing on cassette storage. In my opinion it
would be both simplier and better to stream the whole memory back to
cassette at demand as load and save times would be slow anyway.

Implementing threading dispatch for 6502 class cpu's is a bad idea in
my opinion. It would be better to implement a simple native-code
compiler for these processor type.

But please fiish your project and show me I'm wrong with these two
cents.

chitselb schrieb:
> I'm working on a retro computing project, a 6502 Forth implementation for the Commodore PET 2001. https://github.com/chitselb/pettil if you're curious. The goal is for the language to be fast, tight, and capable of running on the actual hardware. For development I'm using the viceteam.org PET emulator with the xa65 cross-assembler, on Linux.
>
> Since most of us back then (1980) didn't have disk drives, I am going to use the cassette tape for mass storage. These are a few ways I'm considering:
>
> 1) Simulate random access using two cassette decks and copy/merge
>
> The PET cassette had two file types, sequential(data) and program.
> a) For program files, there's a long tone followed by a short header block containing the filename, and then a shorter tone followed by one continuous block of memory (two byte load address followed by the data)
> b) for data files, there's the same long tone/file name header, followed by zero or more short tone/192-byte data blocks
>
> On the PET (not the VIC-20 or C=64) there were two datassette ports, and I have two drives. Using the sequential file format and both decks, FLUSH would copy the entire virtual memory from one tape to the other in 1024-byte blocks (preceded by a 16-bit unsigned block number), inserting and replacing blocks from the memory buffers. Then rewind both tapes and go the other way. Slow, tedious, cumbersome. Welcome to my world in 1980.
>
> 2) Historically accurate
>
> Some Forth implementations from back then implemented tape storage. I have been unable to locate one for the PET but yesterday I found tape images for Datatronic Forth on the C=64 and another thing called "C=64 Forth". Both of these appear to implement some type of mass storage on tape.
>
> I'd be very interested to know what other Forth implementations of that era did as far as tape storage. What Forth words, what did they do, etc...
>
> 3) Save source code as sequential files
>
> Using native named files instead of blocks. Not very Forth-like, but possibly the most expedient.
>
> I'm very grateful for the help of this community with my earlier design considerations (circa 2010) on this project, particularly the hashed dictionary and the incredibly fast inner interpreter. Check the project link above if you're curious to see how those parts turned out.
>
> Charlie

Coos Haak

unread,

Aug 10, 2012, 5:54:32 PM8/10/12

to

Op Fri, 10 Aug 2012 13:41:09 -0700 (PDT) schreef Mat:

> Hello,
> I don't understand why you burden your forth system (and possible
> users) with block accessing on cassette storage. In my opinion it
> would be both simplier and better to stream the whole memory back to
> cassette at demand as load and save times would be slow anyway.
>
> Implementing threading dispatch for 6502 class cpu's is a bad idea in
> my opinion. It would be better to implement a simple native-code
> compiler for these processor type.
>
> But please fiish your project and show me I'm wrong with these two
> cents.
>

What if you have 100 blocks (100 KB) data, how would you load that many in
one goto into the memory of a 8 bit computer?

--
Coos

CHForth, 16 bit DOS applications
http://home.hccnet.nl/j.j.haak/forth.html

dam...@web.de

unread,

Aug 10, 2012, 6:41:45 PM8/10/12

to

Am Freitag, 10. August 2012 23:54:32 UTC+2 schrieb Coos Haak:
> > What if you have 100 blocks (100 KB) data, how would you load that many in
>
> one goto into the memory of a 8 bit computer?

Typical PET's had 8 KB ram ! How would you processing 100 KB of data with such a platform ? You would process these data in chunks of some KB. This would also be possible with a forth system streaming it's state back to tape, because that do not pretend words managing to load and save processed data at demand. Instead of block accessing without motor control, resulting in at-hand position adjustments for each data block regardless of its use you would gain freedom to format a tape well suited for a specific task so all needed for processing would be to press the play and stop keys.

Coos Haak

unread,

Aug 10, 2012, 7:47:31 PM8/10/12

to

Op Fri, 10 Aug 2012 15:41:45 -0700 (PDT) schreef dam...@web.de:

Of course, that's what I meant. Blocks are neat for this sort of work. I've
used a casette drive with C90 casettes in 1981, but not for long. My ZX
Spectrum had two microdrives that I could control from within my own Forth.
Much fast and simpler than pressing knobs with the previous meant computer.

Andrew Haley

unread,

Aug 11, 2012, 4:50:33 AM8/11/12

to

Mat <dam...@web.de> wrote:

> Implementing threading dispatch for 6502 class cpu's is a bad idea in
> my opinion. It would be better to implement a simple native-code
> compiler for these processor type.

I'm sure you're right about speed, but there isn't much memory, and
for that reason all the language implementations I came across at the
time used some some sort of interpretation, whether Forth or Pascal.
6502 Pascal was even more compact than Forth, using a very tight
bytecode. Maybe a JSR-threaded Forth would be OK, but that's still a
considerable code expansion.

Andrew.

Anton Ertl

unread,

Aug 11, 2012, 5:03:15 AM8/11/12

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>Mat <dam...@web.de> wrote:
>
>> Implementing threading dispatch for 6502 class cpu's is a bad idea in
>> my opinion. It would be better to implement a simple native-code
>> compiler for these processor type.
>
>I'm sure you're right about speed,

A JSR-RTS pair is 12 cycles, and the OP mentioned 17 cycles for his
NEXT, so yes, subroutine threading would be a little faster.

> but there isn't much memory, and
>for that reason all the language implementations I came across at the
>time used some some sort of interpretation, whether Forth or Pascal.
>6502 Pascal was even more compact than Forth, using a very tight
>bytecode. Maybe a JSR-threaded Forth would be OK, but that's still a
>considerable code expansion.

Going in the other direction, if the target machine has only 8KB,
there probably won't be more than 256 words anyway, so one could
represent words with bytes in interpreted code. NEXT would probably
be a little slower, though.

Andrew Haley

unread,

Aug 11, 2012, 5:08:04 PM8/11/12

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>Mat <dam...@web.de> wrote:
>>
>>> Implementing threading dispatch for 6502 class cpu's is a bad idea in
>>> my opinion. It would be better to implement a simple native-code
>>> compiler for these processor type.
>>
>>I'm sure you're right about speed,
>
> A JSR-RTS pair is 12 cycles, and the OP mentioned 17 cycles for his
> NEXT, so yes, subroutine threading would be a little faster.

Right, but it's a bit better than that would suggest because enter and
exit are pretty fast too. A problem with JSR threading is that it
would make multi-tasking very messy because the return stack has to
live on Page 1; that might not matter to some but would spoil it for
me. (6502 fig-FORTH had a similar problem because the data stack had
to live in Page 0.)

>>but there isn't much memory, and for that reason all the language
>>implementations I came across at the time used some some sort of
>>interpretation, whether Forth or Pascal. 6502 Pascal was even more
>>compact than Forth, using a very tight bytecode. Maybe a
>>JSR-threaded Forth would be OK, but that's still a considerable code
>>expansion.
>
> Going in the other direction, if the target machine has only 8KB,
> there probably won't be more than 256 words anyway, so one could
> represent words with bytes in interpreted code.

I'm a bit baffled by all this "8kbyte PET" talk. I don't think I ever
saw one with only 8k.

Andrew.

Paul Rubin

unread,

Aug 11, 2012, 11:27:35 PM8/11/12

to

Alex McDonald <bl...@rivadpm.com> writes:
> All I'm advocating is a robust backup (and I provided some information
> to explain what can mitigate the issues of data corruption or loss),
> and disk dumps to large multi TB destktop drives is a no-no in my
> book. The rest fell out of that discussion.

OK, I'm just missing the part about what's wrong with desktop drives
compared with enterprise drives. You listed a number of issues but it
seems to me that all of them can be handled by software. When 100's or
1000's of drives are involved, a 2x cost difference per drive adds up to
a lot of cash, so it has to be justified rather rigorously.

Elizabeth D. Rather

unread,

Aug 12, 2012, 3:51:29 AM8/12/12

to

On 8/10/12 10:41 AM, Mat wrote:
> Hello,
> I don't understand why you burden your forth system (and possible
> users) with block accessing on cassette storage. In my opinion it
> would be both simplier and better to stream the whole memory back to
> cassette at demand as load and save times would be slow anyway.
>
> Implementing threading dispatch for 6502 class cpu's is a bad idea in
> my opinion. It would be better to implement a simple native-code
> compiler for these processor type.
>
> But please fiish your project and show me I'm wrong with these two
> cents.

A memory transfer is appropriate for saving a program image, but I
believe the OP wanted to use tape for source and data, as well. Native
Forths have traditionally organized mass storage as 1024-char blocks
(source blocks were formatted as 16 lines of 64 chars for display and
editing). Managing such blocks as records on a tape can be made to work,
although of course it's slow.

One of the very early (1970-72) Forth systems at NRAO had only tape for
mass storage, and another had a 64 Kb drum plus tape. These systems kept
the program image in the first (longish) record on the tape, for booting
purposes, and 1024-byte records following for source and data. The
system maintained an index of blocks on tape (beyond the program image)
for pseudo-random access. It was better than nothing :-)

We were delighted to get our new PDP-11 in 1973, with a 1.25 Mb
removable disk.

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

Alex McDonald

unread,

Aug 13, 2012, 8:23:21 AM8/13/12

to

On Aug 10, 4:57 pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)

The power supplies are relatively small, and serve a disk shelf that
is (commonly) organised in groups that can be fitted in a 19inch rack
system. 14 drives or more is a common number, although very dense 48
drive systems are also available. In total, the loading on a rack of
such shelves (which may be in the mid hundreds to more of drives)
cannot exceed certain limits in terms of amperage due to the heat
generated; 15KW or more heat from a rack is difficult to dissipate.
Scaling power supplies on a shelf to support 14 drives power-up
simultaneously means that most of the time supplies are operating at
low loads, which is where power supplies are very inefficient; running
them near their maximum rating is preferable, when conversion rates
can be 90% or better.

RAID group size is (relatively) small for RAID-5 type schemes;
normally no more than 6+1 parity or so. Dual parity schemes may employ
12+2 up to around 16+2. Much higher than these limits, and the RAID
rebuild times become prohibitively expensive and riskier due to
failures during rebuild; much lower, and the total space efficiency
and performance due to loss of parallelism is compromised.

>
>
>
>
>
>
>
>
>
> >> >I don't know where you got the idea that 480 tape drives was the
> >> >equivalent to 480 disk drives, but it's not an assertion I made and
> >> >certainly qualifies as insane.
>
> >> You claimed that lots of disks had to be spun up for bandwidth
>
> >> reasons, and you wrote:
>
> >> |It's the economics of competing with tape; big power supplies to
> >> |support 480 disks packed in a single rack cost lots of money.
>
> >> which suggest that you think that a backup solution needs 480 disks
> >> spun up for bandwidth reasons.
>
> >No, that was the COPAN solution. (IIRC it was the smallest COPAN
> >system you could buy.)
>
> And you claimed that their solution was insufficient because they
> could only spin 25% of the disks, and that that was insufficient
> because it limits the bandwidth too much.

It may do. Bandwidth is a problem during massively parallel backups
and due to the design of the shelves. Many systems employ a bus into
which the disks are plugged; disks are addressed via two or more fibre
channel arbitrated loops or a multi-path SAS arrangement (even for
SATA disks). Getting parallelism on such a system requires many
shelves to be active, and the RAID groups are sometimes split across
them, since a single shelf doesn't have max-bandwidth = (disk
bandwidth * number of disks). Hence why 25% of the shelves powered on
in the COPAN system limited bandwidth. (Some system employ "active"
servers supporting the 14+ disks that make up a shelf and can drive
higher sequential (but not random) bandwidth rates, but they are very
power hungry indeed.)

>
> >> It's nonsense, because we are backing up to disks with a total of 10s
> >> of TB, and it's workable, and if we wanted to back up to more disks,
> >> we would just use more disks. =A0And the main bandwidth limit is, as you
> >> write, getting the data off the main storage.
>
> >That was my point. If you want off-server backup, then the bandwidth
> >off the server is the issue.
>
> For our servers, the bandwidth off the server disks is the limit most
> of the time, because there is a lot of seeking during backups, and
> also, the data is already compressed when it goes off the server.

A smart backup program can reduce seeks by sorting, say, a snapshot of
th disk, to reduce the seeks and read blocks serially. Enterprise
class disks often support "skip read" semantics that can reduce the
requirement to seek when reading data from a single track. The order
in which the blocks are read & sent is immaterial to the construction
of a backup on the target.

Anton Ertl

unread,

Aug 15, 2012, 11:13:36 AM8/15/12

to

Alex McDonald <bl...@rivadpm.com> writes:
>On Aug 10, 4:57=A0pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)

>wrote:
>> Alex McDonald <b...@rivadpm.com> writes:

>> >On Aug 9, 2:44=3DA0pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)

>> >wrote:
>> >> Alex McDonald <b...@rivadpm.com> writes:

>> >> >On Aug 9, 7:00=3D3DA0am, an...@mips.complang.tuwien.ac.at (Anton Ertl=
>)
>> >> >wrote:

>> >To spin up a RAID group of say 14 drives on a shelf of disks will
>> >require that the drives are turned on serially in small groups.
>>

>> No. =A0If the hardware cannot spin them up at the same time, one will
>> not choose such a large RAID group. =A0Conversely, if RAID groups of 14

>> disks are desired, the hardware should be designed so that the group

>> can be spun up at the same time. =A0For a system that contains 480 disk

>> drives, dimensioning the power supply such that it can spin up 14
>> drives at once should be no problem.
>
>The power supplies are relatively small, and serve a disk shelf that
>is (commonly) organised in groups that can be fitted in a 19inch rack
>system. 14 drives or more is a common number, although very dense 48
>drive systems are also available. In total, the loading on a rack of
>such shelves (which may be in the mid hundreds to more of drives)
>cannot exceed certain limits in terms of amperage due to the heat
>generated; 15KW or more heat from a rack is difficult to dissipate.

If I have several power supplies, each powering a bunch of drives, I
would distribute the RAID group across these bunches, ideally one
drive per bunch. A nice side benefit is that the system can now
survive a power supply failure without needing any additional power
supply redundancy (not sure if the following rebuilding of lots of
RAID groups on power supply failure is practical, though, but if it
isn't, then we'll just have to bite the bullet and provide power
supply redundancy after all). So, to spin up a whole RAID group at
the same time, each power supply only needs to be able to support
spinning up one drive.

15KW would allow spinning up 480 drives at the same time (and would
also be necessary to let 480 LTO-5 tape drives work at the same time).

>Scaling power supplies on a shelf to support 14 drives power-up
>simultaneously means that most of the time supplies are operating at
>low loads, which is where power supplies are very inefficient; running
>them near their maximum rating is preferable, when conversion rates
>can be 90% or better.

Typical power supplies are relatively efficient across a pretty wide
range, and the highest efficiency is not at maximum load. E.g.,
looking at
http://www.anandtech.com/show/6013/350450w-roundup-11-cheap-psus/3,
i.e., even looking at a cheap power supply, there is relatively little
efficiency variation between 20% and 110% load, and the highest
efficiency is at 50% load. I also looked at the next one in the test
(FSP OEM 400W) and find the same pattern there.

>> For our servers, the bandwidth off the server disks is the limit most
>> of the time, because there is a lot of seeking during backups, and
>> also, the data is already compressed when it goes off the server.
>
>A smart backup program can reduce seeks by sorting, say, a snapshot of
>th disk, to reduce the seeks and read blocks serially. Enterprise
>class disks often support "skip read" semantics that can reduce the
>requirement to seek when reading data from a single track.

Any commodity drive I or my students have measured in the last 15
years or so has cached the data of several tracks for reading (I guess
but have not confirmed that in particular they cache data that they
read while waiting for the disk to rotate to the target sector, but if
there was no request right afterward, probably also the rest of the
track), and that's why some OS-side optimizations we (and others) did
were not as effective as I expected: the drives already did part of
them for us. Anyway, I had not heard that this is a marketing feature
for enterprise drives, and Google has not heard about "skip read
semantics", either.

Concerning our backup program, we are just using tar instead of a
smart one. It's good enough for our needs, but it does not optimize
disk reads the way you suggest (it would need to more about file
systems than I find comfortable to do that), so there's quite a bit of
waiting for disk seeks despite drive caches.

Alex McDonald

unread,

Aug 15, 2012, 2:57:54 PM8/15/12

to

On Aug 15, 4:13 pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)

> If I have several power, each powering a bunch of drives, I

> would distribute the RAID group across these bunches, ideally one
> drive per bunch.

As a practice, that leads to issues with failure modes at the shelf
level. For instance, a failure of a single shelf with something as
simple as a tripped power supply, where the disks in that shelf
contribute to (say) 10 RAID groups may cause 10 simultaneous RAID
rebuilds requiring the involvement of several hundred drives. I have
seen this happen on an HP EVA, where their vdisk RAID supports such a
scheme (although it was not recommended); the resulting mess is not
pretty. It also requires a very large number of spare drives for such
a rebuild.

> A nice side benefit is that the system can now
> survive a power supply failure without needing any additional power
> supply redundancy (not sure if the following rebuilding of lots of
> RAID groups on power supply failure is practical, though, but if it
> isn't, then we'll just have to bite the bullet and provide power
> supply redundancy after all). So, to spin up a whole RAID group at
> the same time, each power supply only needs to be able to support
> spinning up one drive.
>
> 15KW would allow spinning up 480 drives at the same time (and would
> also be necessary to let 480 LTO-5 tape drives work at the same time).
>
> >Scaling power supplies on a shelf to support 14 drives power-up
> >simultaneously means that most of the time supplies are operating at
> >low loads, which is where power supplies are very inefficient; running
> >them near their maximum rating is preferable, when conversion rates
> >can be 90% or better.
>
> Typical power supplies are relatively efficient across a pretty wide
> range, and the highest efficiency is not at maximum load. E.g.,

> looking athttp://www.anandtech.com/show/6013/350450w-roundup-11-cheap-psus/3,

> i.e., even looking at a cheap power supply, there is relatively little
> efficiency variation between 20% and 110% load, and the highest
> efficiency is at 50% load. I also looked at the next one in the test
> (FSP OEM 400W) and find the same pattern there.

Here's an analysis of power efficiencies in a data center you might
find interesting. It confirms your measurements.
http://www.thegreengrid.org/~/media/WhitePapers/White_Paper_16_-_Quantitative_Efficiency_Analysis_30DEC08.pdf?lang=en.

Take a 14 disk system with 2 power supplies. For one power supply to
support all 14 drives plus spinning up 1 requires around 17 disks
worth of power (approx; e.g. 20W spin-up as opposed to 7W in use for a
single drive) from a single PSU running at 100%. Two running in steady
state has 1 supporting 7 drives, and the PSUs are now running at
approx 45% load or less. For a single power supply to support 14 spin-
ups simultaneously would have a pair of PSUs running nearer the 20%
mark, where they are less efficient.

>
> >> For our servers, the bandwidth off the server disks is the limit most
> >> of the time, because there is a lot of seeking during backups, and
> >> also, the data is already compressed when it goes off the server.
>
> >A smart backup program can reduce seeks by sorting, say, a snapshot of
> >th disk, to reduce the seeks and read blocks serially. Enterprise
> >class disks often support "skip read" semantics that can reduce the
> >requirement to seek when reading data from a single track.
>
> Any commodity drive I or my students have measured in the last 15
> years or so has cached the data of several tracks for reading (I guess
> but have not confirmed that in particular they cache data that they
> read while waiting for the disk to rotate to the target sector, but if
> there was no request right afterward, probably also the rest of the
> track), and that's why some OS-side optimizations we (and others) did
> were not as effective as I expected: the drives already did part of
> them for us. Anyway, I had not heard that this is a marketing feature
> for enterprise drives, and Google has not heard about "skip read
> semantics", either.

See http://ps-2.kev009.com/rs6000/manuals/SAN/ESS/2105_Model_ExxFxx/ESS_SCSI_Command_Reference_ExxFxx_SC26-7297-01.PDF
for an example (page 66) and http://www.ibmsystemsmag.com/getattachment/d5e03906-aa31-40c2-88f6-adce31c776ab/
for a diagram of 1 command with skip vs 2 commands with no skip.

This kind of feature is only available on enterprise class drives.