Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

File wiping: why isn't one pass sufficient?

11 views
Skip to first unread message

George Orwell

unread,
May 15, 2006, 2:05:06 PM5/15/06
to
I don't understand (intuitively) why you would need to wipe a hard drive with more than one pass unless the data is somehow "layered" magnetically or parts of it are accidentally skipped altogether in the initial passes.

If you overwrite zeros and ones with irrelevant data, how can they possibly be recreated in their original form? In other words, if I had exactly 1MB of free space, couldn't I effectively "wipe" it by filling every byte with 1MB of new files? Where else would that 1MB go if it wasn't covering up the free space? It's not like a magnetic disk is paper with watermarks on it.

Is the sole purpose of multiple passes (i.e. 7 for the DOD) to root out any "nooks and crannies" of data that weren't overwritten AT ALL in the previous pass? I'm trying to figure out what's going on physically and magnetically during a file wipe.

G.T.

Ron B.

unread,
May 15, 2006, 2:10:20 PM5/15/06
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512


See:

http://wipe.sourceforge.net/secure_del.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQIVAwUBRGjEDF+iaVoeuMy2AQr5UQ/6AxKIktFIa9RdsKxBdAXtR0B313N826GU
F4FR1r9NUo+8h1eNo2ABBe1OmhI8j3hJx8GR6lOJULwV60dwGKygCwJrhZNpl398
+CUnv4uBKE9QXwgci8yNLilA9EwYblAKkuj1vEHyHq41ZBTO+IPyt28yl2Zph6fF
BEp02AvS4UFhlG4fDb4wtPGOvzSRLOpqBOIebShqgXnPbLP+N3PObO9FP62jKzqk
l/JmxZi1EmKvrodn940zl+FahFLgj3SOagnS3bJu5rcnEnMRJ9SWqRgH613rkAe1
b1PCDlflDyTJ4y4izpCkcWRo9GqnNAo28DIhwVFGUrDzDT95oy+8rF05ET7G3AXa
f0/01gri3KWhAB6mHnJD9ZU1UBSnnfUNF3X/LdKm5iXjDVpxMnmDzbNkqOnqSpA1
urgi1eGyYE4J/VUgKvGNiLlvWhjGjTkCod93QdB/xDQpki3KwnwlrlIBZifdbPfR
kJDIUD16pDaOF1Vvn94Paog+lonXt0vcrZYkyzAuFudb4qrLRi2FzJ+so78lDy+C
S4HY+8uaz60sh7519IqYJQwdlYpjPMuSHMa4g4XRYtyKvxL+Ls4fQoOHVsY7nrL+
k+xxVosPiK/+Ja7u8DiZJBrFz+CWWiEPRKJ8WJiA54AM+wxDwR4MRdVavx676lHv
VsNbT8+Il4Q=
=9JDL
-----END PGP SIGNATURE-----

Anonyma

unread,
May 15, 2006, 2:34:00 PM5/15/06
to

Will Dickson

unread,
May 15, 2006, 2:54:07 PM5/15/06
to
On Mon, 15 May 2006 13:10:20 -0500, Ron B. wrote:

>> I don't understand (intuitively) why you would need to wipe a hard
>> drive with more than one pass unless the data is somehow "layered"
>> magnetically or parts of it are accidentally skipped altogether in the
>> initial passes.

Short (and over-simplified) version: Like all physical systems, the actual
disk is analogue in nature: the black-or-white "digitalisation" part
happens in the firmware. If the wiping is insufficient, specialist
hardware can be used to see traces of the previous bit pattern.

Put crudely, suppose we overwrite some bit with a zero. If the overwritten
bit was a one, the result is more like (say) 0.1 than zero, whereas if the
overwritten bit was also a zero, then the result is closer to
0.05. Since either is way below 0.5, the firmware reads them both the
same. However, specialist equipment which reads the platter directly can
tell the difference.

The seminal paper on this subject is
http://www.cs.auckland.ac.nz/~pgut001/pubs/secure_del.html


You should note that many modern filesystems use "journalling" techniques
to improve reliability. A side-effect of many of these is that as new
blocks are written which logically overwrite existing blocks of a file,
the new blocks may actually be written to a different place on the
physical disk. This means that file-level wiping doesn't work any more.
:-( In such cases, you'd need to wipe the entire partition.

HTH

Will.

Unruh

unread,
May 15, 2006, 3:45:49 PM5/15/06
to
George Orwell <nob...@mixmaster.it> writes:

>I don't understand (intuitively) why you would need to wipe a hard drive with more than one pass unless the data is somehow "layered" magnetically or parts of it are accidentally skipped altogether in the initial passes.

>If you overwrite zeros and ones with irrelevant data, how can they possibly be recreated in their original form? In other words, if I had exactly 1MB of free space, couldn't I effectively "wipe" it by filling every byte with 1MB of new files? Where else would that 1MB go if it wasn't covering up the free space? It's not like a magnetic disk is paper with watermarks on it.

Because of hysterisis in magnetic fields and because of misalighments of
the write/read heads.

If you write a one over a zero, the magnetic field on the disk is slightly
different than when you write a one over a one. Thus, by rewiring the read
head so that it reports the actual magnetic field rather than simply the
binaried version (If field above x report 1 else report 0) you can tell
whether the previous bit was 1 or zero.

Furthermore, if the head shifted since writing the previous track, a
remnant of that track can be left at the edge of the current track.

Now all of these require "herioc" efforts-- ie you have to rewire the
output logic or remove the disk from the drive to scan it with different
heads or in a scanning magnetic microscope. So, if you are worried about
some kiddie stealing your drive and discovering that your recipie for
brownies, one wipe is enough. If you are worried about IBM discovering your
invention of a new type of supercondicting memory, many wipes may be
prudent, and even then they should probably done at different ambient
temperatures to make sure the heads were not centered at exactly the same
place. ...


>Is the sole purpose of multiple passes (i.e. 7 for the DOD) to root out any "nooks and crannies" of data that weren't overwritten AT ALL in the previous pass? I'm trying to figure out what's going on physically and magnetically during a file wipe.


No. to the question.

>G.T.

Unruh

unread,
May 15, 2006, 3:47:16 PM5/15/06
to
Anonyma <anon-b...@deuxpi.ca> writes:

>I don't understand (intuitively) why you would need to wipe a hard drive with more than one pass unless the data is somehow "layered" magnetically or parts of it are accidentally skipped altogether in the initial passes.

>If you overwrite zeros and ones with irrelevant data, how can they possibly be recreated in their original form? In other words, if I had exactly 1MB of free space, couldn't I effectively "wipe" it by filling every byte with 1MB of new files? Where else would that 1MB go if it wasn't covering up the free space? It's not like a magnetic disk is paper with watermarks on it.

Think of magnetic writing as covering with tissue paper with new writing on
it. The tissue paper could well allow you to see the old writing through
it. Enough pieces of tissue paper and you cannot see the original.

Dan Evans

unread,
May 15, 2006, 3:05:36 PM5/15/06
to
George Orwell wrote:
> [snip]

> Is the sole purpose of multiple passes (i.e. 7 for the DOD) to root
> out any "nooks and crannies" of data that weren't overwritten AT ALL in
> the previous pass? I'm trying to figure out what's going on physically
> and magnetically during a file wipe.

The best description I've heard goes something like this:

Imagine the data track on a hard drive is like a stretch of road. The
data is actually contained in a small line that takes up about as much
space in the road as the double yellow line in the middle of the
blacktop. Data can be written anywhere "on the road" due to the
unreliability of the operating environment of the drive and margin of
error in the drive's engineering.

Recovery programs are able to discern the older "tracks" that exist in
the road, just how sometimes you can see where road crews have moved the
yellow lines during construction projects. So what multiple passes of
wiping programs like DBAN do is try to confuse the issue so much that
the recovery programs get lost in all the sets of old lines and make bad
decisions about which path to follow next, which should result in what
amounts to random reads of old data.

--
- Dan

Unruh

unread,
May 15, 2006, 5:01:22 PM5/15/06
to
OK, this is the sixth time I have seen this same post with the same
question.

This is rediculous.

Novus

unread,
May 15, 2006, 6:12:14 PM5/15/06
to
On 2006-05-15 15:05:36 -0400, Dan Evans <evan...@gmail.nosp.am.com> said:

> Recovery programs are able to discern the older "tracks" that exist in
> the road, just how sometimes you can see where road crews have moved
> the yellow lines during construction projects. So what multiple passes
> of wiping programs like DBAN do is try to confuse the issue so much
> that the recovery programs get lost in all the sets of old lines and
> make bad decisions about which path to follow next, which should result
> in what amounts to random reads of old data.

If you want to make sure people can't read something on a drive then
melt the platters in a furnace or with appropriate chemicals.

Or better yet, follow the old saying "The only way to keep a secret is
to never tell anyone." and never put the data on the drive in the first
place.

Novus

example

unread,
May 15, 2006, 8:14:16 PM5/15/06
to

Just leave the hard drive on a speaker magnet overnight

Joseph Ashwood

unread,
May 15, 2006, 9:10:31 PM5/15/06
to
"example" <example@example.i> wrote in message
news:9tOdnXLnyrP...@newedgenetworks.com...

> Where else would that 1MB go if it wasn't covering up the free space? It's
> not like a magnetic disk is paper with watermarks on it.

Actually in many ways it is surprisingly similar. The previously posted
yellow stripe on the road is a very good analogy for why erasure doesn't
work very well.

> Just leave the hard drive on a speaker magnet overnight

Won't work. A speaker magnet has a static magnetic field, and while it would
corrupt the drive, it would not unrecoverably corrupt the drive, or at least
not all of it. In order to unrecoverably erase a disk takes a large variable
magnetic field, or an absurdly large static one. A speaker magnet is simply
not going to have a dense enough magnetic field to dependably permanently
erase the data. The reason is fairly simple, it is necessary to introduce a
field that will change the entire field of the disk to a uniform (or
unpredictable) state. Unfortunately, magnetic fields have a memory, and so
you have to either overpower the memory (massive magnetic field), or
overwrite it (variable) to the point where the original field cannot be read
form the memory. The easiest way to do this is to trigger a cascade effect
on the platter, but this requires the introduction of substantial energy
(typically heat), and destroys the usefulness of the media. By introducing a
large number of changes to the state you can use the small magnetic field of
the write head to perform the same purpose by overwriting the memory of the
platter field. There is plenty of discussion to be had about how many times
it should be overwritten, but the fundamental reason why will remain.
Joe


Unruh

unread,
May 16, 2006, 2:08:06 AM5/16/06
to
example <example@example.i> writes:

Pretty useless way of wiping it. While it might make the drive unreadable
via the standard interface, all of the data is left completely accessible.
That hysterisis plays a role here as well, in addition to the magnetic
field not being strong enough to really erase the data.


giorgio.tani

unread,
May 16, 2006, 3:26:32 AM5/16/06
to
> I don't understand (intuitively) why you would need to wipe a hard drive with more than one pass unless the data is somehow "layered" magnetically or parts of it are accidentally skipped altogether in the initial passes.

AFAIK, it depends on the onderlying phisical structure of the disk.
0 and 1 of file content is not plainly written "as is" on the plate
because as you may imagine it's not a fault proof phisical operation,
so it exist different strategies to write chunks of data and verifying
it against accidental phisical errors.
Moreover, the operation of writing data occupies a phisical space of
several atoms, and it's quite difficult that each single atom get
"rewritten" at each operation; more likely the magnetization will cover
"almost" the same area of the previous operation on the same bit, but
an higher resolution scanner may find a seizeable marginal area were
the atoms were not remagnetized.
For that reason rewiting the disk with given patterns known to 0ize and
1ize all bits of a specifical disk type chunk, and doing it many times
(plus doing it with random data to simulate actual data and fool
possible hardware level compression staregies of the disk itself)
should be needed do the job.
Gutmann analyzes those issues in a very interesting work:
Secure Deletion of Data from Magnetic and Solid-State Memory, Peter
Gutmann, Department of Computer Science, University of Auckland
It may be the best reference to understand better how secure deletion
works.

Message has been deleted

Bryan Olson

unread,
May 16, 2006, 5:41:16 AM5/16/06
to
George Orwell wrote:
> I don't understand (intuitively) why you would need to wipe
> a hard drive with more than one pass [...]

> If you overwrite zeros and ones with irrelevant data, how
> can they possibly be recreated in their original form?

[...]

We over-write multiple times because it's cheap-and-easy
to do; plus, like chicken soup, it couldn't hurt. There's
no good evidence it actually helps.

No one actually demos reading data that's been physically
over-written by a single pass of a modern disk. What
future technology will bring is unknown, so any particular
measures to thwart future advances are basically pulled
from someone's ...uh...imagination.

Disk drives keep changing technology because they run into
the physical limits of the older technology. The popular
standards for disk wiping date from when drives had less
than one one-thousandth the data density that they do
today. In addition to miniaturization, signal modulation
and error-control methods have advanced markedly. If there
really were lab equipment that could read the previous
contents of over-written sectors, the drive vendors would
use those techniques to double capacity.


Sometimes paranoia pays, but we might as well direct our
efforts against the more likely threats. On many modern
systems, ensuring a physical over-write of all copies of
sensitive data is surprisingly difficult. Temp files,
swap/page files, transparent backups, and log-structured
file systems can leave data recorded after users thought
it was destroyed. That's a vastly larger threat than the
possibility of reading a magnetic signal after a physical
over-write.


--
--Bryan

Dirty Ed

unread,
May 16, 2006, 6:10:49 AM5/16/06
to
On or about 5/16/2006 5:41 AM, Bryan Olson penned the following:

Well, over 40 years ago I was working on a classified project at
Wright-Patterson AFB in Ohio where we DID recover information from
magnetic tapes that were overwritten and some were also de-gaussed in
large commercial de-gaussers. Guess what? We could recover up to 50%
of what we were looking for on those supposedly erased and de-gaussed
tapes. Why do you think the government requires up to 30 re-writes to
declassify disks. There is equipment out there that WILL get enough of
the previous information to make sense unless there are enough re-writes
to prevent it.


--
Ed
"Under no circumstances will I ever purchase anything offered to me as
the result of an unsolicited email message. Nor will I forward chain
letters, petitions, mass mailings, or virus warnings to large numbers of
others. This is my contribution to the survival of the online
community." - Roger Ebert, December, 1996

giorgio.tani

unread,
May 16, 2006, 6:41:11 AM5/16/06
to
> Disk drives keep changing technology because they run into
> the physical limits of the older technology. The popular
> standards for disk wiping date from when drives had less
> than one one-thousandth the data density that they do
> today. In addition to miniaturization, signal modulation
> and error-control methods have advanced markedly.
I fully agree. The most precisely the information are "targeted" on the
disk surface, the least we should fear that a single pass (over)write
"miss the target".
As you rightly remember after in your post, there are more likely
threats today.

> If there
> really were lab equipment that could read the previous
> contents of over-written sectors, the drive vendors would
> use those techniques to double capacity.

There I cannot fully agree. Noone would buy today a disk requiring
precision machinery that cost million dollars and are big like a room
even if it double the precision of the magnetization and the capacity
of the disk.
While we must assume that an attacker has greater resources than us and
no other target than recover our data, disks vendor has practical
issues of sizes and costs to meet in order to build a soldable product.
What a specialized lab's scanner costing million dollars and big like a
room may do today we may find in our disk for few dollars and in few
cubic inches of space, but certainly not today nor, probably, so soon,
so we should assume that some attackers would have enough resources to
use state of art or experimental technology that maybe will be sold as
mainstream technology only in future decades (when cost and size
shrinking will made it commercially affordable) while the data we are
trying to protect is on our actual hard disk built with the tecnology
that today is enough cheap to be sold mainstream.

Y(J)S

unread,
May 16, 2006, 10:23:27 AM5/16/06
to
Old and admittedly anecdotal evidence that one wipe is not enough:
A long time ago I wrote a random string of bits on a floppy diskette. I
then
overwrote it four times, once with all ones, after that with all zeros,
after that with 010101...,
and finally with 101010... .

I then opened up the diskette cardboard packaging, spinned the inside
and recorded from an induction loop used for QA of diskette magnetic
media. I then crosscorrelated the analog signal retrieved with the
original (supposedly erased) bit stream.

Guess what ? I saw a definite peak in the cross-correlation. That
doesn't really prove that
I could have pulled the bits off the diskette, but it does show that
something was still there.

Y(J)S

Unruh

unread,
May 16, 2006, 10:50:43 AM5/16/06
to
Bryan Olson <fakea...@nowhere.org> writes:


>Disk drives keep changing technology because they run into
>the physical limits of the older technology. The popular
>standards for disk wiping date from when drives had less
>than one one-thousandth the data density that they do
>today. In addition to miniaturization, signal modulation
>and error-control methods have advanced markedly. If there
>really were lab equipment that could read the previous
>contents of over-written sectors, the drive vendors would
>use those techniques to double capacity.

Well no, because the error rate would go way up, Ie, you probably cannot
recover the data with 100% efficiency. But 99% would probably be fine, but
a 1% error rate would not be fine for the disk drive manufacturers.

So lets say the original was 1101. This ideally would correspond to
magnetisations of 1 1 -1 1. Now overwrite with 0 1 1 0. Because of
hysteresis, the magnetizations would now be -.8 1.2 .8 -.8 This is still
pleanty to unambiguously pick out the 0 1 1 0. But may also be enough to
pick out the 1 1 0 1. Ie the first signal is overlayed on the second at
much reduced amplitude. EVentually that amplitude drops into the noise. Eg
even at this stage that .8 may be .8 +- .05 Ie, there may a .05 noise on
top of the signal.

Bryan Olson

unread,
May 16, 2006, 12:03:18 PM5/16/06
to
Unruh wrote:
> Bryan Olson writes:
>>[...] In addition to miniaturization, signal modulation

>>and error-control methods have advanced markedly. If there
>>really were lab equipment that could read the previous
>>contents of over-written sectors, the drive vendors would
>>use those techniques to double capacity.
>
> Well no, because the error rate would go way up, Ie, you probably cannot
> recover the data with 100% efficiency. But 99% would probably be fine, but
> a 1% error rate would not be fine for the disk drive manufacturers.

They use error-correction. Modern codes get within a few decibels
of theoretical channel capacity.

> So lets say the original was 1101. This ideally would correspond to
> magnetisations of 1 1 -1 1.

That's an antique code. It required one surface devoted to a
clocking signal and the heads to be aligned together. Hasn't been
used in decades.


--
--Bryan

Bryan Olson

unread,
May 16, 2006, 1:00:09 PM5/16/06
to
Dirty Ed wrote:
> Well, over 40 years ago I was working on a classified project at
> Wright-Patterson AFB in Ohio where we DID recover information from
> magnetic tapes that were overwritten and some were also de-gaussed in
> large commercial de-gaussers. Guess what? We could recover up to 50%
> of what we were looking for on those supposedly erased and de-gaussed
> tapes.

40 years ago we didn't know how to get near the channel capacity of
the medium. A lot happened in that 40 years.

> Why do you think the government requires up to 30 re-writes to
> declassify disks.

I said why I think standard say to do things like that: because
doing so is so cheap. Also, they're probably concerned about older
drives. I'm kind of surprised they ever let a disk be declassified;
drives now cost so little you might as well just buy a new one.

> There is equipment out there that WILL get enough of
> the previous information to make sense unless there are enough re-writes
> to prevent it.

Do you have any citation showing that for a modern disk drive
"enough re-writes to prevent it" is more than one?


--
--Bryan

Paul Rubin

unread,
May 16, 2006, 1:13:36 PM5/16/06
to
Bryan Olson <fakea...@nowhere.org> writes:
> Do you have any citation showing that for a modern disk drive
> "enough re-writes to prevent it" is more than one?

I thought drives these days sometimes do totally automatic and
transparent bad block forwarding. Once a block is forwarded, it can
never be overwritten. If I were the NSA, I'd say that the only way to
declassify a drive would be to melt it. Better yet: classified
plaintext should never touch the drive in the first place.

Seagate is now working on drives with built-in encryption. Whether
they did it correctly is a different question, of course.

http://www.seagate.com/cda/newsinfo/newsroom/releases/article/0,,2732,00.html
http://www.seagate.com/content/docs/pdf/marketing/PO-Momentus-FDE.pdf

Bryan Olson

unread,
May 16, 2006, 1:17:12 PM5/16/06
to
giorgio.tani wrote:
>>If there
>>really were lab equipment that could read the previous
>>contents of over-written sectors, the drive vendors would
>>use those techniques to double capacity.
>
> There I cannot fully agree. Noone would buy today a disk requiring
> precision machinery that cost million dollars and are big like a room
> even if it double the precision of the magnetization and the capacity
> of the disk.
> While we must assume that an attacker has greater resources than us and
> no other target than recover our data, disks vendor has practical
> issues of sizes and costs to meet in order to build a soldable product.
> What a specialized lab's scanner costing million dollars

The disk-drive companies have spent billions of dollars in research
and development to get all the capacity they can from the medium.
That million-dollar lab equipment is going to have a hard time beating
the drive at its own game.

> and big like a
> room may do today we may find in our disk for few dollars and in few
> cubic inches of space, but certainly not today nor, probably, so soon,

Is there such lab equipment? Has any such thing been demo'd with
a modern drive?


--
--Bryan

Unruh

unread,
May 16, 2006, 1:40:39 PM5/16/06
to
Bryan Olson <fakea...@nowhere.org> writes:

>Unruh wrote:
>> Bryan Olson writes:
>>>[...] In addition to miniaturization, signal modulation
>>>and error-control methods have advanced markedly. If there
>>>really were lab equipment that could read the previous
>>>contents of over-written sectors, the drive vendors would
>>>use those techniques to double capacity.
>>
>> Well no, because the error rate would go way up, Ie, you probably cannot
>> recover the data with 100% efficiency. But 99% would probably be fine, but
>> a 1% error rate would not be fine for the disk drive manufacturers.

>They use error-correction. Modern codes get within a few decibels
>of theoretical channel capacity.

Of course they use error correction. They have to get the error rate down
to about 10^-15. But the higher the raw error rate the more complex and
longer the error correction code. If the raw rate is 1%, error correction
codes become completely unweildy.

>> So lets say the original was 1101. This ideally would correspond to
>> magnetisations of 1 1 -1 1.

>That's an antique code. It required one surface devoted to a
>clocking signal and the heads to be aligned together. Hasn't been
>used in decades.

Oh dear. You are dense aren;t you. This is an example for the person to let
them see why overwriting does not remove the original data. ALL codes code
with magnetic differences on the disk. Exactly how they use that to
actually code the logical bits differ. But that is irrelevant since
ultimately it comes down to 1 and 0 on the platter.
Ie, the above is exactly the coding
at some raw level. And by determining what the original magnetizations were
you can determine what the original logical bits were.


Paul Rubin

unread,
May 16, 2006, 1:50:47 PM5/16/06
to
Unruh <unruh...@physics.ubc.ca> writes:
> Oh dear. You are dense aren;t you. This is an example for the person to let
> them see why overwriting does not remove the original data. ALL codes code
> with magnetic differences on the disk. Exactly how they use that to
> actually code the logical bits differ. But that is irrelevant since
> ultimately it comes down to 1 and 0 on the platter.

I don't think it's really 1 vs 0 on the platter any more. Whether a
given magnetization region on the platter is interpreted as a 1 or a 0
depends on the (analog) magnetization levels of the surrounding
regions as well:

http://en.wikipedia.org/wiki/PRML

Overwriting messes with that pretty thoroughly. The part I don't see
any way to escape from is still bad block forwarding.

Andrew Swallow

unread,
May 16, 2006, 2:32:53 PM5/16/06
to
Bryan Olson wrote:
[snip]

>
> Do you have any citation showing that for a modern disk drive
> "enough re-writes to prevent it" is more than one?

You may find this information on a police website.

The spy reader can hi-fi analogue amplifiers, turn the disk very slowly
read the track several times.

Andrew Swallow

Mike Amling

unread,
May 16, 2006, 3:03:58 PM5/16/06
to
Bryan Olson wrote:
> If there
> really were lab equipment that could read the previous
> contents of over-written sectors, the drive vendors would
> use those techniques to double capacity.

If they cost the same as current techniques, yes. But a special read
head and electronics for forensic use could well be too pricey for mass
produced disk drives.

> Sometimes paranoia pays, but we might as well direct our
> efforts against the more likely threats. On many modern
> systems, ensuring a physical over-write of all copies of
> sensitive data is surprisingly difficult. Temp files,
> swap/page files, transparent backups, and log-structured
> file systems can leave data recorded after users thought
> it was destroyed. That's a vastly larger threat than the
> possibility of reading a magnetic signal after a physical
> over-write.

Yep.

--Mike Amling

TwistyCreek

unread,
May 16, 2006, 3:18:00 PM5/16/06
to mail...@dizum.com
Thanks for all your great answers and links. I somehow couldn't find a definitive tech article on my own. Hard drives are obviously imperfect devices. This answered my basic question:

http://wipe.sourceforge.net/secure_del.html

"In conventional terms, when a one is written to disk the media records a one, and when a zero is written the media records a zero. However the actual effect is closer to obtaining a 0.95 when a zero is overwritten with a one, and a 1.05 when a one is overwritten with a one. Normal disk circuitry is set up so that both these values are read as ones, but using specialised circuitry it is possible to work out what previous "layers" contained."

Having said that, I still wonder how a file can be read with such extreme accuracy, seeing as every bit must be exact for software to function, yet a zero or one is rarely "pure." It makes me think that the integrity of data can be a hit or miss proposition. Error-correction must be critical.

G.T.


P.S. Multiple copies of the original post were probably due to remailer "scattering." Another case of imperfect data.


Bryan Olson

unread,
May 16, 2006, 3:51:55 PM5/16/06
to
Unruh wrote:
> Bryan Olson writes:
>>Unruh wrote:
>>
>>>Bryan Olson writes:
>>>
>>>>[...] In addition to miniaturization, signal modulation
>>>>and error-control methods have advanced markedly. If there
>>>>really were lab equipment that could read the previous
>>>>contents of over-written sectors, the drive vendors would
>>>>use those techniques to double capacity.
>>>
>>>Well no, because the error rate would go way up, Ie, you probably cannot
>>>recover the data with 100% efficiency. But 99% would probably be
fine, but
>>>a 1% error rate would not be fine for the disk drive manufacturers.
>
>
>>They use error-correction. Modern codes get within a few decibels
>>of theoretical channel capacity.
>
> Of course they use error correction. They have to get the error rate down
> to about 10^-15. But the higher the raw error rate the more complex and
> longer the error correction code. If the raw rate is 1%, error correction
> codes become completely unweildy.

Things have changed. We now have computationally-efficient
codes that get within a few decibels of the mathematical limit.


>>>So lets say the original was 1101. This ideally would correspond to
>>>magnetisations of 1 1 -1 1.
>
>>That's an antique code. It required one surface devoted to a
>>clocking signal and the heads to be aligned together. Hasn't been
>>used in decades.
>
> Oh dear. You are dense aren;t you. This is an example for the person
to let
> them see why overwriting does not remove the original data.

It's an example that's decades behind the times. Overwriting
trashes the signal-to-noise ratio, so if the drive was making
good use of the channel's capacity, most of the information
just isn't there anymore.


> ALL codes code
> with magnetic differences on the disk. Exactly how they use that to
> actually code the logical bits differ. But that is irrelevant since
> ultimately it comes down to 1 and 0 on the platter.
> Ie, the above is exactly the coding
> at some raw level.

The signal on the platter is analog. In the old days, disks
stored an approximation to a square wave, and threw away a lot
of the channel's capacity by so doing. Now they use convolution
codes and analog modulation methods that don't place each bit
at a distinct physical location.


--
--Bryan

Mike Amling

unread,
May 16, 2006, 3:53:54 PM5/16/06
to
Unruh wrote:
> Bryan Olson <fakea...@nowhere.org> writes:
>
>> Unruh wrote:
>>> Bryan Olson writes:
>>>> [...] In addition to miniaturization, signal modulation
>>>> and error-control methods have advanced markedly. If there
>>>> really were lab equipment that could read the previous
>>>> contents of over-written sectors, the drive vendors would
>>>> use those techniques to double capacity.
>>> Well no, because the error rate would go way up, Ie, you probably cannot
>>> recover the data with 100% efficiency. But 99% would probably be fine, but
>>> a 1% error rate would not be fine for the disk drive manufacturers.

While we're on the subject, what does it take to securely erase data
from a flash/thumb/pen drives?

--Mike Amling

Paul Rubin

unread,
May 16, 2006, 3:58:50 PM5/16/06
to
Mike Amling <nos...@foobaz.com> writes:
> While we're on the subject, what does it take to securely erase
> data from a flash/thumb/pen drives?

I've heard that the DOD approves simply overwriting them, but again
that sounds like trouble because of bad block forwarding. Also, it
may be impossible to wipe files from a pen drive, as opposed to
overwriting the entire drive, because of write wear levelling that
happens in those drives.

Andrew Swallow

unread,
May 16, 2006, 4:07:38 PM5/16/06
to

On some semiconductor drives you can bypass the levelling system. That
normally only happens when you wipe/format the entire drive. The wipe
software could be written to overwrite all bad blocks and ignore the
error messages.

Andrew Swallow

David W. Hodgins

unread,
May 16, 2006, 4:15:47 PM5/16/06
to
On Tue, 16 May 2006 15:53:54 -0400, Mike Amling <nos...@foobaz.com> wrote:

> While we're on the subject, what does it take to securely erase data
> from a flash/thumb/pen drives?

Fire, crushing, shredding, etc<G>. For less clasified info, I'd
expect the format with multiple overwrites would be sufficient.

Regards, Dave Hodgins

--
Change nomail.afraid.org to ody.ca to reply by email.
(nomail.afraid.org has been set up specifically for
use in usenet. Feel free to use it yourself.)

Unruh

unread,
May 16, 2006, 5:33:07 PM5/16/06
to
TwistyCreek <an...@comments.header> writes:

>Thanks for all your great answers and links. I somehow couldn't find a definitive tech article on my own. Hard drives are obviously imperfect devices. This answered my basic question:

>http://wipe.sourceforge.net/secure_del.html

>"In conventional terms, when a one is written to disk the media records a one, and when a zero is written the media records a zero. However the actual effect is closer to obtaining a 0.95 when a zero is overwritten with a one, and a 1.05 when a one is overwritten with a one. Normal disk circuitry is set up so that both these values are read as ones, but using specialised circuitry it is possible to work out what previous "layers" contained."

>Having said that, I still wonder how a file can be read with such extreme accuracy, seeing as every bit must be exact for software to function, yet a zero or one is rarely "pure." It makes me think that the integrity of data can be a hit or miss proposition. Error-correction must be critical.

It is in order to get the error rates required. A modern disk needs about a
10^-15 error probability or less per bit. But there is no way they could
get that from the raw physical stuff on the disk. So they use massive
amounts of error correction to drive down the error rates. But the raw
error rate is still pretty low.

Unruh

unread,
May 16, 2006, 5:52:33 PM5/16/06
to
Bryan Olson <fakea...@nowhere.org> writes:


The disk has an "analog ( actually it is rather digitial-- distinct
values). signal. The relationship between that signal and the logical bits
is complex. But that analog signal derives from magnetisation.
Magnetisation suffers from hysteresis-- ie, a write signal A priduces a
magnetisation B which depends not only on A but also on the history of
prior magnetisation of that region. Because the actual signal is history
dependent, one can extact the history. This is independent of what the
actual relation is between the magnetisation and the logical bits.
Also because disks need such a low error rate, the signals really are very
very very redundant. They do NOT get the max channel capacity, since that
would in general have a very high error rate. (The channel capacity with 1%
errors is far higher than an 8-5 encoding with only a .01% error rate.).
They may well get max capacity for the desired error rate. But that is far
away from the max channel capacity. Since even a 1% error rate on recovery
of old data may well be acceptable (most of my netnews posts are readable
despite error rates in typing which probably approach 1%), the recovery
process can use the necessary redundancy of the encoding process to extract
more signal from the "noise"


Dirty Ed

unread,
May 16, 2006, 7:23:04 PM5/16/06
to
On or about 5/16/2006 1:00 PM, Bryan Olson penned the following:

Usually 3-10 rewrites are enough for most people. The Government
requires physical destruction: Here is a quote from one page:


It is important to initially emphasize that erasure security can only
be relative. There is no method giving
absolutely secure erase. Government document *DoD 522.22M* is commonly
quoted on erasure methods, and
requires physical destruction of the storage medium (the magnetic
disks) for data classified higher than Secret.
However, even such physical destruction is not absolute if any
remaining disk pieces are larger than a single
512-byte record block in size, about 1/125” today’s drives. Pieces of
this size are found in bags of destroyed
disk pieces studied at CMRR. Magnetic microscopy can image the stored
recorded media bits, using the CMRR
scanning magnetoresistive microscope. Physical destruction nevertheless
offers the highest level of erasure
because recovering any actual user data from a magnetic image requires
overcoming almost a dozen
independent recording technology hurdles.

Entire document can be viewed at:
http://cmrr.ucsd.edu/Hughes/CmrrSecureEraseProtocols.pdf

Paul Rubin

unread,
May 16, 2006, 8:11:36 PM5/16/06
to
Unruh <unruh...@physics.ubc.ca> writes:
> (The channel capacity with 1% errors is far higher than an 8-5
> encoding with only a .01% error rate.).

Are you sure of that? It is counterintuitive.

Unruh

unread,
May 16, 2006, 8:17:48 PM5/16/06
to
Dirty Ed <nob...@spamcop.net> writes:

Well, I would shred it and then heat it to well above the Neal temperature
of the recording material. That should get rid of all possibilities.

Unruh

unread,
May 16, 2006, 8:20:53 PM5/16/06
to

Why is it counter-intuitive. With 1% one percent are bad, but 99% are good.
With .01 about half are redundant so you have reduced the rate by 50% to
from 99%. Ie, redundancy for error correction costs.


Paul Rubin

unread,
May 16, 2006, 8:25:21 PM5/16/06
to
Unruh <unruh...@physics.ubc.ca> writes:
> Why is it counter-intuitive. With 1% one percent are bad, but 99% are good.
> With .01 about half are redundant so you have reduced the rate by 50% to
> from 99%. Ie, redundancy for error correction costs.

Why would I need 50% redundancy to lower the error rate from 1% to .01%?
Is there a theorem?

Luc The Perverse

unread,
May 16, 2006, 9:49:36 PM5/16/06
to
"Unruh" <unruh...@physics.ubc.ca> wrote in message
news:e4dq3c$mlo$1...@nntp.itservices.ubc.ca...

I'm curious what is so damned secret that people would dig through trash for
a piece of hard disk to get it back.

--
LTP

:)


Tim Smith

unread,
May 16, 2006, 11:17:49 PM5/16/06
to
In article <v3qag.17356$Lm5....@newssvr12.news.prodigy.com>, Bryan Olson
wrote:

> The signal on the platter is analog. In the old days, disks stored an
> approximation to a square wave, and threw away a lot of the channel's
> capacity by so doing. Now they use convolution codes and analog modulation
> methods that don't place each bit at a distinct physical location.

So? The analog state of the disk is still going to be a function of both
the current data and the previous data.

--
--Tim Smith

Andrew Swallow

unread,
May 16, 2006, 11:26:44 PM5/16/06
to

This website give the British Government's answer.
<http://www.dnotice.org.uk/standing_da_notices.htm>

Andrew Swallow

Bryan Olson

unread,
May 17, 2006, 12:18:43 AM5/17/06
to
Unruh wrote:
> The disk has an "analog ( actually it is rather digitial-- distinct
> values). signal. The relationship between that signal and the logical bits
> is complex. But that analog signal derives from magnetisation.
> Magnetisation suffers from hysteresis-- ie, a write signal A priduces a
> magnetisation B which depends not only on A but also on the history of
> prior magnetisation of that region. Because the actual signal is history
> dependent, one can extact the history. This is independent of what the
> actual relation is between the magnetisation and the logical bits.

Here's an idea: in front of the write head, put an erase head,
that writes a frequency optimized for the hysteresis loop of the
particular magnetic material. Or maybe put a read head there, and
compensate for the old value in the write signal. Have I just
invented how to compensate for hysteresis and greatly improve the
signal-to-noise ratio, and thus the available recording density?
Alas, no; decades old technology.


> Also because disks need such a low error rate, the signals really are very
> very very redundant. They do NOT get the max channel capacity, since that
> would in general have a very high error rate. (The channel capacity with 1%
> errors is far higher than an 8-5 encoding with only a .01% error rate.).

8-5 encoding is antique.

> They may well get max capacity for the desired error rate.

There isn't really any such thing. For any data-rate less than the
theoretical capacity, and any error-rate greater than zero, there
exists a code that acheves both the data-rate and the error-rate.
That's Shannon's "noisy channel coding theorem".

> But that is far
> away from the max channel capacity.

We now have computationally-efficient codes that get within a few
decibels of the limit.


--
--Bryan

Bryan Olson

unread,
May 17, 2006, 12:30:08 AM5/17/06
to

Probably not, since it's false.

In 1948, Shannon proved his "noisy channel coding theorem", showing
codes can get arbitrarily close to both a zero error-rate and the
theoretical-limit on data rate *simultaneously*.


--
--Bryan

Unruh

unread,
May 17, 2006, 1:17:15 AM5/17/06
to
Paul Rubin <http://phr...@NOSPAM.invalid> writes:

It was an example. It depends on the types of error. You want a coding
which will recognize the errors and correct them. Even if the redundancy is
lower, the point is that the redundancy must be less than 1% for the
correction to impact the rate by less than the error does.
(and 1% is not enough).

The error correction costs. If it has to fix two or three bit errors the
redundancy goes up.


Unruh

unread,
May 17, 2006, 1:22:30 AM5/17/06
to
Bryan Olson <fakea...@nowhere.org> writes:

>8-5 encoding is antique.

Yes, and the lower the error rate the lower the channel capacity.


>> But that is far
>> away from the max channel capacity.

>We now have computationally-efficient codes that get within a few
>decibels of the limit.

The limit for the given error rate and noise in the channel.
Given a lower error rate ( eg 1% instead of 10^-15) and the channel
capacity increases.


>--
>--Bryan

Paul Rubin

unread,
May 17, 2006, 1:24:47 AM5/17/06
to
Unruh <unruh...@physics.ubc.ca> writes:
> >Why would I need 50% redundancy to lower the error rate from 1% to .01%?
> >Is there a theorem?
>
> It was an example. It depends on the types of error. You want a coding
> which will recognize the errors and correct them. Even if the redundancy is
> lower, the point is that the redundancy must be less than 1% for the
> correction to impact the rate by less than the error does.
> (and 1% is not enough).

OK, 50% is more than needed, but 1% is not enough. How much is enough?

According to the theorem that Bryan cited, it sounds like (1% plus
epsilon) is enough, for arbitrarily small epsilon. That is what I
would have expected. How many sugar cookies do I have to ship you to
be 99.99% sure of your receiving 1 million unbroken ones, if 1% of
shipped cookies get broken in transit? The answer is not 2 million or
1.5 million. It's more like slightly over 1.01 million.

Louis Scheffer

unread,
May 17, 2006, 3:01:50 AM5/17/06
to
Paul Rubin <http://phr...@NOSPAM.invalid> writes:

Yes, there are many theorems. This is the study of information and coding
theory.

Basically, any channel has a certain capacity C. If you try to transmit above
C, you have exactly the situation you mention (there is a certain minimum error rate, and the faster you go the worse the error rate).

However, these theorems are irrelevent, since disk drives do not operate above
C. Below C, a different situation applies - you can drive the error rate as
close as you wish to zero by applying sufficient computational effort.

For a binary channel that makes mistakes a fraction p of the time, the
capacity C is p*l(2*p) + (1-p)*l(2*(1-p)), where l(x) = log (base 2) of x.
Plugging in numbers,
C(0.01) = 0.91921 bits per symbol transmitted
C(0.0001) = 0.99853 bits per symbol transmitted
So with ideal coding, there is an 8% difference between the amount of
data you can send with an underlying error rate of 1%, as opposed to 0.01%.
Alternatively, sending the same amount of data over a 1% error rate channel
takes 8% more symbols than sending the same data over a 0.01% error rate
channel.

The above capacities are the theoretical limits, assuming you are willing
to devote arbitrary time to decoding the corrupted blocks. Practical codes
(those that we know how to decode reasonably quickly) will fall somewhat
short of these results. You would need fairly detailed information on
the actual codes and decoding used in disk drives to see what the real
penalty is.

Lou Scheffer

Louis Scheffer

unread,
May 17, 2006, 3:13:10 AM5/17/06
to
Bryan Olson <fakea...@nowhere.org> writes:

Yes, but this only happens below capacity, and capacity for a channel with
a 1% error rate is lower than that with a 0.01% error rate. I showed the
math in another reply, but you need at least 8% more redundancy bits to
achieve a low error rate on a 1% error channel than a 0.01% error channel.

Lou Scheffer

Bryan Olson

unread,
May 17, 2006, 2:16:46 AM5/17/06
to
Mike Amling wrote:

> [...] But a special read

> head and electronics for forensic use could well be too pricey for mass
> produced disk drives.

If disk-drive heads sold in the same quantities as lab equipment,
they'd be too pricey for forensic use.


--
--Bryan

giorgio.tani

unread,
May 17, 2006, 2:18:52 AM5/17/06
to
> The disk-drive companies have spent billions of dollars in research
> and development to get all the capacity they can from the medium.
> That million-dollar lab equipment is going to have a hard time beating
> the drive at its own game.
Chances are that those high tech equipment is developed with the
co-interest of the very same disk producers companies, simply the
device is not enough cheap or miniaturized to be sold on mainstream
market, or simply not jet mature; as said in another example 1% failure
would be fine in the field of data recovery but absolutely non
acceptable for commercial purpouse. Or a device vith a mtbf of some
days (due to mecanical stress, to the need of recalibration of the
devices and so on) may be good for an high cost date recovery service
but definitely not ready for the mass market.
There should be some years of difference between tech get researched
and tested and what is actually sold, to allow new techs to reach
advantageous cost and trustability.

> Is there such lab equipment? Has any such thing been demo'd with
> a modern drive?
AFAIK, disk recovery companies are quite an estabilished businness,
however I can agree with you for the fact that with disk tecnology
progress and with commercial tendece to be on the market as soon as
possible more and more, those data recovery companies are probably
having bad days!

Bryan Olson

unread,
May 17, 2006, 2:28:51 AM5/17/06
to
Louis Scheffer wrote:

> Bryan Olson writes:
>>In 1948, Shannon proved his "noisy channel coding theorem", showing
>>codes can get arbitrarily close to both a zero error-rate and the
>>theoretical-limit on data rate *simultaneously*.
>
> Yes, but this only happens below capacity, and capacity and capacity for a channel with

> a 1% error rate is lower than that with a 0.01% error rate.

The channel itself is analog; it has a signal-to-noise ratio, not
an error rate. What you call the channel's capacity I called the
theoretical limit on data-rate.


--
--Bryan

Paul Rubin

unread,
May 17, 2006, 2:47:56 AM5/17/06
to
Louis Scheffer <l...@Cadence.COM> writes:
> For a binary channel that makes mistakes a fraction p of the time, the
> capacity C is p*l(2*p) + (1-p)*l(2*(1-p)), where l(x) = log (base 2) of x.

Thanks, that helps a lot. Presumably C is the Shannon limit per cycle
= l(1+S/N) where S/N is the analog signal to noise ratio. So the
missing piece of the puzzle is what happens in real disk drives: are
we talking about a noisy high-frequency signal (several cycles per
bit, like in spread spectrum modulation) or a clean low-frequency one?
If it's a high frequency signal, what happens to C if we do something
like chill the disk head in liquid helium? That's something forensic
data recovery might attempt, that a disk drive in normal use really
couldn't. I do get the impression that PRML involves writing multiple
magnetic domains per data bit.

l...@cadence.com

unread,
May 17, 2006, 2:50:55 AM5/17/06
to
Bryan Olson wrote:

> giorgio.tani wrote:
> >>If there really were lab equipment that could read the previous
> >>contents of over-written sectors, the drive vendors would
> >>use those techniques to double capacity.

This is not at all clear. The disk drive makers have to include
engineering margin. The data might be written when the drive is cold,
and must be readable when the drive is hot. The servo might be
tracking a few nm towards the center when writing, and a few nm towards
the edge when reading. The drive might be subject to vibration (say in
a car or plane) which the servos cannot completely eliminate. The data
must be readable a few years later, after the circuits have aged and
drifted. If disk drives are like most electronics, a factor of 2 for
all these combined would not be unusual.

>From basic physics arguments, it would seem that at least 2 passes are
needed. Surely the actual, as written, width of the track varies as a
function of the various uncertainties above. Therefore, some of the
time the overwriting track must be narrower than the previously written
track. Therefore some of the old data remains on one edge of the
track, or the other, or both. To be sure you'd erased everything, even
if one pass gets rid of all magnetic history, then you would need one
pass with the head biased one way, and another pass with it biased the
other way.

Since drives do not support this level of tweaking, at least from the
traditional operating system drivers, you have the risk that no number
of passes is enough. If the original data was written wide, and the
new data is written narrow due to systematic causes, then you can write
all you want and the edges will remain. (Just as a thought experiment,
maybe you wrote the file while on an airplane, where the air is thin
and the head flies close. Then no amount of sea level erasing might
erase the edges.)

> Is there such lab equipment? Has any such thing been demo'd with a modern drive?

You seem to have this argument backwards. In security, you should not
assume something is sufficient just because you have never seen a demo
of it being broken.

Lou Scheffer

Paul Rubin

unread,
May 17, 2006, 4:29:30 AM5/17/06
to
l...@cadence.com writes:
> Since drives do not support this level of tweaking, at least from the
> traditional operating system drivers, you have the risk that no number
> of passes is enough. If the original data was written wide, and the
> new data is written narrow due to systematic causes, then you can write
> all you want and the edges will remain.

Well, ok, let's say the edges contain 1/10th of the energy of the
original full-width signal. So you with absolutely perfect equipment,
you can read the edges with 10 dB worse S/N than the original track.
Bryan's argument is that current disk drives and data encodings come
within a few dB of the Shannon limit. Let's say that means within 5
dB. High speed dialup phone modems must do even better than that, so
such performance sounds plausible. That means with 10 dB extra noise,
you now have to read data at 5 dB -above- the Shannon limit. So what
does that do to the bit error rate? Let's say it's a rate 1/2 code
and since it's originally leaving 5 dB of capacity unused,
lg(1+(S/N)+5 dB) = 1/2, i.e., lg(1+sqrt(10)*S/N) = 1/2.
That means S/N = (sqrt(2)-1)/sqrt(10) = 0.13. Now we increase
that by 10x, i.e. add 10 dB of noise, so S/N=0.013.
The Shannon limit is now lg(1+(sqrt(2)-1)/(10*sqrt(10))) = about 0.02.

Using your earlier formula for the bit error rate,

C = p*lg(2*p) + (1-p)*lg(2*(1-p))

we get about p=0.42, i.e. each output bit contains about 0.98 bits of
noise entropy and only 0.02 bits from the original data. Even for
something like English plaintext, this is pretty useless. If it's
random-looking bits, like crypto keys or compressed data, it's beyond
hope.

Of course the above calculation could be totally wrong (I have no idea
how to do this stuff, I'm just making it up as I go along) and/or the
actual S/N could be better than the guesses. However, I begin to see
that starting with reasonable assumptions, one can get out numbers
that represent hard limits.

Rob Warnock

unread,
May 17, 2006, 4:38:49 AM5/17/06
to
Unruh <unruh...@physics.ubc.ca> wrote:
+---------------

| The error correction costs. If it has to fix two or three bit errors
| the redundancy goes up.
+---------------

Yes, but modern block codes can correct many *bytes* of errors
with fairly-low redundancy. Consider the truncated Reed-Solomon
code (192,171) over GF(256), which when interleaved three ways
can correct up to *30* bytes in a row in a 512-byte disk sector
with only 11% overhead.


-Rob

p.s. Triply-interleaved R-S (2*t+172, 171)/GF(256) codes were
very popular for disks a few years back, since 3*171 = 513
(just over the 512-byte sector size), and you can pick any
0 < t < 44 and get a code that can correct up to a 3*t byte
media burst error, or a lot of other smaller error patterns
(e.g. any random "t" bytes in error in each interleave).

They have better ones now...

-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607

Unruh

unread,
May 17, 2006, 11:14:45 AM5/17/06
to
Paul Rubin <http://phr...@NOSPAM.invalid> writes:

No, you have to also identify that there are broken ones, which are the
broken ones and replace each broken one by a good one.
Bits are not interchangeable.

Mxsmanic

unread,
May 17, 2006, 12:25:06 PM5/17/06
to
George Orwell writes:

> I don't understand (intuitively) why you would need to wipe a hard drive
> with more than one pass unless the data is somehow "layered" magnetically
> or parts of it are accidentally skipped altogether in the initial passes.

If nobody has physical access to the drive, a single pass is
sufficient.

If anyone can gain physical access to the drive, you need multiple
overwrites, since each write of the disk leaves a "ghost" that can be
recovered by the right equipment (this requires taking the drive
apart, though).

--
Transpose mxsmanic and gmail to reach me by e-mail.

Bryan Olson

unread,
May 17, 2006, 1:17:46 PM5/17/06
to
Unruh wrote:
> Bryan Olson writes:
>
>>Unruh wrote:

>>>They may well get max capacity for the desired error rate.
>
>>There isn't really any such thing. For any data-rate less than the
>>theoretical capacity, and any error-rate greater than zero, there
>>exists a code that acheves both the data-rate and the error-rate.
>>That's Shannon's "noisy channel coding theorem".
>
> Yes, and the lower the error rate the lower the channel capacity.

"The desired error rate" has no effect on channel capacity. Channel
capacity depends upon the channel's bandwidth and signal-to-noise
ratio, and not on what we desire. Our choice of coding does depend
on the error rate we want, and max capacity for a desired error rate
of 10**-2 equals the optimal the max capacity for a desired error
rate of 10**-15 equals the channel capacity.

--
--Bryan

l...@cadence.com

unread,
May 17, 2006, 1:57:57 PM5/17/06
to
Paul Rubin wrote:
> l...@cadence.com writes:
> > Since drives do not support this level of tweaking, at least from the
> > traditional operating system drivers, you have the risk that no number
> > of passes is enough. If the original data was written wide, and the
> > new data is written narrow due to systematic causes, then you can write
> > all you want and the edges will remain.
>
> Well, ok, let's say the edges contain 1/10th of the energy of the
> original full-width signal. [...] Using your earlier formula for the bit error rate,
> [...] each output bit contains about 0.98 bits of

> noise entropy and only 0.02 bits from the original data. Even for
> something like English plaintext, this is pretty useless. If it's
> random-looking bits, like crypto keys or compressed data, it's beyond
> hope.

This is assuming the reading technology is the same as that used in the
drive, but reading technology is advancing, too. Imagine someone
making the same argument you did 20 years ago, on then current disk
drives. Sure, *that* reader cannot reliably read the remaining
fragment (otherwise, as Bryan Olson pointed out, manufacturers would
have used it to increase capacity). But a read head/amplifier from
today could easily read that 1/10 wide track, probably with much better
signal to noise than the original reader could do on the full track.

So, as Bryan says, one wipe is probably good enough to prevent today's
disk from being easily read with today's technology. But it may well
not be enough to prevent today's disk being read with tomorrow's
technology. And this does not even count what could be done by a
really determined adversary, such as using an atomic force microscope
to measure the remaining magnetism. Although slow and expensive, this
can yield a signal to noise limited only by domains on the disk.

So, as usual, the amount of effort you need to spend depends on the
criticality of your secret and the attack model. If the secret must be
kept from a mildly determined adversary for 1-2 years, one wipe is
probably enough. In 10 years, though, a medium determined adversary
might read this using then-current technology. And if you are trying
to protect your secret from national-lab class attacks far into the
future, you proably need to rely on fundamental physics rather than any
technological limits. I'd be tempted to degauss the whole drive while
heating it above the Curie temperature.

Lou Scheffer

Paul Rubin

unread,
May 17, 2006, 2:33:22 PM5/17/06
to
Unruh <unruh...@physics.ubc.ca> writes:
> >According to the theorem that Bryan cited, it sounds like (1% plus
> >epsilon) is enough, for arbitrarily small epsilon. That is what I
> >would have expected. How many sugar cookies do I have to ship you to
> >be 99.99% sure of your receiving 1 million unbroken ones, if 1% of
> >shipped cookies get broken in transit? The answer is not 2 million or
> >1.5 million. It's more like slightly over 1.01 million.
>
> No, you have to also identify that there are broken ones, which are the
> broken ones and replace each broken one by a good one.
> Bits are not interchangeable.

Louis Scheffer gave the theoretical numbers. The answer turns out to
be about 1.08 million. Intuitively it works like this: you have a
noise source that flips a bit p=1% of the time and leaves it alone 99%
of the time. The entropy in that event is -(p*lg(p) + (1-p)*lg(1-p))
or about 0.08 bits (I'm not sure where the factors of 2 in Louis's
formula came from). So each bit in the output stream carries
(ideally) 1 bit of entropy, of which .08 bits came from the noise
source and .92 bits came from the original input. That then is the
best we can hope for. Rob Warnock gave an example of a (192,171)
Reed-Solomon code with 8-bit symbols but I think we need something a
little stronger for an error rate this high. However, from data
compression we should hope to be able to approach that 8% limit.

Paul Rubin

unread,
May 17, 2006, 2:41:10 PM5/17/06
to
l...@cadence.com writes:
> And this does not even count what could be done by a
> really determined adversary, such as using an atomic force microscope
> to measure the remaining magnetism. Although slow and expensive, this
> can yield a signal to noise limited only by domains on the disk.

I see what you're getting at. The SNR of the reading system in an
actual disk drive is worse than that limit. The question then is how
much worse. Again, it should be possible to get actual numbers out.

> And if you are trying to protect your secret from national-lab class
> attacks far into the future, you proably need to rely on fundamental
> physics rather than any technological limits. I'd be tempted to
> degauss the whole drive while heating it above the Curie
> temperature.

Simply melting the drive should more than sufficient ;-).

nemo_outis

unread,
May 17, 2006, 3:37:03 PM5/17/06
to
Mxsmanic <mxsm...@gmail.com> wrote in
news:kgjm62132ekva9vdl...@4ax.com:

> George Orwell writes:
>
>> I don't understand (intuitively) why you would need to wipe a hard
>> drive with more than one pass unless the data is somehow "layered"
>> magnetically or parts of it are accidentally skipped altogether in
>> the initial passes.
>
> If nobody has physical access to the drive, a single pass is
> sufficient.

If nobody has access to the drive then zero passes are sufficient :-)

Regards,

Ari Silverstein

unread,
May 17, 2006, 3:40:01 PM5/17/06
to
On Tue, 16 May 2006 19:49:36 -0600, Luc The Perverse wrote:

> I'm curious what is so damned secret that people would dig through trash for
> a piece of hard disk to get it back.

Ah, Luc, drift further from school and into the real world, my friend.
--
Drop the alphabet for email

Phil Carmody

unread,
May 17, 2006, 3:50:08 PM5/17/06
to

But you can find all of that data on the back seat of the car
in the car park.

Phil
--
The man who is always worrying about whether or not his soul would be
damned generally has a soul that isn't worth a damn.
-- Oliver Wendell Holmes, Sr. (1809-1894), American physician and writer

Unruh

unread,
May 17, 2006, 4:35:49 PM5/17/06
to
Bryan Olson <fakea...@nowhere.org> writes:

Something happened to the error correction on that last sentence. The words
do not fit toghether into a coherent sentence.


>--
>--Bryan

Luc The Perverse

unread,
May 17, 2006, 4:41:50 PM5/17/06
to
"Ari Silverstein" <abcarisi...@yahoo.comxyz> wrote in message
news:h03gyzey593p$.17kqhfkj6m32z.dlg@40tude.net...

> On Tue, 16 May 2006 19:49:36 -0600, Luc The Perverse wrote:
>
>> I'm curious what is so damned secret that people would dig through trash
>> for
>> a piece of hard disk to get it back.
>
> Ah, Luc, drift further from school and into the real world, my friend.

Secrets are dumb?

--
LTP

:)


Unruh

unread,
May 17, 2006, 4:50:27 PM5/17/06
to
Paul Rubin <http://phr...@NOSPAM.invalid> writes:

>Unruh <unruh...@physics.ubc.ca> writes:
>> >According to the theorem that Bryan cited, it sounds like (1% plus
>> >epsilon) is enough, for arbitrarily small epsilon. That is what I
>> >would have expected. How many sugar cookies do I have to ship you to
>> >be 99.99% sure of your receiving 1 million unbroken ones, if 1% of
>> >shipped cookies get broken in transit? The answer is not 2 million or
>> >1.5 million. It's more like slightly over 1.01 million.
>>
>> No, you have to also identify that there are broken ones, which are the
>> broken ones and replace each broken one by a good one.
>> Bits are not interchangeable.

>Louis Scheffer gave the theoretical numbers. The answer turns out to
>be about 1.08 million. Intuitively it works like this: you have a
>noise source that flips a bit p=1% of the time and leaves it alone 99%
>of the time. The entropy in that event is -(p*lg(p) + (1-p)*lg(1-p))
>or about 0.08 bits (I'm not sure where the factors of 2 in Louis's
>formula came from). So each bit in the output stream carries
>(ideally) 1 bit of entropy, of which .08 bits came from the noise
>source and .92 bits came from the original input. That then is the
>best we can hope for. Rob Warnock gave an example of a (192,171)

That has an extra 21 parity symbols per 171 data symbols. An expansion of
about 13%, which can fix about 10 errors in those 192 symbols. If the
errors are all independent ( a bad assumption on disks) the probability of
>10 bit errors in 192 symbols ( 1500 bits) is about unity ( the average
>number of errors is 15). Ie, you had better use a lot more redundant code
>than that. And to drive that error rate down to 10^-15/bit an even more
>redundant code is needed.

>Reed-Solomon code with 8-bit symbols but I think we need something a
>little stronger for an error rate this high. However, from data
>compression we should hope to be able to approach that 8% limit.

Data compression? That has nothing to do with it. Assume that the data is
completely uncompressible ( which is certainly what a disk drive has to
assume). Sorry I do not know what you mean here.

Ari Silverstein

unread,
May 17, 2006, 4:51:23 PM5/17/06
to

You stated you were not aware that ppl would rummage for HD data in the
trash.

I found that cute, naive but cute :0

Mike Amling

unread,
May 17, 2006, 5:16:26 PM5/17/06
to
Unruh wrote:
> Dirty Ed <nob...@spamcop.net> writes:
>> It is important to initially emphasize that erasure security can only
>> be relative. There is no method giving
>> absolutely secure erase. Government document *DoD 522.22M* is commonly
>> quoted on erasure methods, and
>> requires physical destruction of the storage medium (the magnetic
>> disks) for data classified higher than Secret.
>> However, even such physical destruction is not absolute if any
>> remaining disk pieces are larger than a single
>> 512-byte record block in size, about 1/125” today’s drives. Pieces of
>> this size are found in bags of destroyed
>> disk pieces studied at CMRR. Magnetic microscopy can image the stored
>> recorded media bits, using the CMRR
>> scanning magnetoresistive microscope. Physical destruction nevertheless
>> offers the highest level of erasure
>> because recovering any actual user data from a magnetic image requires
>> overcoming almost a dozen
>> independent recording technology hurdles.
>
>> Entire document can be viewed at:

>> http://cmrr.ucsd.edu/Hughes/CmrrSecureEraseProtocols.pdf
>
> Well, I would shred it and then heat it to well above the Neal temperature
> of the recording material. That should get rid of all possibilities.

The Neal temperature? What's that? How does it compare to the Curie
temperature?

--Mike Amling

l...@cadence.com

unread,
May 17, 2006, 5:39:00 PM5/17/06
to
Paul Rubin wrote:
> Unruh <unruh...@physics.ubc.ca> writes:
> > >According to the theorem that Bryan cited, it sounds like (1% plus
> > >epsilon) is enough, for arbitrarily small epsilon.
> >
> > No, you have to also identify that there are broken ones, which are the
> > broken ones and replace each broken one by a good one.
> > Bits are not interchangeable.
>
> Louis Scheffer gave the theoretical numbers. The answer turns out to
> be about 1.08 million. Intuitively it works like this: you have a
> noise source that flips a bit p=1% of the time and leaves it alone 99%
> of the time. The entropy in that event is -(p*lg(p) + (1-p)*lg(1-p))
> or about 0.08 bits (I'm not sure where the factors of 2 in Louis's
> formula came from).

The factor of 2 comes from the fact that I'm computing the capacity and
you are computing the entropy, and the capacity = 1-entropy.

C = p*l(2*p) + (1-p)*l(2*(1-p))

C = p*[l(2) + l(p)] + (1-p)*[l(2) + l(1-p)], but l(2) = 1, so

C = p + (1-p) + p*l(p) + (1-p)*l(1-p)

C = 1 + p*l(p) + (1-p)*l(1-p)

C = 1 - entropy(p), as you defined it above. So the expressions agree

Lou Scheffer

Paul Rubin

unread,
May 17, 2006, 5:48:52 PM5/17/06
to
Unruh <unruh...@physics.ubc.ca> writes:
> Data compression? That has nothing to do with it. Assume that the data is
> completely uncompressible ( which is certainly what a disk drive has to
> assume). Sorry I do not know what you mean here.

I just mean that we usually entertain a general notion that given a
precise enough model of a source of data, we can compress it down to
arbitrarily close to the Shannon entropy, so we should hope to be able
to do something similar with error correcting codes.

Let's look at our 1% error rate channel again. Say we have a vector
of 100 bits and we flip each one with probability .01. On average we
expect to flip 1 bit out of the 100. From the binomial distribution,
the likelihood is around .996 that 4 or fewer bits are flipped. So if
we can find, say, a (100,90) code that corrects almost all the <=
4-bit errors, then 99.6% of the time we can decode 90 input bits
correctly, or 0.4%/90=.0044% errors per bit which is within the .01
that we wanted.

I'm not sure if that's possible.. Since there are less than 2**22
ways of flipping <= 4 bits out of 100, there is a very obvious
(100,78) code that corrects all 4-bit errors. I'd expect to be able
to do better than that, but for now I'm having trouble showing an
explicit construction. Maybe someone knowledgeable can chime in.

laura fairhead

unread,
May 17, 2006, 5:51:17 PM5/17/06
to

Why would you need to shred it in the first place, why not just
melt it down so everything is liquid/gas ? Surely temperature
just boils down to how much heat you need to turn every part
of the drive into liquid or gas... small dedicated blast-furnace
comes to mind ?!

byefornow
laura

>
>--Mike Amling

--
echo moc.12klat@daehriaf_arual|sed 's/\(.\)\(.\),*/\2,\1/g;h;s/,//g;/@t/q;G;D'

l...@cadence.com

unread,
May 17, 2006, 6:12:50 PM5/17/06
to
Paul Rubin wrote:
> Unruh <unruh...@physics.ubc.ca> writes:
>
> I just mean that we usually entertain a general notion that given a
> precise enough model of a source of data, we can compress it down to
> arbitrarily close to the Shannon entropy, so we should hope to be able
> to do something similar with error correcting codes.
>
> Let's look at our 1% error rate channel again. Say we have a vector
> of 100 bits and we flip each one with probability .01. On average we
> expect to flip 1 bit out of the 100. From the binomial distribution,
> the likelihood is around .996 that 4 or fewer bits are flipped. [...]

>
> I'm not sure if that's possible.. Since there are less than 2**22
> ways of flipping <= 4 bits out of 100, there is a very obvious
> (100,78) code that corrects all 4-bit errors. I'd expect to be able
> to do better than that, but for now I'm having trouble showing an
> explicit construction. Maybe someone knowledgeable can chime in.

You are proceeding directly down the path Shannon walked many years
ago. The code you seek is very simple (and very impractical). As you
note above, a (90,100) code is below capacity. So simply assign each
of 2^90 code words at random (choose 100 random bits for each). Then
when you receive a codeword, just compare it to your dictionary and
choose the closest. This will give you a certain residual error rate.
If it's not good enough, go to a (180,200) code. Still not good
enough? Try a (900,1000) code, and so on. As long as you are below
capacity, this will eventually give you as small an error rate as you
desire. Also, it's fairly easy to show that this is the best you can
do.

Of course a code such as this will take more atoms than exist to hold
the codebook, and more than the lifetime of the universe to decode one
block. So we seek more algorithmic codes. The best so far, Low
Density Parity Check codes, which iterate over the received word to try
to find the word that was sent, can get within a few percent of the
limit at the cost of a few hundred high-precision operations per bit.
Algebraic codes, where you compute some expression that tells you which
bits (if any) are wrong, are much much faster but do not approach the
limit nearly as closely (maybe a factor of 2). Reed-Solomon codes are
an example of this strategy.

Lou Scheffer

Bryan Olson

unread,
May 17, 2006, 6:20:03 PM5/17/06
to
Dirty Ed wrote:
> On or about 5/16/2006 1:00 PM, Bryan Olson penned the following:
>
>>Dirty Ed wrote:
>>>There is equipment out there that WILL get enough of
>>>the previous information to make sense unless there are enough re-writes
>>>to prevent it.
>>
>>Do you have any citation showing that for a modern disk drive
>>"enough re-writes to prevent it" is more than one?
>
> Usually 3-10 rewrites are enough for most people.

The question was what is enough to prevent the equipment you
noted from reading intelligible data, not what is enough in
the judgment of most people. Could you be more specific about
what equipment you were talking about? Any demonstrations of
that it can read over-written data from a modern drive?


> The Government
> requires physical destruction: Here is a quote from one page:


>
> It is important to initially emphasize that erasure security can only
> be relative. There is no method giving
> absolutely secure erase. Government document *DoD 522.22M* is commonly
> quoted on erasure methods, and
> requires physical destruction of the storage medium (the magnetic
> disks) for data classified higher than Secret.
> However, even such physical destruction is not absolute if any
> remaining disk pieces are larger than a single
> 512-byte record block in size, about 1/125” today’s drives. Pieces of
> this size are found in bags of destroyed
> disk pieces studied at CMRR. Magnetic microscopy can image the stored
> recorded media bits, using the CMRR
> scanning magnetoresistive microscope.

That's talking about data that was *not* over-written by the drive.
It has nothing to do with the question at issue.

[...]

What does this document have to do with the question?


--
--Bryan

Luc The Perverse

unread,
May 17, 2006, 7:54:20 PM5/17/06
to
"Ari Silverstein" <abcarisi...@yahoo.comxyz> wrote in message
news:4jku6c0eu5l1$.m6vs3rgsrwod.dlg@40tude.net...

I fear we won't learn the futility of life until it is over ;)

--
LTP

:)


Paul Rubin

unread,
May 17, 2006, 9:06:10 PM5/17/06
to
l...@cadence.com writes:
> You are proceeding directly down the path Shannon walked many years
> ago. The code you seek is very simple (and very impractical). As you
> note above, a (90,100) code is below capacity. So simply assign each
> of 2^90 code words at random (choose 100 random bits for each). Then
> when you receive a codeword, just compare it to your dictionary and
> choose the closest.

Thanks, this is really cool. It's way removed from disk recovery
but I always wanted to understand this stuff.

It's neat that even though you expect around 2**80 birthday collisions
(= codewords assigned to more than one data point and which therefore
won't decode) it doesn't necessarily kill you. With 2^90 code words
you get 0.1% total failures from collisions alone, but as the words
get larger the effect becomes negligible. Then there are around
2**97.3 (i.e. 100*2**90) vectors that are 1 bit away from a code word,
so if you take a given code word and flip one bit, the likelihood of a
non-unique nearest match is about 1/10th, not too bad. Working out all
the rest of the numbers should give the residual error rate and looks
like a good exercise.

Mxsmanic

unread,
May 17, 2006, 10:36:42 PM5/17/06
to
Bryan Olson writes:

> "The desired error rate" has no effect on channel capacity.

Yes, it does. The more errors you can tolerate, the higher the
capacity of the channel. If you can tolerate any number of errors,
the capacity of the channel is infinite.

> Channel capacity depends upon the channel's bandwidth and signal-to-noise
> ratio, and not on what we desire.

Signal to noise is just a restatement of the number of errors one is
willing to tolerate.

> Our choice of coding does depend on the error rate we want ...

And the choice of coding determines the channel capacity. QED.

Mxsmanic

unread,
May 17, 2006, 10:37:18 PM5/17/06
to
nemo_outis writes:

> If nobody has access to the drive then zero passes are sufficient :-)

Yes, but I qualified it as _physical_ access.

Bryan Olson

unread,
May 17, 2006, 11:13:04 PM5/17/06
to
Mxsmanic wrote:
> Bryan Olson writes:
>>"The desired error rate" has no effect on channel capacity.
>
> Yes, it does. The more errors you can tolerate, the higher the
> capacity of the channel.

I'm going to continue using Shannon's definition of channel
capacity, not yours.


--
--Bryan

Rob Warnock

unread,
May 17, 2006, 11:26:46 PM5/17/06
to
Unruh <unruh...@physics.ubc.ca> wrote:
+---------------

| Paul Rubin <http://phr...@NOSPAM.invalid> writes:
| >Rob Warnock gave an example of a (192,171) ...

|
| That has an extra 21 parity symbols per 171 data symbols. An expansion
| of about 13%, which can fix about 10 errors in those 192 symbols.
+---------------

Or 30 errors in the whole block, if the errors are evenly-divided
among the three interleaved code blocks. [*Not* a good assumption!]

+---------------
| >...errors are all independent ( a bad assumption on disks)...
+---------------

True, which is why the 3-way interleave is (was?) popular. It would
"use up" only one of the 10 bytes of correction for any burst error of
up to 24 bits. That is, it will correct up to 10 bursts of 24 bits each
(or less) anywhere in a standard 512-byte disk block (plus overhead).

+---------------


| > the probability of >10 bit errors in 192 symbols ( 1500 bits) is
| > about unity ( the average number of errors is 15). Ie, you had better
| > use a lot more redundant code than that.

...


| >Reed-Solomon code with 8-bit symbols but I think we need something a
| >little stronger for an error rate this high. However, from data
| >compression we should hope to be able to approach that 8% limit.

+---------------

Actually, Reed-Solomon do codes work best for burst errors, not random
bit errors. So when there is a high rate of Gaussian bit errors, what
people do is use an inner convolutional code (with a constraint length
about the same as the R-S symbol size or so, e.g., k=7 or k=9) and then
wrap an R-S code around that [or interleaved R-S codes, if you need a
larger block size than is convenient]. This first started in FEC/ECC
codes for satellites and deep-space networks, but later made its way
into hard disks with PRML decoders [Partial Response Maximum Likelihood].
Soft-decision convolutional decoders (Viterbi, etc.) do a very good job
on large numbers of isolated Gaussian random bit errors, but when they
fail (miscorrect) they create a much larger burst error... which R-S
codes, conveniently, are very good at correcting!! As it says here:

http://en.wikipedia.org/wiki/Reed-Solomon_error_correction
...
Viterbi decoders tend to produce errors in short bursts.
Correcting these burst errors is a job best done by short or
simplified Reed-Solomon codes.

Modern versions of concatenated Reed-Solomon/Viterbi-decoded
convolutional coding were and are used on the Mars Pathfinder,
Galileo, Mars Exploration Rover and Cassini missions, where they
perform within about 1-1.5 dB of the ultimate limit imposed by
the Shannon capacity.

Note that when the random error is *VERY* high [e.g., in deep-space
work], the required redundancy on the inner code becomes very high
as well. It is not at all unusual to see "rate 1/2" codes [codes with
a 2:1 redundancy!] in such work.

Disk drives need less help. I'm guessing that things like EPRML
<http://www.storagereview.com/guide2000/ref/hdd/geom/dataEPRML.html>
are using what amount to PRML codes of rate-7/8 or higher rates,
with R-S codes wrapped around that. [But I've been out of that for
a while, so I'm not sure what the current state-of-the-art is.]


-Rob

p.s. I just stumbled across the following paper, which has a good
overview of the subject [including at the end a *very* good list
of recommended readings, many of which are in my personal library]:

http://www.pericle.com/papers/Error_Control_Tutorial.pdf

p.p.s. Those *very* interested in the gory details of disk drive
coding might like this:

http://vivaldi.ucsd.edu:8080/~kcheng/thesis/thesis.pdf
Michael K. Cheng, "Algebraic Soft-Decision Decoding
Techniques for High-Density Magnetic Recording"

Tim Smith

unread,
May 18, 2006, 12:37:47 AM5/18/06
to
In article <1147888677....@i40g2000cwc.googlegroups.com>,

l...@cadence.com wrote:
> So, as Bryan says, one wipe is probably good enough to prevent today's
> disk from being easily read with today's technology. But it may well not
> be enough to prevent today's disk being read with tomorrow's technology.
> And this does not even count what could be done by a really determined
> adversary, such as using an atomic force microscope to measure the
> remaining magnetism. Although slow and expensive, this can yield a signal
> to noise limited only by domains on the disk.

That's an important point. The read technology being used by the disk
manufacturer has something like 2 nanoseconds to read a given bit. The
attacker is not under that kind of time constraint, and so can use
*existing* technology that would be completely out of the question for the
disk drive manufacturer to use.

--
--Tim Smith

Paul Rubin

unread,
May 18, 2006, 1:32:05 AM5/18/06
to
rp...@rpw3.org (Rob Warnock) writes:
> Note that when the random error is *VERY* high [e.g., in deep-space
> work], the required redundancy on the inner code becomes very high
> as well. It is not at all unusual to see "rate 1/2" codes [codes with
> a 2:1 redundancy!] in such work.

Those are used in terrestrial communications as well. I guess there's
an optimization problem and I wonder if rate 1/2 codes are some kind
of general solution. That is: you want send the maximum error-free
bit rate for a given amount of transmitter power. At 1000 bits/sec
the raw bit error rate (BER) is E0. You can double the SNR by sending
500 bits/sec instead, giving twice the energy per bit, resulting in a
lower raw BER, call it E1. Or, you can keep sending 1000 bits/sec but
use a rate 1/2 error correction code, ending up with some residual BER
(call it E2). Or maybe you should make the SNR even worse, sending
2000 bps with a rate 1/3 code. So there's an optimal tradeoff that
may or may not depend a lot on the channel parameters.

> p.s. I just stumbled across the following paper, which has a good
> overview of the subject [including at the end a *very* good list
> of recommended readings, many of which are in my personal library]:
>
> http://www.pericle.com/papers/Error_Control_Tutorial.pdf
>
> p.p.s. Those *very* interested in the gory details of disk drive
> coding might like this:
>
> http://vivaldi.ucsd.edu:8080/~kcheng/thesis/thesis.pdf
> Michael K. Cheng, "Algebraic Soft-Decision Decoding
> Techniques for High-Density Magnetic Recording"

Thanks, the first paper is quite readable and the second one indicates
that future disk drives will use very sophisticated codes.

Wikipedia's coverage is also pretty good, I'm finding, and there's
this online textbook that I downloaded a long time ago and have been
looking at again:

http://www.inference.phy.cam.ac.uk/itprnn/book.html

I'd like to find out more about the analog side. Most of what I know
came from an old article by Phil Karn about why Morse code is far less
spectrum-efficient than a lot of the ham radio community seems to
think. That's a good article and is probably online somewhere but I
don't remember specifics.

l...@cadence.com

unread,
May 18, 2006, 3:09:24 AM5/18/06
to
Bryan Olson wrote:
> [...] Any demonstrations of that it can read over-written data from a modern drive?
>
The issue of demos is not helpful. Of course, if someone can supply a
demo then we know that 1 pass over-writing is not secure. But if no
demo exists, it proves nothing, since there are 2 possibilities:

(a) No one knows how, which is why no demos exist, or

(b) Someone knows how, but is not demo-ing.

Note that you have no way of telling whether (a) or (b) is true, unless
(b) is forbidden by the laws of physics (and even then beware of
someone who takes an approach you did not anticipate - see side-channel
attacks). This is not just an academic distinction - it has literally
changed world history. In World War II, Japan and Germany were faced
with exactly this problem with their codes. Even though some of their
people thought that maybe their codes were being broken, no-one on
their side could demonstrate how this could be done. Hence they
believed (a), but unfortunately (for them) the answer was (b), and they
lost the war. Note also that their adversaries, knowing very well that
a demo would result in loss of intelligence, actively worked to
preserve case (b), up to and including allowing their own people to die
rather than doing a demo.

If you are going to consider demos, what would be more relevent is if
anyone today can read the data off a 1980s disk that was over-written
once. This better simulates the problem that the reading technology
will continue to advance, whereas you may never get another chance to
over-write the data some more.

Lou Scheffer

Bryan Olson

unread,
May 18, 2006, 5:24:44 AM5/18/06
to
l...@cadence.com wrote:
> Bryan Olson wrote:
>
>>[...] Any demonstrations of that it can read over-written data from a modern drive?
>>
>
> The issue of demos is not helpful.

B.S.

> Of course, if someone can supply a
> demo then we know that 1 pass over-writing is not secure. But if no
> demo exists, it proves nothing, since there are 2 possibilities:
>
> (a) No one knows how, which is why no demos exist, or
>
> (b) Someone knows how, but is not demo-ing.

So how many layers of aluminum foil do you need to put over your
head to stop secret agents from reading your mind?

I have a theory that for a modern drive, more than one
physical over-write is gratuitous. The scientific method demands
that I look for evidence that would dis-prove my theory. The
closest I've found is reports of the recovery of over-written
data at far lower densities and, more importantly, when the writer
does nothing to address hysteresis.

There are plausible modes of failure, where one over-write will
not destroy the data. Heads may be aligned differently, or the
signal on a track may, over time, induce magnetism beyond the
normal edges of the track. Unfortunately, doing 30 re-writes in
succession does not address these problems.

In this case, a poster specifically claimed there is equipment
out there that can recover intelligible data. He left himself an
out by saying "unless there are enough re-writes to prevent it";
if we take that literally then it's true by definition. Anyway,
I want to know what equipment he was talking about and what
evidence there is that it can do what he suggest it can.


--
--Bryan

Richard Herring

unread,
May 18, 2006, 6:26:56 AM5/18/06
to
In message <KoMag.1216$E84...@chiapp18.algx.net>, Mike Amling
<nos...@foobaz.com> writes
>Unruh wrote:

[...]

>> Well, I would shred it and then heat it to well above the Neal
>>temperature
>> of the recording material. That should get rid of all possibilities.
>
> The Neal temperature? What's that? How does it compare to the Curie
>temperature?
>

It's the Néel temperature, (if you see garbage, the second letter should
be an e acute) and it's the analogue of the Curie temperature for
_anti_-ferromagnetic materials.

--
Richard Herring

l...@cadence.com

unread,
May 18, 2006, 12:18:05 PM5/18/06
to
Bryan Olson wrote:
> l...@cadence.com wrote:
> > Bryan Olson wrote:
> >
> >>[...] Any demonstrations of that it can read over-written data from a modern drive?
> >>
> >
> > The issue of demos is not helpful.
>
> B.S.
>
> > Of course, if someone can supply a
> > demo then we know that 1 pass over-writing is not secure. But if no
> > demo exists, it proves nothing, since there are 2 possibilities:
> >
> > (a) No one knows how, which is why no demos exist, or
> >
> > (b) Someone knows how, but is not demo-ing.
>
> So how many layers of aluminum foil do you need to put over your
> head to stop secret agents from reading your mind?

This is an excellent example. Suppose I have a secret that a Large
Government Organization would really like to know, such as the name of
the country to which I sold that enriched uranium. If I'm just
walking down the street I need no tinfoil. On the other hand, if
government agents grab me, stick me inside a MRI brain-imaging machine,
and start asking me questions about countries while monitoring the
activity in different portions of my brain, I would be *very* concerned
for the security of my secret.

Exactly analagously, I'm not worried about someone casually reading by
drive after one over-write. But if they can apply specialist
equipment, I'm not sure it cannot be done.

> I have a theory that for a modern drive, more than one physical over-write is gratuitous.

This could be true in two ways:

(a) One overwrite is already perfect, so N cannot be any better

(b) One overwrite is not perfect, but N is not any better, or at least
not significantly better.

Clearly, you do not believe (a), since you state "There are plausible


modes of failure, where one over-write will not destroy the data. Heads
may be aligned differently, or the signal on a track may, over time,
induce magnetism beyond the normal edges of the track."

This leaves (b). But (b) seems dubious, since there are at least 2
kinds of information leakage the multiple writing might help. One is
the edge of track problem - since your servos are never perfect,
successive overwrites will wipe a larger width. The next is
hysteresis. Modern methods reduce, but do not eliminate, this problem.
So N overwrites are clearly better - the question is whether they are
better enough to matter, and this seems very dependent on the technical
details of the drive.

>The scientific method demands that I look for evidence that would dis-prove my theory.

Unfortunately there are some cases where the scientific method does not
work, and this is one of them. The scientific method assumes that each
side is in search of the truth, and will therefore advance their own
theory by publically disproving the theories of others. But in this
case your opponent is not searching for the truth - in fact, it's
better for them if you wrongly believe your method is secure. So
suppose they possess a dis-proof of your theory. They will not only
not demo it, they will actively try to convince you that all dis-proofs
have failed, where in fact theirs succeeded. In political
code-breaking, the last thing you want is to tell your opponent you've
broken their code, where in academic code breaking, where the
scientific method applies, it's the very first thing you do.

Lou Scheffer

Unruh

unread,
May 18, 2006, 2:29:16 PM5/18/06
to
Bryan Olson <fakea...@nowhere.org> writes:

Shannon defined channel capacity in terms of the signal to noise of the
channel. Signal to noise is precisely a measure of the error. The higher
the signal to noise the higher the capacity. All error correction does is
increase the signal to noise of the channel. It comes at a cost in
redundancy.


>--
>--Bryan

Bryan Olson

unread,
May 18, 2006, 2:57:03 PM5/18/06
to
Unruh wrote:
> Shannon defined channel capacity in terms of the signal to noise of the
> channel.

No, he defined it as the rate one can reliably pass symbols across
the channel. See (1) in:

http://www.stanford.edu/class/ee104/shannonpaper.pdf

He also proved a theorem that for a bandwidth-limited channel
with Gaussian noise, the capacity is proportional to the log
of (1 + signal/noise). That was not the definition; that was
a result.


> Signal to noise is precisely a measure of the error.

You have "precisely" where "vaguely" would apply.


> The higher
> the signal to noise the higher the capacity. All error correction does is
> increase the signal to noise of the channel.

The analog channel has a signal-to-noise ratio, and it's fixed
regardless of what computations you choose to do on the bits
you send across it.


--
--Bryan

Mxsmanic

unread,
May 18, 2006, 4:17:09 PM5/18/06
to
Bryan Olson writes:

> The analog channel has a signal-to-noise ratio, and it's fixed
> regardless of what computations you choose to do on the bits
> you send across it.

Analog channels have no signal-to-noise ratio. Setting a threshold
for signal-to-noise converts an analog channel into a digital channel.

Bryan Olson

unread,
May 18, 2006, 8:06:00 PM5/18/06
to
Mxsmanic wrote:
> Bryan Olson writes:
>
>
>>The analog channel has a signal-to-noise ratio, and it's fixed
>>regardless of what computations you choose to do on the bits
>>you send across it.
>
>
> Analog channels have no signal-to-noise ratio. Setting a threshold
> for signal-to-noise converts an analog channel into a digital channel.

Don't see what you mean. For a given channel, the signal-to-noise
ratio is a real constant. Suppose we have a channel with a one MHz
bandwidth and a signal-to-noise ratio of one million (or 60dB).
What "threshold" might you put on one million?


--
--Bryan

Mxsmanic

unread,
May 19, 2006, 12:49:54 AM5/19/06
to
Bryan Olson writes:

> Don't see what you mean.

In analog systems, all signal is information; that is why analog
systems accumulate errors. A digital system is simply one in which an
arbitrary threshold is set: anything above the threshold is signal,
and anything below it is noise. A non-zero noise threshold allows
error-free transmission and storage up to a certain bandwidth; a zero
threshold does not.

> For a given channel, the signal-to-noise ratio is a real constant.

Real, but arbitrary. The higher you set it, the fewer errors you will
encounter for a given bandwidth; the lower you set it, the greater the
number of errors. If you set it to zero, bandwidth is infinite but so
is the number of errors.

The distinguishing feature of digital systems is that they set a
threshold. An image recorded on film is analog because no lower limit
is set for information in the image, and so there is no real
distinction between information and noise. Data recorded on digital
tape is digital because any signal below an arbitrary limit is treated
as noise, and anything above it is treated as information.

> Suppose we have a channel with a one MHz
> bandwidth and a signal-to-noise ratio of one million (or 60dB).
> What "threshold" might you put on one million?

The signal-to-noise ratio itself is the threshold. By setting a
threshold, you determine the error rate you will tolerate; you can
guarantee perfect transmission beyond a certain number of errors. A
threshold of zero means zero errors and unlimited bandwidth. And
bandwidth itself is nothing more than the maximum density of
information that can be transmitted without more than an
arbitrarily-set number of errors.

Rob Warnock

unread,
May 19, 2006, 1:37:27 AM5/19/06
to
Mxsmanic <mxsm...@gmail.com> wrote:
+---------------

| The distinguishing feature of digital systems is that they set a
| threshold. An image recorded on film is analog because no lower limit
| is set for information in the image, and so there is no real
| distinction between information and noise. Data recorded on digital
| tape is digital because any signal below an arbitrary limit is treated
| as noise, and anything above it is treated as information.
+---------------

This shows a serious lack of understanding of the field. Using a
binary threshold [the so-called "hard-threshold decoder" or "hard
slicer"] loses a substantial amount of the available information,
worth as much as 6(?) dB in coding gain. You might want to go read a
couple of those references I posted previously, especially ones that
discuss the topic of "partial-response maximum-likelihood decoders"
(e.g, Viterbi, et al).


-Rob

Bryan Olson

unread,
May 19, 2006, 2:08:03 AM5/19/06
to
Mxsmanic wrote:
> Bryan Olson writes:
>
>
>>Don't see what you mean.
>
> In analog systems, all signal is information; that is why analog
> systems accumulate errors. A digital system is simply one in which an
> arbitrary threshold is set: anything above the threshold is signal,
> and anything below it is noise. A non-zero noise threshold allows
> error-free transmission and storage up to a certain bandwidth; a zero
> threshold does not.

In analog systems, signal-to-noise ratio is the power of the
communication signal reaching the receiver, divided by the power
of the noise. It is not hard to look up.


>>For a given channel, the signal-to-noise ratio is a real constant.
>
> Real, but arbitrary. The higher you set it

You don't set it, except maybe by the power with which you drive
the signal. It's a property of the analog channel: the power of
the signal, divided by the power of the noise.


>>Suppose we have a channel with a one MHz
>>bandwidth and a signal-to-noise ratio of one million (or 60dB).
>>What "threshold" might you put on one million?
>
>
> The signal-to-noise ratio itself is the threshold.

You claimed, "setting a threshold for signal-to-noise converts an
analog channel into a digital channel." So you're setting a
threshold on the threshold?

Near as I can tell, you're now thinking about sending digits by
amplitude modulation of the analog carrier. Each symbol gets a
certain amplitude range, so we could reasonably call the
boundaries thresholds, but they're on the signal as received,
not on the signal-to-noise ratio.


> By setting a
> threshold, you determine the error rate you will tolerate; you can
> guarantee perfect transmission beyond a certain number of errors. A
> threshold of zero means zero errors and unlimited bandwidth. And
> bandwidth itself is nothing more than the maximum density of
> information that can be transmitted without more than an
> arbitrarily-set number of errors.

Bandwidth is literally the width of the frequency band; the range
of sinusoidal wave frequencies the channel carries.


--
--Bryan

Mxsmanic

unread,
May 19, 2006, 6:21:32 AM5/19/06
to
Rob Warnock writes:

> This shows a serious lack of understanding of the field.

No, it is a reduction of the concept to its quintessence. The
difference between digital and analog is that digital makes an
arbitrary distinction between signal and noise, whereas analog does
not.

> Using a
> binary threshold [the so-called "hard-threshold decoder" or "hard
> slicer"] loses a substantial amount of the available information,
> worth as much as 6(?) dB in coding gain.

You can change the threshold, but there has to be one.

Mxsmanic

unread,
May 19, 2006, 6:25:30 AM5/19/06
to
Bryan Olson writes:

> In analog systems, signal-to-noise ratio is the power of the
> communication signal reaching the receiver, divided by the power
> of the noise. It is not hard to look up.

That is not relevant here. In an analog system, you cannot
distinguish between signal and noise. Noise looks like part of the
signal. There is no division between the two. While this allows you
to theoretically exploit the full capacity of the channel, it also
makes it impossible to hold errors below any particular level.

> You don't set it, except maybe by the power with which you drive
> the signal.

Yes, you do. All digital systems apply a threshold at some point.
Anything below the threshold is discarded as noise. Anything above it
is treated as signal.

> You claimed, "setting a threshold for signal-to-noise converts an
> analog channel into a digital channel." So you're setting a
> threshold on the threshold?

A digital channel has a non-zero, arbitrary threshold. An analog
channel does not. In a digital channel you can distinguish between
signal and noise with an arbitrary degree of certainty (thanks to the
threshold); in an analog channel you cannot.

> Near as I can tell, you're now thinking about sending digits by
> amplitude modulation of the analog carrier.

The actual physical nature of the channel is irrelevant.

> Bandwidth is literally the width of the frequency band; the range
> of sinusoidal wave frequencies the channel carries.

You cannot define that without also defining the signal-to-noise
ratio. And if the latter is non-zero, you're effectively digitizing
the channel.

Bryan Olson

unread,
May 19, 2006, 11:15:24 AM5/19/06
to
Mxsmanic wrote:
> Bryan Olson writes:
>
>
>>In analog systems, signal-to-noise ratio is the power of the
>>communication signal reaching the receiver, divided by the power
>>of the noise. It is not hard to look up.
>
> That is not relevant here.

Nonsense. The analog channel's capacity, the rate we can send
information across it, depends upon its signal-to-noise ratio
(and bandwidth).

> In an analog system, you cannot
> distinguish between signal and noise. Noise looks like part of the
> signal. There is no division between the two. While this allows you
> to theoretically exploit the full capacity of the channel, it also
> makes it impossible to hold errors below any particular level.

Wrong. We can never send information at greater than the capacity.
Modulation and coding can get arbitrarily close to capacity at
arbitrarily low error rates.

[...]


> The actual physical nature of the channel is irrelevant.

You don't seem to follow what this thread is about.


--
--Bryan

Ari Silverstein

unread,
May 19, 2006, 12:43:09 PM5/19/06
to
On Wed, 17 May 2006 17:54:20 -0600, Luc The Perverse wrote:

> "Ari Silverstein" <abcarisi...@yahoo.comxyz> wrote in message
> news:4jku6c0eu5l1$.m6vs3rgsrwod.dlg@40tude.net...
>> On Wed, 17 May 2006 14:41:50 -0600, Luc The Perverse wrote:
>>
>>> "Ari Silverstein" <abcarisi...@yahoo.comxyz> wrote in message
>>> news:h03gyzey593p$.17kqhfkj6m32z.dlg@40tude.net...
>>>> On Tue, 16 May 2006 19:49:36 -0600, Luc The Perverse wrote:
>>>>
>>>>> I'm curious what is so damned secret that people would dig through
>>>>> trash
>>>>> for
>>>>> a piece of hard disk to get it back.
>>>>
>>>> Ah, Luc, drift further from school and into the real world, my friend.
>>>
>>> Secrets are dumb?
>>
>> You stated you were not aware that ppl would rummage for HD data in the
>> trash.
>>
>> I found that cute, naive but cute :0
>
> I fear we won't learn the futility of life until it is over ;)

lol
--
Drop the alphabet for email

Mxsmanic

unread,
May 19, 2006, 4:04:45 PM5/19/06
to
Bryan Olson writes:

> The analog channel's capacity, the rate we can send
> information across it, depends upon its signal-to-noise ratio
> (and bandwidth).

Bandwidth and SNR are interdependent. You may be able to sustain a
bandwith of 1000 MHz with one SNR, or 2000 MHz with a higher SNR.

> Wrong. We can never send information at greater than the capacity.

The capacity of a noise-free analog channel is infinite.

It is loading more messages.
0 new messages