Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Sync option destroys flash!

3,322 views
Skip to first unread message

Michael H. Warfield

unread,
May 13, 2005, 12:30:23 PM5/13/05
to
Under the right circumstances, even copying a single file to a flash
drive mounted with the "sync" option can destroy the entire drive!

Now that I have your attention!

I found this out the hard way. (Kissed one brand new $70 USD 1GB flash
drive good-bye.) According to the man pages for mount, FAT and VFAT
file systems ignore the "sync" option. It lies. Maybe it use to be
true, but it certainly lies now. A simple test can verify this. Mount
a flash drive with a FAT/VFAT file system without the sync option and
writes to the drive go very fast. Typing "sync" or unmounting the drive
afterwards, takes time as the buffered data is written to the flash
drive. This is as it should be. Mount it with the sync option and
writes are really REALLY slow (worse than they should be just from
copying the data through USB) but sync and umount come back immediately
and result in no additional writing to the drive. [Do the preceding
with only a few files and less than a few meg of data if you value that
flash.] So... FAT and VFAT are honoring the sync option. This is very
VERY bad. It's bad for floppies, it's bad for hard drives, it's FATAL
for flash drives.

Flash drives have a limited number of write cycles. Many many
thousands of write cycles, but limited, none the less. They are also
written in blocks which are much larger than the "sector" size report
(several K in a physical nand flash block, IRC).

What happens, with the sync option on a VFAT file system, is that the
FAT tables are getting pounded and over-written over and over and over
again as each and every block/cluster is allocated while a new file is
written out. This constant overwriting eventually wears out the first
block or two of the flash drive.

I had lost a couple of flash keys previously, without realizing what
was going on, but what send me off investigating this was when I copied
a 700 Meg file to a brand new 1G USB 2.0 flash memory key in a USB 2.0
slot. It took over a half an hour to copy to the drive, which really
had me wondering WTF was wrong! Then, when I went to use the key, I
found the first couple of blocks were totally destroyed. Read errors
immediately upon insertion. Then I started digging and found the
hotplug / HAL / fstab-sync stuff on Fedora Core was mounting USB drives
with the "sync" option (if less than 2 Gig). I knew from previous
experience (CF backup cards in a PDA) that repeated pounding on the FAT
tables would destroy a flash card with a FAT file system. So I reported
this on the Fedora list. Someone else noticed that the man pages for
mount state that FAT and VFAT ignore the sync option. Not so,
obviously... Copying that 700 Meg file resulted in thousands upon
thousands upon thousands of writes to the FAT table and backup FAT
table. It simply blew through all the rewrites for those blocks and
burned them up. Bye bye flash key...

On a floppy, this would result in an insane amount of jacking around
back and forth between data sectors and the FAT sectors. In addition to
taking forever, that would shorten the life of the diskettes and the
drive itself, but who cares about floppies any more. On a real hard
drive, this will cause "head resonances" as the heads go through
constant high speed seeks between the cylinder with the FAT tables and
the data cylinders. That can't be good, on a continuous basis, for
drive life. But it's really a disaster for flash memory. It's going to
cause premature failure in most flash memory, even if it doesn't kill
them right off as it did in my case with a 700 Meg file.

Can we go back to ignoring "sync" on FAT and VFAT? I can't see where
it does much good. You might corrupt a file system if you unplugged it
while dirty but it beats the hell out of physically burning it up and
destroying the drive!

If it's decided that the FAT and VFAT file systems MUST obey the sync
option then please do something about a special case for the FAT tables!
Sync the data if thou must buti... Thou shalt not, must not, whack off
on the FAT tables!!!

Another option would be to only sync the FAT and VFAT file systems upon
close of the file being written or upon close of the last file open on
the file system (fs not busy) but that might not help in the case of a
whole lotta little files...

I'm also going to file a couple of bug reports in bugzilla at RedHat
but this seems to be a more fundamental problem than a RedHat specific
problem. But, IMHO, they should never be setting that damn sync flag
arbitrarily.

Mike
--
Michael H. Warfield | (770) 985-6132 | m...@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0xDF1DD471 | possible worlds. A pessimist is sure of it!

signature.asc

Lennart Sorensen

unread,
May 13, 2005, 1:30:26 PM5/13/05
to
On Fri, May 13, 2005 at 12:20:06PM -0400, Michael H. Warfield wrote:
> I found this out the hard way. (Kissed one brand new $70 USD 1GB flash
> drive good-bye.) According to the man pages for mount, FAT and VFAT
> file systems ignore the "sync" option. It lies. Maybe it use to be
> true, but it certainly lies now. A simple test can verify this. Mount
> a flash drive with a FAT/VFAT file system without the sync option and
> writes to the drive go very fast. Typing "sync" or unmounting the drive
> afterwards, takes time as the buffered data is written to the flash
> drive. This is as it should be. Mount it with the sync option and
> writes are really REALLY slow (worse than they should be just from
> copying the data through USB) but sync and umount come back immediately
> and result in no additional writing to the drive. [Do the preceding
> with only a few files and less than a few meg of data if you value that
> flash.] So... FAT and VFAT are honoring the sync option. This is very
> VERY bad. It's bad for floppies, it's bad for hard drives, it's FATAL
> for flash drives.

Certainly causes lots of unnecesary writes which flash doesn't like.
Given sync doesn't appear to be the default, whyever would you add sync
to vfat?

> Flash drives have a limited number of write cycles. Many many
> thousands of write cycles, but limited, none the less. They are also
> written in blocks which are much larger than the "sector" size report
> (several K in a physical nand flash block, IRC).
>
> What happens, with the sync option on a VFAT file system, is that the
> FAT tables are getting pounded and over-written over and over and over
> again as each and every block/cluster is allocated while a new file is
> written out. This constant overwriting eventually wears out the first
> block or two of the flash drive.

All the flash I have used do automatic wear leveling. Maybe I have only
used high quality flash media (given I am doing work with embedded
industrial grade gear, that is quite plausible). Of course wear
leveling doesn't mean you aren't doing way more writes than necesary,
but it helps spread the load away from the FAT, which would otherwise
quickly die on most flash cards no matter what system you use for
writing to it.

> I had lost a couple of flash keys previously, without realizing what
> was going on, but what send me off investigating this was when I copied
> a 700 Meg file to a brand new 1G USB 2.0 flash memory key in a USB 2.0
> slot. It took over a half an hour to copy to the drive, which really
> had me wondering WTF was wrong! Then, when I went to use the key, I
> found the first couple of blocks were totally destroyed. Read errors
> immediately upon insertion. Then I started digging and found the
> hotplug / HAL / fstab-sync stuff on Fedora Core was mounting USB drives
> with the "sync" option (if less than 2 Gig). I knew from previous
> experience (CF backup cards in a PDA) that repeated pounding on the FAT
> tables would destroy a flash card with a FAT file system. So I reported
> this on the Fedora list. Someone else noticed that the man pages for
> mount state that FAT and VFAT ignore the sync option. Not so,
> obviously... Copying that 700 Meg file resulted in thousands upon
> thousands upon thousands of writes to the FAT table and backup FAT
> table. It simply blew through all the rewrites for those blocks and
> burned them up. Bye bye flash key...

700M = 1.4M 512byte sectors. I guess if it actually writes one sector
at a time and syncs, and it's not a media with good wear leveling, then
yes that would destroy the sectors holding the FAT. Ouch. Crappy
media, and bad way to treat it. Unfortunately there is no bug, just
user error, and potentially badly designed flash media firmware.

> On a floppy, this would result in an insane amount of jacking around
> back and forth between data sectors and the FAT sectors. In addition to
> taking forever, that would shorten the life of the diskettes and the
> drive itself, but who cares about floppies any more. On a real hard
> drive, this will cause "head resonances" as the heads go through
> constant high speed seeks between the cylinder with the FAT tables and
> the data cylinders. That can't be good, on a continuous basis, for
> drive life. But it's really a disaster for flash memory. It's going to
> cause premature failure in most flash memory, even if it doesn't kill
> them right off as it did in my case with a 700 Meg file.
>
> Can we go back to ignoring "sync" on FAT and VFAT? I can't see where
> it does much good. You might corrupt a file system if you unplugged it
> while dirty but it beats the hell out of physically burning it up and
> destroying the drive!

How about you just don't use the sync option with fat when you don't
mean to use sync? sync does exactly what it should, which just happens
to not be what you want, so don't use it.

> If it's decided that the FAT and VFAT file systems MUST obey the sync
> option then please do something about a special case for the FAT tables!
> Sync the data if thou must buti... Thou shalt not, must not, whack off
> on the FAT tables!!!

Then the sync option wouldn't be much use anymore.

> Another option would be to only sync the FAT and VFAT file systems upon
> close of the file being written or upon close of the last file open on
> the file system (fs not busy) but that might not help in the case of a
> whole lotta little files...

Again not very useful then.

> I'm also going to file a couple of bug reports in bugzilla at RedHat
> but this seems to be a more fundamental problem than a RedHat specific
> problem. But, IMHO, they should never be setting that damn sync flag
> arbitrarily.

No they certainly should not, but it may have something to do with
making life easier for kde/gnome desktops and automatic mount/umount of
media. Dumb idea still, but that happens sometimes.

Len Sorensen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Michael H. Warfield

unread,
May 13, 2005, 2:00:29 PM5/13/05
to
On Fri, 2005-05-13 at 13:17 -0400, Lennart Sorensen wrote:

> Certainly causes lots of unnecesary writes which flash doesn't like.
> Given sync doesn't appear to be the default, whyever would you add sync
> to vfat?

RedHat does it when they automount USB drives. It's in their "hal"
package. Law of unintended consequences, I guess.

> > Flash drives have a limited number of write cycles. Many many
> > thousands of write cycles, but limited, none the less. They are also
> > written in blocks which are much larger than the "sector" size report
> > (several K in a physical nand flash block, IRC).
> >
> > What happens, with the sync option on a VFAT file system, is that the
> > FAT tables are getting pounded and over-written over and over and over
> > again as each and every block/cluster is allocated while a new file is
> > written out. This constant overwriting eventually wears out the first
> > block or two of the flash drive.

> All the flash I have used do automatic wear leveling. Maybe I have only
> used high quality flash media (given I am doing work with embedded
> industrial grade gear, that is quite plausible). Of course wear
> leveling doesn't mean you aren't doing way more writes than necesary,
> but it helps spread the load away from the FAT, which would otherwise
> quickly die on most flash cards no matter what system you use for
> writing to it.

Yeah, I've heard that claimed but a real problem is that nobody can
tell which ones do or don't until you've got a crispy chip. I've looked
and I haven't been able to find a single reference on any that you might
pick up at Best Buy or Fry's. I've never seen one that has it written
on it that it has any sort of wear leveling like that. But I guess we
don't all buy our flash drives at an industrial supply house (forget
Tiger Direct or CDW either - consumer grade). I would bet that most
people don't even REALIZE that flash has a limited life and wears out.

> 700M = 1.4M 512byte sectors. I guess if it actually writes one sector
> at a time and syncs, and it's not a media with good wear leveling, then
> yes that would destroy the sectors holding the FAT. Ouch. Crappy
> media, and bad way to treat it. Unfortunately there is no bug, just
> user error, and potentially badly designed flash media firmware.

Unfortunately, it's the default under some RedHat stuff (yes, I've
filed the bug reports) and the documentation on "mount" is incorrect and
it's arguable if it's any value, and there's no warning of the risks.
"User error" is a bit off the mark when the user has to take action he
has no knowledge of to work around a destructive default he doesn't even
know exists (here I'm referring specifically to the need to unmount the
automounted flash and then remount it somewhere else without the unsafe
options).

User error also boils down to removing drives before they are unmounted
or sync'ed. Windows seems to deal with this problem without jacking off
on the FAT tables. Why is it that Linux has a procedure that is
potentially destructive and degrades performance so badly when MS gets
away without it in Windows? It's an MS file system. One of the
contributors on the Fedora list remarked that this explained why his USB
keys were so slow on Linux compared to Windows.

> > On a floppy, this would result in an insane amount of jacking around
> > back and forth between data sectors and the FAT sectors. In addition to
> > taking forever, that would shorten the life of the diskettes and the
> > drive itself, but who cares about floppies any more. On a real hard
> > drive, this will cause "head resonances" as the heads go through
> > constant high speed seeks between the cylinder with the FAT tables and
> > the data cylinders. That can't be good, on a continuous basis, for
> > drive life. But it's really a disaster for flash memory. It's going to
> > cause premature failure in most flash memory, even if it doesn't kill
> > them right off as it did in my case with a 700 Meg file.
> >
> > Can we go back to ignoring "sync" on FAT and VFAT? I can't see where
> > it does much good. You might corrupt a file system if you unplugged it
> > while dirty but it beats the hell out of physically burning it up and
> > destroying the drive!

> How about you just don't use the sync option with fat when you don't
> mean to use sync? sync does exactly what it should, which just happens
> to not be what you want, so don't use it.

Already filed bug reports with RedHat. On both HAL and on the
documentation for mount.

> > If it's decided that the FAT and VFAT file systems MUST obey the sync
> > option then please do something about a special case for the FAT tables!
> > Sync the data if thou must buti... Thou shalt not, must not, whack off
> > on the FAT tables!!!

> Then the sync option wouldn't be much use anymore.

Oh? It's of some use now? Beyond degrading performance to the point
that it takes many times longer to write a flash system in Linux than in
Windows? Yes, I know, don't use it. What's it there for, then? To
protect idiots from removing drives before they're unmounted? Those are
the same idiots who will not know what happened when it burns up there
drive and they are the least likely to use that option overtly. Most
people don't realize that it was being used without their knowledge
(much less the risks associated with using it in the first place) and
just thought it was a Linux problem that it took many times longer to
write USB keys in Linux than in Windows.

> > Another option would be to only sync the FAT and VFAT file systems upon
> > close of the file being written or upon close of the last file open on
> > the file system (fs not busy) but that might not help in the case of a
> > whole lotta little files...

> Again not very useful then.

Rule number one should be "do no harm". Maybe it should be under some
"force" flag or something just warning people not to use it unless they
really REALLY know what their doing. I, personally, would never have
used it under any circumstances. I was shocked to discover that RedHat
was doing this by default. A warning to THEM would have been better
than a warning to me, since I already knew not to do stupid shit like
that. Not sure who is responsible for that HAL package or what other
distros may also be vulnerable to this. I mostly blame RedHat for this
problem first plus the mount man pages (which I suspect they replied on)
second but I question if there shouldn't be a better way to avoid
destroying hardware here.

> > I'm also going to file a couple of bug reports in bugzilla at RedHat
> > but this seems to be a more fundamental problem than a RedHat specific
> > problem. But, IMHO, they should never be setting that damn sync flag
> > arbitrarily.

> No they certainly should not, but it may have something to do with
> making life easier for kde/gnome desktops and automatic mount/umount of
> media. Dumb idea still, but that happens sometimes.

> Len Sorensen

Mike

signature.asc

Zan Lynx

unread,
May 13, 2005, 2:10:15 PM5/13/05
to
On Fri, 2005-05-13 at 13:17 -0400, Lennart Sorensen wrote:

Err, no, the sync flag is a wonderful idea. I use sync on jump drives
(although I won't now that I've read about this problem) because I can
copy a file to it and as soon as the green light stops flashing I can
yank it out without bothering to click or type anything to "eject" it.
Requiring "eject" on removable media is a dumb idea.

Perhaps FAT updates could be cached and delayed until some # of ms after
the last write.

Turning the sync flag off and adjusting the values
in /proc/sys/vm/dirty_* would have the same effect, but would change it
on all devices, not just removables.

Maybe we need a way to set dirty_* per block device?
--
Zan Lynx <zl...@acm.org>

signature.asc

Lennart Sorensen

unread,
May 13, 2005, 2:20:09 PM5/13/05
to
On Fri, May 13, 2005 at 01:53:48PM -0400, Michael H. Warfield wrote:
> Yeah, I've heard that claimed but a real problem is that nobody can
> tell which ones do or don't until you've got a crispy chip. I've looked
> and I haven't been able to find a single reference on any that you might
> pick up at Best Buy or Fry's. I've never seen one that has it written
> on it that it has any sort of wear leveling like that. But I guess we
> don't all buy our flash drives at an industrial supply house (forget
> Tiger Direct or CDW either - consumer grade). I would bet that most
> people don't even REALIZE that flash has a limited life and wears out.

I believe any flash with a sandisk controller in it will do pretty good
wear leveling (I think all the ones labeled sandisk have their
controller, although they don't all have sandisk flash chips). The
controller is what does the wear leveling. At least for CF and SD this
is what they do.

> Unfortunately, it's the default under some RedHat stuff (yes, I've
> filed the bug reports) and the documentation on "mount" is incorrect and
> it's arguable if it's any value, and there's no warning of the risks.
> "User error" is a bit off the mark when the user has to take action he
> has no knowledge of to work around a destructive default he doesn't even
> know exists (here I'm referring specifically to the need to unmount the
> automounted flash and then remount it somewhere else without the unsafe
> options).

Well the documentation for mount isn't authorative on what the kernel
does. Not sure where the authoritive information on how vfat treats
sync is (besides the source code).

> User error also boils down to removing drives before they are unmounted
> or sync'ed. Windows seems to deal with this problem without jacking off
> on the FAT tables. Why is it that Linux has a procedure that is
> potentially destructive and degrades performance so badly when MS gets
> away without it in Windows? It's an MS file system. One of the
> contributors on the Fedora list remarked that this explained why his USB
> keys were so slow on Linux compared to Windows.

Well perhaps what windows does is write the hole file, then update the
fat, then call sync immediately afterwards, or whenever a file is
closed, it calls sync on that file's information.

> Already filed bug reports with RedHat. On both HAL and on the
> documentation for mount.

Well that is where the problem has to be solved.

> Oh? It's of some use now? Beyond degrading performance to the point
> that it takes many times longer to write a flash system in Linux than in
> Windows? Yes, I know, don't use it. What's it there for, then? To
> protect idiots from removing drives before they're unmounted? Those are
> the same idiots who will not know what happened when it burns up there
> drive and they are the least likely to use that option overtly. Most
> people don't realize that it was being used without their knowledge
> (much less the risks associated with using it in the first place) and
> just thought it was a Linux problem that it took many times longer to
> write USB keys in Linux than in Windows.

It does help make sure data is written to fat drives right away,
although I don't ever think that should be enabled per filesystem
myself, some people seem to think it is the way to do things. I think
an application should be responsible for calling sync on a file when it
considers it important. If a filesystem doesn't honour that it is
broken (and yes many filesystems probably are broken that way).

> Rule number one should be "do no harm". Maybe it should be under some
> "force" flag or something just warning people not to use it unless they
> really REALLY know what their doing. I, personally, would never have
> used it under any circumstances. I was shocked to discover that RedHat
> was doing this by default. A warning to THEM would have been better
> than a warning to me, since I already knew not to do stupid shit like
> that. Not sure who is responsible for that HAL package or what other
> distros may also be vulnerable to this. I mostly blame RedHat for this
> problem first plus the mount man pages (which I suspect they replied on)
> second but I question if there shouldn't be a better way to avoid
> destroying hardware here.

flash drives have only been around a few years, and I guess in this case
redhat made a mistake. I think any filesystem with sync mount option
would be very bad for flash drives. This is not a fat problem, it's a
problem with how sync works with flash drives in general. What do you
think would happen if you formated your flash disk with ext3 and mounted
it with sync? It would die real quick too.

Removing proper sync option is not a solution just because you could do
harm. Should we remove write access to the disk just because you might
let users run rm -rf /? Isn't that allowing harm to be done by the user
too?

Lennart Sorensen

unread,
May 13, 2005, 2:20:23 PM5/13/05
to
On Fri, May 13, 2005 at 11:58:19AM -0600, Zan Lynx wrote:
> Err, no, the sync flag is a wonderful idea. I use sync on jump drives
> (although I won't now that I've read about this problem) because I can
> copy a file to it and as soon as the green light stops flashing I can
> yank it out without bothering to click or type anything to "eject" it.
> Requiring "eject" on removable media is a dumb idea.
>
> Perhaps FAT updates could be cached and delayed until some # of ms after
> the last write.
>
> Turning the sync flag off and adjusting the values
> in /proc/sys/vm/dirty_* would have the same effect, but would change it
> on all devices, not just removables.
>
> Maybe we need a way to set dirty_* per block device?

Well you could just make a way to start writing data to the device
within 1s or so, and then assuming your device has a light indicating
activity (it better), then when the light has been off for a couple of
seconds, you can remove it.

I wish every removeable media drive had a soft eject button on it and a
locking mechanism like zip drive, jax drive, cdrom/dvd, mac floppy
drive, etc. They are seriously needed to tell the os what the user is
trying to do to the poor media. That way the os can see when the user
wants to take the media out, flush it, unlock and eject. Doesn't help
if the eject button doesn't signal the OS of course, but at least on
some drives it does. Being able to remove media from a system without
telling the OS first is just bad design.

Michael H. Warfield

unread,
May 13, 2005, 2:30:13 PM5/13/05
to
On Fri, 2005-05-13 at 14:09 -0400, Lennart Sorensen wrote:
> On Fri, May 13, 2005 at 01:53:48PM -0400, Michael H. Warfield wrote:
> > Yeah, I've heard that claimed but a real problem is that nobody can
> > tell which ones do or don't until you've got a crispy chip. I've looked
> > and I haven't been able to find a single reference on any that you might
> > pick up at Best Buy or Fry's. I've never seen one that has it written
> > on it that it has any sort of wear leveling like that. But I guess we
> > don't all buy our flash drives at an industrial supply house (forget
> > Tiger Direct or CDW either - consumer grade). I would bet that most
> > people don't even REALIZE that flash has a limited life and wears out.

> I believe any flash with a sandisk controller in it will do pretty good
> wear leveling (I think all the ones labeled sandisk have their
> controller, although they don't all have sandisk flash chips). The
> controller is what does the wear leveling. At least for CF and SD this
> is what they do.

Funny you should mention SanDisk. Maybe the newer ones have gotten
better but the CF cards I used in my PDA were mixes of the SanDisk 128
Meg and SimpleTech 128 Meg. Burned up several before I realized it was
the nightly backup program that was eating them and I didn't notice any
difference between the brands. I don't know what controller was in the
SimpleTech CF cards. Any way to tell short of dismantling them?

:

signature.asc

Lennart Sorensen

unread,
May 13, 2005, 2:30:13 PM5/13/05
to
On Fri, May 13, 2005 at 02:21:23PM -0400, Michael H. Warfield wrote:
> Funny you should mention SanDisk. Maybe the newer ones have gotten
> better but the CF cards I used in my PDA were mixes of the SanDisk 128
> Meg and SimpleTech 128 Meg. Burned up several before I realized it was
> the nightly backup program that was eating them and I didn't notice any
> difference between the brands. I don't know what controller was in the
> SimpleTech CF cards. Any way to tell short of dismantling them?

Not really. I believe sandisk has wear leveling on the 201 series CF
cards and on their new generation CF/SD for sure they have it (and
unfortunately for us they discontinued industrial temperature in the new
line so we have had to look elsewhere for CF cards).

Unfortunately a lot of what is sold to consumers at retail is cheap
crap. :)

Alan Cox

unread,
May 13, 2005, 3:00:28 PM5/13/05
to
> What happens, with the sync option on a VFAT file system, is that the
> FAT tables are getting pounded and over-written over and over and over
> again as each and every block/cluster is allocated while a new file is
> written out. This constant overwriting eventually wears out the first
> block or two of the flash drive.

All non-shite quality flash keys have an on media log structured file
system and will take 100,000+ writes per sector or so. They decent ones
also map out bad blocks and have spares. The "wear out the same sector"
stuff is a myth except on ultra-crap devices.

> I'm also going to file a couple of bug reports in bugzilla at RedHat
> but this seems to be a more fundamental problem than a RedHat specific
> problem. But, IMHO, they should never be setting that damn sync flag
> arbitrarily.

It sounds like your need to find a vendor who makes decent keys. For
that matter several vendors now offer life time guarantees with their
USB flash media.

Sync gets set by RH because it seemed the right thing to do to handle
random user device pulls. Now O_SYNC works so excessively well on
fat/vfat that needs looking at - and as you say likewise perhaps the
nature of the FAT rewriting.

However its not a media issue, its primarily a performance issue.

Alan

Michael H. Warfield

unread,
May 13, 2005, 3:10:08 PM5/13/05
to
On Fri, 2005-05-13 at 14:26 -0400, Lennart Sorensen wrote:
> On Fri, May 13, 2005 at 02:21:23PM -0400, Michael H. Warfield wrote:
> > Funny you should mention SanDisk. Maybe the newer ones have gotten
> > better but the CF cards I used in my PDA were mixes of the SanDisk 128
> > Meg and SimpleTech 128 Meg. Burned up several before I realized it was
> > the nightly backup program that was eating them and I didn't notice any
> > difference between the brands. I don't know what controller was in the
> > SimpleTech CF cards. Any way to tell short of dismantling them?

> Not really. I believe sandisk has wear leveling on the 201 series CF
> cards and on their new generation CF/SD for sure they have it (and
> unfortunately for us they discontinued industrial temperature in the new
> line so we have had to look elsewhere for CF cards).

Par for the course...

> Unfortunately a lot of what is sold to consumers at retail is cheap
> crap. :)

You won't get any argument from me there!

> Len Sorensen

Latest things I've just started playing with are these "Intelligent
Sticks" or "I Sticks". They look like a chip, similar in form factor to
the memory sticks, but slide into the open half of a USB jack, even
though the contact pads wouldn't look like a good match. No connector
shell, so they're really flat. I wonder just how "Intelligent" they
are. I'm bet'n that wear leveling ain't part of it. But they are cute
chips that fit in a wallet easily and can boot a fully encrypted laptop.
Once working, they'll almost never need rewriting to any large extent.
The 1G chips can hold a complete OS like Knoppix or Basilisk (the 512Meg
chips too, with a lot of squeezing).

signature.asc

Mark Rustad

unread,
May 13, 2005, 3:10:05 PM5/13/05
to
On May 13, 2005, at 1:26 PM, Lennart Sorensen wrote:

> Not really. I believe sandisk has wear leveling on the 201 series CF
> cards and on their new generation CF/SD for sure they have it (and
> unfortunately for us they discontinued industrial temperature in the
> new
> line so we have had to look elsewhere for CF cards).
>
> Unfortunately a lot of what is sold to consumers at retail is cheap
> crap. :)

It does seem to be a problem finding out how things really work in
these devices. There seem to be the following types (from worst to
best):

1. No wear leveling. Bad blocks are mapped out at manuf. time and that
is it.
2. Bad blocks are detected and remapped dynamically. This is sometimes
called wear-leveling, but the device life is a function of how many
spares there originally were.
3. "Real" wear leveling. This can move data to fully use the life of
all sectors.
4. "Real" wear leveling with lots of optimization and write cache -
these are large devices usually with the ability to have battery power
to ensure write cache can be flushed out.

#4 is easy to determine because of the size and complexity of the
things. The others are much harder to distinguish and it really is
important to know what you are dealing with. If you can't find out how
it works, I would assume #1 or #2 which are both pretty poor.

It really would be nice to easily find out what category of device
these things really are.

--
Mark Rustad, MRu...@mac.com

Michael H. Warfield

unread,
May 13, 2005, 3:30:17 PM5/13/05
to
Hey Alan,

On Fri, 2005-05-13 at 19:40 +0100, Alan Cox wrote:
> > What happens, with the sync option on a VFAT file system, is that the
> > FAT tables are getting pounded and over-written over and over and over
> > again as each and every block/cluster is allocated while a new file is
> > written out. This constant overwriting eventually wears out the first
> > block or two of the flash drive.

> All non-shite quality flash keys have an on media log structured file
> system and will take 100,000+ writes per sector or so. They decent ones
> also map out bad blocks and have spares. The "wear out the same sector"
> stuff is a myth except on ultra-crap devices.

That's easy enough to say but AFAICT there doesn't seem to be any easy
well to tell the good from the bad from the just plain ugly. I
typically don't buy junk (I didn't think), but I've definitely
experienced this, first hand, with Sony Memory Vaults, SanDisk CF cards,
some SimpleTech CF cards, some SmartMedia cards (what an oximoron that
is), and now this 1G USB stick (which was, I admit, an "off brand" I had
never heard of before at Frys Electronics). The CF cards were burned up
in a PDA, so it's not just this. For a myth, I've definitely seen too
much of it.

Strangely (and in response to another comment someone made) some USB
cards which I reformated for ext2 have survived quite well.

> > I'm also going to file a couple of bug reports in bugzilla at RedHat
> > but this seems to be a more fundamental problem than a RedHat specific
> > problem. But, IMHO, they should never be setting that damn sync flag
> > arbitrarily.

> It sounds like your need to find a vendor who makes decent keys. For
> that matter several vendors now offer life time guarantees with their
> USB flash media.

Now THAT I gotta check into. I never noticed anything on the packaging
about a guarantee, but I will now. But how do you determine which are
"decent" keys? They don't put stickers on them saying "this one is
decent" and "this one is junk" and I'm an old cynic who has learned that
price is not always a good indicator either. Maybe the guarantee will
be a clue. I've just got to shop for it more.

> Sync gets set by RH because it seemed the right thing to do to handle
> random user device pulls. Now O_SYNC works so excessively well on
> fat/vfat that needs looking at - and as you say likewise perhaps the
> nature of the FAT rewriting.

> However its not a media issue, its primarily a performance issue.

Yeah, several of us have noticed the performance issue!

> Alan

signature.asc

Lee Revell

unread,
May 13, 2005, 5:30:23 PM5/13/05
to
On Fri, 2005-05-13 at 12:20 -0400, Michael H. Warfield wrote:
> Under the right circumstances, even copying a single file to a flash
> drive mounted with the "sync" option can destroy the entire drive!
>
> Now that I have your attention!
>
> I found this out the hard way. (Kissed one brand new $70 USD 1GB flash
> drive good-bye.) According to the man pages for mount, FAT and VFAT
> file systems ignore the "sync" option. It lies.

I guess you found out the hard way that the vast majority of Linux docs
are 2-3 years out of date...

> On a real hard
> drive, this will cause "head resonances" as the heads go through
> constant high speed seeks between the cylinder with the FAT tables and
> the data cylinders. That can't be good, on a continuous basis, for
> drive life. But it's really a disaster for flash memory.

I have seen a clueless sysadmin destroy several 15,000 RPM SCSI drives
this way by putting the syslog partition and mail spool at opposite ends
of the drive. I think Alan Cox said something like "these days you can
no longer assume that buggy software won't destroy your hardware".

Lee

Alan Cox

unread,
May 13, 2005, 6:10:08 PM5/13/05
to
On Gwe, 2005-05-13 at 20:10, Michael H. Warfield wrote:
> > It sounds like your need to find a vendor who makes decent keys. For
> > that matter several vendors now offer life time guarantees with their
> > USB flash media.
>
> Now THAT I gotta check into. I never noticed anything on the packaging
> about a guarantee, but I will now.

Most of them have guarantees of some form (this URL might be useful
since it lists the guarantee times for a lot of the media - EU
guarantees anyway, US often seem to be a lot different)

http://www.valuemedia.co.uk/compact_flash.htm

As you'll see Lexar for example offer lifetime guarantees on their
units. I believe Kingston also do.

> But how do you determine which are
> "decent" keys? They don't put stickers on them saying "this one is
> decent" and "this one is junk" and I'm an old cynic who has learned that
> price is not always a good indicator either. Maybe the guarantee will
> be a clue. I've just got to shop for it more.

Or it may even be cheaper to "burn" a few - buy one of each type from
various shops, do 2 million writes to the same sector and take them back
the next day if they died [And publish the review data 8))]

Måns Rullgård

unread,
May 13, 2005, 6:50:05 PM5/13/05
to
Alan Cox <al...@lxorguk.ukuu.org.uk> writes:

> On Gwe, 2005-05-13 at 20:10, Michael H. Warfield wrote:
>> > It sounds like your need to find a vendor who makes decent keys. For
>> > that matter several vendors now offer life time guarantees with their
>> > USB flash media.
>>
>> Now THAT I gotta check into. I never noticed anything on the
>> packaging about a guarantee, but I will now.
>
> Most of them have guarantees of some form (this URL might be useful
> since it lists the guarantee times for a lot of the media - EU
> guarantees anyway, US often seem to be a lot different)
>
> http://www.valuemedia.co.uk/compact_flash.htm
>
> As you'll see Lexar for example offer lifetime guarantees on their
> units. I believe Kingston also do.

Isn't a lifetime guarantee terminated by definition at the instant the
device dies?

>> But how do you determine which are
>> "decent" keys? They don't put stickers on them saying "this one is
>> decent" and "this one is junk" and I'm an old cynic who has learned that
>> price is not always a good indicator either. Maybe the guarantee will
>> be a clue. I've just got to shop for it more.
>
> Or it may even be cheaper to "burn" a few - buy one of each type from
> various shops, do 2 million writes to the same sector and take them back
> the next day if they died [And publish the review data 8))]

It's probably a good idea to get from different shops, or someone
might get suspicious when you take them back.

--
Måns Rullgård
m...@inprovide.com

Alan Cox

unread,
May 13, 2005, 7:00:14 PM5/13/05
to
On Gwe, 2005-05-13 at 22:25, Lee Revell wrote:
> I guess you found out the hard way that the vast majority of Linux docs
> are 2-3 years out of date...

The man pages are normally way better than that, and in this case its
only weeks out of date, assuming the current pages havent fixed it and
he doesnt
have old manual pages..

Jeffrey Hundstad

unread,
May 13, 2005, 7:10:12 PM5/13/05
to
Alan Cox wrote:

>On Gwe, 2005-05-13 at 20:10, Michael H. Warfield wrote:
>
>
>>>It sounds like your need to find a vendor who makes decent keys. For
>>>that matter several vendors now offer life time guarantees with their
>>>USB flash media.
>>>
>>>
>> Now THAT I gotta check into. I never noticed anything on the packaging
>>about a guarantee, but I will now.
>>
>>
>
>Most of them have guarantees of some form (this URL might be useful
>since it lists the guarantee times for a lot of the media - EU
>guarantees anyway, US often seem to be a lot different)
>
>http://www.valuemedia.co.uk/compact_flash.htm
>
>As you'll see Lexar for example offer lifetime guarantees on their
>units. I believe Kingston also do.
>
>
>
>> But how do you determine which are
>>"decent" keys? They don't put stickers on them saying "this one is
>>decent" and "this one is junk" and I'm an old cynic who has learned that
>>price is not always a good indicator either. Maybe the guarantee will
>>be a clue. I've just got to shop for it more.
>>
>>
>
>Or it may even be cheaper to "burn" a few - buy one of each type from
>various shops, do 2 million writes to the same sector and take them back
>the next day if they died [And publish the review data 8))]
>
>
>

If someone has a contact at the Linux Journal (or other magazine) it
might be good to suggest this for an article.

--
Jeffrey Hundstad

Jon Masters

unread,
May 13, 2005, 7:50:07 PM5/13/05
to
On 5/14/05, Jeffrey Hundstad <jeffrey....@mnsu.edu> wrote:

> If someone has a contact at the Linux Journal (or other magazine) it
> might be good to suggest this for an article.

I'll look into it for an upcoming feature in LU&D magazine, if we deem
it interesting enough to the readership. Certainly, I'd enjoy trashing
a load of CF cards.

Jon.

(j...@linuxuser.co.uk)

Jon Masters

unread,
May 13, 2005, 8:00:23 PM5/13/05
to
On 5/13/05, Måns Rullgård <m...@inprovide.com> wrote:

> Alan Cox <al...@lxorguk.ukuu.org.uk> writes:

> > On Gwe, 2005-05-13 at 20:10, Michael H. Warfield wrote:

> >> But how do you determine which are
> >> "decent" keys? They don't put stickers on them saying "this one is
> >> decent" and "this one is junk" and I'm an old cynic who has learned that
> >> price is not always a good indicator either. Maybe the guarantee will
> >> be a clue. I've just got to shop for it more.
> >
> > Or it may even be cheaper to "burn" a few - buy one of each type from
> > various shops, do 2 million writes to the same sector and take them back
> > the next day if they died [And publish the review data 8))]
>
> It's probably a good idea to get from different shops, or someone
> might get suspicious when you take them back.

Not really. I doubt the person in the shop will care.

Incidentally, I've discovered that you really don't want to buy flash
devices from camera shops. It would seem that, like I guess might be
the case with certain "audio CD" blanks you buy in stores, they don't
seem to care about selling you a device with one or two known-bad
sectors. You won't notice 512 bytes of lost data in many JPEG images
(not that I am saying it's a good practice to have) but my Zaurus
/did/ notice when I couldn't reflash it from a CF card with a fault
somewhere around 8MB. After quite some time of screwing around with
the filesystem by hand, I was able to ensure that something else was
occupying the sector in question and eventually was able to reflash
(all because it was a Sunday afternoon and we have silly Sunday
trading laws here).

When I took the CF card back to the camera shop, I took the Zaurus and
did a test on potential replacement cards while in the store -
explaining that my PDA had a "CompactFlash tester" installed, or
something like that. It's amazing what you can get away with - akin to
spending an hour in the store when I got my replacement Powerbook
after finding a single bad pixel, testing every unit to find one
without any, just to be happy :-)

Jon.

Robert Hancock

unread,
May 13, 2005, 8:10:10 PM5/13/05
to
Lennart Sorensen wrote:
> Well perhaps what windows does is write the hole file, then update the
> fat, then call sync immediately afterwards, or whenever a file is
> closed, it calls sync on that file's information.

Probably something like that.. Windows does default to disabling write
caching on removable drives, to prevent data loss if a device is removed
without being stopped first, but I think it's quite a bit less
aggressive about updating the FAT than the original poster's description
suggests Linux is doing with the sync option (i.e. only updating after
each user-level write call or something).

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hanc...@nospamshaw.ca
Home Page: http://www.roberthancock.com/

Michael H. Warfield

unread,
May 13, 2005, 9:10:05 PM5/13/05
to
Hey Alan...

On Fri, 2005-05-13 at 19:40 +0100, Alan Cox wrote:

> > What happens, with the sync option on a VFAT file system, is that the
> > FAT tables are getting pounded and over-written over and over and over
> > again as each and every block/cluster is allocated while a new file is
> > written out. This constant overwriting eventually wears out the first
> > block or two of the flash drive.

> All non-shite quality flash keys have an on media log structured file
> system and will take 100,000+ writes per sector or so. They decent ones
> also map out bad blocks and have spares. The "wear out the same sector"
> stuff is a myth except on ultra-crap devices.

Yah know... I've been thinking about this... In a former life, we use
to do something very similar with a virtual memory system on some real
early (80's vintage) networked VM workstations (back when memory was
actually valuable and scarce).

So... This would have to work with a list or pool of "spares" that are
not allocated to the "visible" file system. We used a "least used"
algorithm for that VM system. This would seem to be a "replace as
rewritten" algorithm. Each time you write to the file system, it grabs
a block off the head of the spares list, writes your data to it, and
then adds the old block to the tail of the list. Pretty basic stuff and
it doesn't have to track what kind of high level file system you are
using or know anything about its structure. Cool...

That makes sense. But... How big might this "list" be? Maybe an
additional 10% of the entire drive capacity? That's quite a bit... But
now you're beating on that FAT table pretty heavy. For each block
allocated and written to, we're rewriting the FAT table (actually TWO
FAT tables if you count the back up FAT). Ok... One data block, two
FAT table rewrites. So a FAT table block gets added back to the list
and a block is grabbed off the list. Seems like there would be a pretty
high percentage of old FAT table blocks sitting there circulating on the
spares list. That would make the probability of grabbing an old FAT
table block and rewriting it again pretty high. Then it would get added
back to the list again, in turn.

Because of this systematic thumping of the FAT tables, these old FAT
blocks are going to be circulating in that spares list at a pretty high
density. The wear leveling is not going to be nearly as effective
BECAUSE of the thumping. I'm not certain if that will be better or
worse if there are more blocks in the spares list. Seems like you are
going to end up with 50% - 60% (WAG) of the blocks in the spares list
being old FAT table blocks and end up with a number that just keep
recirculating until they burn out. I would think that they'll burn out
faster if that spares list is small and they get reused more frequently
(note to follow).

The up side is that, once an beat up old FAT table block does get
allocated to a file data block, it gets to retire in comfort and not get
rewritten until the file gets rewritten. But... That's reducing the
pool of circulating blocks in the allocated file system... So, a file
system that's full is going to rotate through it's spare and free blocks
faster as well... Some pluses... Some minuses...

It would seem like this would work well for something like a camera (or
a Mars Rover) where you are periodically removing almost everything from
the flash memory and all the blocks have a chance to return to the
spares list. But I see lots of possibilities for degrading the wear
leveling in other cases...

Now... Flaw recovery could be a big help there. Write the block but
notice that the old one is now bad and don't add it back or the new one
failed and you grab another. But then your spares list shrinks.
Failure occurs on the first failure where the spares list hits zero.
Probability (in the FAT thumping case with the sync option) is that it's
going to be a FAT block that takes the hit and takes then entire drive
out.

Am I seeing this correctly? Seems to me that the wear leveling is not
going to be nearly as effective as it should in the case where we are
beating up on the FAT simply because of this systematic bias the sync
option introduces into the write patterns on a FAT file system. And
that will be aggravated by significant load of static data. If
anything, the "sync" option almost appears to be defeating the wear
leveling logic on FAT and VFAT file systems.

> > I'm also going to file a couple of bug reports in bugzilla at RedHat
> > but this seems to be a more fundamental problem than a RedHat specific
> > problem. But, IMHO, they should never be setting that damn sync flag
> > arbitrarily.
>
> It sounds like your need to find a vendor who makes decent keys. For
> that matter several vendors now offer life time guarantees with their
> USB flash media.
>
> Sync gets set by RH because it seemed the right thing to do to handle
> random user device pulls. Now O_SYNC works so excessively well on
> fat/vfat that needs looking at - and as you say likewise perhaps the
> nature of the FAT rewriting.
>
> However its not a media issue, its primarily a performance issue.
>
> Alan

Mike

signature.asc

li...@horizon.com

unread,
May 13, 2005, 10:50:07 PM5/13/05
to
Alan the Hirsute spake unto the masses:

> All non-shite quality flash keys have an on media log structured file
> system and will take 100,000+ writes per sector or so. They decent ones
> also map out bad blocks and have spares. The "wear out the same sector"
> stuff is a myth except on ultra-crap devices.

I would have though so, but I can say from personal experience that
SanDisk brand CF cards respond to losing power during a write by producing
a bad sector. I had assumed that a sensible implementation would take
advantage of the out-of-place writing by doing a two-phase commit at
write time, so writes would be atomic.

Does anyone know of a CF manufacturer that *does* guarantee atomic writes?
Obviously, if power is lost during a write, it's not clear whether
I'll get the old or the new contents, but I want one or ther other and
not -EIO.

Given that SanDisk first developed the CompactFlash card, you'd think they'd
be a fairly reputable brand...

Robert Hancock

unread,
May 14, 2005, 12:40:07 AM5/14/05
to
li...@horizon.com wrote:
> I would have though so, but I can say from personal experience that
> SanDisk brand CF cards respond to losing power during a write by producing
> a bad sector. I had assumed that a sensible implementation would take
> advantage of the out-of-place writing by doing a two-phase commit at
> write time, so writes would be atomic.
>
> Does anyone know of a CF manufacturer that *does* guarantee atomic writes?
> Obviously, if power is lost during a write, it's not clear whether
> I'll get the old or the new contents, but I want one or ther other and
> not -EIO.
>
> Given that SanDisk first developed the CompactFlash card, you'd think they'd
> be a fairly reputable brand...

I think it would be a fair bit of work to guarantee this, unless you add
enough capacitive energy storage or something onboard to ensure that the
write can complete even if power is lost. Some hard drives have the same
problem, actually, where a bad sector can be produced if it was being
written at the time power was lost.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hanc...@nospamshaw.ca
Home Page: http://www.roberthancock.com/

Jörn Engel

unread,
May 14, 2005, 6:20:07 AM5/14/05
to
On Fri, 13 May 2005 23:00:34 +0100, Alan Cox wrote:
>
> Or it may even be cheaper to "burn" a few - buy one of each type from
> various shops, do 2 million writes to the same sector and take them back
> the next day if they died [And publish the review data 8))]

Or just accept the fact that flashes are a tad different from spinning
rust. Expecting a decent wear levelling on the cheap USB sticks and
other forms of flash is plain unrealistic.

USB stick are a bit better than old 3.5" floppies were - if both are
used with fat, minix, ext, etc. At least they don't die by lying in a
dark drawer. But if you want them to last, your best bet is currently
to use JFFS2 on them.

Of course, JFFS2 sucks performance-wise, so the end result currently
is that USB sticks suck in some way, no matter what you try.

Jörn

--
Fools ignore complexity. Pragmatists suffer it.
Some can avoid it. Geniuses remove it.
-- Perlis's Programming Proverb #58, SIGPLAN Notices, Sept. 1982

Denis Vlasenko

unread,
May 15, 2005, 3:10:12 PM5/15/05
to

What we really need, is a less thorough version of O_SYNC.
O_SYNC currently guarantees that when syscall returns, data
is on media (or at least in disk drive's internal cache :).

This is exactly what really paranoid people want.
Journalling labels, all that good stuff.

But there are many cases where people just want to say
'write out dirty data asap for this device', so that
I can copy files to floppy, wait till it stops making
noise, and eject a disk. Samr for flash if it has write
indicator (mine has a diode).

The difference from O_SYNC is that in this O_LAZY_SYNC mode
writes return early, just filling pagecache, but writeout
is started immediately and continues until there is no dirty
O_LAZY_SYNC data.

This mode won't eat flash as fast as O_SYNC does.
--
vda

Mark Lord

unread,
May 15, 2005, 8:30:11 PM5/15/05
to
All flashcards (other than dumb "smart media" cards) have integrated
NAND controllers which perform automatic page/block remapping and
which implement various wear-leveling algorithms. Rewriting "Sector 0"
10000 times probably only writes once to the first sector of a 1GB card.
The other writes are spread around the rest of the card, and remapped
logically by the integrated controller.

Linux could be more clever about it all, though. Wear-leveling can only
be done efficiently on "unused" or "rewritten" blocks/pages on the cards,
and not so well with areas that hold large static data files (from the point
of view of the flash controller, not the O/S).

If we were really clever about it, then when Linux deletes a file from a
flashcard device, it would also issue CFA ERASE commands for the newly
freed sectors. This would let the card's controller know that it can
remap/reuse that area of the card as it sees fit.

But it's dubious that a *short term* use (minutes/hours) of O_SYNC
would have killed a new 1GB card.

Cheers

David Woodhouse

unread,
May 16, 2005, 5:40:07 AM5/16/05
to
On Sun, 2005-05-15 at 20:23 -0400, Mark Lord wrote:
> All flashcards (other than dumb "smart media" cards) have integrated
> NAND controllers which perform automatic page/block remapping and
> which implement various wear-leveling algorithms. Rewriting "Sector 0"
> 10000 times probably only writes once to the first sector of a 1GB card.
> The other writes are spread around the rest of the card, and remapped
> logically by the integrated controller.

Assuming the firmware of the card is written with a modicum of clue,
this is true. It's not clear how valid that assumption is, in the
general case. There are reports of cards behaving as if they have almost
no wear levelling at all.

> Linux could be more clever about it all, though. Wear-leveling can only
> be done efficiently on "unused" or "rewritten" blocks/pages on the cards,
> and not so well with areas that hold large static data files (from the point
> of view of the flash controller, not the O/S).
>
> If we were really clever about it, then when Linux deletes a file from a
> flashcard device, it would also issue CFA ERASE commands for the newly
> freed sectors. This would let the card's controller know that it can
> remap/reuse that area of the card as it sees fit.

This would be extremely useful, yes. I've said in the past that I want
this for the benefit of the purely software flash translation layers
(FTL, NFTL etc.). I hadn't realised that CF cards expose the same
functionality.

--
dwmw2

Richard B. Johnson

unread,
May 16, 2005, 9:10:16 AM5/16/05
to
On Sun, 15 May 2005, Mark Lord wrote:

> All flashcards (other than dumb "smart media" cards) have integrated
> NAND controllers which perform automatic page/block remapping and
> which implement various wear-leveling algorithms. Rewriting "Sector 0"
> 10000 times probably only writes once to the first sector of a 1GB card.
> The other writes are spread around the rest of the card, and remapped
> logically by the integrated controller.
>
> Linux could be more clever about it all, though. Wear-leveling can only
> be done efficiently on "unused" or "rewritten" blocks/pages on the cards,
> and not so well with areas that hold large static data files (from the point
> of view of the flash controller, not the O/S).
>
> If we were really clever about it, then when Linux deletes a file from a
> flashcard device, it would also issue CFA ERASE commands for the newly
> freed sectors. This would let the card's controller know that it can
> remap/reuse that area of the card as it sees fit.
>
> But it's dubious that a *short term* use (minutes/hours) of O_SYNC
> would have killed a new 1GB card.
>
> Cheers


CompactFlash(tm) like Sandisk and PNY do not write directly
to the 'flash' part of the device. Instead, the page-oriented
structure, copies the contents of the current page into
static RAM. It is only when the page gets changed that the
contents of static RAM get written before the actual page-
change occurs. The write to flash-RAM is preceded by an
erase, which sets all bits, because the flash-RAM bits can
only be reset by writing.

These flash-RAM devices are designed to emulate IDE/ATA
disk drives so no special software is required except that
the software must not fail when the flash-RAM device doesn't
respond to a "hardware" command. Some versions of Linux
issue large amounts of error messages when accessing these
devices. Other versions have very long, annoying, time-outs.
Nevertheless, all Linux versions I have tried in resent times
will work once the initialization process that occurs during
boot is over.

We have systems that mount these CompactFlash(tm) devices
R/W, without denying ATIME, and have not had any field failures.
Recent design reviews have required that we mount these NOATIME,
but it was only a guess that there may be problems many years
into the future.

Regular NAND flash-RAM, installed on PC/Boards have been written
many more times than the 100k guarantee. The first observation
of a "wearing" mechanism is that the erase-cycle for programming
takes a shorter time. This means that it takes less time to
program the flash-RAM than it did when it was new. NAND flash-RAM
fails in an interesting way. Some bits that were low, slowly drift
high. This means that sometimes it will be read correctly and
sometimes not. With failed NAND flash-RAM, the erase-cycle becomes
very short. It will still program okay, but in a few days
one will have problems consistently reading correct data.

Therefore, I believe that the end-of-life could be determined
if the manufacturer just published end-of-life erase times.

The original poster did not tell how the flash-RAM fails in
his system. If he did not encounter the wear-out mechanism
I described it is likely that there is an electrical problem
that is killing his flash-RAM.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.

Pavel Machek

unread,
May 16, 2005, 1:20:15 PM5/16/05
to
Hi!

> > All flashcards (other than dumb "smart media" cards) have integrated
> > NAND controllers which perform automatic page/block remapping and
> > which implement various wear-leveling algorithms. Rewriting "Sector 0"
> > 10000 times probably only writes once to the first sector of a 1GB card.
> > The other writes are spread around the rest of the card, and remapped
> > logically by the integrated controller.
>
> Assuming the firmware of the card is written with a modicum of clue,
> this is true. It's not clear how valid that assumption is, in the
> general case. There are reports of cards behaving as if they have almost
> no wear levelling at all.

I have seen card marked "3.3V/5V", but it only really worked on
3.3V. Linux used 5V and quickly killed it.
Pavel
--
Boycott Kodak -- for their patent abuse against Java.

Helge Hafting

unread,
May 16, 2005, 7:20:10 PM5/16/05
to
On Sun, May 15, 2005 at 10:00:26PM +0300, Denis Vlasenko wrote:
>
> What we really need, is a less thorough version of O_SYNC.
> O_SYNC currently guarantees that when syscall returns, data
> is on media (or at least in disk drive's internal cache :).
>
> This is exactly what really paranoid people want.
> Journalling labels, all that good stuff.
>
> But there are many cases where people just want to say
> 'write out dirty data asap for this device', so that
> I can copy files to floppy, wait till it stops making
> noise, and eject a disk. Samr for flash if it has write
> indicator (mine has a diode).
>
I don't really see the need for a new mode.
Mount without o_sync, but use the sync command
once to get things written out. Or
use the umount command before ejecting, it syncs
the device before returning. I use umount all the time
when using compactflash. The wear is minimal.

Helge Hafting

Colin Leroy

unread,
May 17, 2005, 4:10:11 AM5/17/05
to
Hi,

> According to the man pages for mount, FAT and VFAT
> file systems ignore the "sync" option. It lies. Maybe it use to be
> true, but it certainly lies now.

Yes, it does lie. I'm the author of the O_SYNC patch for fat and vfat,
and I'd like to point out that I did test a few flash drives (and hard
drives) extensively (during about a week) with this flag, and they did
not die on me.
As other people said, I think the O_SYNC handling does exactly what it
says, and if it doesn't do what you want, tell HAL not to set it (There
must be a way).

--
Colin

Lennart Sorensen

unread,
May 17, 2005, 9:40:12 AM5/17/05
to
On Sat, May 14, 2005 at 02:43:46AM -0000, li...@horizon.com wrote:
> Alan the Hirsute spake unto the masses:
> > All non-shite quality flash keys have an on media log structured file
> > system and will take 100,000+ writes per sector or so. They decent ones
> > also map out bad blocks and have spares. The "wear out the same sector"
> > stuff is a myth except on ultra-crap devices.
>
> I would have though so, but I can say from personal experience that
> SanDisk brand CF cards respond to losing power during a write by producing
> a bad sector. I had assumed that a sensible implementation would take
> advantage of the out-of-place writing by doing a two-phase commit at
> write time, so writes would be atomic.

It can also respond to loosing power during write by getting it's state
so mixed up the whole card is dead (it identifies but all sectors fail
to read). The binary industrial grade CF cards (no longer in
production) had capacitors to be able to finish writing the block they
were doing to prevent problems. Supposedly their new firmware now will
have a rollback system so that any partial write is just added back to
the free pool. I had thought this was always how they did it, but no
apparently that is also something new.

> Does anyone know of a CF manufacturer that *does* guarantee atomic writes?
> Obviously, if power is lost during a write, it's not clear whether
> I'll get the old or the new contents, but I want one or ther other and
> not -EIO.

We were told by SanDisk when we asked about a dead card (it had power
loss during a write) and was told that is normal for the regular
multicell flash cards. They told us the firmware in the generation of
cards they are currently launching does not have a problem with that
anymore since it essentially journals the writes and can roll back a
partial block write. I imagine they have patents on that too along with
lots of other flash technology. Unfortunately their next generation
cards aren't -40 to +85C operation so although everything else was
perfect about them, they are of no use to us.

> Given that SanDisk first developed the CompactFlash card, you'd think they'd
> be a fairly reputable brand...

Well they seem to finally be getting those features working as people
have expected them to work.

Len Sorensen

Lennart Sorensen

unread,
May 17, 2005, 9:40:15 AM5/17/05
to
On Fri, May 13, 2005 at 09:05:34PM -0400, Michael H. Warfield wrote:
> Yah know... I've been thinking about this... In a former life, we use
> to do something very similar with a virtual memory system on some real
> early (80's vintage) networked VM workstations (back when memory was
> actually valuable and scarce).
>
> So... This would have to work with a list or pool of "spares" that are
> not allocated to the "visible" file system. We used a "least used"
> algorithm for that VM system. This would seem to be a "replace as
> rewritten" algorithm. Each time you write to the file system, it grabs
> a block off the head of the spares list, writes your data to it, and
> then adds the old block to the tail of the list. Pretty basic stuff and
> it doesn't have to track what kind of high level file system you are
> using or know anything about its structure. Cool...

Really good wearleveling will even move blocks that "never" seem to
change to the more used blocks ones in a while to spread out the wear to
blocks that have static content in them. After all if 90% of your flash
never changes, and you run a log in the last 10%, you will still wear
out that 10% first if you don't occationally move some of the static
content to the 10% with some wear, and start running your log on the
previously unused area.

I was told by someone from SanDisk that this is how _some_ of their
flash media work (at least on the new ones).

I was actually surprised since I assumed at the time this was how all of
the CF cards worked.

Len Sorensen

Richard B. Johnson

unread,
May 17, 2005, 4:50:09 PM5/17/05
to
On Tue, 17 May 2005 li...@horizon.com wrote:

>> It can also respond to loosing power during write by getting it's state
>> so mixed up the whole card is dead (it identifies but all sectors fail
>> to read).
>

> Gee, that just happened to me! Well, actually, thanks to Linux's
> *insistence* on reading the partition table, I haven't managed to
> get I/O errors on anything bit sectors 0 through 7, but I am quite
> sure I wasn't writing those sectors when I pulled the plug:
>
> hda: read_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: read_intr: error=0x10 { SectorIdNotFound }, LBAsect=6, sector=6
> hda: read_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: read_intr: error=0x10 { SectorIdNotFound }, LBAsect=6, sector=6
> hda: read_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: read_intr: error=0x10 { SectorIdNotFound }, LBAsect=6, sector=6
> hda: read_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: read_intr: error=0x10 { SectorIdNotFound }, LBAsect=6, sector=6
> ide0: reset: success
> hda: read_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: read_intr: error=0x10 { SectorIdNotFound }, LBAsect=6, sector=6
> end_request: I/O error, dev 03:00 (hda), sector 6
> unable to read partition table
[SNIPPED...]

You can "fix" this by writing all sectors. Although the data is lost,
the flash-RAM isn't. This can (read will) happen if you pull the
flash-RAM out of its socket with the power ON.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.11.9 on an i686 machine (5537.79 BogoMips).


Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.

li...@horizon.com

unread,
May 17, 2005, 4:40:09 PM5/17/05
to
> It can also respond to loosing power during write by getting it's state
> so mixed up the whole card is dead (it identifies but all sectors fail
> to read).

Gee, that just happened to me! Well, actually, thanks to Linux's


*insistence* on reading the partition table, I haven't managed to
get I/O errors on anything bit sectors 0 through 7, but I am quite
sure I wasn't writing those sectors when I pulled the plug:

hda: read_intr: status=0x51 { DriveReady SeekComplete Error }
hda: read_intr: error=0x10 { SectorIdNotFound }, LBAsect=6, sector=6
hda: read_intr: status=0x51 { DriveReady SeekComplete Error }
hda: read_intr: error=0x10 { SectorIdNotFound }, LBAsect=6, sector=6
hda: read_intr: status=0x51 { DriveReady SeekComplete Error }
hda: read_intr: error=0x10 { SectorIdNotFound }, LBAsect=6, sector=6
hda: read_intr: status=0x51 { DriveReady SeekComplete Error }
hda: read_intr: error=0x10 { SectorIdNotFound }, LBAsect=6, sector=6
ide0: reset: success
hda: read_intr: status=0x51 { DriveReady SeekComplete Error }
hda: read_intr: error=0x10 { SectorIdNotFound }, LBAsect=6, sector=6
end_request: I/O error, dev 03:00 (hda), sector 6
unable to read partition table

> The binary industrial grade CF cards (no longer in


> production) had capacitors to be able to finish writing the block they
> were doing to prevent problems.

Er... are you talking about the SanDisk SDCFI or SDCFJ series?
If they told me, I'd specify them instantly...

> Supposedly their new firmware now will have a rollback system so that any
> partial write is just added back to the free pool. I had thought this
> was always how they did it, but no apparently that is also something new.

Indeed; I'm in a fix right now because this seemed to thunderingly obvious
to me I didn't check carefully before committing to a CF-based design.

> We were told by SanDisk when we asked about a dead card (it had power
> loss during a write) and was told that is normal for the regular
> multicell flash cards. They told us the firmware in the generation of
> cards they are currently launching does not have a problem with that
> anymore since it essentially journals the writes and can roll back a
> partial block write.

You wouldn't happen to know what devices those are, would you?
The SDCFH "Ultra II" series, maybe?

I'm talking to them now, so perhaps I'll learn. As I said, since it
specifically does out-of-place writes, a two-phase commit is
startlingly easy to do. The basic procedure is:

- For those who don't know, Flash memory technology can only be "erased"
to all 1 bits large blocks, but can be "programmed" to 0 bits on
a bit-by-bit basis. Also, high-density NAND flash often has bad bits,
so ECC is required.
- All High-density NAND flash has 512+16=528-byte sectors. The 16 bytes
are a label area for ECC and information to identify the 512-byte payload.
- If a write is interrupted, it's possible that the affected bit will
read unreliably.
- Reserve three bits (possibly each implemented with multiple physical
bits for redundancy in the face of errors). They mean, respectively:
- "This sector has started being programmed",
- "This sector has finished being programmed and its contents are valid", and
- "This sector contains stale data and should be erased".

To execute a new write,
- Choose an erased sector (or erase an unused sector if your pool of
pre-erased sectors has been used up).
- (Verify that the sector truly is erased. If it's not, program the
"stale data; to be erased" bits and go back to step 1.)
- Program the "write starting bit". This is important so that it is
possible to tell that the sector is no longer clear without having
to check the entire data area. Which is important when building the
initial list of erased sectors when booting.
- Program the data, ECC bits, etc.
- (Verify the data was written properly. Flash memory wears out
eventually, so bad blocks may develop during operation.)
- Program the "finished programming bit".
- Program the stale-data bits of the previous version of the sector.

When booting, read all the label areas and build the initial
logical/physical sector map.

If you find one for which the "started programming" bit is set but the
"finished programming" one is not, read it and verify the checksums.
If all looks well, re-program the sector (to make sure there aren't
any half-programmed bits) and program the "finished programming" bit.
(This is required in case the finished programming bit was half-programmed
when power was lost; if you don't do it, it's possible that *this* time
you felt sure the sector wasn't finished but the next time the card is
booted, the bit *will* read as programmed, resulting in a confused user.)

Also find the pervious version of the same sector and program its
stale bits. You need at least a 3-state sequence number to do this,
but that's not a requirement created by atomic writing.

If, on the other hand, reading the payload produces a CRC error, program
the "stale & to be erased" bit.

> I imagine they have patents on that too along with
> lots of other flash technology. Unfortunately their next generation
> cards aren't -40 to +85C operation so although everything else was
> perfect about them, they are of no use to us.

Well, I can make do. If you *are* talking about SDCFI or SDCFJ, they're
still for sale at
https://www.californiapc.com/products/sdflash_industrial.php3
at least...

Anyway, thanks for the information!

Denis Vlasenko

unread,
May 18, 2005, 3:10:10 AM5/18/05
to
On Tuesday 17 May 2005 02:18, Helge Hafting wrote:
> On Sun, May 15, 2005 at 10:00:26PM +0300, Denis Vlasenko wrote:
> >
> > What we really need, is a less thorough version of O_SYNC.
> > O_SYNC currently guarantees that when syscall returns, data
> > is on media (or at least in disk drive's internal cache :).
> >
> > This is exactly what really paranoid people want.
> > Journalling labels, all that good stuff.
> >
> > But there are many cases where people just want to say
> > 'write out dirty data asap for this device', so that
> > I can copy files to floppy, wait till it stops making
> > noise, and eject a disk. Samr for flash if it has write
> > indicator (mine has a diode).
> >
> I don't really see the need for a new mode.
> Mount without o_sync, but use the sync command
> once to get things written out. Or
> use the umount command before ejecting, it syncs
> the device before returning. I use umount all the time
> when using compactflash. The wear is minimal.

I just want this to happen automatically
(automounter helps with this) and at once
(i.e. without delay prior to start of writeout).

Caching (delaying) writes for hard disks makes tons of sense,
but not as much for removable media.
--
vda

li...@horizon.com

unread,
May 18, 2005, 7:30:22 AM5/18/05
to
>> hda: read_intr: status=0x51 { DriveReady SeekComplete Error }
>> hda: read_intr: error=0x10 { SectorIdNotFound }, LBAsect=6, sector=6
>> end_request: I/O error, dev 03:00 (hda), sector 6
>> unable to read partition table
> [SNIPPED...]
>
> You can "fix" this by writing all sectors. Although the data is lost,
> the flash-RAM isn't. This can (read will) happen if you pull the
> flash-RAM out of its socket with the power ON.

Er... no. Trying to write 8K to /dev/hda, I get the above error
on sector 15.

My *other* problems could be fixed by rewriting the affected sector, but
this one seems to be a doozy. I never saw "SectorIdNotFound" before.

> Notice : All mail here is now cached for review by Dictator Bush.

As long as he has to read it personally, that's fine. I'll get some
small pleasure watching his lips move.

Richard B. Johnson

unread,
May 18, 2005, 8:10:11 AM5/18/05
to
On Wed, 18 May 2005 li...@horizon.com wrote:

>>> hda: read_intr: status=0x51 { DriveReady SeekComplete Error }
>>> hda: read_intr: error=0x10 { SectorIdNotFound }, LBAsect=6, sector=6
>>> end_request: I/O error, dev 03:00 (hda), sector 6
>>> unable to read partition table
>> [SNIPPED...]
>>
>> You can "fix" this by writing all sectors. Although the data is lost,
>> the flash-RAM isn't. This can (read will) happen if you pull the
>> flash-RAM out of its socket with the power ON.
>
> Er... no. Trying to write 8K to /dev/hda, I get the above error
> on sector 15.
>

If you can boot DOS or FREE dos on your system, see if the disk
emulation implimented the format-unit command. You can do it with
debug...

- mov dx, 81 ; 81 is D: , 80 is C:
- mov cx, 0 ; Start at cylinder 0
- mov ah, 7 ; Format unit command
- int 13 ; BIOS hard-disk service
- int 3 ; Catch after call

If the call returned with CY not set and the command took some time
it is likely that new sectors were written and all is well.

> My *other* problems could be fixed by rewriting the affected sector, but
> this one seems to be a doozy. I never saw "SectorIdNotFound" before.
>
>> Notice : All mail here is now cached for review by Dictator Bush.
>
> As long as he has to read it personally, that's fine. I'll get some
> small pleasure watching his lips move.
>

Cheers,


Dick Johnson
Penguin : Linux version 2.6.11.9 on an i686 machine (5537.79 BogoMips).

Notice : All mail here is now cached for review by Dictator Bush.

98.36% of all statistics are fiction.

Lennart Sorensen

unread,
May 18, 2005, 9:50:13 AM5/18/05
to
On Tue, May 17, 2005 at 08:31:17PM -0000, li...@horizon.com wrote:
> Gee, that just happened to me! Well, actually, thanks to Linux's
> *insistence* on reading the partition table, I haven't managed to
> get I/O errors on anything bit sectors 0 through 7, but I am quite
> sure I wasn't writing those sectors when I pulled the plug:
>
> hda: read_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: read_intr: error=0x10 { SectorIdNotFound }, LBAsect=6, sector=6
> hda: read_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: read_intr: error=0x10 { SectorIdNotFound }, LBAsect=6, sector=6
> hda: read_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: read_intr: error=0x10 { SectorIdNotFound }, LBAsect=6, sector=6
> hda: read_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: read_intr: error=0x10 { SectorIdNotFound }, LBAsect=6, sector=6
> ide0: reset: success
> hda: read_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: read_intr: error=0x10 { SectorIdNotFound }, LBAsect=6, sector=6
> end_request: I/O error, dev 03:00 (hda), sector 6
> unable to read partition table

Yeah that is exactly how it responds after a powerloss during write when
you have no protection against that.

> > The binary industrial grade CF cards (no longer in
> > production) had capacitors to be able to finish writing the block they
> > were doing to prevent problems.
>
> Er... are you talking about the SanDisk SDCFI or SDCFJ series?
> If they told me, I'd specify them instantly...

The model with the capacitors was the SDCFBI-size-201-80, but they are no
longer available for purchase (we were going to use them and are now
trying a few other brands). The -80 Meant industrial grade which was
the flash with capacitors to keep power failures from killing a write
in progress. None of our tests have killed a -80 yet. The -00 we have
killed by have power off during writes. They were also rated at 3 times
the write cycles of their regular grade.

We were told the new cards that they offer to OEMs have new firmware
that does some kind of journaling or timestamping or something to deal
with the problem instead. I haven't tried the new cards and given they
don't offer the needed temperature range we need, I won't be trying them
either.

> > Supposedly their new firmware now will have a rollback system so that any
> > partial write is just added back to the free pool. I had thought this
> > was always how they did it, but no apparently that is also something new.
>
> Indeed; I'm in a fix right now because this seemed to thunderingly obvious
> to me I didn't check carefully before committing to a CF-based design.
>
> > We were told by SanDisk when we asked about a dead card (it had power
> > loss during a write) and was told that is normal for the regular
> > multicell flash cards. They told us the firmware in the generation of
> > cards they are currently launching does not have a problem with that
> > anymore since it essentially journals the writes and can roll back a
> > partial block write.
>
> You wouldn't happen to know what devices those are, would you?
> The SDCFH "Ultra II" series, maybe?

Anything sold retail is anyone's guess (according to their rep) while
OEM cards you get what it says on the card. The retail cards don't
carry the same model numbers either. A retail sandisk card might not
even contain sandisk memory (only the controller is sure to be sandisk).

Well I suspect that is along the lines of what the sandisk 201 series'
replacement is doing in it's firmware.

The new ones must be either SDCFH or SDCFJ but I can't find anything
that says what the difference is between the two lines. We were using
the SDCFBI-*-80 cards.

> Well, I can make do. If you *are* talking about SDCFI or SDCFJ, they're
> still for sale at
> https://www.californiapc.com/products/sdflash_industrial.php3
> at least...

Well we were told last buy was about a week or two ago on the
SDCFBI-*-201-80 cards. We are now playing with SLCF*JI cards from
SimpleTech and hopefully those will work out for us. It sure is hard to
do indurstrial temperature when most people don't care. Most companies
are happy to avoid the trouble since normal temperature suits 99% of the
market, so why bother with the trouble for the last 1% even if they are
willing to pay double. :)

Len Sorensen

Michael H. Warfield

unread,
May 18, 2005, 5:20:18 PM5/18/05
to
All right... Now I'm really confused.

There are, obviously, some individuals on this list who are a LOT more
knowledgeable about the internal workings of flash, so I'm hoping for a
clear(er) understanding of just WHAT is going on here.

I'm the original poster and someone in another message remarked about
not having enough details on the damage to the card... So I just did
some spot checking on the card for some details.

Block checking with dd bs=512 if=/dev/sda gave me some indicators...

Blocks 0-7 DOA, hard read errors, 0+0 records in.

Blocks 8-31 would read 8 blocks at a time and then give me an error, but
the next 8 blocks would read fine. So 24 consecutive blocks SEEMED to
read but strangely.

Blocks 32-39 DOA

Blocks 40-71 would read 8 blocks at a time.

Roughly 1/3 of the blocks seem to be dead in multiples of 8 blocks on 8
block boundries early on. No real pattern to which ones were dead and
which ones would read 8 and then error.

Once past block 512, huge blocks would be readable but eventually give
me an error.

Dead fields (0 records read) were always multiples of 8 512 byte blocks,
4KBytes falling on an 4K boundry.

Reading with dd bs=4096 gave similar results for 4K blocks with skip
count less than 64. Skip count 64 and greater gave me large swaths that
were readable. No time did I see a partial record read (indicating a
failure off a 4K boundry).

Basically, that re-enforced my option that it was block wear-out from
uneven wear leveling when copying that 700 Meg file and beating the
bejesus out of the FAT tables. Front part of the flash was heavily
damaged with sporatic damage deeper in the flash.

Now, I saw this message... Well... I didn't remove the key when it
was being written to but, what the hell... The key is dead, I've got
nothing to loose, and it might yield some more information as to the
nature of the failure. So I copied zeros to the entire key with "dd
if=/dev/zero of=/dev/sda bs=16M". I'll be a son of a bitch but that key
recovered. I've partitioned it and read the whole damn thing back end
to end and it's perfect.

Ok... So, WTF? It wasn't (AFAICT) due to loss of power or pulling it
while writing. What was this failure and why did overwriting it fix it?
Did the stick just flaw out all the burned out blocks or did it really
recover the ECC errors? I'm really baffled now.

BTW... I've killed the "sync" option in hal (you just have to create
an XML policy file in the right location to specify that option as false
in all cases) and have been beating the crap out of several other keys
without a single failure. I'm going to try this key again...

Thank you very VERY much for this hint to recover the damaged key.
That's a trick I've used for damaged IDE & SCSI hard drives (recover
head drift and soft errors) and I never thought to try it with a flash
key. I'll be damned if I understand just what has happened at this
point but I really appreciate that trick.

> Cheers,
> Dick Johnson
> Penguin : Linux version 2.6.11.9 on an i686 machine (5537.79 BogoMips).
> Notice : All mail here is now cached for review by Dictator Bush.
> 98.36% of all statistics are fiction.

Regards,

signature.asc

Richard B. Johnson

unread,
May 19, 2005, 9:00:09 AM5/19/05
to

The problem is that it's a RAM disk that uses flash-RAM, plus a
little bit of SRAM for one page of I/O. Some devices use two
pages of SRAM to ping-pong for speed. The size of the 'pages'
might vary with the manufacturer. The only thing known is that
these pages will be a multiple of the de facto 512 byte 'sector'
size of a physical disk.

You fixed the device by writing to the whole device without
an intervening read. Writes work like this. The data written goes
into a SRAM page that is shadowed. When that page is filled,
before another page is switched, the flash-RAM page that was
shadowed is now written to the real flash-RAM. This is necessary
because flash-RAM can only be written by resetting bits, not
setting them. So first the page is erased which takes a lot of
time and sets all the bits high. The writing process sends an
unlock-sequence to the flash-RAM controller, followed by the
offset into the page, followed by the data byte. This also
takes time so flash-RAM without the SRAM random-access shadow
page is somewhat limited in value. This process continues util
you have written the whole device.

Normal random access works like this, the page to be accessed
is calculated by dividing the offset you want, by the real
page-size. The offset into that page is the remainder from
the division. Any unflushed data in the SRAM gets written to
the device as previously shown. The newly calculated page
is read into the SRAM. You do I/O from the SRAM. The chip
remembers if any writes occur. If a write occurs, the contents
of the SRAM is flushed to the calculated page any time the
page is about to be changed. There is also a "stale" alogrithm
that writes out a page that hasn't been accessed for some
time.

Now, if you interrupt this sequence by killing the power at
some 'bad' time, data will not be correct. In fact, you could
have an erased page with all bits set.

Now, when you have a file-system that has inodes scattered
all through it, any inodes that are on an erased page will
cause the next access to be at some offset (sector) that
doesn't exist. Since there are no 'sector IDs' as shown
in the errors reported, they must be created by the hard-disk
emulation. So, looking at errors with Sector ID=6, etc.,
simply means that the emulator was 'confused'. It was probably
the Nth wrap of some hardware variable.

Anyway, I've used the SanDisk and PNY flash-RAM that emulates
a 'type 3' IDE drive since they first became available. I haven't
killed any yet. But.... I've destroyed many file-systems by
unplugging them while accesses were occurring.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.11.9 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.

0 new messages