eMMC data corruption due to power removal?

4,215 views
Skip to first unread message

Yiling Cao

unread,
Mar 26, 2014, 8:53:03 AM3/26/14
to beagl...@googlegroups.com
Hi I have some my products deployed with am335x with Micron eMMC 2GB, but my products allow users to unplug power as they wish. 

My linux app very rarely writes to the eMMC. and my /etc/fstab specifies /var/log and /tmp to tempfs; fstab mount all partitions with "noatime" properties.

But around 2 months of deployment, I found that around 1-2% am335x machines, have some sort of data corruption, resulting fail to boot up.

Can anyone share some thoughts/ experience about how to resolve this issue? In real life product, whats the best practice?

Sungjin Chun

unread,
Mar 26, 2014, 5:45:57 PM3/26/14
to beagl...@googlegroups.com
How about making system partition be mounted as read-only and data partition be mounted after booting and checking? In this case, only data partition has possibility of corruption.

Sent from my iPad
--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to the Google Groups "BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Brandon I

unread,
Mar 26, 2014, 9:46:19 PM3/26/14
to beagl...@googlegroups.com

I had a loooooooong discussion about this with a colleague of mine after we started seeing boards die.

Basically you're eventually doomed unless you mount the whole disk as read only since the wear leveling algorithms in the flash have no knowledge of what a partition is and will eventually end up with suppesed-to-be-read-only data mixed in with the writable partition erase blocks. If you're writing to flash, it will eventually fail by unfortunate design.

It tooks his previous company 6 months of fighting to come to terms with this in their last product. They had to write data, so eventually used usb flash that the customer could easily replace when things eventually died. They tried every flash card they could get their hands on, read only partitions, etc and eventually had to give up.

Use the SD card you say! Any micro SD card you can put in the slot is absolutely not meant for continuous writing. The SD card spec has a very specific use case in mind (video and images), and logging or using it as a sparse write file system goes completely against the intended SD card design specs. Industrial grade write-tolerant flash will cost you hundreds of dollars more than something on Amazon.

With our current product, I told my boss that I was worried about corruption and that we would eventually go to read only once we debugged the boards. Within two weeks of only log messages, all of our boards started dyeing. The next day, all disks were mounted as read only and issues are debugged with the in-memory log files. We haven't seen any failures in 6 months now.

The easy solution is trying to force the answer of "why are you writing anything to persistent storage?" to be "there's no good reason since it eventually bricks our product". If you want something that will last forever, you will not write to standard flash media. If you can't, then maybe use a usb flash drive (MUCH better life than a micro sd card) and count the days until it corrupts or someone pulls the power at an inopportune time. You could always use a battery backup to get rid of the power off issue. :-\

This is all doom and gloom, but it's a consequence of inconsistent power, buffers, and the destruction nature of quantum tunneling.

-Brandon

Yiling Cao

unread,
Mar 26, 2014, 11:22:14 PM3/26/14
to beagl...@googlegroups.com
Thanks Brandon for your experience. I do agree with that better to put whole disk read only.

But how do iPhone and Android survive? Esp for those Android phones? They are very prone to sudden power removal as well. 

How do routers handle this issue? they save the settings on different devices?

I have a SQLite db around 1-2M and data will be written to them. Would like to have some easy and quick solution to make it absolutely stable.

Yiling Cao

unread,
Mar 26, 2014, 11:23:26 PM3/26/14
to beagl...@googlegroups.com
Anyway to make monuting do fsck before mounting? Shall I edit fstab command line or ?
Message has been deleted

Charles Steinkuehler

unread,
Mar 27, 2014, 8:41:11 AM3/27/14
to beagl...@googlegroups.com
On 3/26/2014 10:22 PM, Yiling Cao wrote:
> Thanks Brandon for your experience. I do agree with that better to put
> whole disk read only.
>
> But how do iPhone and Android survive? Esp for those Android phones? They
> are very prone to sudden power removal as well.

What? These devices are battery powered, and other than opening the
case and physically removing the battery they are guaranteed enough
power to do a proper and orderly shutdown.

> How do routers handle this issue? they save the settings on different
> devices?

Routers save a very small amount of setup data, and either have a very
small window when they are writing updates to the filesystem, or in some
cases can store the configuration in EEPROM.

> I have a SQLite db around 1-2M and data will be written to them. Would like
> to have some easy and quick solution to make it absolutely stable.

I don't think "easy and quick" go together with "absolutely stable" in
this context. You're looking at solutions like adding a backup battery,
migrating your SQLite db to a different storage device, or other
solutions that do not fit the "easy and quick" description.

I think about the simplest thing you can do is add a uSD card and
separate the OS from the data storage. This gets you around the problem
of corrupting the OS when writing to the data, but you can still run
into problems because the uSD card need to have specific boot files
present or the BBB won't boot. That problem can be fixed by updating
the u-boot configuration on the eMMC so it ignores the uSD card and
always boots from eMMC.

You'll still need to be able to deal with data corruption in your db
files, but that's a solvable software problem if the system reliably boots.

--
Charles Steinkuehler
cha...@steinkuehler.net

Yiling Cao

unread,
Mar 27, 2014, 8:47:33 AM3/27/14
to beagl...@googlegroups.com
Thanks for your reply. 

On Thu, Mar 27, 2014 at 8:41 PM, Charles Steinkuehler <cha...@steinkuehler.net> wrote:
On 3/26/2014 10:22 PM, Yiling Cao wrote:
> Thanks Brandon for your experience. I do agree with that better to put
> whole disk read only.
>
> But how do iPhone and Android survive? Esp for those Android phones? They
> are very prone to sudden power removal as well.

What?  These devices are battery powered, and other than opening the
case and physically removing the battery they are guaranteed enough
power to do a proper and orderly shutdown.

What I mean is you can take out battery at back very easily as well.
 
> How do routers handle this issue? they save the settings on different
> devices?

Routers save a very small amount of setup data, and either have a very
small window when they are writing updates to the filesystem, or in some
cases can store the configuration in EEPROM.

> I have a SQLite db around 1-2M and data will be written to them. Would like
> to have some easy and quick solution to make it absolutely stable.

I don't think "easy and quick" go together with "absolutely stable" in
this context.  You're looking at solutions like adding a backup battery,
migrating your SQLite db to a different storage device, or other
solutions that do not fit the "easy and quick" description.

I think about the simplest thing you can do is add a uSD card and
separate the OS from the data storage.  This gets you around the problem
of corrupting the OS when writing to the data, but you can still run
into problems because the uSD card need to have specific boot files
present or the BBB won't boot.  That problem can be fixed by updating
the u-boot configuration on the eMMC so it ignores the uSD card and
always boots from eMMC.

You'll still need to be able to deal with data corruption in your db
files, but that's a solvable software problem if the system reliably boots.


I have already minimized data writes. I hope by next version I will write stuff to eeprom. 
--
Charles Steinkuehler
cha...@steinkuehler.net

Yiling Cao

unread,
Mar 27, 2014, 8:51:39 AM3/27/14
to beagl...@googlegroups.com
When there are very small time window to update the content in flash. do you choose to:

1. initially mount as ro, remount as rw, write your changes and remount back to ro? OR
2. just mount as rw to boot up?




Message has been deleted

Charles Steinkuehler

unread,
Mar 27, 2014, 2:41:24 PM3/27/14
to beagl...@googlegroups.com
On 3/27/2014 12:26 PM, rh_ wrote:
> On Thu, 27 Mar 2014 07:41:11 -0500
> Charles Steinkuehler <cha...@steinkuehler.net>
> wrote:
>
>> On 3/26/2014 10:22 PM, Yiling Cao wrote:
>>> Thanks Brandon for your experience. I do agree with that better to
>>> put whole disk read only.
>>>
>>> But how do iPhone and Android survive? Esp for those Android
>>> phones? They are very prone to sudden power removal as well.
>>
>> What? These devices are battery powered, and other than opening the
>> case and physically removing the battery they are guaranteed enough
>> power to do a proper and orderly shutdown.
>
> I pull the battery on my android frequently doing devel. Never had any
> problems. I pull the plug on my BBB all the time too, at least once/day.
> No problems.

Yes, but are you writing to the flash when you pull the power?

There is a huge difference between "it works for me" and *RELIABLY*
avoiding data corruption when power is unexpectedly removed with
significant write activity in-progress.

--
Charles Steinkuehler
cha...@steinkuehler.net

Brandon I

unread,
Mar 27, 2014, 3:43:22 PM3/27/14
to beagleboard
That's because your phone uses a sane filesystems that takes into account this use case and isn't writing constantly (write one byte, the disk writes a whole erase block). This doesn't protect you from eventual disk corruption. The wear leveling bad-block type tables will eventually corrupt/run out of memory loooong before your disk space is eaten by bad blocks.


"Most Android devices currently use YAFFS, a lightweight filesystem that is optimized for flash storage and is commonly used in mobile and embedded devices."

My production Beaglebone image does not support this.

"Developers who are accessing the filesystem directly will have to be mindful about Ext4's buffering behavior and make sure that the data is actually reaching persistent storage in a timely manner so that it won't be lost in the event of a system failure."

It is now an issue with Android!

"T'so says that there isn't much need for concern. Google and the handset makers will catch platform-level filesystem reliability issues, ensuring that the high-level storage APIs are safe."

Is the API you use for disk writes safe? Nope.

-Brandon

On Thu, Mar 27, 2014 at 10:26 AM, rh_ <richard...@lavabit.com> wrote:
On Thu, 27 Mar 2014 07:41:11 -0500
Charles Steinkuehler <cha...@steinkuehler.net>
wrote:

> On 3/26/2014 10:22 PM, Yiling Cao wrote:
> > Thanks Brandon for your experience. I do agree with that better to
> > put whole disk read only.
> >
> > But how do iPhone and Android survive? Esp for those Android
> > phones? They are very prone to sudden power removal as well.
>
> What?  These devices are battery powered, and other than opening the
> case and physically removing the battery they are guaranteed enough
> power to do a proper and orderly shutdown.

I pull the battery on my android frequently doing devel. Never had any
problems. I pull the plug on my BBB all the time too, at least once/day.
No problems.

For people having issues I would suspect a problem elsewhere.

--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to a topic in the Google Groups "BeagleBoard" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beagleboard/dV0ctlQykYI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beagleboard...@googlegroups.com.

John Syn

unread,
Mar 27, 2014, 5:10:46 PM3/27/14
to beagl...@googlegroups.com

From: Brandon I <brando...@gmail.com>
Reply-To: <beagl...@googlegroups.com>
Date: Wednesday, March 26, 2014 at 6:46 PM
To: <beagl...@googlegroups.com>
Subject: Re: [beagleboard] eMMC data corruption due to power removal?


I had a loooooooong discussion about this with a colleague of mine after we started seeing boards die.

Basically you're eventually doomed unless you mount the whole disk as read only since the wear leveling algorithms in the flash have no knowledge of what a partition is and will eventually end up with suppesed-to-be-read-only data mixed in with the writable partition erase blocks. If you're writing to flash, it will eventually fail by unfortunate design.

It tooks his previous company 6 months of fighting to come to terms with this in their last product. They had to write data, so eventually used usb flash that the customer could easily replace when things eventually died. They tried every flash card they could get their hands on, read only partitions, etc and eventually had to give up.

Use the SD card you say! Any micro SD card you can put in the slot is absolutely not meant for continuous writing. The SD card spec has a very specific use case in mind (video and images), and logging or using it as a sparse write file system goes completely against the intended SD card design specs. Industrial grade write-tolerant flash will cost you hundreds of dollars more than something on Amazon.

With our current product, I told my boss that I was worried about corruption and that we would eventually go to read only once we debugged the boards. Within two weeks of only log messages, all of our boards started dyeing. The next day, all disks were mounted as read only and issues are debugged with the in-memory log files. We haven't seen any failures in 6 months now.

The easy solution is trying to force the answer of "why are you writing anything to persistent storage?" to be "there's no good reason since it eventually bricks our product". If you want something that will last forever, you will not write to standard flash media. If you can't, then maybe use a usb flash drive (MUCH better life than a micro sd card) and count the days until it corrupts or someone pulls the power at an inopportune time. You could always use a battery backup to get rid of the power off issue. :-\

This is all doom and gloom, but it's a consequence of inconsistent power, buffers, and the destruction nature of quantum tunneling.
What you say is mostly correct. However, you can use supercaps based power supply which will enable you to store data stored in RAM to Non-Volatile storage such as SDCard or eMMC when a power fail is detected. Also, when Linux goes through an orderly shutdown, no corruption occurs. This way, you only write to flash during a power failure so you won’t see any flash failures. The supercaps don’t have a limited number of charge cycles which is common in Lithium Iron batteries so these systems should be good for 10 years or more. Plan for about 90 Seconds to write data to flash and Linux shutdown. 

Regards,
John

David Lambert

unread,
Mar 27, 2014, 5:25:29 PM3/27/14
to beagl...@googlegroups.com
I have had a long and painful history using flash in general, and have come to the conclusion that asynchronous removal of power is a very bad thing. The following link shows one low level phenomenon called "unstable bits". This seems to be getting worse the more bits that are stuffed into a cell (pretty obvious) :-[

http://www.linux-mtd.infradead.org/doc/ubifs.html#L_unstable_bits

Some other studies suggest that very high density chips may exhibit similar problems even when reading during a power fail!

My conclusions lean to removing power only when ALL accesses to flash have completed.

HTH,

Dave.
Message has been deleted

Brandon I

unread,
Mar 27, 2014, 7:59:35 PM3/27/14
to beagleboard
Rh, my earlier reply was to you, and that link shows that it is now a problem with androids use of ext4.


On Thu, Mar 27, 2014 at 4:55 PM, rh_ <richard...@lavabit.com> wrote:
On Thu, 27 Mar 2014 13:41:24 -0500

Charles Steinkuehler <cha...@steinkuehler.net>
wrote:

> On 3/27/2014 12:26 PM, rh_ wrote:
> > On Thu, 27 Mar 2014 07:41:11 -0500
> > Charles Steinkuehler
> > <cha...@steinkuehler.net> wrote:
> >
> >> On 3/26/2014 10:22 PM, Yiling Cao wrote:
> >>> Thanks Brandon for your experience. I do agree with that better to
> >>> put whole disk read only.
> >>>
> >>> But how do iPhone and Android survive? Esp for those Android
> >>> phones? They are very prone to sudden power removal as well.
> >>
> >> What?  These devices are battery powered, and other than opening
> >> the case and physically removing the battery they are guaranteed
> >> enough power to do a proper and orderly shutdown.
> >
> > I pull the battery on my android frequently doing devel. Never had
> > any problems. I pull the plug on my BBB all the time too, at least
> > once/day. No problems.
>
> Yes, but are you writing to the flash when you pull the power?

Don't know. But it's possible. How would I know? If it doesn't boot?
For android there's JAFFS (or is it YAFFS) so it's more robust than ext4
I guess.


>
> There is a huge difference between "it works for me" and *RELIABLY*
> avoiding data corruption when power is unexpectedly removed with
> significant write activity in-progress.

Ok, but I haven't encountered a problem yet, and I'm never that lucky.
With the millions and millions (billions?) of handsets I would think
data corruption would be a much more visible problem. I haven't seen
it happen yet over many phones and many years.

--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to a topic in the Google Groups "BeagleBoard" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beagleboard/dV0ctlQykYI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beagleboard...@googlegroups.com.
Message has been deleted
Message has been deleted

David Lambert

unread,
Mar 27, 2014, 9:37:15 PM3/27/14
to beagl...@googlegroups.com
On 03/27/2014 07:04 PM, rh_ wrote:
> On Thu, 27 Mar 2014 16:25:29 -0500
> David Lambert <da...@lambsys.com> wrote:
>
>> I have had a long and painful history using flash in general, and
>> have come to the conclusion that asynchronous removal of power is a
> asynchronous? Like pulling the plug and not pushing it?
Yes.
>
>> very bad thing. The following link shows one low level phenomenon
>> called "unstable bits". This seems to be getting worse the more bits
>> that are stuffed into a cell (pretty obvious) :-[
> low-level phenomenon? You mean a manufacturer defect? An inherent
> defect in the flash design? Implementation defect?
All flash is inherently error prone. That's why long ECC codes are
employed as recommended by the flash chip manufacturers
>
>> Some other studies suggest that very high density chips may exhibit
>> similar problems even when *_reading_* during a power fail!
> Ouch.
>
>> My conclusions lean to removing power only when ALL accesses to flash
>> have completed.
> What technologies were used to reach your conclusion? Filesystems,
> flash device, etc.
From the mid 1990s. Everything from raw NAND, NOR flash chips with ASIC
or software ECC controllers/wear leveling. USB/SD/CompactFlash. File
systems UBIFS, XFS, Ext2/3/4, FAT, and some proprietary sequential only
file systems with embedded EDC/ECC.
>
> Why is this technology wide spread if it's got an inherent flaw?
>
Cheap, and with the right controllers, reliable.

Wikipedia has a good basic introduction to the technology with some more
authoritative citations. For greater depth some of the manufacturers'
data sheets may be helpful.
http://en.wikipedia.org/wiki/Flash_memory
http://www.micron.com/products/nand-flash

Holger Hellmuth

unread,
Mar 28, 2014, 7:17:39 AM3/28/14
to beagl...@googlegroups.com
Am 27.03.2014 20:43, schrieb Brandon I:
> That's because your phone uses a sane filesystems that takes into
> account this use case and isn't writing constantly (write one byte, the
> disk writes a whole erase block). This doesn't protect you from eventual
> disk corruption. The wear leveling bad-block type tables will eventually
> corrupt/run out of memory loooong before your disk space is eaten by bad
> blocks.

Here you are talking about wear-leveling running out of storage because
of too many writes.

>
> It is now an issue with Android!
>
> "T'so says that there isn't much need for concern. Google and the
> handset makers will catch platform-level filesystem reliability issues,
> ensuring that the high-level storage APIs are safe."
>
> Is the API you use for disk writes safe? Nope.

This article from 2010 seems to talk about inconsistent file systems
because of buffering, i.e. delayed and too few writes.

So which is it? Do you see unrepairable SD-cards or simply SD-cards that
work again after a mkfs? Is it too few or too many writes?


Reply all
Reply to author
Forward
Message has been deleted
0 new messages