More about SD-Card problems

969 views
Skip to first unread message

Mika Westerberg

unread,
Jan 13, 2011, 5:07:27 AM1/13/11
to marti...@gmail.com, si...@googlegroups.com
Hello,

I've been investigating this SD-Card problem for a while and I think
it's time to share my observations.

As we know, 'apt-get update' is very good in triggering the problem.
I've managed to reduce it into a small test program
which almost always causes the card to fail. The program is attached
(mmap_stress.c).

Current theory is following:

Typically we fail on writing a page with error -EILSEQ (CRC error).
Now the code that calculates the CRC is in (this is from .37+ kernel):

drivers/mmc/host/mmc_spi.c:mmc_spi_writeblock():
...
if (host->mmc->use_spi_crc)
scratch->crc_val = cpu_to_be16(
crc_itu_t(0, t->tx_buf, t->len));
if (host->dma_dev)
dma_sync_single_for_device(host->dma_dev,
host->data_dma, sizeof(*scratch),
DMA_BIDIRECTIONAL);

status = spi_sync_locked(spi, &host->m);

We can see that CRC is calculated over the TX buf *before* the
message is sent to bus. Now we know that doing SPI transfer is going
to sleep -> which causes context switch.

Then apt-get (or mmap_stress) is scheduled to the processor and it
does some change to the mmapped file and presumably to the same page
which currently being transferred to the card. This causes card to
fail to verify the CRC -> bang.

Now this behavior is normal for mmapped files. It can be changed and
only way of making sure that the disk has the latest stuff is to
call msync() or similar.

We can prevent CRCs from SD card transfers by passing
'mmc_core.use_spi_crc=0' to the kernel.

I've been running some testing with disabled data CRCs and so far
results are good -> no errors whatsoever. I still continue testing but
I suggest others to try the option also.

Regards,
MW

mmap_stress.c

Federico Pietro Briata

unread,
Jan 13, 2011, 6:22:22 AM1/13/11
to si...@googlegroups.com
Hello

2011/1/13 Mika Westerberg <mika.we...@iki.fi>:


> Hello,
>
> I've been investigating this SD-Card problem for a while and I think
> it's time to share my observations.
>
> As we know, 'apt-get update' is very good in triggering the problem.
> I've managed to reduce it into a small test program
> which almost always causes the card to fail. The program is attached
> (mmap_stress.c).

thank you, I'll try out!

Can be also ext2 fs not adapt on sim1? I mean, should I try with btrfs
or ext4 (without jurnal) ?

Those following are my last error, I had it with SD card (2GB, not
sdhc) that Sergio send me...
I saw better result when I move from SDHC to SD
I had only this error and it went out only after an apt-get update
with sources.list very populated...

Fetched 14.4MB in 4min16s (56.1kB/s)
mmcblk0: error -84 transferring data, sector 3210145, nr 48, card
status 0x0
end_request: I/O error, dev mmcblk0, sector 3210145
end_request: I/O error, dev mmcblk0, sector 3210153
end_request: I/O error, dev mmcblk0, sector 3210161
end_request: I/O error, dev mmcblk0, sector 3210169
end_request: I/O error, dev mmcblk0, sector 3210177
end_request: I/O error, dev mmcblk0, sector 3210185
mmcblk0: error -84 transferring data, sector 267, nr 2, card status
0x0
end_request: I/O error, dev mmcblk0, sector 267
Buffer I/O error on device mmcblk0p1, logical block 8
lost page write due to I/O error on mmcblk0p1
------------[ cut here ]------------
WARNING: at fs/buffer.c:1152 mark_buffer_dirty+0x34/0xec()
[<c002c118>] (unwind_backtrace+0x0/0xec) from [<c003e14c>]
(warn_slowpath_common+0x44/0x5c)
[<c003e14c>] (warn_slowpath_common+0x44/0x5c) from [<c003e180>]
(warn_slowpath_null+0x1c/0x24)
[<c003e180>] (warn_slowpath_null+0x1c/0x24) from [<c00be074>]
(mark_buffer_dirty+0x34/0xec)
[<c00be074>] (mark_buffer_dirty+0x34/0xec) from [<c00fa9a4>]
(ext2_new_blocks+0x3e8/0x53c)
[<c00fa9a4>] (ext2_new_blocks+0x3e8/0x53c) from [<c00fe5a0>]
(ext2_get_block+0x3b4/0x78c)
[<c00fe5a0>] (ext2_get_block+0x3b4/0x78c) from [<c00bf284>]
(block_prepare_write+0x1c4/0x4c8)
[<c00bf284>] (block_prepare_write+0x1c4/0x4c8) from [<c00bf71c>]
(block_write_begin+0x48/0x78)
[<c00bf71c>] (block_write_begin+0x48/0x78) from [<c00fd75c>]
(ext2_write_begin+0x3c/0x64)
------------[ cut here ]------------
WARNING: at fs/buffer.c:1152 mark_buffer_dirty+0x34/0xec()
[<c00fd75c>] (ext2_write_begin+0x3c/0x64) from [<c006ae28>]
(generic_file_buffered_write+0xd8/0x240)
[<c002c118>] (unwind_backtrace+0x0/0xec) from [<c003e14c>]
(warn_slowpath_common+0x44/0x5c)
[<c006ae28>] (generic_file_buffered_write+0xd8/0x240) from
[<c006cf2c>] (__generic_file_aio_write+0x460/0x4a4)
[<c003e14c>] (warn_slowpath_common+0x44/0x5c) from [<c003e180>]
(warn_slowpath_null+0x1c/0x24)
[<c006cf2c>] (__generic_file_aio_write+0x460/0x4a4) from [<c006cfdc>]
(generic_file_aio_write+0x6c/0xd4)
[<c003e180>] (warn_slowpath_null+0x1c/0x24) from [<c00be074>]
(mark_buffer_dirty+0x34/0xec)
[<c006cfdc>] (generic_file_aio_write+0x6c/0xd4) from [<c0099798>]
(do_sync_write+0xa8/0xf4)
[<c00be074>] (mark_buffer_dirty+0x34/0xec) from [<c00fa9a4>]
(ext2_new_blocks+0x3e8/0x53c)
[<c0099798>] (do_sync_write+0xa8/0xf4) from [<c009a194>]
(vfs_write+0xb0/0x13c)
[<c009a194>] (vfs_write+0xb0/0x13c) from [<c009a2d4>]
(sys_write+0x40/0x6c)
[<c00fa9a4>] (ext2_new_blocks+0x3e8/0x53c) from [<c00fe5a0>]
(ext2_get_block+0x3b4/0x78c)
[<c009a2d4>] (sys_write+0x40/0x6c) from [<c0026dc0>]
(ret_fast_syscall+0x0/0x2c)
[<c00fe5a0>] (ext2_get_block+0x3b4/0x78c) from [<c00bf284>]
(block_prepare_write+0x1c4/0x4c8)
---[ end trace 2567634ba15c3eaa ]---
[<c00bf284>] (block_prepare_write+0x1c4/0x4c8) from [<c00bf71c>]
(block_write_begin+0x48/0x78)
[<c00bf71c>] (block_write_begin+0x48/0x78) from [<c00fd75c>]
(ext2_write_begin+0x3c/0x64)
[<c00fd75c>] (ext2_write_begin+0x3c/0x64) from [<c006ae28>]
(generic_file_buffered_write+0xd8/0x240)
[<c006ae28>] (generic_file_buffered_write+0xd8/0x240) from
[<c006cf2c>] (__generic_file_aio_write+0x460/0x4a4)
[<c006cf2c>] (__generic_file_aio_write+0x460/0x4a4) from [<c006cfdc>]
(generic_file_aio_write+0x6c/0xd4)
[<c006cfdc>] (generic_file_aio_write+0x6c/0xd4) from [<c0099798>]
(do_sync_write+0xa8/0xf4)
[<c0099798>] (do_sync_write+0xa8/0xf4) from [<c009a194>]
(vfs_write+0xb0/0x13c)
[<c009a194>] (vfs_write+0xb0/0x13c) from [<c009a2d4>]
(sys_write+0x40/0x6c)
[<c009a2d4>] (sys_write+0x40/0x6c) from [<c0026dc0>]
(ret_fast_syscall+0x0/0x2c)
---[ end trace 2567634ba15c3eab ]---
EXT2-fs (mmcblk0p1): error: ext2_fsync: detected IO error when writing
metadata buffers
Reading package lists... Error!
W: GPG error: http://packages.enlightenment.org squeeze Release: The
following signatures couldn't be verified because the public key is
no2
W: GPG error: http://www.debian-multimedia.org squeeze Release: The
following signatures couldn't be verified because the public key is
not7
E: Problem syncing the file - sync (5 Input/output error)
E: The package lists or status file could not be parsed or opened.

westeri

unread,
Jan 13, 2011, 7:11:49 AM1/13/11
to sim1

On Jan 13, 1:22 pm, Federico Pietro Briata <feder...@briata.org>
wrote:
>
> Those following are my last error, I had it with SD card (2GB, not
> sdhc) that Sergio send me...
> I saw better result when I move from SDHC to SD
> I had only this error and it went out only after an apt-get update
> with sources.list very populated...
>
> Fetched 14.4MB in 4min16s (56.1kB/s)
> mmcblk0: error -84 transferring data, sector 3210145, nr 48, card
> status 0x0
> end_request: I/O error, dev mmcblk0, sector 3210145
> end_request: I/O error, dev mmcblk0, sector 3210153
> end_request: I/O error, dev mmcblk0, sector 3210161
> end_request: I/O error, dev mmcblk0, sector 3210169
> end_request: I/O error, dev mmcblk0, sector 3210177
> end_request: I/O error, dev mmcblk0, sector 3210185
> mmcblk0: error -84 transferring data, sector 267, nr 2, card status
> 0x0

Are these errors happening after you set 'mmc_core.use_spi_crc=0'? Or
are these some older ones?

I think (at least if the theory is correct) that it has nothing to do
which filesystem you are using. So ext2/3/4 should be fine.

MW

Mika Westerberg

unread,
Jan 17, 2011, 1:53:08 AM1/17/11
to marti...@gmail.com, si...@googlegroups.com
On Thu, Jan 13, 2011 at 12:07 PM, Mika Westerberg
<mika.we...@iki.fi> wrote:
[...]

>
> I've been running some testing with disabled data CRCs and so far
> results are good -> no errors whatsoever. I still continue testing but
> I suggest others to try the option also.

Ok, I've tested this on all of my SD-Cards and both 'apt-get update'
and 'mmap_stress'
work if we disable the data CRCs. I tested with ext3 and ext4 filesystems.

One can disable the data CRCs with following on the u-boot prompt:

setenv bootargs ${bootargs} mmc_core.use_spi_crc=0
(and optional saveenv)

Above works if you have compiled in MMC support. Otherwise you'll do
it with modprobe like

modprobe mmc_core use_spi_crc=0
(or something similar, I didn't try this)

It would be great if someone tries the options and reports back
whether it helps or not :)

Best regards,
MW

Federico Pietro Briata

unread,
Jan 24, 2011, 6:11:09 PM1/24/11
to si...@googlegroups.com, mika.we...@iki.fi, marti...@gmail.com
Hello Mika

2011/1/17 Mika Westerberg <mika.we...@iki.fi>:


> Ok, I've tested this on all of my SD-Cards and both 'apt-get update'
> and 'mmap_stress'
> work if we disable the data CRCs. I tested with ext3 and ext4 filesystems.
>
> One can disable the data CRCs with following on the u-boot prompt:
>
>    setenv bootargs ${bootargs} mmc_core.use_spi_crc=0
>    (and optional saveenv)

Thank You! That solve the problem! :D
I've added on wiki at
http://code.google.com/p/sim1/wiki/BootLoader

Disabling CRC is a king of workaround for a problem still to fix or
was just uboot miss config?

I tested on btrfs and apt-get update, never went so far... :)

federico

Mika Westerberg

unread,
Jan 25, 2011, 1:43:15 AM1/25/11
to Federico Pietro Briata, si...@googlegroups.com, marti...@gmail.com
On Tue, Jan 25, 2011 at 1:11 AM, Federico Pietro Briata
<fede...@briata.org> wrote:
[...]

>
> Disabling CRC is a king of workaround for a problem still to fix or
> was just uboot miss config?

It is certainly "a workaround" but probably only thing we can do to
prevent failures
with this kind of programs.

Since mmcqd (the kernel thread which handles request queue for the SD-card) can
be scheduled out while it is waiting for the SPI layer to complete a
transfer, it is certainly
possible that some program modifies it's own data (which is currently
transmitted to the
SD-card).

We have some alternatives, though:

1. Fix the programs (yeah, right ;)
2. Make a copy of the TX data and pass that to the SPI layer. This
ensures that no one is
able to modify it while the data is transmitted. This could have
negative effect on performance.

(possibly some more)

In case CRC check is needed, you can use a filesystem which supports checksums.
Btrfs seems to have support for this.

> I tested on btrfs and apt-get update, never went so far... :)

Great :) Thanks for testing.

martinwguy

unread,
Jan 27, 2011, 3:12:27 AM1/27/11
to sim1
On Jan 13, 11:07 am, Mika Westerberg <mika.westerb...@iki.fi> wrote:
> I've been investigating this SD-Card problem for a while and I think
> it's time to share my observations.
>...
> We can prevent CRCs from SD card transfers by passing
> 'mmc_core.use_spi_crc=0' to the kernel.

Sorry for the delay, I've been out of action for a while for personal
reasons.

Many thanks for this. Great work!
I have been unable to provoke write errors when using this flag,
whereas they are still present without it.

Failing to detect data errors is certainly better than reporting write
errors for successful writes!

M

martinwguy

unread,
Jan 27, 2011, 3:18:22 AM1/27/11
to sim1
On Jan 25, 7:43 am, Mika Westerberg <mika.westerb...@iki.fi> wrote:

> We have some alternatives, though:
>
> 1. Fix the programs (yeah, right ;)
> 2. Make a copy of the TX data and pass that to the SPI layer. This
> ensures that no one is
>     able to modify it while the data is transmitted. This could have
> negative effect on performance.

Compute the checksum as the data is transferred, poking the value into
the right place in the data at the last moment?

M

westeri

unread,
Jan 27, 2011, 3:50:18 AM1/27/11
to sim1


On Jan 27, 10:18 am, martinwguy <martinw...@gmail.com> wrote:
[...]
>
> Compute the checksum as the data is transferred, poking the value into
> the right place in the data at the last moment?

Yeah, that could be one way. Unfortunately that is easier said than
done :(

BTW, I have "hackish" version of patches which make the SPI driver to
use DMA
and can post them if someone wants to try. It is currently work in
progress as
I'm doing it on my spare time which implies that it'll take some time
when the
set is ready to be posted on LAKML.

Martin Guy

unread,
Jan 27, 2011, 12:25:39 PM1/27/11
to si...@googlegroups.com
On 27 January 2011 09:50, westeri <mika.we...@gmail.com> wrote:
> BTW, I have "hackish" version of patches which make the SPI driver to
> use DMA
> and can post them if someone wants to try. It is currently work in
> progress as
> I'm doing it on my spare time which implies that it'll take some time
> when the
> set is ready to be posted on LAKML.

Wow! Does that also make the memory-to-memory DMA interface work for
RAM-to-RAM block transfers?
Certainly your work is interesting and I'd be happy to try it when it
"seems to work".

Cheers

M

Dado Sutter

unread,
Jan 27, 2011, 2:27:56 PM1/27/11
to si...@googlegroups.com

Cool !
Until we (eLua) don't have a "DMA module" (it may end up as part of the CPU module too), we could try to use this a a platform-specific module for now.
Good work!

Best
Dado

 

Cheers

   M

Martin Guy

unread,
Jan 27, 2011, 11:28:31 PM1/27/11
to Mika Westerberg, si...@googlegroups.com
On 13 January 2011 11:07, Mika Westerberg <mika.we...@iki.fi> wrote:
> I've been running some testing with disabled data CRCs and so far
> results are good -> no errors whatsoever. I still continue testing but
> I suggest others to try the option also.

Hi
To check for how likely undetected data errors are I've run stress
tests writing large amounts of random data to cards, then reading it
back in and checking that it is what it should be.
With 640MB on SDHC and plain SD cards and 128MB on old MMC, there
are NO data errors. 100% OK.

Cheers

M

Mika Westerberg

unread,
Jan 28, 2011, 1:54:31 AM1/28/11
to Martin Guy, si...@googlegroups.com
On Fri, Jan 28, 2011 at 6:28 AM, Martin Guy <marti...@gmail.com> wrote:
>   With 640MB on SDHC and plain SD cards and 128MB on old MMC, there
> are NO data errors. 100% OK.

Cool, that's impressive :)

And if there is still need to make sure that no corruptions occur,
people can use some filesystem which makes CRCs of the data
and metadata.

Thanks,
MW

westeri

unread,
Jan 28, 2011, 2:01:54 AM1/28/11
to sim1


On Jan 27, 7:25 pm, Martin Guy <martinw...@gmail.com> wrote:
> On 27 January 2011 09:50, westeri <mika.westerb...@gmail.com> wrote:
>
> Wow! Does that also make the memory-to-memory DMA interface work for
> RAM-to-RAM block transfers?

The M2M DMA API is not yet finished but I will add such support, yes.
Only thing which is going to be missing from the API is DMA to/from
external devices (EXT_DREQ). That is because I don't have any such
hardware to test.

> Certainly your work is interesting and I'd be happy to try it when it
> "seems to work".

Currently it already works but looks horrible. I'll try to polish the
patches during weekend and hope to send them to you when done.

Regards,
MW

Martin Guy

unread,
Jan 30, 2011, 7:39:52 PM1/30/11
to si...@googlegroups.com, linux-...@freelists.org
---------- Messaggio inoltrato ----------
Da: "Mika Westerberg" <mika.we...@iki.fi>
Data: 30/gen/2011 19.13
Oggetto: Re: More about SD-Card problems
A: "Martin Guy" <marti...@gmail.com>

On Sat, Jan 29, 2011 at 01:15:24PM +0100, Martin Guy wrote:
>
> Just completed 3.3GB of data. Again, not a single error.

Sounds great! So basically we don't need to worry about the CRCs since it is
extremely unlikely that we get corrupted data transfer.

===

The M2M DMA patches are attached. Note that it is still in "hack" phase so
error handling etc. are not finalized at all. M2M DMA currently only supports
SPI but I'm going to add that memory-to-memory support and possibly IDE, let's
see. It currently doesn't use double buffering but that is going to be added at
some point.

I've been developing on .38-rc2 kernel but since these patches touch only ep93xx
stuff I believe that they should apply pretty easily to .36.

Once you have applied the patches, you can enable the DMA support like:

diff --git a/arch/arm/mach-ep93xx/simone.c b/arch/arm/mach-ep93xx/simone.c
index 0f44123..2d12f35 100644
--- a/arch/arm/mach-ep93xx/simone.c
+++ b/arch/arm/mach-ep93xx/simone.c
@@ -161,6 +161,7 @@ static struct spi_board_info simone_spi_devices[] __initdata = {

 static struct ep93xx_spi_info simone_spi_info __initdata = {
       .num_chipselect = ARRAY_SIZE(simone_spi_devices),
+       .use_dma        = true,
 };

After this, all the transfers should use DMA. I'm not sure if it is the best way
since setting up the DMA channel for 1 byte transfer sounds like overkill. I
think that we should probably use PIO for smaller transfers and DMA for the
larger ones.

I have been testing this on Sim.One with mmc_spi and on TS-7260 attached to a
SPI EEPROM (at25).

There are probably plenty of bugs lurking around so make sure that you have
your data backed up ;-)

Regards,
MW
0001-ep93xx-add-memory-to-memory-DMA-support.patch
0002-spi-ep93xx-add-DMA-support.patch
Reply all
Reply to author
Forward
0 new messages