SD-card multiple block write problem

763 views
Skip to first unread message

bob.fe...@rafresearch.com

unread,
Jun 25, 2018, 7:16:36 PM6/25/18
to NuttX
In order to improve the speed of SD-card writing I want to perform multi-block writes.
The generic NuttX code and the stm32f device code seem to support it, even permitting DMA from the user's buffers (if certain alignment and size conditions are met).
I am testing the writing very large files, 10 to 100 MB files.

When I write 512-byte buffers (uses single block writes) everything is fine.
When I try to write 16 KB buffers (uses multiple block writes) I see...
mmcsd_geometry: Entry
mmcsd_geometry: available: true mediachanged: false writeenabled: true
mmcsd_geometry: nsectors: 7774208 sectorsize: 512
mmcsd_write: sector: 13232 nsectors: 32 sectorsize: 512
mmcsd_writemultiple: startblock=13232 nblocks=32
mmcsd_writemultiple: nbytes=16384 byte offset=13232
mmcsd_eventwait: ERROR: Awakened with 0c
mmcsd_writemultiple: ERROR: CMD25 transfer failed: -116
Write error occurred at 0/37 (MB/16KB).

Error 116 is ETIMEDOUT and occurs when the sdio driver "data delay" value decrements to 0.
The above test was run on a Class 10 SDHC card (4GB) the failure occurred after 36 16KB buffers were written successfully.

(By the way, in order to get this far I applied this patch...
diff --git a/arch/arm/src/stm32f7/stm32_sdmmc.c b/arch/arm/src/stm32f7/stm32_sdmmc.c
index 86481dd..45638e8 100644
--- a/arch/arm/src/stm32f7/stm32_sdmmc.c
+++ b/arch/arm/src/stm32f7/stm32_sdmmc.c
@@ -2201,7 +2201,9 @@ static int stm32_sendsetup(FAR struct sdio_dev_s *dev, FAR const
   /* Then set up the SDIO data path */
 
   dblocksize = stm32_log2(priv->blocksize) << STM32_SDMMC_DCTRL_DBLOCKSIZE_SHIFT;
-  stm32_dataconfig(priv, SDMMC_DTIMER_DATATIMEOUT, nbytes, dblocksize);
+ //BobChange:
+//  stm32_dataconfig(priv, SDMMC_DTIMER_DATATIMEOUT, nbytes, dblocksize);
+  stm32_dataconfig(priv, 0xffffffff, nbytes, dblocksize); // was 0x000fffff

Before the patch the same failures would occur after writing about 8 16K buffers.)
I have tried different cards. they all fail. The same error occurred using an U1 class 32 GB card.
That card was able to transfer a little over a Megabyte (~64 16KB buffers).
 )

It is suspicious that the failures occur after a power of 2 number of 16K buffers are transferred.
The significance of a "power of 2" could be that the SD-card is doing something normal that just takes longer.

The driver seems to be trying to DMA the entire 16KBs of data at once and is expecting the SDIO to cut it into 32  512-byte (plus CRC) packets.
It seems that this is working, but sometimes inter-packet "wait for ready" delays are very very long and exceed the sdio addapter's ability to wait.
Has anyone else had problems or success writing more than 512 bytes at a time to a SD-card?
-Bob

bob.fe...@rafresearch.com

unread,
Jul 20, 2018, 3:39:39 PM7/20/18
to NuttX
The timeout failure was caused by a bug in the arch/arm/stm32f7 code.
That bug was fixed and the patch was accepted.

Another bug was found in drivers/mmcsd_sdio.c. This bug resulted in lower write performance on all architectures using the sdio adapter.
This bug also was fixed and the patch was accepted.

Using writes from a 16K buffer (32 sectors), I am now seeing error free sdio multi-sector write operations. Using new 4GB Class 10 SDHC cards, I observe 3+ MB/sec write rates. Using well worn Class 10 SDHC cards I observe 1.5 to 1.9 MB/sec write rates,

The same SD-cards measure only 50 to 100 KB/sec when using 512 byte writes.

-Bob

Ramtin Amin

unread,
Jul 21, 2018, 6:38:20 AM7/21/18
to Bob Feretich, nu...@googlegroups.com
great!
For the clock freq of the SD, are you in bypass mode ? (48Mhz ?)


--
You received this message because you are subscribed to the Google Groups "NuttX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

bob.fe...@rafresearch.com

unread,
Jul 21, 2018, 3:57:45 PM7/21/18
to NuttX
No, for that test I used  a 16 MHz clock. (It was set up for an external SD-card socket breakout board.)

I just tested again using a 24 MHz clock on a board with a SD-card socket,
Doubling the clock speed raised the transfer rate a little bit to just over 2 MB/sec.
 
Most of the SD-card delay is erase and program time, not in the data transfer.

Ramtin Amin

unread,
Jul 21, 2018, 6:54:43 PM7/21/18
to nu...@googlegroups.com
Ok got it.
So as Greg removed your last commit for posix violation, are you planning to find a way to make it posix complient ?


spudaneco

unread,
Jul 21, 2018, 7:10:09 PM7/21/18
to nu...@googlegroups.com
It is not purely a matter of low level POSIX compliance, it violates the POSIX device driver model at the highest levels.  I will never accept any logic of that nature in any form into the OS.  Even if were made POSIX compliant at some other lower level.

Any PR/patch of the nature will be rejected and I suggest you not waste your time or suffer the bad feelings that inevitably follow the rejection.



Sent from Samsung tablet.

patacongo

unread,
Jul 21, 2018, 8:28:04 PM7/21/18
to nu...@googlegroups.com
My understanding (and I am sure that Bob will correct me if I am wrong) is that the rejected change was not directly related to SD card performance.  It was indirectly related in that it let you achieve multi-block transfers with smaller buffers in the user application.

With small buffers, large multi-block transfers cannot be performed.  You have to use large buffers when you call write() to get the performance.

The POSIX driver interface is our old friends open, close, read, write, etc.  That change ignored the POSIX interface for write.  I assume that a small write was required to start the transfer, but then the driver opened a "back channel" directly into the application and requested additional data to transfer through this side channel.

There is nothing you can do to make the POSIX.  It is inherently non-POSIX.

And nothing like that is ever going to make it into the NuttX repositories.  If you want the performance, use larger I/O buffers but please do not ask me to bastardize the strict POSIX  interface.  I well never bend on that topic and only bad feelings can result.

Using FAT, I don't think you can improve that with buffers any larger than a cluster size (assuming that your writes a always aligned to a cluster).

If you don't like POSIX, please consider switching to a different operating system with lower standards.


bob.fe...@rafresearch.com

unread,
Jul 21, 2018, 9:45:27 PM7/21/18
to NuttX
The part of the patch that improves performance was accepted.

Greg rejected a part that permitted applications to enjoy the performance improvement without having to allocate large buffers. This violated "strict POSIX compliance". I submitted that patch privately to him to receive his guidance on whether he though my "reinterpretation" of the POSIX write() was within bounds. It was not. Like the header to this Google Groups page states, strict compliance with POSIX is a founding premise of Nuttx.

The test I performed used a 16KB write buffer to write 32 sectors at a time. Writing 32-sectors like this (using Multiple Block Write (CMD25)) takes only a little more time than writing a single sector.

Ramtin Amin

unread,
Jul 22, 2018, 5:46:00 AM7/22/18
to nu...@googlegroups.com
In fact I didn't read the patch, but as I modified stm32f7 stm32_sdmmc.c in order to get sdio in it, I was wondering what could be common to SDIO (used for the wifi) in there.
But if we already have an improvement for sdcard, it's great. 

patacongo

unread,
Jul 22, 2018, 12:02:39 PM7/22/18
to NuttX

In fact I didn't read the patch, but as I modified stm32f7 stm32_sdmmc.c in order to get sdio in it, I was wondering what could be common to SDIO (used for the wifi) in there.
But if we already have an improvement for sdcard, it's great.

This change may only applies only to SD cards since it has to do with tuning SD card multiblock transfers.  Does that apply at at the wireless SDIO interface?

Even if so, it is difficult to extrapolate the performance improvements that Bob report to any normal use of SDIO for FAT filesystem or networking support.  Bob was using a 16Kb data buffer and transferring directly to the MMCSD_SDIO block driver.  That is a good test for measuring peak performance and would apply if, for example, you are streaming raw data to an SD card with no file system.

With typcial SD card access via a FAT file system, however, the performance will depend many things but this this discussion on the number of sectors per cluster is the critical factor.  Clusters may not be contiguous, but the sectors within the cluster will be and that is the limit on size of the multiblock transfer.  Larger drives will get larger cluster sizes.  But to get something comparable to Bob's test, you would have to have a cluster of 32x512 sectors.  That would correspond to a 32Gb or larger FLASH drive  (see https://support.microsoft.com/en-us/help/140365/default-cluster-size-for-ntfs-fat-and-exfat) and that is only supported on only a few platforms.  Even so, it would be difficult to duplicate the performance because of other issues (such as data alignment with respect to clusters).

For wireless, it is probably even worse.  I assume that you wouldl never perform a transfer of more 1516 bytes (1500 MTU + 14 Ethernet + 2 FCS).

bob.fe...@rafresearch.com

unread,
Jul 22, 2018, 1:33:15 PM7/22/18
to NuttX
I didn't know that wireless had a connection to sdio.

There were four parts to patch in mmcsd_sdio.c mmcsd_writemultiple() were...
1) The performance improvement.
mmcsd_sendcmdpoll(priv, SD_ACMD23, 0);
was changed to...
mmcsd_sendcmdpoll(priv, SD_ACMD23, nblocks);

This code is surrounded by...
if (IS_SD(priv->type)) { ...}

So, it only affects SD-cards.

2)Better error recovery. (shown condensed)

if (evret != OK) { ferr("ERROR: CMD25 transfer failed: %d\n", evret); /* If we return from here, we probably leave the sd-card in * Receive-data State. Instead, we will remember that * an error occurred and try to execute the STOP_TRANSMISSION * to put the sd-card back into Transfer State. */ } /* Send STOP_TRANSMISSION */ ret = mmcsd_stoptransmission(priv); if (evret != OK) { return evret; } if (ret != OK) { ferr("ERROR: mmcsd_stoptransmission failed: %d\n", ret); return ret; }
3) SD-card command name corrections in comments.

4) The alternate version of
mmcsd_writemultiple(), this part was not accepted.

Jedi Tek'Unum

unread,
Jul 23, 2018, 10:57:28 AM7/23/18
to NuttX
I apologize that I haven't really been following this thread and therefore I probably shouldn't stick my nose in :)

On Saturday, July 21, 2018 at 7:28:04 PM UTC-5, patacongo wrote:
With small buffers, large multi-block transfers cannot be performed.  You have to use large buffers when you call write() to get the performance.

The POSIX driver interface is our old friends open, close, read, write, etc.  That change ignored the POSIX interface for write.  I assume that a small write was required to start the transfer, but then the driver opened a "back channel" directly into the application and requested additional data to transfer through this side channel.

What about writev/readv? Gather/scatter, or sometimes called vector I/O, is exactly for this situation. Typically they break down into chained DMA.

They are POSIX 1003.1-2001.

patacongo

unread,
Jul 23, 2018, 11:02:16 AM7/23/18
to NuttX

What about writev/readv? Gather/scatter, or sometimes called vector I/O, is exactly for this situation. Typically they break down into chained DMA.

They are implemented.  But currently, they just call read/write multiple times.

In NuttX, read/write are the fundamental POSIX I/O calls and readv/writev are built on top of them so they cannot improve performance.

Linux does this better:  readv/writev are the fundamental driver I/O methods and so can exploit hardware-specific scatter-gather optimizations and read/write are built on top of readv/writev.

Greg

Reply all
Reply to author
Forward
0 new messages