On 07/04/2012 11:31 AM, Paul Stoffregen wrote:
> Thanks for the tip Dave. I had no idea SD card random write performance
> varies by about a factor of 100 between different cards.
I haven't been following this thread, but I am interested in why exactly
it is that random writes suck so badly.
What follows is just a random dump of thoughts on the matter though.....
I highly doubt it's something to do with the actual SD interface, as
it's basically just an SPI or quad-SPI connection. Is it something to
do with the host polling the card for page-erase completion, in which
case it might be mitigated with a tuneable on the host? I wouldn't
think it's that either though since there's such variation between
cards, unless the card suggests a polling period. I know too little
about the protocol as far as that goes.
Thinking about it more, I'm guessing it has more to do with flash page
size and rewrite times. If you only write 512b to the card and the
flash page size is bigger than that, it'll be forced to read that whole
page, erase it, and then rewrite the merge of the old data and the new
sector.
You'd think that since 4K is a common flash page size, and the random
writes are 4K, that you'd be fine. However, unless the controller is
smart enough to start the read-erase-write cycle on the first sector,
then *notice* that the next sector is sequential and try to coalesce the
writes, it's going to end up doing *8* r-e-w cycles every single time.
Going to a 4K sector size could eliminate the problem, but leading-edge
SATA hard drives are only just now supporting 4K, I wouldn't expect SD
cards to do it yet. Maybe that's what the higher class cards are
capable of?
Looking into the SD architecture a bit, it looks like the picture is
even worse. They're arranged in up to 4096 "clusters" with up to 512
"blocks" per, where originally a block = 512b sector, but later could be
1k or 2k. Either way, the *cluster* is the flash page erase size, which
means you could have a erase size of 1MB. That makes the 8x r-e-w
cycles need to "randomly" write a 4KB block suck really bad, but it does
also strongly imply that sequential erase clustering is generally done,
because that'd the only way you'd get decent sequential write
performance (get write of first sector, read and erase entire cluster,
get write of sector++ and *not* restart the erase, repeat for entire
cluster).
As far a usage on a Nook or Pi, I suspect your killer is atime. Be
default (at least historically) the filesystems are set to record the
timestamp of every *read* access to every file, which means the inode of
every file has to be subtly altered. This will cause a *massive* number
of small writes especially on boot, which could explain what's going on.
Setting 'noatime' on all filesystems would help that significantly.
Another probably better option, and where TRIM comes into play again,
would be to use something like JFFS2 on the SD card. Such filesystems
are *extremely* aware of the physical structure of flash-based devices,
to the point where they "waste" space by trying to treat the device as
"append-only". Rather than erasing and rewriting data repeatedly in a
given sector of the hardware, it will "journal" the data changes in such
a way as to almost totally eliminate read-modify[erase]-write cycles.
When you have a piece of hardware with a hard-wired flash chip on the
board (rather than an "ATA" protocol to an SD card), JFFS2 will actively
pre-erase as much of the flash space as it can, sequentially, in order
to make writes nearly instantaneous (since erases take *far* longer than
the actual write). For an SD card or even a modern SSD, that's where
the TRIM command would come into play. However, I have no idea if the
SD protocol has any provision for that.
The main difference in this case between a SD card and a larger SSD
would be that the SSD has far more resources at its disposal in order to
reorder and map the logical sectors into actual physical flash pages.
An SSD will wear-level in such a way that the mapping becomes totally
chaotic across the entire device, whereas an SD card *might* wear-level
in very small blocks (because of silicon-size constraints, basically).
Because SSDs actually have a little bit of headroom relative to their
advertised size, they can do a *little* bit of pre-erase even without
TRIM. SD's not so much.
Anyway, enough rambling for now, back to cleaning the house.....