Amtel SAM9 "boot from NAND" is a myth?

Grant Edwards

unread,

Sep 15, 2010, 3:23:39 PM9/15/10

to

We recently based a board on an Atmel AT91SAM9G20 part which the FAE
and rep said could boot from NAND flash. The eval board can indeed be
configured to boot from NAND flash. However, when it comes time to
spec parts for a real product, we find that's all smoke and mirrors.

The Atmel SAM9 parts require that block 0 be completely free of bit
errors since the ROM bootloader doesn't do ECC (despite the fact that
the part does have hardware ECC support). So you have to use a NAND
flash that guarantees a good block 0 _without_using_ECC_. It turns
out those sorts of NAND flash parts appear to be made of a combination
of Unicorn's horn and Unobtanium. IOW, they don't exist. At least
that's what the flash distributor and rep tell us.

What was Amtel thinking when they decided not to do ECC when reading
NAND flash? I realize Atmel doesn't make NAND flash, but surely they
must have been aware that NAND flash parts aren't spec'ed to be
fault-free by the flash vendors.

My opinion? It's a way for Atmel to suck you in and then after you
get the unpleasant surprise that you _can't_ boot from NAND, they try
to sell you a serial dataflash part you don't really want.

OTOH, TI did it right in their OMAP parts: not only does the
bootloader do ECC, it also will skip blocks that have uncorrectable
errors.

Atmel: Block 0 must be good without ECC

TI: _Any_ of blocks 0,1,2,3 must be good _with_ ECC

Which do you think is going to work better?

--
Grant Edwards grant.b.edwards Yow! It's a lot of fun
at being alive ... I wonder if
gmail.com my bed is made?!?

Jon Kirwan

unread,

Sep 15, 2010, 3:49:23 PM9/15/10

to

Cripes. Hopefully, Ulf will take a moment to provide a frank
and complete response to your experiences here.

Jon

Tim Wescott

unread,

Sep 15, 2010, 4:10:42 PM9/15/10

to

On 09/15/2010 12:23 PM, Grant Edwards wrote:
> We recently based a board on an Atmel AT91SAM9G20 part which the FAE
> and rep said could boot from NAND flash. The eval board can indeed be
> configured to boot from NAND flash. However, when it comes time to
> spec parts for a real product, we find that's all smoke and mirrors.
>
> The Atmel SAM9 parts require that block 0 be completely free of bit
> errors since the ROM bootloader doesn't do ECC (despite the fact that
> the part does have hardware ECC support). So you have to use a NAND
> flash that guarantees a good block 0 _without_using_ECC_. It turns
> out those sorts of NAND flash parts appear to be made of a combination
> of Unicorn's horn and Unobtanium. IOW, they don't exist. At least
> that's what the flash distributor and rep tell us.
>
> What was Amtel thinking when they decided not to do ECC when reading
> NAND flash? I realize Atmel doesn't make NAND flash, but surely they
> must have been aware that NAND flash parts aren't spec'ed to be
> fault-free by the flash vendors.
>
> My opinion? It's a way for Atmel to suck you in and then after you
> get the unpleasant surprise that you _can't_ boot from NAND, they try
> to sell you a serial dataflash part you don't really want.

Product Marketing:
"We need to boot from NAND flash"

Engineering:
"We're six months away from tape out, you can't add a new feature now"

Product Marketing:
"Men women and children will die if you don't make this work"

Engineering:
"Well, get the executive cadre to move tape out by nine months and we'll
do the job right"

Product Marketing:
"We can't do that. Just make it work or you're all fired".

Engineering:
"OK..."

> OTOH, TI did it right in their OMAP parts: not only does the
> bootloader do ECC, it also will skip blocks that have uncorrectable
> errors.
>
> Atmel: Block 0 must be good without ECC
>
> TI: _Any_ of blocks 0,1,2,3 must be good _with_ ECC
>
> Which do you think is going to work better?
>

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" was written for you.
See details at http://www.wescottdesign.com/actfes/actfes.html

rickman

unread,

Sep 15, 2010, 6:19:37 PM9/15/10

to

Which Dilbert book is this from? Only trouble is, marketing can't
fire anyone in engineering... as much as they might like to.

This sort of program uses all manner of concept, requirement and
specification documents. But that is no replacement for thinking.
This is just something that no one thought about at the time.

Maybe I"m showing my ignorance, but I thought the ECC was ON the flash
part and was hidden from the MCU. Is that being turned off or did I
miss something in how NAND flash works?

Rick

Tim Wescott

unread,

Sep 15, 2010, 7:09:49 PM9/15/10

to

The last time I looked at NAND flash the ECC part was just extra bits in
each record -- it was your job to fill them in on writes and interpret
them on reads.

Of course, that was 5-10 years ago.

Grant Edwards

unread,

Sep 15, 2010, 7:46:11 PM9/15/10

to

On 2010-09-15, Tim Wescott <t...@seemywebsite.com> wrote:
> On 09/15/2010 03:19 PM, rickman wrote:
>> On Sep 15, 4:10 pm, Tim Wescott<t...@seemywebsite.com> wrote:
>>> On 09/15/2010 12:23 PM, Grant Edwards wrote:

>>>> What was Amtel thinking when they decided not to do ECC when reading
>>>> NAND flash? I realize Atmel doesn't make NAND flash, but surely they
>>>> must have been aware that NAND flash parts aren't spec'ed to be
>>>> fault-free by the flash vendors.
>>>
>>>> My opinion? It's a way for Atmel to suck you in and then after you
>>>> get the unpleasant surprise that you _can't_ boot from NAND, they try
>>>> to sell you a serial dataflash part you don't really want.

>> This sort of program uses all manner of concept, requirement and

>> specification documents. But that is no replacement for thinking.
>> This is just something that no one thought about at the time.

Then somebody screwed up, becuase it was their job to think about that
sort of thing.

>> Maybe I"m showing my ignorance, but I thought the ECC was ON the
>> flash part and was hidden from the MCU.

Not in any of the NAND flash datasheets I've seen.

>> Is that being turned off or did I miss something in how NAND flash
>> works?
>
> The last time I looked at NAND flash the ECC part was just extra bits
> in each record -- it was your job to fill them in on writes and
> interpret them on reads.

Yup. And the NAND controller in the SAM9 part _has_ hardware that
generates the ECC bits when you do a write operation. All you hvae to
do is write them into the "extra" bits in the flash after you've
written the data. Reading is a bit more complex, since there's extra
work required to correct an error when one is detected.

> Of course, that was 5-10 years ago.

It's still that way. There may be NAND parts that do ECC themselves,
but I've never seen one.

--
Grant

Marc Jet

unread,

Sep 16, 2010, 5:43:09 AM9/16/10

to

Grant Edwards wrote:
> The Atmel SAM9 parts require that block 0 be completely free of bit
> errors since the ROM bootloader doesn't do ECC (despite the fact that
> the part does have hardware ECC support). So you have to use a NAND
> flash that guarantees a good block 0 _without_using_ECC_. It turns
> out those sorts of NAND flash parts appear to be made of a combination
> of Unicorn's horn and Unobtanium. IOW, they don't exist. At least
> that's what the flash distributor and rep tell us.

For what it's worth, Micron MT29FxGxx has a guaranteed good block 0.

larwe

unread,

Sep 16, 2010, 9:52:09 AM9/16/10

to

On Sep 15, 3:23 pm, Grant Edwards <inva...@invalid.invalid> wrote:

> The Atmel SAM9 parts require that block 0 be completely free of bit
> errors since the ROM bootloader doesn't do ECC (despite the fact that

This is not unprecedented. In fact the original SSFDC specification, I
believe, had the same requirement (it has been a VERY long time, but
IIRC the first block had to contain the CIS, and it had to be 100%
free of errors, both correctable and uncorrectable).

We're shipping a product based on a micro with the same limitation. I
believe bad block 0s are treated as a normal production yield issue
(the parts are programmed in-circuit).

I also think the i.MXxxxxxxxx family from Freescale has the same
requirement. Anyway it's definitely not the first time I've heard it
mentioned.

> My opinion? It's a way for Atmel to suck you in and then after you
> get the unpleasant surprise that you _can't_ boot from NAND, they try
> to sell you a serial dataflash part you don't really want.

Never attribute to malice that which can adequately be explained by
stupidity. And I doubt DataFlash is a major revenue generator for
Atmel.

larwe

unread,

Sep 16, 2010, 9:54:03 AM9/16/10

to

On Sep 15, 7:46 pm, Grant Edwards <inva...@invalid.invalid> wrote:

> It's still that way. There may be NAND parts that do ECC themselves,
> but I've never seen one.

The idea of NAND is cheap bulk storage, no on-chip smarts, move the
cost into the main micro.

Anders....@kapsi.spam.stop.fi.invalid

unread,

Sep 16, 2010, 11:11:17 AM9/16/10

to

Grant Edwards <inv...@invalid.invalid> wrote:
> The Atmel SAM9 parts require that block 0 be completely free of bit
> errors since the ROM bootloader doesn't do ECC (despite the fact that
> the part does have hardware ECC support).

At least Samsung's S3C2440A has the same requirement.

-a

Grant Edwards

unread,

Sep 16, 2010, 11:28:24 AM9/16/10

to

At some point in the past, perhaps it was thought that bit-error-free
block 0 was going to be a common spec for NAND flash parts?

Apparently nobody told the guys running the flash fabs...

I have found a datasheet for one 1.8V NAND flash that guarantees a
good block zero. Whether the parts are actually available or not I
don't know.

Unfortunately we want to run the flash at 3.3V, and the
distributors/FAEs we've talked to have been unable to _any_ 3.3V part
thats meets Atmel's defect-free block 0 requirement. Now the choice
is between adding a SPI flash for booting or adding 1.8V rail plus
buying more expensive 1.8V SDRAM along with bus transceivers to talk
to the required 3.3V peripherals.

--
Grant Edwards grant.b.edwards Yow! I'm imagining a surfer
at van filled with soy sauce!
gmail.com

tim....

unread,

Sep 16, 2010, 12:18:08 PM9/16/10

to

"Grant Edwards" <inv...@invalid.invalid> wrote in message
news:i6r6fr$rm2$1...@reader1.panix.com...

> We recently based a board on an Atmel AT91SAM9G20 part which the FAE
> and rep said could boot from NAND flash. The eval board can indeed be
> configured to boot from NAND flash. However, when it comes time to
> spec parts for a real product, we find that's all smoke and mirrors.
>
> The Atmel SAM9 parts require that block 0 be completely free of bit

But don't all manufactures guarantee this?

When I worked on NAND that as the spec -

Block 0 was guaranteed to work (so that you could boot from it) and all the
rest could be replaced by error checking.

devices with errors in block 0 are scrap.

tim

Grant Edwards

unread,

Sep 16, 2010, 12:40:23 PM9/16/10

to

On 2010-09-16, tim.... <tims_n...@yahoo.co.uk> wrote:
>
> "Grant Edwards" <inv...@invalid.invalid> wrote in message
> news:i6r6fr$rm2$1...@reader1.panix.com...
>> We recently based a board on an Atmel AT91SAM9G20 part which the FAE
>> and rep said could boot from NAND flash. The eval board can indeed be
>> configured to boot from NAND flash. However, when it comes time to
>> spec parts for a real product, we find that's all smoke and mirrors.
>>
>> The Atmel SAM9 parts require that block 0 be completely free of bit
>
> But don't all manufactures guarantee this?

Nope.

For example, from Micron's datasheet for the MT29F1G08/MT29F1G16:

* Blocks 0\u20137 (block address 00h-07h) guaranteed to be valid
with ECC when shipped from factory (3.3V only); see Error
Management (page 83).

* Blocks 0\u20133 (block address 00h-03h) guaranteed to be valid
with ECC when shipped from factory (1.8V only); see Error
Management (page 83).

From p83:

The first block (physical block address 00h) for each CE# is
guaranteed to be valid with ECC when shipped from the factory.

Blocks 0-7 (block address 00h-40h) guaranteed to be valid with ECC
when shipped from factory (3.3V only). Blocks 0-3 (block address
00h-40h) guaranteed to be valid with ECC when shipped from factory
(1.8V only).

> When I worked on NAND that as the spec -

That doesn't currently seem to be the case.

> Block 0 was guaranteed to work (so that you could boot from it) and
> all the rest could be replaced by error checking.

Makes sense to me.

> devices with errors in block 0 are scrap.

Not any more.

From what the local Arrow guys have told us, it seems most newer parts
don't guarantee block 0 good unless you're doing ECC that can correct
a single-bit error.

[A few minutes ago, our Arrow guy just sent us a Toshiba datasheet for
a 3.3V 1Gb part that does guarantee good block 0.]

--
Grant Edwards grant.b.edwards Yow! I want the presidency
at so bad I can already taste
gmail.com the hors d'oeuvres.

Anders....@kapsi.spam.stop.fi.invalid

unread,

Sep 16, 2010, 12:41:16 PM9/16/10

to

Grant Edwards <inv...@invalid.invalid> wrote:
> Unfortunately we want to run the flash at 3.3V, and the
> distributors/FAEs we've talked to have been unable to _any_ 3.3V part
> thats meets Atmel's defect-free block 0 requirement.

Coincidentally Samsung's own K9F1208 is a 3.3V part and guarantees an
error-free block zero. It's not advertised as a feature, but reading the
datasheet you find this:

"The 1st block, which is placed on 00h block address, is guaranteed to
be a valid block, does not require Error Correction up to 1K program/erase
cycles."

I don't have enough experience with NAND flash to say if this is a common
feature or not. (I'm only aware of the S3C2440 and this flash via the
FriendlyARM Mini2440 board.)

-a

Wil Taphoorn

unread,

Sep 16, 2010, 1:04:47 PM9/16/10

to

On 16-9-2010 18:40, Grant Edwards wrote:
> On 2010-09-16, tim.... <tims_n...@yahoo.co.uk> wrote:
>>
>> But don't all manufactures guarantee this?
>
> Nope.

See what they (Hynix, Intel, Micron, Phison, SanDisk, Sony, Spansion)
have spec'ed about NAND at http://www.onfi.org
One of the specs is that NAND's *must* have at least one guaranteed
valid block starting at address 0.

> For example, from Micron's datasheet for the MT29F1G08/MT29F1G16:
>
> * Blocks 0\u20137 (block address 00h-07h) guaranteed to be valid
> with ECC when shipped from factory (3.3V only); see Error
> Management (page 83).

From the Hynix HY27USXX561A data sheet:
- 3.3V device: VCC = 2.7 to 3.6V :
- The 1st block is guaranteed to be a valid block up to 1K cycles
without ECC

--
Wil

Stefan Reuther

unread,

Sep 16, 2010, 12:46:15 PM9/16/10

to

All "hardware ECC support" I have seen so far is useless for anything
but older, smaller SLC parts. "Hardware ECC support" is doing a Hamming
code in hardware, which can correct a single bit error. Current large
SLC parts, and MLC parts, need a 4- or even 8-bit-correcting code.

Thus, "NAND flash support" is another piece for glossy brochures. A
regular memory interface and an interrupt-capable GPIO is enough to
access NAND flash.

>>Of course, that was 5-10 years ago.
>
> It's still that way. There may be NAND parts that do ECC themselves,
> but I've never seen one.

I haven't seen anything but announcements (or fully-fledged SSDs /
memory cards).

Stefan

Grant Edwards

unread,

Sep 16, 2010, 2:46:14 PM9/16/10

to

On 2010-09-16, Wil Taphoorn <w...@nogo.wtms.nl> wrote:
> On 16-9-2010 18:40, Grant Edwards wrote:
>> On 2010-09-16, tim.... <tims_n...@yahoo.co.uk> wrote:
>>>
>>> But don't all manufactures guarantee this?
>>
>> Nope.
>
> See what they (Hynix, Intel, Micron, Phison, SanDisk, Sony, Spansion)
> have spec'ed about NAND at http://www.onfi.org
>
> One of the specs is that NAND's *must* have at least one guaranteed
> valid block starting at address 0.

Can you please point to that in the spec?

All I can find is the definition for one of the fields in the
parameter block that tells how many valid blocks there are at the
beginning of the device:

Version 2.0:

5.6.1.20. Byte 107

Guaranteed valid blocks at beginning of target This field
indicates the number of guaranteed valid blocks starting at
block address 0 of the target. The minimum value for this field
is 1h. The blocks are guaranteed to be valid for the endurance
specified for this area (see section 5.6.1.21) when the host
follows the specified number of bits to correct.

Versions 2.1,2.2,2.3:

5.6.1.22. Byte 107

Guaranteed valid blocks at beginning of target This field
indicates the number of guaranteed valid blocks starting at
block address 0 of the target. The minimum value for this field
is 1h. The blocks are guaranteed to be valid for the endurance
specified for this area (see section 5.6.1.23) when the host
follows the specified number of bits to correct.

Nowhere does it say this number can't be 0, and it explicitly says
they can require that the host do ECC for those "garanteed valid"
blocks.

>> For example, from Micron's datasheet for the MT29F1G08/MT29F1G16:
>>

>> * Blocks 0 (block address 00h-07h) guaranteed to be valid

>> with ECC when shipped from factory (3.3V only); see Error
>> Management (page 83).
>
> From the Hynix HY27USXX561A data sheet:
> - 3.3V device: VCC = 2.7 to 3.6V :
> - The 1st block is guaranteed to be a valid block up to 1K cycles
> without ECC

Yup, I just found that one about 10 minutes ago. Unfortunately, Hynix
has no local distributors and we're not large enough that Hynix sell
to us, and we've been burned recently buying gray-market flash.

I've also just found a couple Micron datasheets that spec block 0 to
be valid without ECC, but apparently those parts aren't available
according to the local Micron rep.

--
Grant Edwards grant.b.edwards Yow! This is a NO-FRILLS
at flight -- hold th' CANADIAN
gmail.com BACON!!

Grant Edwards

unread,

Sep 16, 2010, 2:54:28 PM9/16/10

to

On 2010-09-16, Anders....@kapsi.spam.stop.fi.invalid <Anders....@kapsi.spam.stop.fi.invalid> wrote:

> Grant Edwards <inv...@invalid.invalid> wrote:
>
>> Unfortunately we want to run the flash at 3.3V, and the

>> distributors/FAEs we've talked to have been unable to find _any_ 3.3V

>> part thats meets Atmel's defect-free block 0 requirement.
>
> Coincidentally Samsung's own K9F1208 is a 3.3V part and guarantees an
> error-free block zero. It's not advertised as a feature, but reading the
> datasheet you find this:
>
> "The 1st block, which is placed on 00h block address, is guaranteed to
> be a valid block, does not require Error Correction up to 1K program/erase
> cycles."
>
> I don't have enough experience with NAND flash to say if this is a common
> feature or not. (I'm only aware of the S3C2440 and this flash via the
> FriendlyARM Mini2440 board.)

Ah. That appears to be a lower-desnity small-block part (512Mb). I'm
not sure what size we told the rep, but it looks like it was 1Gb or
based on the datasheets I've seen so far. We're also only interested
in parts we can actually buy, so it's also possible the K9F1208 isn't
avaialable through distribution.

--
Grant Edwards grant.b.edwards Yow! -- I have seen the
at FUN --
gmail.com

Grant Edwards

unread,

Sep 16, 2010, 2:55:55 PM9/16/10

to

On 2010-09-16, Grant Edwards <inv...@invalid.invalid> wrote:
> On 2010-09-16, Wil Taphoorn <w...@nogo.wtms.nl> wrote:
>> On 16-9-2010 18:40, Grant Edwards wrote:
>>> On 2010-09-16, tim.... <tims_n...@yahoo.co.uk> wrote:
>>>>
>>>> But don't all manufactures guarantee this?
>>>
>>> Nope.
>>
>> See what they (Hynix, Intel, Micron, Phison, SanDisk, Sony, Spansion)
>> have spec'ed about NAND at http://www.onfi.org
>>
>> One of the specs is that NAND's *must* have at least one guaranteed
>> valid block starting at address 0.
>
> Can you please point to that in the spec?
>
> All I can find is the definition for one of the fields in the
> parameter block that tells how many valid blocks there are at the
> beginning of the device:
>

> 5.6.1.22. Byte 107
>
> Guaranteed valid blocks at beginning of target This field
> indicates the number of guaranteed valid blocks starting at
> block address 0 of the target. The minimum value for this field
> is 1h. The blocks are guaranteed to be valid for the endurance
> specified for this area (see section 5.6.1.23) when the host
> follows the specified number of bits to correct.
>
> Nowhere does it say this number can't be 0,

Duh, wrong. The minimum is 1.

> and it explicitly says they can require that the host do ECC for
> those "garanteed valid" blocks.

I still stand by that reading.

--
Grant Edwards grant.b.edwards Yow! I'm EMOTIONAL
at now because I have
gmail.com MERCHANDISING CLOUT!!

Anders....@kapsi.spam.stop.fi.invalid

unread,

Sep 16, 2010, 3:34:54 PM9/16/10

to

Grant Edwards <inv...@invalid.invalid> wrote:
> On 2010-09-16, Anders....@kapsi.spam.stop.fi.invalid <Anders....@kapsi.spam.stop.fi.invalid> wrote:
>> Coincidentally Samsung's own K9F1208 is a 3.3V part and guarantees an
>> error-free block zero.

> Ah. That appears to be a lower-desnity small-block part (512Mb). I'm
> not sure what size we told the rep, but it looks like it was 1Gb or
> based on the datasheets I've seen so far. We're also only interested
> in parts we can actually buy, so it's also possible the K9F1208 isn't
> avaialable through distribution.

They have other parts, like the K9F8G08 (1GB, 4K pages). Dunno about
availability.

-a

Grant Edwards

unread,

Sep 16, 2010, 3:45:53 PM9/16/10

to

Yup, I found those, but was unable to figure how to download a
datasheet. So far, it appears to me that older, lower density parts
are much more likely to guarantee block 0 w/o requiring ECC than the
newer, higher density parts.

--
Grant Edwards grant.b.edwards Yow! Hello. Just walk
at along and try NOT to think
gmail.com about your INTESTINES being
almost FORTY YARDS LONG!!

Wil Taphoorn

unread,

Sep 16, 2010, 4:13:38 PM9/16/10

to

On 16-9-2010 20:46, Grant Edwards wrote:

> On 2010-09-16, Wil Taphoorn <w...@nogo.wtms.nl> wrote:
>> One of the specs is that NAND's *must* have at least one guaranteed
>> valid block starting at address 0.
>
> Can you please point to that in the spec?
>
> All I can find is the definition for one of the fields in the
> parameter block that tells how many valid blocks there are at the
> beginning of the device:

> Nowhere does it say this number can't be 0

> [..] The minimum value for this field is 1h.
Since that field is mandatory this line reads "at least one" to me.

> The blocks are guaranteed to be valid for the endurance
> specified for this area (see section 5.6.1.23) when the host
> follows the specified number of bits to correct.
>

> .. and it explicitly says they can require that the host do ECC

> for those "garanteed valid" blocks.

Doesn't that mean that the programming device that is writing the
boot sector has to verify for errors and, if so, reject the device?

--
Wil

Grant Edwards

unread,

Sep 16, 2010, 4:21:08 PM9/16/10

to

On 2010-09-16, Wil Taphoorn <w...@nogo.wtms.nl> wrote:
> On 16-9-2010 20:46, Grant Edwards wrote:
>> On 2010-09-16, Wil Taphoorn <w...@nogo.wtms.nl> wrote:
>>> One of the specs is that NAND's *must* have at least one guaranteed
>>> valid block starting at address 0.
>>
>> Can you please point to that in the spec?
>>
>> All I can find is the definition for one of the fields in the
>> parameter block that tells how many valid blocks there are at the
>> beginning of the device:

>> The blocks are guaranteed to be valid for the endurance specified

>> for this area (see section 5.6.1.23) when the host follows the
>> specified number of bits to correct.
>>
>> .. and it explicitly says they can require that the host do ECC
>> for those "garanteed valid" blocks.
>
> Doesn't that mean that the programming device that is writing the
> boot sector has to verify for errors and, if so, reject the device?

You mean that block 0 is guaranteed good _if_ the customer throws out
any devices they find with a bad block 0?

Or, to phrase it differently: "Block 0 is guaranteed to be valid in
all devices that have a valid block 0.".

That's a statement so meaningless that even George Bush would be proud
of it. ;)

--
Grant Edwards grant.b.edwards Yow! I smell like a wet
at reducing clinic on Columbus
gmail.com Day!

Grant Edwards

unread,

Sep 16, 2010, 5:20:36 PM9/16/10

to

On 2010-09-16, Wil Taphoorn <w...@nogo.wtms.nl> wrote:

OK, I've gotten more clarification from the hardware guys. One of the
requirements is availablility in a BGA package. The above Hynix part
appears to be rather old and only available in TSOP. If you look at
the newer large-block Hynix parts (which are available in BGA) such as

the HY27UF081G2A, the datasheet says:

The 1st block is guaranteed to be a valid block up to 1K cycles

with ECC. (1bit/528bytes)

--
Grant Edwards grant.b.edwards Yow! Psychoanalysis??
at I thought this was a nude
gmail.com rap session!!!

Wil Taphoorn

unread,

Sep 16, 2010, 5:18:35 PM9/16/10

to

On 16-9-2010 22:21, Grant Edwards wrote:
> On 2010-09-16, Wil Taphoorn <w...@nogo.wtms.nl> wrote:
>> Doesn't that mean that the programming device that is writing the
>> boot sector has to verify for errors and, if so, reject the device?
>
> You mean that block 0 is guaranteed good _if_ the customer throws out
> any devices they find with a bad block 0?

No, I expect that this block can -at least once- be written without any
bit errors (i.e. able to boot without ECC considerations). What I meant
is that it is up to the design to take risks of reprogramming this boot
sector.

--
Wil

Grant Edwards

unread,

Sep 16, 2010, 5:44:24 PM9/16/10

to

On 2010-09-16, Wil Taphoorn <w...@nogo.wtms.nl> wrote:
> On 16-9-2010 22:21, Grant Edwards wrote:
>> On 2010-09-16, Wil Taphoorn <w...@nogo.wtms.nl> wrote:
>>> Doesn't that mean that the programming device that is writing the
>>> boot sector has to verify for errors and, if so, reject the device?
>>
>> You mean that block 0 is guaranteed good _if_ the customer throws out
>> any devices they find with a bad block 0?
>
> No, I expect that this block can -at least once- be written without any
> bit errors (i.e. able to boot without ECC considerations).

It doesn't say that anywhere in the spec.

What it says is this:

The blocks are guaranteed to be valid for the endurance
specified for this area (see section 5.6.1.23) when the host
follows the specified number of bits to correct.

Note the last phrase:

"when the host follows the specified number of bits to correct"

The blocks are only guaranteed valid _if_ you do ECC to correct the
specified number of bit-errors.

> What I meant is that it is up to the design to take risks of
> reprogramming this boot sector.

OK, I understand what you mean. But, that's not what the OneNAND spec
says, and the datasheets for many vendor's parts specifically state
that you must do ECC if you expect block 0 to be valid.

--
Grant Edwards grant.b.edwards Yow! Kids, don't gross me
at off ... "Adventures with
gmail.com MENTAL HYGIENE" can be
carried too FAR!

Wil Taphoorn

unread,

Sep 16, 2010, 6:28:56 PM9/16/10

to

On 16-9-2010 23:44, Grant Edwards wrote:
> On 2010-09-16, Wil Taphoorn <w...@nogo.wtms.nl> wrote:
>>
>> No, I expect that this block can -at least once- be written without any
>> bit errors (i.e. able to boot without ECC considerations).
>
> It doesn't say that anywhere in the spec.
>

> The blocks are guaranteed to be valid for the endurance
> specified for this area (see section 5.6.1.23) when the host
> follows the specified number of bits to correct.
>
> Note the last phrase:
>
> "when the host follows the specified number of bits to correct"
>
> The blocks are only guaranteed valid _if_ you do ECC to correct the
> specified number of bit-errors.

True, "for the endurance specified", AKA "a number of times programmed".

But that doesn't mean you can't program it the first time. That is what
I meant by "expect", I would not accept a device that flips a bit on
the very first time it was programmed.

--
Wil

Grant Edwards

unread,

Sep 16, 2010, 10:21:40 PM9/16/10

to

On 2010-09-16, Wil Taphoorn <w...@nogo.wtms.nl> wrote:
> On 16-9-2010 23:44, Grant Edwards wrote:
>> On 2010-09-16, Wil Taphoorn <w...@nogo.wtms.nl> wrote:
>>>
>>> No, I expect that this block can -at least once- be written without any
>>> bit errors (i.e. able to boot without ECC considerations).
>>
>> It doesn't say that anywhere in the spec.
>>
>> The blocks are guaranteed to be valid for the endurance
>> specified for this area (see section 5.6.1.23) when the host
>> follows the specified number of bits to correct.
>>
>> Note the last phrase:
>>
>> "when the host follows the specified number of bits to correct"
>>
>> The blocks are only guaranteed valid _if_ you do ECC to correct the
>> specified number of bit-errors.
>
> True, "for the endurance specified", AKA "a number of times
> programmed".
>
> But that doesn't mean you can't program it the first time.

That's immaterial. What's important is that it doesn't mean
that you _can_ program it the first time. (without ECC)

> That is what I meant by "expect", I would not accept a device that
> flips a bit on the very first time it was programmed.

I know what you meant by "expect", but I doubt that what you expect
determines what a fab ships.

A bit can fail the first time you program block 0, and it will still
meet the spec. That's what matters.

You can expect all sorts of things, but if a feature isn't in the
part's specification, then it's foolish to design a product that
depends on that feature.

The last batch of NAND chips I played with had 0 bad blocks.

I can "expect" 0 bad blocks all I want, but that's not going to stop
the vendor from shipping parts with up to 20 bad blocks out of 1024
next week. A design that relies on NAND parts having 0 bad blocks is a
bad idea no matter how hard I expect 100% good blocks.

--
Grant

Marc Jet

unread,

Sep 17, 2010, 7:52:07 AM9/17/10

to

> All "hardware ECC support" I have seen so far is useless for anything
> but older, smaller SLC parts. "Hardware ECC support" is doing a Hamming
> code in hardware, which can correct a single bit error. Current large
> SLC parts, and MLC parts, need a 4- or even 8-bit-correcting code.

IMHO it is actually worse.

The way many NAND datasheets are written, they allow for more than
just 1 or 4 or 8 bad bits in a block. A certain number of blocks
could go away COMPLETELY, and the part would still be in-spec.

People commonly expect bad blocks to have more bit errors than their
ECC copes with. However, nowhere in the datasheets is a guarantee for
this.

For what I know, blocks could just as well become all 1. Or all 0.
Or return read timeout. Or worse, they could become "unerasable" -
stuck at your own previous content (with your headers, and valid
ECC!).

Now I want to see how your FTL copes with that!

Stefan Reuther

unread,

Sep 17, 2010, 1:00:19 PM9/17/10

to

I interpret that to mean that the boot sector can consist of X perfectly
reliable bits and Y unreliable bits (e.g. permanently zero). The boot
loader would then have to ECC-correct the unreliable bits each time it
loads, and the manufacturer guarantees that Y doesn't grow above the ECC
requirements.

Stefan

rickman

unread,

Sep 17, 2010, 3:19:56 PM9/17/10

to

Looking back, I never actually used a NAND flash in a design. I
understand how the bad bits would be managed. But what about bad
blocks? Is this a spec on delivery or is it allowed for blocks to go
bad in the field? I can't see how that could be supported without a
very complex scheme along the lines of RAID drives.

Rick

Grant Edwards

unread,

Sep 17, 2010, 10:03:08 PM9/17/10

to

That's what it means to me, that's what it means to the FAE's we're
working with, and judging by the parts' datasheets, that's what it
means to the guys doing QA at the fabs.

--
Grant

Allan Herriman

unread,

Sep 18, 2010, 12:34:38 AM9/18/10

to

It's pretty simple actually. When the driver reads a block that has an
error, it copies the corrected contents to an unused block and sets the
bad block flag in the original block, preventing its reuse.
No software will ever clear the bad block flag, which means that the
effective size of the device decreases as blocks go bad in the field.

From the point of view of the flash device, the bad block flag is just
another bit. The meaning comes from the software behaviour. The device
manufacturer will also mark some blocks bad during test. All filesystems
will use this same bit. Even if you reformat the device and put a
different filesystem on it, the bad block information is retained.

Cheers,
Allan

rickman

unread,

Sep 18, 2010, 7:26:30 AM9/18/10

to

You lost me. If there is an recoverable error, the block is not bad,
right? That's the purpose of the ECC. If the block accumulates
enough bad bits that the ECC can not correct, then you can't recover
the data.

Obviously there is something about the definition of "bad block" that
I am not getting. Are blocks with *any* bit errors considered bad and
not used? What if a block goes bad because it went from no bit errors
to more than the correctable number of bit errors? As Marc indicated,
a block can go bad for multiple reasons, many of which do not allow
the data to be recovered.

This sounds just like a bad block on a hard drive. When the block
goes bad, you lose data. No way around it, just tough luck! I
suppose in both media that is one of the limitations of the media. I
didn't realize that NAND Flash had this same sort of specified
behavior which is considered part of normal operation. I'll have to
keep that in mind.

Rick

Stefan Reuther

unread,

Sep 18, 2010, 7:34:12 AM9/18/10

to

Allan Herriman wrote:
> On Fri, 17 Sep 2010 12:19:56 -0700, rickman wrote:
>>On Sep 17, 7:52 am, Marc Jet <jetm...@hotmail.com> wrote:
>>>People commonly expect bad blocks to have more bit errors than their
>>>ECC copes with. However, nowhere in the datasheets is a guarantee for
>>>this.

[...]

>>Looking back, I never actually used a NAND flash in a design. I
>>understand how the bad bits would be managed. But what about bad
>>blocks? Is this a spec on delivery or is it allowed for blocks to go
>>bad in the field? I can't see how that could be supported without a
>>very complex scheme along the lines of RAID drives.
>
> It's pretty simple actually. When the driver reads a block that has an
> error, it copies the corrected contents to an unused block and sets the
> bad block flag in the original block, preventing its reuse.
> No software will ever clear the bad block flag, which means that the
> effective size of the device decreases as blocks go bad in the field.

But where do you store the "bad block" flag? It is pretty common to
store it in the bad block itself. The point Marc is making is that this
is not guaranteed to work.

> From the point of view of the flash device, the bad block flag is just
> another bit. The meaning comes from the software behaviour. The device
> manufacturer will also mark some blocks bad during test. All filesystems
> will use this same bit. Even if you reformat the device and put a
> different filesystem on it, the bad block information is retained.

In an ideal world, maybe. All file systems I have seen so far use
different bad block schemes. Which is not surprising, as NAND flash
parts themselves use different schemes to mark factory bad blocks.

Stefan

David Brown

unread,

Sep 18, 2010, 10:25:14 AM9/18/10

to

Just like with hard disks, the NAND flash ECC can correct several errors
in a block. So when there are a few correctable errors in a block, the
block is still "good" and still used. But once you have got close to
the correctable limit, you can still read out the data but you mark it
as bad so that it won't be used again.

There is always a possibility of a major failure that unexpectedly
increases the error rate beyond the capabilities of the ECC. But that
should be a fairly rare event - like a head crash on a hard disk. The
idea is to detect slow, gradual decay and limit its consequences. If
you need to protect against sudden disaster, then something equivalent
to RAID is the answer.

rickman

unread,

Sep 18, 2010, 11:05:47 PM9/18/10

to

On Sep 18, 10:25 am, David Brown

"Close" isn't good enough. You can't assume that it will fail
gradually. If it goes from good to bad, then you have lost data. Now
that I am aware of that, I will treat NAND flash the same as hard
disks, not to be counted on for embedded projects where a file system
failure is important.

> There is always a possibility of a major failure that unexpectedly
> increases the error rate beyond the capabilities of the ECC. But that
> should be a fairly rare event - like a head crash on a hard disk. The
> idea is to detect slow, gradual decay and limit its consequences. If
> you need to protect against sudden disaster, then something equivalent
> to RAID is the answer.

Yes, a bad block happening without warning may be "rare", but the
point is that it is JUST like a hard disk drive and can not be used in
an app where this would cause a system failure. Any part of the
system can fail, but a bad block is not considered a "failure" of the
chip even though it can cause a failure of the system.

Rick

rickman

unread,

Sep 18, 2010, 11:09:11 PM9/18/10

to

On Sep 18, 7:34 am, Stefan Reuther <stefan.n...@arcor.de> wrote:
> Allan Herriman wrote:
> > On Fri, 17 Sep 2010 12:19:56 -0700, rickman wrote:
> >>On Sep 17, 7:52 am, Marc Jet <jetm...@hotmail.com> wrote:
> >>>People commonly expect bad blocks to have more bit errors than their
> >>>ECC copes with. However, nowhere in the datasheets is a guarantee for
> >>>this.
> [...]
> >>Looking back, I never actually used a NAND flash in a design. I
> >>understand how the bad bits would be managed. But what about bad
> >>blocks? Is this a spec on delivery or is it allowed for blocks to go
> >>bad in the field? I can't see how that could be supported without a
> >>very complex scheme along the lines of RAID drives.
>
> > It's pretty simple actually. When the driver reads a block that has an
> > error, it copies the corrected contents to an unused block and sets the
> > bad block flag in the original block, preventing its reuse.
> > No software will ever clear the bad block flag, which means that the
> > effective size of the device decreases as blocks go bad in the field.
>
> But where do you store the "bad block" flag? It is pretty common to
> store it in the bad block itself. The point Marc is making is that this
> is not guaranteed to work.

Why do you need a bad block flag? If the block has an ECC failure, it
is bad and the OS will note that. You may have to read the block ECC
the first time it fails, but after that it can be noted in the file
system as not part of a file and not part of free space on the drive.

> > From the point of view of the flash device, the bad block flag is just
> > another bit. The meaning comes from the software behaviour. The device
> > manufacturer will also mark some blocks bad during test. All filesystems
> > will use this same bit. Even if you reformat the device and put a
> > different filesystem on it, the bad block information is retained.
>
> In an ideal world, maybe. All file systems I have seen so far use
> different bad block schemes. Which is not surprising, as NAND flash
> parts themselves use different schemes to mark factory bad blocks.
>
> Stefan

I don't see how this is any different from a hard drive. There they
use a combination of factory data and the file system to track bad
blocks.

Rick

David Brown

unread,

Sep 19, 2010, 10:04:17 AM9/19/10

to

On 19/09/2010 05:05, rickman wrote:
> On Sep 18, 10:25 am, David Brown
> <david.br...@hesbynett.removethisbit.no> wrote:
>> On 18/09/2010 13:26, rickman wrote:
>> Just like with hard disks, the NAND flash ECC can correct several errors
>> in a block. So when there are a few correctable errors in a block, the
>> block is still "good" and still used. But once you have got close to
>> the correctable limit, you can still read out the data but you mark it
>> as bad so that it won't be used again.
>
> "Close" isn't good enough. You can't assume that it will fail
> gradually. If it goes from good to bad, then you have lost data. Now
> that I am aware of that, I will treat NAND flash the same as hard
> disks, not to be counted on for embedded projects where a file system
> failure is important.
>

That's just nonsense.

/Everything/ has a chance of failure. Are you going to stop using
microcontrollers because you've heard that they occasionally fail? Will
you stop driving your car to work because they sometimes break down?

What is important for building reliable systems is to have an
understanding of the failure modes of the parts, the chances of these
failures, and the consequences of the failures. NAND flash has
significant risk of failure with reasonably well understood
characteristics - the failure of individual bits is mostly independent,
and the risk of failure increases with each erase/write cycle. So what
you get is a pattern of gradually more random bit failures within any
given block, increasing as the block gets erased and re-written. You
correct for a few bit failures, but if there are too many errors you
consider the block to be failing - you can read from it, but you won't
trust it to store new data. In most cases, you'll copy the data over to
a different block.

Note that the same principle applies if the ECC coding only corrects a
single error - with one correctable error you consider the block too
risky for re-use, but trust the (corrected) data read out.

>
>> There is always a possibility of a major failure that unexpectedly
>> increases the error rate beyond the capabilities of the ECC. But that
>> should be a fairly rare event - like a head crash on a hard disk. The
>> idea is to detect slow, gradual decay and limit its consequences. If
>> you need to protect against sudden disaster, then something equivalent
>> to RAID is the answer.
>
> Yes, a bad block happening without warning may be "rare", but the
> point is that it is JUST like a hard disk drive and can not be used in
> an app where this would cause a system failure. Any part of the
> system can fail, but a bad block is not considered a "failure" of the
> chip even though it can cause a failure of the system.
>

The only way to make a system safe in the event of rare catastrophic
failures of critical systems is with redundancy. It applies to NAND
devices just like it applies to every other part of the system.

The difference is that with a NAND flash, a bad block is /not/
considered a failure because you take the wear of the blocks into
account in the design of the system, so that they don't lead to system
failure.

Think of it like a battery - you know it is going to "fail", and plan
accordingly so that it does not lead to a catastrophic failure of the
system.

David Brown

unread,

Sep 19, 2010, 10:06:20 AM9/19/10

to

Failures can be intermittent - a partially failed bit could be read
correctly or incorrectly depending on the data stored, the temperature,
or the voltage. So if you see that you are getting failures, you make a
note of them and don't use that block again.

Allan Herriman

unread,

Sep 19, 2010, 11:31:53 AM9/19/10

to

Our experience has been that parts fresh from the factory will have some
blocks flagged as bad. If we (by using a modified driver) write and read
those blocks, some of them will actually work ok. Presumably the factory
test is rather more rigorous.

("Modified driver" should be read as "partially ported and still bug
ridden driver".)

It's been a while, but ISTR that the parts we were using had 0 to 3 bad
blocks per device, which was within the manufacturer's spec. We stress
tested a bunch of them and we did see a block go bad. The number of erase/
write cycles required exceeded the manufacturer's minimum spec.
(This stress testing was performed to test the bad block handling in
software.)

To keep this relevant to the OP, we were using parts that had a guaranteed
good block 0. The board is still in production.

Regards,
Allan

Ulf Samuelsson

unread,

Sep 20, 2010, 4:26:52 AM9/20/10

to

rickman skrev:

> On Sep 18, 10:25 am, David Brown
> <david.br...@hesbynett.removethisbit.no> wrote:
>> On 18/09/2010 13:26, rickman wrote:
>>
>>
>>
>>> On Sep 18, 12:34 am, Allan Herriman<allanherri...@hotmail.com> wrote:
>>>> On Fri, 17 Sep 2010 12:19:56 -0700, rickman wrote:
>>>>> On Sep 17, 7:52 am, Marc Jet<jetm...@hotmail.com> wrote:
>>>>>>> All "hardware ECC support" I have seen so far is useless for anything
>>>>>>> but older, smaller SLC parts. "Hardware ECC support" is doing a
>>>>>>> Hamming code in hardware, which can correct a single bit error.
>>>>>>> Current large SLC parts, and MLC parts, need a 4- or even
>>>>>>> 8-bit-correcting code.
>>>>>> IMHO it is actually worse.
>>>>>> The way many NAND datasheets are written, they allow for more than just
>>>>>> 1 or 4 or 8 bad bits in a block. A certain number of blocks could go
>>>>>> away COMPLETELY, and the part would still be in-spec.
>>>>>> People commonly expect bad blocks to have more bit errors than their
>>>>>> ECC copes with. However, nowhere in the datasheets is a guarantee for
>>>>>> this.

>>>>>> For what I know, blocks could just as well become all 1. Or all 0.. Or

News to you:
All flash memories will eventually "wear out".
You have to have a strategy to handle this.

>
>> There is always a possibility of a major failure that unexpectedly
>> increases the error rate beyond the capabilities of the ECC. But that
>> should be a fairly rare event - like a head crash on a hard disk. The
>> idea is to detect slow, gradual decay and limit its consequences. If
>> you need to protect against sudden disaster, then something equivalent
>> to RAID is the answer.
>
> Yes, a bad block happening without warning may be "rare", but the
> point is that it is JUST like a hard disk drive and can not be used in
> an app where this would cause a system failure. Any part of the
> system can fail, but a bad block is not considered a "failure" of the
> chip even though it can cause a failure of the system.
>
> Rick

--
Best Regards
Ulf Samuelsson
These are my own personal opinions, which may
or may not be shared by my employer Atmel Nordic AB

Ulf Samuelsson

unread,

Sep 20, 2010, 4:44:34 AM9/20/10

to

Grant Edwards skrev:
> We recently based a board on an Atmel AT91SAM9G20 part which the FAE
> and rep said could boot from NAND flash. The eval board can indeed be
> configured to boot from NAND flash. However, when it comes time to
> spec parts for a real product, we find that's all smoke and mirrors.
>
> The Atmel SAM9 parts require that block 0 be completely free of bit
> errors since the ROM bootloader doesn't do ECC (despite the fact that
> the part does have hardware ECC support). So you have to use a NAND
> flash that guarantees a good block 0 _without_using_ECC_. It turns
> out those sorts of NAND flash parts appear to be made of a combination
> of Unicorn's horn and Unobtanium. IOW, they don't exist. At least
> that's what the flash distributor and rep tell us.
>
> What was Amtel thinking when they decided not to do ECC when reading
> NAND flash? I realize Atmel doesn't make NAND flash, but surely they
> must have been aware that NAND flash parts aren't spec'ed to be
> fault-free by the flash vendors.
>
> My opinion? It's a way for Atmel to suck you in and then after you
> get the unpleasant surprise that you _can't_ boot from NAND, they try
> to sell you a serial dataflash part you don't really want.

The problem is that the NAND flash market has moved on since
the AT91SAM9G20 was designed.
The NAND flash used to guarantee Block 0.
Now there are memories which does not have this guarantee.
I think that the move in the industry is towards eMMC

Note that the fact that block 0 is OK, is no guarantee that
you can boot. The part configuration must also be recognized
by the boot ROM. Some manufacturers "reuse" id's so if the
table contains two elements with the same Id, only the first will
be found.

>
> OTOH, TI did it right in their OMAP parts: not only does the
> bootloader do ECC, it also will skip blocks that have uncorrectable
> errors.
>
> Atmel: Block 0 must be good without ECC
>
> TI: _Any_ of blocks 0,1,2,3 must be good _with_ ECC
>
> Which do you think is going to work better?

Boudewijn Dijkstra

unread,

Sep 20, 2010, 5:12:04 AM9/20/10

to

Op Sat, 18 Sep 2010 13:26:30 +0200 schreef rickman <gnu...@gmail.com>:

> On Sep 18, 12:34 am, Allan Herriman <allanherri...@hotmail.com> wrote:
>> On Fri, 17 Sep 2010 12:19:56 -0700, rickman wrote:
>> > On Sep 17, 7:52 am, Marc Jet <jetm...@hotmail.com> wrote:

>> >> > [...]

>>
>> > Looking back, I never actually used a NAND flash in a design. I
>> > understand how the bad bits would be managed. But what about bad
>> > blocks? Is this a spec on delivery or is it allowed for blocks to go
>> > bad in the field? I can't see how that could be supported without a
>> > very complex scheme along the lines of RAID drives.
>>
>> It's pretty simple actually. When the driver reads a block that has an
>> error, it copies the corrected contents to an unused block and sets the
>> bad block flag in the original block, preventing its reuse.
>> No software will ever clear the bad block flag, which means that the
>> effective size of the device decreases as blocks go bad in the field.
>>
>> From the point of view of the flash device, the bad block flag is just
>> another bit. The meaning comes from the software behaviour. The device
>> manufacturer will also mark some blocks bad during test. All
>> filesystems
>> will use this same bit. Even if you reformat the device and put a
>> different filesystem on it, the bad block information is retained.
>

> You lost me. If there is an recoverable error, the block is not bad,
> right?

It means that it's not doing particularly well and is likely to become bad.

> That's the purpose of the ECC. If the block accumulates
> enough bad bits that the ECC can not correct, then you can't recover
> the data.

Precisely. Without ECC, you wouldn't be able to evacuate the data to a
good block.

In theory I think you could still use them for writing but you'd have to
verify the data every time.

> Obviously there is something about the definition of "bad block" that
> I am not getting. Are blocks with *any* bit errors considered bad and
> not used? What if a block goes bad because it went from no bit errors
> to more than the correctable number of bit errors?

That would be "really bad".

> As Marc indicated,
> a block can go bad for multiple reasons, many of which do not allow
> the data to be recovered.
>
> This sounds just like a bad block on a hard drive. When the block
> goes bad, you lose data. No way around it, just tough luck! I
> suppose in both media that is one of the limitations of the media. I
> didn't realize that NAND Flash had this same sort of specified
> behavior which is considered part of normal operation. I'll have to
> keep that in mind.
>
> Rick

--
Gemaakt met Opera's revolutionaire e-mailprogramma:
http://www.opera.com/mail/
(remove the obvious prefix to reply by mail)

Marc Jet

unread,

Sep 20, 2010, 7:15:32 AM9/20/10

to

NAND chips allow for up to N "bad blocks" before a device is
considered defective. Some blocks come already marked as bad from
factory. It is recommended to preserve this information, as factory
testing is usually more exhaustive than what you can implement in a
typical embedded system. However, more bad blocks are allowed to
develop DURING THE LIFE TIME of the device, up to the specified
maximum (N).

This means that you whatever you write to the device, may or may not
be readable afterwards! You have 3 choices how to handle this:

Choice 1: Use enough error correction to be 100% safe against it.

Note that "normal" ECC is definately not enough. On a device with 0
known bad blocks, up to N blocks can disappear from one moment to the
other (in the worst possible scenario). To be safe against this, you
must distribute each and every of your precious bits over at least N+1
blocks.

Algorithms exist that can do this (for example RS), but they are not
nice. Besides the algorithmic complexity, there is another problem
with this approach. The higher the storage efficiency (data bits
versus redundancy bits), the more blocks you have to read before
you're able to extract your bits. With N in the range of 20 to 40 in
typical NAND chips, this results in an unavoidable and very high read/
write latency.

Choice 2: Avoid giving more reliability guarantees than your
underlying technology.

This sounds simple and impossible, yet in fact it's quite realistic.
The problem is not that your storage capacity may go away. It's just
that vital data stored in a particular place may become unreadable.
If you introduce a procedure to restore the data and make that
procedure part of the normal operation of your device, then it's not a
real problem.

In practice this means that your bad block layer must be able to
identify the bad blocks in all circumstances. I know that many real-
world algorithms (like the mentioned one of using the "bad block bit")
are not 100% fit for the task. After all, the bad block bit may be
stuck at '1', and you can't do what's necessary to mark it bad. But
there are more reliable approaches that can achieve the necessary
guarantee.

Of course the other essential part for this choice is to provide a way
to restore the data, which can be a PC flasher program (like iTunes
"restore device").

Then your device can be declared to be always working, without
extending the reliability guarantees beyond those given by the NAND
manufacturer.

Choice 3: Implement reasonable ECC, give all the guarantees, and hope
for the best.

This seems to be "industry standard". It seems to work out quite OK,
because NAND failures usually are not very catastrophic. As others
have pointed out, creeping failures can be detected and data migrated
before ECC capability is exceeded. Usually failures go in hand with
write activity in the same block or page, and write patterns are under
software control.

But then again, to make it very clear: this approach is not 100%
safe. It's a compromise between feasibility and reliability.

You will see yield problems. Unless it's life threatening technology,
you're probably better off accepting them than to cure them.

Grant Edwards

unread,

Sep 20, 2010, 10:18:40 AM9/20/10

to

On 2010-09-20, Ulf Samuelsson <u...@a-t-m-e-l.com> wrote:
> Grant Edwards skrev:

>> The Atmel SAM9 parts require that block 0 be completely free of bit
>> errors since the ROM bootloader doesn't do ECC (despite the fact that
>> the part does have hardware ECC support). So you have to use a NAND
>> flash that guarantees a good block 0 _without_using_ECC_. It turns
>> out those sorts of NAND flash parts appear to be made of a
>> combination of Unicorn's horn and Unobtanium. IOW, they don't exist.
>> At least that's what the flash distributor and rep tell us.
>

> The problem is that the NAND flash market has moved on since the
> AT91SAM9G20 was designed. The NAND flash used to guarantee Block 0.

That was the conclusion to which I eventually came after reviewing a
bunch of datasheets. The parts that you could use to boot a G20 were
all several years old, and the parts that required ECC on block 0 were
newer. Since the hardware guys wanted a small (read BGA) package,
that pretty much left only the recent parts that reuire ECC on block 0.

It looks like we're going to either have to settle for TSOP or add a
SPI NOR flash to hold the 16KB bootstrap.

> Note that the fact that block 0 is OK, is no guarantee that you can
> boot. The part configuration must also be recognized by the boot ROM.
> Some manufacturers "reuse" id's so if the table contains two elements
> with the same Id, only the first will be found.

--
Grant Edwards grant.b.edwards Yow! I wish I was a
at sex-starved manicurist
gmail.com found dead in the Bronx!!

Vladimir Vassilevsky

unread,

Sep 20, 2010, 10:36:11 AM9/20/10

to

Marc Jet wrote:
> NAND chips allow for up to N "bad blocks" before a device is
> considered defective.

That N could be as high as 2% of the total capacity. The tendency is
allowing for more and more of N. The higher is the flash density, the
lower is the reliability. This is especially true for the multilevel
flash. If the application requires high reliability of data, I avoid
using high density flash. There is also NAND flash of industrial
quality, which is substantially more reliable then consumer grade.

> Some blocks come already marked as bad from
> factory. It is recommended to preserve this information, as factory
> testing is usually more exhaustive than what you can implement in a
> typical embedded system.

You are making unfounded assumptions here.

> However, more bad blocks are allowed to
> develop DURING THE LIFE TIME of the device, up to the specified
> maximum (N).

And higher then maximum. N+1, N+2 and so on.

> This means that you whatever you write to the device, may or may not
> be readable afterwards!

Incredible, isn't it?

> You have 3 choices how to handle this:
> Choice 1: Use enough error correction to be 100% safe against it.

Only the insurance agencies are promising 100% guaranteed result.

> Choice 2: Avoid giving more reliability guarantees than your
> underlying technology.

> Choice 3: Implement reasonable ECC, give all the guarantees, and hope
> for the best.
> This seems to be "industry standard".

RAID or RAID-like solutions are well known for the safe storage of data.

> It seems to work out quite OK,
> because NAND failures usually are not very catastrophic.

Until some critical part of the filesystem fails, making all other data
unaccessible.

> As others
> have pointed out, creeping failures can be detected and data migrated
> before ECC capability is exceeded. Usually failures go in hand with
> write activity in the same block or page, and write patterns are under
> software control.

Those intelligent measures introduce a lot of overhead and increase the
amount of write activity. Also, they create critical situations when
accidental power failure can destroy the filesystem.

> But then again, to make it very clear: this approach is not 100%
> safe. It's a compromise between feasibility and reliability.
>
> You will see yield problems. Unless it's life threatening technology,
> you're probably better off accepting them than to cure them.

Sure. Who cares about occasionaly broken .mp3 or .jpg file.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

Ulf Samuelsson

unread,

Sep 20, 2010, 12:37:44 PM9/20/10

to

If you add an SPI flash (or a dataflash) and plan to boot Linux,
you are probably better of by putting also u-boot and u-boot
environment in the dataflash.
You might also want to consider the kernel.

Reason is the SAM-BA S/W which only knows how to erase
the complete NAND flash.
If you plan to program the NAND flash using another method,
then of course, use the NAND flash for everything
except bootstrap.

--
Best Regards
Ulf Samuelsson

These are my own personal opinions, which may (or may not)

Stefan Reuther

unread,

Sep 20, 2010, 1:19:57 PM9/20/10

to

Marc Jet wrote:
> Choice 1: Use enough error correction to be 100% safe against it.
>
> Note that "normal" ECC is definately not enough. On a device with 0
> known bad blocks, up to N blocks can disappear from one moment to the
> other (in the worst possible scenario). To be safe against this, you
> must distribute each and every of your precious bits over at least N+1
> blocks.

This means you have to distribute each single data block across, say,
161 blocks. With a block size of 4k and NOP=4 this means the minimum
amount of data you can write (aka "cluster size") is 161 kBytes. Plus,
remember that NAND flash tends to get more forgetful if you actually use
NOP=4, so you'd more likely write 161x4 = 644 kBytes.

Well, that's certainly a way to reach 100,000 programming cycles.

> Choice 2: Avoid giving more reliability guarantees than your
> underlying technology.
>
> This sounds simple and impossible, yet in fact it's quite realistic.
> The problem is not that your storage capacity may go away. It's just
> that vital data stored in a particular place may become unreadable.
> If you introduce a procedure to restore the data and make that
> procedure part of the normal operation of your device, then it's not a
> real problem.
>
> In practice this means that your bad block layer must be able to
> identify the bad blocks in all circumstances. I know that many real-
> world algorithms (like the mentioned one of using the "bad block bit")
> are not 100% fit for the task. After all, the bad block bit may be
> stuck at '1', and you can't do what's necessary to mark it bad. But
> there are more reliable approaches that can achieve the necessary
> guarantee.

That's why you don't use a single bit. If my bad block layer sees a bad
block, it tries to actively stomp on all bits that still live there, to
destroy as much of the ECC and magic numbers as possible. Remember, we
don't need 100% reliability. After all, all components have a finite
life, and the flash just needs to live a little longer than the plug
connectors or capacitors in the device :-) And by using many bits, I
believe to got the chance that they all refuse to flip low enough.

It's a flash. It's electrons that tunnel out gradually. It's not an evil
gnome sitting within the package, deciding "today, I'll annoy the
engineer in an especially evil twisted way", so while the data sheet
allows a NAND flash to keep its old contents unmodifiably in a bad
sector, I assume this doesn't happen in practice. Or, at least, not
often enough to be observable.

Stefan

Stefan Reuther

unread,

Sep 20, 2010, 1:22:44 PM9/20/10

to

Vladimir Vassilevsky wrote:

> Marc Jet wrote:
>> Some blocks come already marked as bad from
>> factory. It is recommended to preserve this information, as factory
>> testing is usually more exhaustive than what you can implement in a
>> typical embedded system.
>
> You are making unfounded assumptions here.

No, he is citing usual data sheets. So while skeptics may still doubt
that factory testing happens, Marc's claim certainly is not unfounded,
because you can read it in every data sheet :-]

Stefan

Stefan Reuther

unread,

Sep 20, 2010, 1:01:42 PM9/20/10

to

[I haven't got rickman's post.]

David Brown wrote:
> On 19/09/2010 05:09, rickman wrote:
>> On Sep 18, 7:34 am, Stefan Reuther<stefan.n...@arcor.de> wrote:
>>> Allan Herriman wrote:
>>>> It's pretty simple actually. When the driver reads a block that has an
>>>> error, it copies the corrected contents to an unused block and sets the
>>>> bad block flag in the original block, preventing its reuse.
>>>> No software will ever clear the bad block flag, which means that the
>>>> effective size of the device decreases as blocks go bad in the field.
>>>
>>> But where do you store the "bad block" flag? It is pretty common to
>>> store it in the bad block itself. The point Marc is making is that this
>>> is not guaranteed to work.
>>
>> Why do you need a bad block flag? If the block has an ECC failure, it
>> is bad and the OS will note that. You may have to read the block ECC
>> the first time it fails, but after that it can be noted in the file
>> system as not part of a file and not part of free space on the drive.

How do you mark it "in the file system" if your file system is actually
inside the NAND flash? Thought experiment: your bad block table is
stored in a particular block. That block goes bad. Where do you mark
that this block is now bad?

State of the art seems to be to use magic numbers for valid data, and
destroy the ECC and/or magic numbers for blocks that are gone bad, so
you can identify them later. That's the "bad block flag".

> Failures can be intermittent - a partially failed bit could be read
> correctly or incorrectly depending on the data stored, the temperature,
> or the voltage. So if you see that you are getting failures, you make a
> note of them and don't use that block again.

From what I've seen, those temporary failed bits are still within the
specs of the NAND flash as long as you're running the part within specs.
However, when you're way out of spec (say, 30°C over limit), all hell
breaks loose.

Stefan

Grant Edwards

unread,

Sep 20, 2010, 2:10:11 PM9/20/10

to

On 2010-09-20, Ulf Samuelsson <nospa...@atmel.com> wrote:
> 2010-09-20 16:18, Grant Edwards skrev:

> If you add an SPI flash (or a dataflash) and plan to boot Linux, you
> are probably better of by putting also u-boot and u-boot environment
> in the dataflash. You might also want to consider the kernel.

While the ROM bootloader supports the 25xx series "dataflash" parts we
got sold, there is no support in the AT91 bootstrap, U-Boot, or Linux
-- at least not that I could ever find. I asked about it on the AT91
forum a few months back and got the usual response (IOW, none at all).

> Reason is the SAM-BA S/W which only knows how to erase the complete
> NAND flash.

Oh, I fixed that ages ago.

I added a few lines of code to the nand-flash, data-flash, and
serial-flash applets so that they can all erase a region of flash.

Then I wrote my own ROM-boot-protocol client in Python.

[Besides the lack of an "erase region" command, SAM-BA won't work at
all using a serial connection on a Linux host, it's not very usable
from the command-line, and it isn't very easy to use as a module from
other programs.]

> If you plan to program the NAND flash using another method, then of
> course, use the NAND flash for everything except bootstrap.

We'll initially use "SAM-BA" replacement program to program
prototypes. Then for production, the plan is to have the distributor
ship them with U-Boot preprogrammed so that we can use the TFTP server
in U-Boot to do the rest.

--
Grant Edwards grant.b.edwards Yow! I Know A Joke!!
at
gmail.com

David Brown

unread,

Sep 20, 2010, 2:11:40 PM9/20/10

to

On 20/09/2010 19:19, Stefan Reuther wrote:

> Remember, we
> don't need 100% reliability. After all, all components have a finite
> life, and the flash just needs to live a little longer than the plug
> connectors or capacitors in the device :-) And by using many bits, I
> believe to got the chance that they all refuse to flip low enough.
>

This is what some people here apparently have trouble understanding.
/Nothing/ is 100% reliable - it's just a matter of taking the
reliability of your parts into account when designing a complete system.

Ulf Samuelsson

unread,

Sep 20, 2010, 6:37:33 PM9/20/10

to

2010-09-20 20:10, Grant Edwards skrev:
> On 2010-09-20, Ulf Samuelsson<nospa...@atmel.com> wrote:
>> 2010-09-20 16:18, Grant Edwards skrev:
>
>> If you add an SPI flash (or a dataflash) and plan to boot Linux, you
>> are probably better of by putting also u-boot and u-boot environment
>> in the dataflash. You might also want to consider the kernel.
>
> While the ROM bootloader supports the 25xx series "dataflash" parts we
> got sold, there is no support in the AT91 bootstrap, U-Boot, or Linux
> -- at least not that I could ever find. I asked about it on the AT91
> forum a few months back and got the usual response (IOW, none at all).
>

It is hidden in a disused lavatory in the cellar marked:
"Beware of the Leopard".

There are three different AT91bootstraps around.

1) The obvious AT91bootstrap you can download from www.atmel.com
2) My derivative of AT91bootstrap which adds Kconfig etc.
and is used by open source projects like Buildroot and OpenEmbedded.
3) There is normally an AT91bootstrap in the "Softpack's".
This is different from (1) and (2).
It supports the 25xx series SPI flash but relies
on libraries not normally available in arm-linux compilers
so you may have to compile it using arm-newlib, IAR or Keil.

>> Reason is the SAM-BA S/W which only knows how to erase the complete
>> NAND flash.
>
> Oh, I fixed that ages ago.
>
> I added a few lines of code to the nand-flash, data-flash, and
> serial-flash applets so that they can all erase a region of flash.
>

Nice, how about sharing!

> Then I wrote my own ROM-boot-protocol client in Python.
>

> [Besides the lack of an "erase region" command, SAM-BA won't work at
> all using a serial connection on a Linux host, it's not very usable
> from the command-line, and it isn't very easy to use as a module from
> other programs.]
>

>> If you plan to program the NAND flash using another method, then of
>> course, use the NAND flash for everything except bootstrap.
>
> We'll initially use "SAM-BA" replacement program to program
> prototypes. Then for production, the plan is to have the distributor
> ship them with U-Boot preprogrammed so that we can use the TFTP server
> in U-Boot to do the rest.
>

Or, if you have an SD-Connector, you can boot from an SD-Card/
external SPI flash which programs the internal flash.

Grant Edwards

unread,

Sep 21, 2010, 11:30:11 AM9/21/10

to

On 2010-09-20, Ulf Samuelsson <nospa...@atmel.com> wrote:
> 2010-09-20 20:10, Grant Edwards skrev:

>> While the ROM bootloader supports the 25xx series "dataflash" parts we
>> got sold, there is no support in the AT91 bootstrap, U-Boot, or Linux
>> -- at least not that I could ever find. I asked about it on the AT91
>> forum a few months back and got the usual response (IOW, none at all).
>
> It is hidden in a disused lavatory in the cellar marked:
> "Beware of the Leopard".

Ah! I just listened to Stephen Fry's auidiobook of THHGTTG last
weekend while driving home from Chicago.

> There are three different AT91bootstraps around.
>
> 1) The obvious AT91bootstrap you can download from www.atmel.com

That's the one I looked at.

> 2) My derivative of AT91bootstrap which adds Kconfig etc. and is used
> by open source projects like Buildroot and OpenEmbedded.

Though am using buildroot for my rootfs, I found it more convenient to
build other things (kernel, bootstrap, U-Boot) separately, so I never
really looked into that one.

> 3) There is normally an AT91bootstrap in the "Softpack's". This is
> different from (1) and (2). It supports the 25xx series SPI flash
> but relies on libraries not normally available in arm-linux
> compilers so you may have to compile it using arm-newlib, IAR or
> Keil.

That's interesting. I'll keep that in mind.

>>> Reason is the SAM-BA S/W which only knows how to erase the complete
>>> NAND flash.
>>
>> Oh, I fixed that ages ago.
>>
>> I added a few lines of code to the nand-flash, data-flash, and
>> serial-flash applets so that they can all erase a region of flash.
>
> Nice, how about sharing!

Sure. The changes to the applets can certainly be shared. I'll have
to check with management regarding my sam-ba client replacement. I
just double-checked, and the erase-region command has been added to
the nandflash and dataflash applets, but it never got added to the
serialflash (AT25xx) applet.

>>> If you plan to program the NAND flash using another method, then of
>>> course, use the NAND flash for everything except bootstrap.
>>
>> We'll initially use "SAM-BA" replacement program to program
>> prototypes. Then for production, the plan is to have the distributor
>> ship them with U-Boot preprogrammed so that we can use the TFTP
>> server in U-Boot to do the rest.
>
> Or, if you have an SD-Connector, you can boot from an SD-Card/
> external SPI flash which programs the internal flash.

That's also an option, but since we'll have to connect an Ethernet
cable anyway as part of the normal production test process, we want to
use Ethernet as the programming interface as well.

--
Grant Edwards grant.b.edwards Yow! Is a tattoo real, like
at a curb or a battleship?
gmail.com Or are we suffering in
Safeway?

Marc Jet

unread,

Sep 21, 2010, 1:35:50 PM9/21/10

to

> State of the art seems to be to use magic numbers for valid data, and
> destroy the ECC and/or magic numbers for blocks that are gone bad, so
> you can identify them later. That's the "bad block flag".

This seems to be "industry standard" from my experience as well. But
IMHO it's not a good solution to the problem.

Typical NAND datasheets do not specify the behaviour of bad blocks.
The approach you mention, relies on certain behaviour from the bad
blocks (e.g. ability to erase or overwrite). This is why I think it
is a bad approach.

Another approach is the following:

The chip is partitioned into data blocks and spare blocks. During
mount, all block headers are scanned in a specific order, e.g.
ascending order for data blocks, and descending order for spare
blocks.

Every data block contains a header which contains its physical block
number and a (cryptographical) hash signature. Blocks without valid
hash signature are considered bad or stale (e.g. powerfail during
erase). In the first pass, every data block that passes this test is
considered valid - until a spare block overrides it.

The spare block header contains its own physical block number, and the
physical data block number of the block it replaces, and a hash
signature as well. If a spare block exists for a data block, the data
block is degraded to "bad". No matter what the data block content has
claimed to be. Likewise if another valid spare block refers to the
same data block, it overrides the previously read spare block (thus
the block scanning order). After all, what we arbitrarily designated
to be "spare" blocks, could be bad blocks too..

This method is able to memorize any combination of up to N bad blocks,
no matter what the bad block behaviour is. Up to the collision
resistence of the hash algorithm, of course. You can achieve any
desired reliability by choosing the hash algorithm accordingly.

The key point to understand is that the bad block information should
be stored in the good blocks, not in the bad ones. The good blocks
are the ones that have their behaviour specified.

Paul

unread,

Sep 21, 2010, 3:26:52 PM9/21/10

to

In article <i7aj23$1qo$1...@reader1.panix.com>, inv...@invalid.invalid
says...

>
> On 2010-09-20, Ulf Samuelsson <nospa...@atmel.com> wrote:
> > 2010-09-20 20:10, Grant Edwards skrev:
>
> >> While the ROM bootloader supports the 25xx series "dataflash" parts we
> >> got sold, there is no support in the AT91 bootstrap, U-Boot, or Linux
> >> -- at least not that I could ever find. I asked about it on the AT91
> >> forum a few months back and got the usual response (IOW, none at all).
> >
> > It is hidden in a disused lavatory in the cellar marked:
> > "Beware of the Leopard".
>
> Ah! I just listened to Stephen Fry's auidiobook of THHGTTG last
> weekend while driving home from Chicago.

So you will also have discovered about

"There is no UP for rain to fall from, therefore rainfall of the
universe is none"

Let alone no sex...

Says he the geek with CDs of the original Radio series..

--
Paul Carpenter | pa...@pcserviceselectronics.co.uk
<http://www.pcserviceselectronics.co.uk/> PC Services
<http://www.pcserviceselectronics.co.uk/fonts/> Timing Diagram Font
<http://www.gnuh8.org.uk/> GNU H8 - compiler & Renesas H8/H8S/H8 Tiny
<http://www.badweb.org.uk/> For those web sites you hate

rickman

unread,

Sep 22, 2010, 12:09:06 PM9/22/10

to

On Sep 19, 10:06 am, David Brown

That's fine. But my point is that if the block is "bad" either you
can either set a bad block flag or the ECC value be be invalid when
the media is read. In either case you can flag it in your access
system (don't want to call it a file system) and not use that block
again until the next reboot. This only has a performance impact at
boot time. You don't have to *rely* on a bad block flag since that
can also be faulty. But it can be used in addition to detecting an
ECC error.

Rick

rickman

unread,

Sep 22, 2010, 12:22:00 PM9/22/10

to

On Sep 20, 1:01 pm, Stefan Reuther <stefan.n...@arcor.de> wrote:
> [I haven't got rickman's post.]
>
>
>
> David Brown wrote:
> > On 19/09/2010 05:09, rickman wrote:
>
> >> Why do you need a bad block flag? If the block has an ECC failure, it
> >> is bad and the OS will note that. You may have to read the block ECC
> >> the first time it fails, but after that it can be noted in the file
> >> system as not part of a file and not part of free space on the drive.
>
> How do you mark it "in the file system" if your file system is actually
> inside the NAND flash? Thought experiment: your bad block table is
> stored in a particular block. That block goes bad. Where do you mark
> that this block is now bad?

Sure, if that is your file system, then it doesn't work very well for
NAND flash does it? The bad sector 0 problem is one that hard drives
have to this day, don't they? Or maybe the internal controller can
remap that "invisibly" now that there are tons of embedded smarts in
them. But that is the point. If your device can't "fix" a bad block
in the lowest level of the file system on the drive, then it is
subject to failure. If on boot, the software does what it has to do
to recover the structure of the file system, then it will be robust.

> State of the art seems to be to use magic numbers for valid data, and
> destroy the ECC and/or magic numbers for blocks that are gone bad, so
> you can identify them later. That's the "bad block flag".

Yes and once you find a "bad block" the "access system" (I really
shouldn't call it a file system since you might not be working at that
level) will have to remember this block in memory, not on the drive.
Each time the system is booted, it will have to either read a valid
bad block table, or construct its own. I supposed that each time the
device needs a new block to write data it could search for a working
block. That would be a very primitive system as well as slow, but it
would work and would not require a bad block table.

BTW, I assume that in order to trust a block on a NAND drive each
write would need to be verified in some manner. Is that also included
in a NAND access system?

> > Failures can be intermittent - a partially failed bit could be read
> > correctly or incorrectly depending on the data stored, the temperature,
> > or the voltage. So if you see that you are getting failures, you make a
> > note of them and don't use that block again.
>
> From what I've seen, those temporary failed bits are still within the
> specs of the NAND flash as long as you're running the part within specs.
> However, when you're way out of spec (say, 30°C over limit), all hell
> breaks loose.

Not sure what you mean by "within the specs". Are you saying the spec
allows some level of intermittent failure on reads and/or writes? If
so, there is still some level of intermittent that would be outside
the spec and needs to be flagged as bad.

Rick

David Brown

unread,

Sep 22, 2010, 3:24:23 PM9/22/10

to

You can track bad blocks in all sorts of different ways. Some will
involve more work when the bad block is discovered, others will involve
more checking before using a block. But any file system, or "access
system" if you like, has to have some way of tracking whether a block is
in use or not. If you think of bad blocks as being in use in a special
file that can't be accessed normally, then you have got simple and
efficient bad block tracking (at least, it's as simple and efficient as
the rest of your file system).

Ulf Samuelsson

unread,

Sep 23, 2010, 3:00:27 AM9/23/10

to

There is a new bootstrap in the works which will merge
everything. The business unit like my idea of using
Kconfig, so I think they used my version as the base,
and they have extended it with all the new features
that Atmel wants supported like SPI flash and SD-Card boot.

>>>> Reason is the SAM-BA S/W which only knows how to erase the complete
>>>> NAND flash.
>>>
>>> Oh, I fixed that ages ago.
>>>
>>> I added a few lines of code to the nand-flash, data-flash, and
>>> serial-flash applets so that they can all erase a region of flash.
>>
>> Nice, how about sharing!
>
> Sure. The changes to the applets can certainly be shared. I'll have
> to check with management regarding my sam-ba client replacement. I
> just double-checked, and the erase-region command has been added to
> the nandflash and dataflash applets, but it never got added to the
> serialflash (AT25xx) applet.
>
>>>> If you plan to program the NAND flash using another method, then of
>>>> course, use the NAND flash for everything except bootstrap.
>>>
>>> We'll initially use "SAM-BA" replacement program to program
>>> prototypes. Then for production, the plan is to have the distributor
>>> ship them with U-Boot preprogrammed so that we can use the TFTP
>>> server in U-Boot to do the rest.
>>
>> Or, if you have an SD-Connector, you can boot from an SD-Card/
>> external SPI flash which programs the internal flash.
>
> That's also an option, but since we'll have to connect an Ethernet
> cable anyway as part of the normal production test process, we want to
> use Ethernet as the programming interface as well.
>

You can't use Ethernet, until you have programmed the board the first
time. There is no Ethernet support in the BootROM.

It will be much easier to insert an SD-Card & reset the board.
The code loaded from the SD-Card can be used to download
stuff over ethernet.

You can put images on a webserver that can be downloaded
to an SD-Card by end customers.

Grant Edwards

unread,

Sep 23, 2010, 10:25:16 AM9/23/10

to

On 2010-09-23, Ulf Samuelsson <nospa...@atmel.com> wrote:
> 2010-09-21 17:30, Grant Edwards skrev:

>>>> Then for production, the plan is to have the distributor ship them
>>>> with U-Boot preprogrammed so that we can use the TFTP server in
>>>> U-Boot to do the rest.
>>>
>>> Or, if you have an SD-Connector, you can boot from an SD-Card/
>>> external SPI flash which programs the internal flash.
>>
>> That's also an option, but since we'll have to connect an Ethernet
>> cable anyway as part of the normal production test process, we want
>> to use Ethernet as the programming interface as well.
>
> You can't use Ethernet, until you have programmed the board the first
> time. There is no Ethernet support in the BootROM.

Right. As I said, we will have the NAND flash distributor pre-program
the parts with the bootstrap and U-Boot. It allows our production
procedure to do everything via Ethernet.

> It will be much easier to insert an SD-Card & reset the board.

Not really. Even if our product had an externally accessible SD-Card
socket, it's simpler to use Ethernet for everything.

> The code loaded from the SD-Card can be used to download
> stuff over ethernet.
>
> You can put images on a webserver that can be downloaded
> to an SD-Card by end customers.

Our products almost never have access to the Internet.

--
Grant Edwards grant.b.edwards Yow! PIZZA!!
at
gmail.com

Ulf Samuelsson

unread,

Sep 23, 2010, 1:02:43 PM9/23/10

to

2010-09-23 16:25, Grant Edwards skrev:
> On 2010-09-23, Ulf Samuelsson<nospa...@atmel.com> wrote:
>> 2010-09-21 17:30, Grant Edwards skrev:
>
>>>>> Then for production, the plan is to have the distributor ship them
>>>>> with U-Boot preprogrammed so that we can use the TFTP server in
>>>>> U-Boot to do the rest.
>>>>
>>>> Or, if you have an SD-Connector, you can boot from an SD-Card/
>>>> external SPI flash which programs the internal flash.
>>>
>>> That's also an option, but since we'll have to connect an Ethernet
>>> cable anyway as part of the normal production test process, we want
>>> to use Ethernet as the programming interface as well.
>>
>> You can't use Ethernet, until you have programmed the board the first
>> time. There is no Ethernet support in the BootROM.
>
> Right. As I said, we will have the NAND flash distributor pre-program
> the parts with the bootstrap and U-Boot. It allows our production
> procedure to do everything via Ethernet.
>

U-Boot needs to be setup to connect to a host, but maybe DHCP
can resolve that.

One issue with this method is if your image is large than the
onboard SDRAM/DDR-II.

It is more difficult to download and program a partial file.
Probably you have to split the file into several files in the host,
This may be problematic with advanced file systems like JFFS2.

The SD-Card approach allows you to program large amounts of flash.
You have the same problem with limited RAM space, but you
can at least run an application which reads the SD-Card
and writes to the flash.

>> It will be much easier to insert an SD-Card& reset the board.

>
> Not really. Even if our product had an externally accessible SD-Card
> socket, it's simpler to use Ethernet for everything.

If you don't have an SD-Card socket, then that is a problem of course.

Some people use the primary SPI boot option for an ISP connection
to external H/W, and the secondary SPI boot option is used
for onboard dataflash.

>
>> The code loaded from the SD-Card can be used to download
>> stuff over ethernet.
>>
>> You can put images on a webserver that can be downloaded
>> to an SD-Card by end customers.
>
> Our products almost never have access to the Internet.
>

The intention was that you can publish a support pack on internet.
A customer then manually downloads the support pack files and stores
them on an SD-Card. By inserting the SD-Card into the system
it is upgraded.
Depending on Bootorder, you may have to kill the erase the first sector
of any flash containing a bootable image.

Grant Edwards

unread,

Sep 23, 2010, 2:38:33 PM9/23/10

to

On 2010-09-23, Ulf Samuelsson <nospa...@atmel.com> wrote:
> 2010-09-23 16:25, Grant Edwards skrev:
>> On 2010-09-23, Ulf Samuelsson<nospa...@atmel.com> wrote:
>>> 2010-09-21 17:30, Grant Edwards skrev:
>>
>>>>>> Then for production, the plan is to have the distributor ship them
>>>>>> with U-Boot preprogrammed so that we can use the TFTP server in
>>>>>> U-Boot to do the rest.
>>>>>
>>>>> Or, if you have an SD-Connector, you can boot from an SD-Card/
>>>>> external SPI flash which programs the internal flash.
>>>>
>>>> That's also an option, but since we'll have to connect an Ethernet
>>>> cable anyway as part of the normal production test process, we want
>>>> to use Ethernet as the programming interface as well.
>>>
>>> You can't use Ethernet, until you have programmed the board the first
>>> time. There is no Ethernet support in the BootROM.
>>
>> Right. As I said, we will have the NAND flash distributor
>> pre-program the parts with the bootstrap and U-Boot. It allows our
>> production procedure to do everything via Ethernet.
>
> U-Boot needs to be setup to connect to a host, but maybe DHCP can
> resolve that.

As you imply a TFTP _client_ in U-Boot is less than optimal for this
usage.

My U-Boot has a TFTP server. U-Boot doesn't have to be configured to
connect to anything. It just has to be configured to listen for TFTP
commands for a second or two before it attempts to boot a Linux image.
I usually also configure it to listen indefinitely for TFTP commands
in the case where the Linux image is missing or corrupt.

Despite Hr. Denx's strenuous assertions that a TFTP server is useless
and wrong and unwanted in U-Boot, I find it to be very useful indeed.

> One issue with this method is if your image is large than the onboard
> SDRAM/DDR-II.

The TFTP server doesn't write the image to RAM. It writes it to flash
(if that's what it's been told to do by the TFTP client).

Of course the received data has to be buffered in at least
write-page-sized chunks, but that's usually 4K or less for all the
NAND flashes I've seen.

I actually found it more convenient to buffer erase-block-size chunks
(128KB in my case), but that's still no problem on most ARM9 platforms
running U-Boot.

> It is more difficult to download and program a partial file.

Indded it would be. That's why I don't. :)

--
Grant Edwards grant.b.edwards Yow! Hello, GORRY-O!!
at I'm a GENIUS from HARVARD!!
gmail.com