A bit resistant to disruption

Francois Grieu

unread,

Feb 7, 2010, 12:51:25 PM2/7/10

to

/*
Disclaimer: This is a hard problem, only minimally disguised into
something on-topic for comp.lang.c

You are programming a C99 conforming freestanding implementation.
Disruption (reset or power off/on cycle) can occur at *ANY* time without
advance warning, and stops whatever operation is underway. The program
is re-executed from main() after a disruption.

The objective is to implement a single bit resistant to disruption,
accessed using two procedures to be written:
int bRead(void); // Return 0 or 1 according to bit state.
void bToggle(void); // Change the state of the bit.

These two properties shall be met:
a) The bit is stable in the absence of Toggle. That is, the result given
by any two undisrupted bRead() is the same unless bToggle() was invoked
in-between.
b) The bit is changed by Toggle. That is, the result given by any two
undisrupted bRead() is different if bToggle() was invoked exactly once
in-between and no disruption occurred during that execution of bToggle().

The only way to store information across disruptions is using a number
of independent physical EEPROM cells, designated by index j. Access to
the EEPROM cells is by the following three functions. The properties
stated for theses functions apply for any fixed non-negative integer j
less than the number of cells, and "for that cell" is implied everywhere.

// Read a designated EEPROM cell; always returns 0 or 1.
extern int eRead(int j);

// Erase a designated EEPROM cell.
// After undisrupted eErase, eRead returns 0 until eProgram is invoked.
extern void eErase(int j);

// Program a designated *ERASED* EEPROM cell.
// After undisrupted eProgram, eRead returns 1 until eErase is invoked.
// Programming a cell that is not erased cause undefined behavior.
extern void eProgram(int j);

An EEPROM cell can be left neither erased nor programmed by a disrupted
eErase unless that cell was already erased, or by a disrupted eProgram.
For a cell in such halfway state, eRead returns a value specified only
to be 0 or 1. That otherwise unspecified value may in particular change
after disruption (due e.g. to uncontrollable random noise and/or
temperature and/or voltage drift in the hardware used by eRead), even
though the sate of cells is unchanged by disruption or/and eRead.

Note: if a cell is not erased (left in a halfway state or programmed),
an eProgram of that cell is prohibited. A workaround is to use the
following rather than eProgram():
void mySafeProgram(int j)
{
eErase(j);
eProgram(j);
}

Before the first run, all EEPROM cells have been erased. eRead, eErase,
and eProgram terminate unless a disruption occurs, assuming constraints
on j and eProgram are met. bRead and bToggle must similarly terminate
unless a disruption occurs.

Can it be done? If yes, what is the minimum number of cells needed? Give
a proof.

Depending on the answer, try to minimize the expected failure rate or
maximize the expected LED flash rate in the following use case:
*/

#define NCELLS 10 // number of cells

extern void sLed(int x); // 0: LED on green then off
// 1: LED on red then off.
extern int eRead(int j); // Read a cell.
extern void eErase(int j); // Erase a cell to 0.
extern void eProgram(int j); // Program an erased cell to 1.

int bRead(void); // Return 0 or 1 according to bit state.
void bToggle(void); // Change the state of the bit.

// Declarations and code for bRead/bToggle go here

int main(void)
{
int vState;
vState = bRead();
while(1)
{
sLed(vState);
bToggle();
vState = !vState;
}
return 0;
}

/*
Assume eProgram, eErase, sLed each last 1 second; everything else is
comparatively instant; a disruption occurs with odds p each second;
unspecified eRead values are assigned at random with mean 0.5; the LED
is off following disruption; a failure is a violation of the
requirement: if a disruption occurs with the LED on in some color, then
the first time the program lights a LED it must be with that color.

Notice that disruption without warning is the standard under which most
real-life hardware actually operates (with the exception of systems with
physical protection against adversarial reset and a power reserve with
means to sense imminent power loss), even though this is often left out
of the specification. And that real disruption-surviving media (EEPROM,
Flash, CDRW, though not battery-backed SRAM) do have the limitation that
read after interrupted change of state sometime gives an unstable
result. While "programming a cell that is not erased cause undefined
behavior" is not the rule, it can exist due to a poorly specified API,
an overcautious hardware specification, or in some cases due to tue
hardware limitations.

The problem is derived from actual requirements in a Smart Card. This is
neither homework, nor an attempt to gather brainpower for free: I'm the
author of the problem and have conclusively determined the feasibility,
with a concise proof. A similar problem "A bit in bottles" is in
rec.puzzles, some of the replies contain answers, and I committed mine.
http://groups.google.com/group/rec.puzzles/browse_thread/thread/ab892b3cf88af0de

Francois Grieu
*/

Message has been deleted

Kaz Kylheku

unread,

Feb 7, 2010, 1:43:01 PM2/7/10

to

On 2010-02-07, Francois Grieu <fgr...@gmail.com> wrote:
> The objective is to implement a single bit resistant to disruption,
> accessed using two procedures to be written:
> int bRead(void); // Return 0 or 1 according to bit state.
> void bToggle(void); // Change the state of the bit.

Note that the power can go out at any time while bToggle
is executing, creating an obvious ambiguity: did this outage occur
before or after the bit was toggled?

What if the power goes out while the the expression bToggle() is
evaluating, but before the call has taken place? What if the power
outage occurs just after bToggle has been called, and is returning?

In the event of a system disruption such as a power loss, the best that
we can ensure is that the data is /consistent/. There is no way to
ensure that the data has the absolutely most recent value.

For instance, a database transaction to update some records either
did happen or did not happen; the database is not in some in-between
inconsistent or corrupt state.

A single bit cannot ever be inconsistent!

Francois Grieu

unread,

Feb 7, 2010, 2:25:13 PM2/7/10

to

Stefan Ram a �crit :

> Francois Grieu <fgr...@gmail.com> writes:
>> The objective is to implement a single bit resistant to disruption,
>> accessed using two procedures to be written:
>> int bRead(void); // Return 0 or 1 according to bit state.
>> void bToggle(void); // Change the state of the bit.
>

> This is a complete sentence defining the �objective�, which
> actually does not allow for any further requirements to be
> listed later. Otherwise it would need to include a clause such as:
>
> �... to implement a single bit according to requirements
> given below.�.

These requirements are properties a) b). I stand corrected.

> For example, if I would write �The objective /is/ to
> calculate the sum of 2 and 3.� The structure of this
> sentence means the objective is defined once and for all.
> I can not add any requirements (�propertier�) to be met later.
> Otherwise, I would need to write �One /part/ of the objective
> is to calculate the sum of 2 and 3.� or �The objective is to
> calculate the sum of 2 and 3 fulfilling the following properties.�
> or so.
>
> But the sentence cannot be understood as it is written, because
> what it means for a bit to be �resistant to disruption� is unknown.
> (At least to me, that is.)

>
>> These two properties shall be met:
>> a) The bit is stable in the absence of Toggle. That is, the result given
>> by any two undisrupted bRead() is the same unless bToggle() was invoked
>> in-between.
>> b) The bit is changed by Toggle. That is, the result given by any two
>> undisrupted bRead() is different if bToggle() was invoked exactly once
>> in-between and no disruption occurred during that execution of bToggle().

> ��
> I thought the hard case was a disruption during the execution of bToggle().

No. If there is a disruption during bToggle, what the next undisrupted
bRead() returns is left unspecified.

> If I can assume that it is not being interrupted, why can't I then
> implement bToggle in the most obvious way?

Implement bToggle as you wish. The difficult requirement is a), given
that eRead may return unstable data.

Francois Grieu

Message has been deleted

Andrew Poelstra

unread,

Feb 7, 2010, 2:44:54 PM2/7/10

to

On 2010-02-07, Francois Grieu <fgr...@gmail.com> wrote:
>

> The objective is to implement a single bit resistant to disruption,
>

As has been said, a single bit cannot be "inconsistent" in the
sense that a database or complex structure might be. If you want
it to hold across power failures, this is likely a hardware issue.

bit ^= 1;

is likely to be a single instruction on your hardware so if you
#define Toggle() as such, there's nothing more that can be done
via software.

Francois Grieu

unread,

Feb 7, 2010, 3:12:21 PM2/7/10

to

Kaz Kylheku wrote:
> On 2010-02-07, Francois Grieu <fgr...@gmail.com> wrote:
>> The objective is to implement a single bit resistant to disruption,
>> accessed using two procedures to be written:
>> int bRead(void); // Return 0 or 1 according to bit state.
>> void bToggle(void); // Change the state of the bit.
>
> Note that the power can go out at any time while bToggle
> is executing, creating an obvious ambiguity: did this outage occur
> before or after the bit was toggled?

This is why after bToggle() is disrupted, the result of the next
undisrupted bRead() is left unspecified.

> What if the power goes out while the the expression bToggle() is
> evaluating, but before the call has taken place? What if the power
> outage occurs just after bToggle has been called, and is returning?

Count that as disrupted bToggle().

If you want a "falsifiable" test, consider the use case given. Any
correct bRead/bToggle pass this test. Any bRead/bToggle that pass this
test can be turned into a correct bRead/bToggle2 by having bToggle2
invoke bRead and discard the result then invoke bToggle.

> In the event of a system disruption such as a power loss, the best that
> we can ensure is that the data is /consistent/. There is no way to
> ensure that the data has the absolutely most recent value.

Agreed.

> For instance, a database transaction to update some records either
> did happen or did not happen; the database is not in some in-between
> inconsistent or corrupt state.

Agreed.

> A single bit cannot ever be inconsistent!

With actual physical EEPROM or Flash, after a change of a cell has been
disrupted, two readings without a change are not guaranteed to return
the same value. This is because a cell is read by comparing the charge
in the cell to a reference value, with some uncertainty, and that charge
only changes slowly (in the order of a millisecond, and a power loss, at
least a deliberate one as in short circuit, occurs faster than that).
Note this is one area where SRAM + battery backup is better (positive
feedback between the two sides of a SRAM cell makes it inherently
stable, or at least makes metastability plain unobservable). All this
stuff is very often overlooked, and gives rise to all kind of subtle and
hard to track issues, especially when temperature changes and when
disruption is frequent or/and adversarial (which must be assumed in
Smart Cards, especially when remotely powered).

One usual solution is that after reading a bit as erased, you erase it
again; and after reading it as programmed, you program it again
(positive feedback with software); so that next time you read, you'll
get a stable value. A relatively easy (I would not say trivial) tweak
using a second bit avoids premature wear out after things have
stabilized. You then have a "bit" in the usual sense and can use it to
perform e.g. atomic database transaction using classical algorithms.

But in the present problem things are made complicated by prohibition of
over-program of cells.

Francois Grieu

unread,

Feb 7, 2010, 3:17:42 PM2/7/10

to

Stefan Ram a �crit :
> Francois Grieu <fgr...@gmail.com> writes:
>

> Given:
>
>> extern int eRead(int j);
>> extern void eErase(int j);
>> extern void eProgram(int j);
>> void mySafeProgram(int j)
>
> Wanted:

>
>> the result given by any two undisrupted bRead() is the same
>> unless bToggle() was invoked in-between.
>

> So, what about the following implementation?
>
> int bRead( int const j )
> { int const value = eRead( j );
> if( value )mySafeProgram( j ); else eErase( j );
> return value; }
>

Assume bRead() just has returned 1.
A disruption occurs.
bRead() is called again.
A disruption occurs during that bRead(), while it is doing
mySafeProgram(), sometime during eErase() or eProgram(); say, in between
to simplify.
bRead() is called again without disruption, and returns 0.

Property a) is violated, regardless of how you implement bToggle, which
you did not specify BTW.

Francois Grieu

unread,

Feb 7, 2010, 3:24:31 PM2/7/10

to

Andrew Poelstra wrote :

> As has been said, a single bit cannot be "inconsistent" in the
> sense that a database or complex structure might be.

A bit, agreed. The state of a cell, disagreed. See my answer in the
sub-thread that you mention.

If you want a useful model of an EEPROM cell and don't want to dive into
how hardware works, think of a bottle that can only be emptied or filled
slowly, and is read by grossly weighting it on a scale giving only
light/heavy reading. This is how classic EEPROM and Flash work
(multilevel Flash store e.g. 2 bits per cell with a scale having 4
readings instead of 2, we don't consider that).

(snip)
Francois Grieu

Andrew Poelstra

unread,

Feb 7, 2010, 4:15:53 PM2/7/10

to

On 2010-02-07, Francois Grieu <fgr...@gmail.com> wrote:

I know how Flash and EEPROM work, on a basic level - which is why
I think you need a hardware solution. You can buffer and checksum
to validate the data as best you can, but in the end if you lose
power in the middle of a flash write you may very well be screwed.

Message has been deleted

Francois Grieu

unread,

Feb 7, 2010, 4:28:32 PM2/7/10

to

I count that as one "I think the answer is: impossible". All the persons
that I know have solved the problem (that's three including me) seem to
have changed their mind at least once.

Francois Grieu

unread,

Feb 7, 2010, 4:34:49 PM2/7/10

to

Stefan Ram wrote:

> Francois Grieu <fgr...@gmail.com> writes:
>> A disruption occurs during that bRead(), while it is doing
>

> I do not see how to implement a solution, but by storing
> an error detecting code into more than one bit for each
> of the values for ON or OFF one could at least /detect/ an
> indefinite bit with a certain probability.

Now we are moving into the right direction!

Hints: try to prove impossibility. Any hole in your proof may lead to a
solution. That, or prove a procedure, any hole...

Francois Grieu

Moi

unread,

Feb 7, 2010, 4:46:35 PM2/7/10

to

First shot: two hits in my cerebral hash-table
1) alternating bit-protocol (difficult with only one bit at a time ...)
2) Leslie Lamport's reading <--> writing in opposite directions.

NB, I like the problem (though it is not strictly c.l.c stuff ;-)

AvK

Francois Grieu

unread,

Feb 8, 2010, 1:26:45 AM2/8/10

to

Moi wrote :

>> Can it be done? If yes, what is the minimum number of cells needed? Give
>> a proof.
>
> First shot: two hits in my cerebral hash-table
> 1) alternating bit-protocol (difficult with only one bit at a time ...)

Nice. However in the present problem there are no errors, only those
pesky disruptions.

> 2) Leslie Lamport's reading <--> writing in opposite directions.

When I wrote "hard problem", I did not mean quite that hard...

> NB, I like the problem (though it is not strictly c.l.c stuff ;-)

At least "freestanding implementation" is on topic and exactly describes
the context: digital computer, no error, single process, all variables
lost on disruption, no standard library for persistent storage. And
c.l.c may be one of the best place to reach programmers facing that
issue (minus perhaps the prohibition of over-program, which is what
makes the problem challenging).

Francois Grieu

Moi

unread,

Feb 8, 2010, 5:21:56 PM2/8/10

to

On Mon, 08 Feb 2010 07:26:45 +0100, Francois Grieu wrote:

> Moi wrote :
>
>>> Can it be done? If yes, what is the minimum number of cells needed?
>>> Give a proof.
>>
>> First shot: two hits in my cerebral hash-table 1) alternating
>> bit-protocol (difficult with only one bit at a time ...)
>
> Nice. However in the present problem there are no errors, only those
> pesky disruptions.
>
>> 2) Leslie Lamport's reading <--> writing in opposite directions.
>
> When I wrote "hard problem", I did not mean quite that hard...

I think that your failed eeprom_reset_bit() and eeprom_set_bit()
operations are more or less equivalent to Lamport's "sending messages to another process".
Messages can be lost in transit, your eeprom-operations can fail
because of a crash.
The only difference is that in the eeprom-case there is a sort of tri-state
logic; in the case of lost messages the domain stays restricted
to just two cases.

If I understood the problem correct:
1) the machine can be expected to crash *any moment*, so even while "in recovery mode"
2) an undefined cell (either rewritten or partially cleared) produces a random
value; subsequent reads may yield different values.

My current approach uses a state machine with a kind of graycodes
(maybe necklaces: http://www.theory.csc.uvic.ca/~cos/inf/neck/NecklaceInfo.html ; I am not
sure yet how this should be named)
The guiding idea is that in any state there is a maximal number of
bits that could be undefined (tri-state): the bits that could have been affected
by the previous or next state in the graph.
While recovering this number is two, in "normal operation" it may perhaps be just one.

My guess is I need three or four bits / bottles.
... still working on it ...

AvK

Francois Grieu

unread,

Feb 9, 2010, 2:15:20 AM2/9/10

to

Moi wrote :

> On Mon, 08 Feb 2010 07:26:45 +0100, Francois Grieu wrote:
>
>> Moi wrote :
>>
>>>> Can it be done? If yes, what is the minimum number of cells needed?
>>>> Give a proof.
>>> First shot: two hits in my cerebral hash-table 1) alternating
>>> bit-protocol (difficult with only one bit at a time ...)
>> Nice. However in the present problem there are no errors, only those
>> pesky disruptions.
>>
>>> 2) Leslie Lamport's reading <--> writing in opposite directions.
>> When I wrote "hard problem", I did not mean quite that hard...
>
> I think that your failed eeprom_reset_bit() and eeprom_set_bit()
> operations are more or less equivalent to Lamport's "sending messages to another process".
> Messages can be lost in transit, your eeprom-operations can fail
> because of a crash.
> The only difference is that in the eeprom-case there is a sort of tri-state
> logic; in the case of lost messages the domain stays restricted
> to just two cases.

Yes I see your point. One redeeming difference is that things are
strictly sequential / singe-process in my case.

> If I understood the problem correct:
> 1) the machine can be expected to crash *any moment*, so even while "in recovery mode"

Yes. Notice this is very real-world.

> 2) an undefined cell (either rewritten or partially cleared) produces a random
> value; subsequent reads may yield different values.

Yes, with one exception: if a cell/bottle was erased/empty, and you
erase/empty it again while a disruption occurs, it remains erased/empty.
The makers of EEPROM will reluctantly accept to give this insurance,
which is self-evident with bottles. There is no equivalent in the other
direction, for you can't program/fill a cell/bottle unless it is fully
erased/empty [or so insisted the makers of my EEPROM].

> My current approach uses a state machine with a kind of graycodes
> (maybe necklaces:http://www.theory.csc.uvic.ca/~cos/inf/neck/NecklaceInfo.html ; I am not
> sure yet how this should be named)
> The guiding idea is that in any state there is a maximal number of
> bits that could be undefined (tri-state): the bits that could have been affected
> by the previous or next state in the graph.
> While recovering this number is two, in "normal operation" it may perhaps be just one.
>
> My guess is I need three or four bits / bottles.
> ... still working on it ...

You are the first poster with a proposition that I won't contradict, and
are so far untouched by the "at least one public change of mind" curse!

> AvK

Francois Grieu

Seebs

unread,

Feb 9, 2010, 2:42:02 AM2/9/10

to

On 2010-02-09, Francois Grieu <fgr...@gmail.com> wrote:
> You are the first poster with a proposition that I won't contradict, and
> are so far untouched by the "at least one public change of mind" curse!

The tricky part, so far as I can tell, is that we can't tell what a
partially-written bit might look like. On the other hand, in theory,
only one write can be interrupted, as I understand it.

Hmm.

Okay, here's an approach. Imagine without loss of generality that our
goal is to toggle the bit in cell 0, and that we have additional bits to
toggle. The bit is "T".

Let's see. Okay, when toggle() is called, we'll start by clearing a working
area. So we have some unspecified number of bits which is our working area,
call them W[0] ... W[N].

The first thing we need to do is be sure that we can recover the previous
value in the event of a disruption. So, necessarily, we need a bit to hold
the known-previous-value. Let's call that W0. But how do we know whether
W0 holds a bit we care about? We need a flag for whether W0 has been written.
Let's call that W1. So, read() should do something like:
if (W1)
return W0
else
return T

What about toggle()? If W1 is set, then it was interrupted during an
operation. It should try to extract the value from W0 and push it back
into T. Once it has done that, it can clear W1 (because it no longer
needs to be able to remember the value in W0), then W0. That reduces us
to the default state, which is that T holds the current bit and we wish
to toggle it.

Given that, we then:
* Copy T into W0
* Set W1
* Erase T
* if T was previously 0, program T
* erase W1
* erase W0

The rationale of this ordering is that, at every time:
* if W1 is set, W0 holds the correct value of T
* if W1 is unset, T is known to be correct

If we are interrupted while setting W1, we were interrupted before we modified
T, so that's not a problem -- T is still at its correct old value. If we
are interrupted while setting W0, we were interrupted before we set W1, so
that's not a problem -- T is still at its correct old value. If we are
interrupted during any of the erasing or programming of T, that's not a
problem -- W0 and W1 are set, so we'll pick up the right value next time
we come in. If we are interrupted during the erase of W1, that's fine. If
it gets erased, it's clear and T has the right value. If it doesn't get
erased, it's still set, and W0 still holds the correct "previous value",
so we'll be in one of the trustworthy intermediate states.

So I think that's it.

So!

// Read a designated EEPROM cell; always returns 0 or 1.
extern int eRead(int j);
// Erase a designated EEPROM cell.
// After undisrupted eErase, eRead returns 0 until eProgram is invoked.
extern void eErase(int j);
// Program a designated *ERASED* EEPROM cell.
// After undisrupted eProgram, eRead returns 1 until eErase is invoked.
// Programming a cell that is not erased cause undefined behavior.
extern void eProgram(int j);

/* default state:
* bit 0 holds the current Durable Bit
* bit 1 is undefined
* bit 2 is clear
*
* during a toggle:
* bit 2 is set
* bit 1 holds the current Durable Bit
* bit 0 is undefined.
*
*/

int
bRead(void) {
if (eRead(2)) {
return eRead(1);
} else {
return eRead(0);
}
}

void
bToggle(void) {
int t;
if (eRead(2)) {
t = eRead(1);
eErase(0);
if (t)
eProgram(0);
eErase(2);
}
t = eRead(0);
eErase(1);
if (t)
eProgram(1);
eProgram(2);
eErase(0);
if (!t)
eProgram(0);
eErase(2);
}

I realized on consideration that I don't need to finish this with
eErase(1), because bit 1 is undefined. Thus, if eErase(2) completes
at the end of bToggle(), the toggle is complete and bit 0 holds the
newly toggled bit. If it doesn't complete, 1 still holds the right
value.

Note that there exists at least one case (previous value was zero, and
a disruption occurred) where 0 is erased twice without having been
programmed.

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

Francois Grieu

unread,

Feb 9, 2010, 3:11:01 AM2/9/10

to

Seebs wrote:
> On 2010-02-09, Francois Grieu <fgr...@gmail.com> wrote:
>> You are the first poster with a proposition that I won't contradict, and
>> are so far untouched by the "at least one public change of mind" curse!
>
> The tricky part, so far as I can tell, is that we can't tell what a
> partially-written bit might look like. On the other hand, in theory,
> only one write can be interrupted, as I understand it.

Only one write can be interrupted. But a series of disruptions can lead
to multiple partially-written cells. See below.

> eProgram(0); // [B]
> eErase(2); // [A]

> }
> t = eRead(0);
> eErase(1);
> if (t)
> eProgram(1);
> eProgram(2);
> eErase(0);
> if (!t)
> eProgram(0);
> eErase(2);
> }
>
> I realized on consideration that I don't need to finish this with
> eErase(1), because bit 1 is undefined. Thus, if eErase(2) completes
> at the end of bToggle(), the toggle is complete and bit 0 holds the
> newly toggled bit. If it doesn't complete, 1 still holds the right
> value.
>
> Note that there exists at least one case (previous value was zero, and
> a disruption occurred) where 0 is erased twice without having been
> programmed.

This one is incorrect. Assume a disruption where I inserted [A]; now
cell 2 is uncertain. Assume cell 2 is read as 1 during the next bToggle,
and a disruption occurs at [B]; now cells 0 and 2 are uncertain, and
bRead can return 0 (if cells 2 and 0 are read as 0) or 1 (if cell 2 is
read as 0 and cell 0 is read as 1).

Hint: there's nothing to prevent bRead from invoking eErase or/and eProgram.

Francois Grieu

Seebs

unread,

Feb 9, 2010, 3:16:59 AM2/9/10

to

On 2010-02-09, Francois Grieu <fgr...@gmail.com> wrote:
>> void
>> bToggle(void) {
>> int t;
>> if (eRead(2)) {
>> t = eRead(1);
>> eErase(0);
>> if (t)
>> eProgram(0); // [B]
>> eErase(2); // [A]
>> }
>> t = eRead(0);
>> eErase(1);
>> if (t)
>> eProgram(1);
>> eProgram(2);
>> eErase(0);
>> if (!t)
>> eProgram(0);
>> eErase(2);
>> }

> This one is incorrect. Assume a disruption where I inserted [A]; now

> cell 2 is uncertain. Assume cell 2 is read as 1 during the next bToggle,
> and a disruption occurs at [B]; now cells 0 and 2 are uncertain, and
> bRead can return 0 (if cells 2 and 0 are read as 0) or 1 (if cell 2 is
> read as 0 and cell 0 is read as 1).

Okay, I misunderstood the spec. I thought that an erase necessarily
either worked or failed to work, and couldn't leave a cell indeterminate.
Re-reading the spec, I see that isn't correct. That does make this a
lot trickier.

> Hint: there's nothing to prevent bRead from invoking eErase or/and eProgram.

True. I'm not sure how it matters, though; a hypothetical user can just run
bToggle() over and over without ever hitting bRead(), so I can't usefully
rely on anything it does, I don't think. I'm assuming this has to work
in the general case, not just in the specific example case given.

I suspect this could be solved with another bit, but I'm too sleepy to figure
out how right now.

Francois Grieu

unread,

Feb 9, 2010, 4:10:44 AM2/9/10

to

Notice that after one (or several) disrupted bToggle(), the next
undisrupted bRead() can return anything. One attractive option is thus
that this undisrupted bRead() does cleanup so that the next undisrupted
bRead() will return the right value.

> I'm assuming this has to work
> in the general case, not just in the specific example case given.

Yes. But if you can make some bRead()/bToggle() that work in my use
case, it is easy to exhibit a fully correct bRead()/bToggle2(): just define

void bToggle2(void )
{
(void) bRead();
bToggle();
}

The proof is left as a relatively easy exercise to the reader.

> I suspect this could be solved with another bit, but I'm too sleepy to figure
> out how right now.

Good night..

Francois Grieu

Seebs

unread,

Feb 9, 2010, 4:28:12 AM2/9/10

to

On 2010-02-09, Francois Grieu <fgr...@gmail.com> wrote:
>> // Read a designated EEPROM cell; always returns 0 or 1.
>> extern int eRead(int j);
>> // Erase a designated EEPROM cell.
>> // After undisrupted eErase, eRead returns 0 until eProgram is invoked.
>> extern void eErase(int j);
>> // Program a designated *ERASED* EEPROM cell.
>> // After undisrupted eProgram, eRead returns 1 until eErase is invoked.
>> // Programming a cell that is not erased cause undefined behavior.
>> extern void eProgram(int j);

[re: my first attempt]

> This one is incorrect. Assume a disruption where I inserted [A]; now
> cell 2 is uncertain. Assume cell 2 is read as 1 during the next bToggle,
> and a disruption occurs at [B]; now cells 0 and 2 are uncertain, and
> bRead can return 0 (if cells 2 and 0 are read as 0) or 1 (if cell 2 is
> read as 0 and cell 0 is read as 1).
>
> Hint: there's nothing to prevent bRead from invoking eErase or/and eProgram.

I get it! The risk here is that if eRead() returns unreliable results,
future queries of the same bit may not yield the same results, resulting
in inconsistent flow through the program logic; in particular, attempts
might be made to modify things based on spurious data.

So.

int
bSureOf(int j) {
if (eRead(j)) {
eErase(j);
eProgram(j);
return 1;
} else {
eErase(j);
return 0;
}
}

int
bRead(void) {
if (bSureOf(2)) {

return eRead(1);
} else {
return eRead(0);
}
}

void
bToggle(void) {
int t;
if (bSureOf(2)) {

t = eRead(1);
eErase(0);
if (t)
eProgram(0);

eErase(2);

}
t = eRead(0);
eErase(1);
if (t)
eProgram(1);
eProgram(2);
eErase(0);
if (!t)
eProgram(0);
eErase(2);
}

We start with all bits clear. bSureOf(2) erases 2, just to be cautious,
but returns 0. We read a bit. We erase bit 1 (also clear at the moment,
but who cares), and don't program it. So far, nothing has been changed.
At this point, we start trying to write 2. If this is interrupted, two
possibilities exist:

1. The attempted read in the next hit of bSureOf() comes up 0; we
forcibly erase bit 2, and trust bit 0, which was correct.
2. The attempted read in the next hit of bSureOf() comes up 1; we
forcibly erase-and-write bit 2, and trust bit 1, which was correct.

Either way, if we get past any check on bit 2, we have a
correct and consistent state. Thus, eventually we come to the eErase(0),
which occurs only when bit 2 is definitely known to be set to 1. We now
try to set bit 0 to its new value. Once that happens, we try to clear
bit 2. If we succeed, all is well -- bit 2 is clear, bit 0 is the toggled
state. If we are interrupted, one of two things can happen on the next
read:

1. The attempted read in bSureOf(2) comes up 0; we forcibly erase
bit 2, and we continue to use the new, toggled, state of bit 0. Our
behavior is consistent.

2. The attempted read in bSureOf(2) comes up 1; we try to write bit
2, and if we succeed, we fall back on the not-yet-toggled state of bit
1. Since the toggle was interrupted, this behavior is acceptable.

3. The attempted read in bSureOf(2) comes up 1; we try to write bit
2, but are interrupted somewhere during it. It is still ambiguous whether
the toggle has succeeded, but no matter what happens, if bRead() returns,
we will have either toggled or failed to toggle the bit.

If bit 2 is set, then bit 1 has definitely been set to the most recently
returned value. (It is possible that we've since tried to store a toggled
value in bit 0, but if we were interrupted, we haven't returned it so we
don't need to show that new value yet.) If bit 2 is not set, then bit 0
definitely holds either the last returned value, or the most recently toggled
value. Either way, before we return, we commit to bit 2's perceived state.
If that is interrupted, that's okay -- eRead() doesn't return a value, because
it was disrupted, so we haven't returned a wrong value.

... I think.

Francois Grieu

unread,

Feb 9, 2010, 6:50:28 AM2/9/10

to

Seebs wrote:
> On 2010-02-09, Francois Grieu <fgr...@gmail.com> wrote:
>>> // Read a designated EEPROM cell; always returns 0 or 1.
>>> extern int eRead(int j);
>>> // Erase a designated EEPROM cell.
>>> // After undisrupted eErase, eRead returns 0 until eProgram is invoked.
>>> extern void eErase(int j);
>>> // Program a designated *ERASED* EEPROM cell.
>>> // After undisrupted eProgram, eRead returns 1 until eErase is invoked.
>>> // Programming a cell that is not erased cause undefined behavior.
>>> extern void eProgram(int j);
>
> [re: my first attempt]
>> This one is incorrect. Assume a disruption where I inserted [A]; now
>> cell 2 is uncertain. Assume cell 2 is read as 1 during the next bToggle,
>> and a disruption occurs at [B]; now cells 0 and 2 are uncertain, and
>> bRead can return 0 (if cells 2 and 0 are read as 0) or 1 (if cell 2 is
>> read as 0 and cell 0 is read as 1).
>>
>> Hint: there's nothing to prevent bRead from invoking eErase or/and eProgram.
>
> I get it! The risk here is that if eRead() returns unreliable results,
> future queries of the same bit may not yield the same results, resulting
> in inconsistent flow through the program logic; in particular, attempts
> might be made to modify things based on spurious data.

Yes. See below ;-)

> So.
>
> int
> bSureOf(int j) {
> if (eRead(j)) {
> eErase(j);
> eProgram(j);
> return 1;
> } else {
> eErase(j);
> return 0;
> }
> }
>
> int
> bRead(void) {
> if (bSureOf(2)) {
> return eRead(1);
> } else {
> return eRead(0);
> }
> }
>
> void
> bToggle(void) {
> int t;
> if (bSureOf(2)) {
> t = eRead(1);
> eErase(0);
> if (t)

> eProgram(0); // [B]

> eErase(2);
> }
> t = eRead(0);
> eErase(1);
> if (t)
> eProgram(1);
> eProgram(2);

> eErase(0); // [A]
> if (!t)
> eProgram(0);
> eErase(2);
> }

Assume that on a bToggle(), a disruption occurs where I added [A], tus
with cell 2 fully programmed. Assume the next disruption occurs in the
next bToggle(), which will thus take the branch where I added [B].
Assume this disruption is during the eProgram(0) on the left of [B],
leaving cell 0 undefined, and cell 2 fully programmed.
An undisrupted bRead now returns eRead(0), which is unspecified, and
leaves cell 2 fully programmed. Thus two undisrupted bRead() without a
bToggle() in between can return different results. Bust.

[snip]
Francois Grieu

Francois Grieu

unread,

Feb 9, 2010, 7:12:26 AM2/9/10

to

[I messed up my earlier reply, sorry; let me try again]

Seebs wrote :

> On 2010-02-09, Francois Grieu <fgr...@gmail.com> wrote:
>>> // Read a designated EEPROM cell; always returns 0 or 1.
>>> extern int eRead(int j);
>>> // Erase a designated EEPROM cell.
>>> // After undisrupted eErase, eRead returns 0 until eProgram is invoked.
>>> extern void eErase(int j);
>>> // Program a designated *ERASED* EEPROM cell.
>>> // After undisrupted eProgram, eRead returns 1 until eErase is invoked.
>>> // Programming a cell that is not erased cause undefined behavior.
>>> extern void eProgram(int j);
>
> [re: my first attempt]
>> This one is incorrect. Assume a disruption where I inserted [A]; now
>> cell 2 is uncertain. Assume cell 2 is read as 1 during the next bToggle,
>> and a disruption occurs at [B]; now cells 0 and 2 are uncertain, and
>> bRead can return 0 (if cells 2 and 0 are read as 0) or 1 (if cell 2 is
>> read as 0 and cell 0 is read as 1).
>>
>> Hint: there's nothing to prevent bRead from invoking eErase or/and eProgram.
>
> I get it! The risk here is that if eRead() returns unreliable results,
> future queries of the same bit may not yield the same results, resulting
> in inconsistent flow through the program logic; in particular, attempts
> might be made to modify things based on spurious data.

Yes. See below ;-)

> So.
>
> int
> bSureOf(int j) {
> if (eRead(j)) {

> eErase(j); // [C]

> eProgram(j);
> return 1;
> } else {
> eErase(j);
> return 0;
> }
> }
>
> int
> bRead(void) {
> if (bSureOf(2)) {
> return eRead(1);
> } else {
> return eRead(0);
> }
> }
>
> void
> bToggle(void) {
> int t;
> if (bSureOf(2)) {
> t = eRead(1);
> eErase(0);
> if (t)

> eProgram(0); // [B]

> eErase(2);
> }
> t = eRead(0);
> eErase(1);
> if (t)
> eProgram(1);
> eProgram(2);

> eErase(0); // [A]
> if (!t)
> eProgram(0);
> eErase(2);
> }
>

We get a fist disruption at [A]. Cell 2 is fully programmed. Next
disruption is when bToggle() is doing eProgram(0) on the left of [B].
Cell 0 is undefined. We then do bRead() which is disrupted during
bSureOf(2), after eErase(2) at [C]. Now cell 2 is erased, and an
undisrupted bRead() returns eRead(0), which is unspecified. Two
undisrupted bRead() without a bToggle() in betwwen can thus return
different values. Bust.

[snip]
Francois Grieu

Seebs

unread,

Feb 9, 2010, 9:10:00 AM2/9/10

to

On 2010-02-09, Francois Grieu <fgr...@gmail.com> wrote:
> [I messed up my earlier reply, sorry; let me try again]

No worries. I already found a bug in this one, which is that if we're
currently pointing at cell 1, and cell 1 and cell 0 are different, multiple
successive bRead() calls could inadvertantly flip back to cell 0 without
a toggle.

Hmm.

Does an eErase() of a cell which is already zero risk creating indeterminacy,
or does it stay zero regardless of disruptions?

Francois Grieu

unread,

Feb 9, 2010, 9:30:46 AM2/9/10

to

Seebs wrote:
> On 2010-02-09, Francois Grieu <fgr...@gmail.com> wrote:
>> [I messed up my earlier reply, sorry; let me try again]
>
> No worries. I already found a bug in this one, which is that if we're
> currently pointing at cell 1, and cell 1 and cell 0 are different, multiple
> successive bRead() calls could inadvertantly flip back to cell 0 without
> a toggle.
>
> Hmm.
>
> Does an eErase() of a cell which is already zero risk creating indeterminacy,

No. I wanted that to be implied by "After undisrupted eErase, eRead
returns 0 until eProgram is invoked." and confirmed by "An EEPROM cell

can be left neither erased nor programmed by a disrupted eErase unless

that cell was already erased".

> or does it stay zero regardless of disruptions?

I stays zero.

It took some effort to have the maker of the EEPROM assert that. I have
strong arguments towards the conjecture that the problem can't be solved
if this insurance is removed.

Francois Grieu

Seebs

unread,

Feb 9, 2010, 9:58:00 AM2/9/10

to

On 2010-02-09, Francois Grieu <fgr...@gmail.com> wrote:
>> Does an eErase() of a cell which is already zero risk creating indeterminacy,
>
> No. I wanted that to be implied by "After undisrupted eErase, eRead
> returns 0 until eProgram is invoked." and confirmed by "An EEPROM cell
> can be left neither erased nor programmed by a disrupted eErase unless
> that cell was already erased".
>
>> or does it stay zero regardless of disruptions?
>
> I stays zero.

Okay.

> It took some effort to have the maker of the EEPROM assert that. I have
> strong arguments towards the conjecture that the problem can't be solved
> if this insurance is removed.

I believe you are correct.

So, it seems like the key gimmick would be to have values where "0" is a
safe state, so if they look to be 0, you can set them to 0 unconditionally.

Hmm. This is pretty tricky. I am pretty sure it's doable, but everything
I've come up with so far ends up in a case where you can't tell which bit
is indeterminate, and trying to stabilize them risks leaving two bits
indeterminate.

Moi

unread,

Feb 10, 2010, 6:39:06 AM2/10/10

to

On Tue, 09 Feb 2010 08:15:20 +0100, Francois Grieu wrote:

> Moi wrote :
>> On Mon, 08 Feb 2010 07:26:45 +0100, Francois Grieu wrote:
>>
>
>> I think that your failed eeprom_reset_bit() and eeprom_set_bit()
>> operations are more or less equivalent to Lamport's "sending messages
>> to another process". Messages can be lost in transit, your
>> eeprom-operations can fail because of a crash.
>> The only difference is that in the eeprom-case there is a sort of
>> tri-state logic; in the case of lost messages the domain stays
>> restricted to just two cases.
>
> Yes I see your point. One redeeming difference is that things are
> strictly sequential / singe-process in my case.

Yes, but in fact, messages are the building blocks of any kind
of atomicity / serialisation. A message can be recieved (and processed)
or not.

IMHO your eeprom-casus is in essence a sychronisation problem; there
may be a difference (in state) between the eeprom and what the program
knows. Sending (lossy) messages to and from the eeprom helps the program to
adapt its views. (or impose its will on the poor eeprom ;-)
The fact that there is only one "writer" makes it simpler than concurrency
problems, but not basically different.

BTW is there an elegant way to emulate a faulty eeprom by e.g. a diskfile ?
something like:
{ write random value to cell; sync;
wait a second;
write good value to cell; sync;
}??

(with the exception of erasing an empty cell, which should always be a no-op)

BTW, I think I need 3 bits to store one non-lossy bit.
AvK

Francois Grieu

unread,

Feb 10, 2010, 7:28:35 AM2/10/10

to

Moi wrote:
> On Tue, 09 Feb 2010 08:15:20 +0100, Francois Grieu wrote:
>
>> Moi wrote :
>>> On Mon, 08 Feb 2010 07:26:45 +0100, Francois Grieu wrote:
>>>
>>> I think that your failed eeprom_reset_bit() and eeprom_set_bit()
>>> operations are more or less equivalent to Lamport's "sending messages
>>> to another process". Messages can be lost in transit, your
>>> eeprom-operations can fail because of a crash.
>>> The only difference is that in the eeprom-case there is a sort of
>>> tri-state logic; in the case of lost messages the domain stays
>>> restricted to just two cases.
>> Yes I see your point. One redeeming difference is that things are
>> strictly sequential / singe-process in my case.
>
> Yes, but in fact, messages are the building blocks of any kind
> of atomicity / serialisation. A message can be recieved (and processed)
> or not.
>
>
> IMHO your eeprom-casus is in essence a sychronisation problem; there
> may be a difference (in state) between the eeprom and what the program
> knows. Sending (lossy) messages to and from the eeprom helps the program to
> adapt its views. (or impose its will on the poor eeprom ;-)
> The fact that there is only one "writer" makes it simpler than concurrency
> problems, but not basically different.

Agreed.

>
> BTW is there an elegant way to emulate a faulty eeprom by e.g. a diskfile ?
> something like:
> { write random value to cell; sync;
> wait a second;
> write good value to cell; sync;
> }??

[OT] behavior of diskfiles under power disruption is notoriously
erratic; many hard disks have a write cache and some reorder writes..[/OT].

I'm thinking of a testbed program that does a randomized simulation,
using setjmp/longjmp to gracefully exit eProgram/eErase.

One issue is to zero the global or/and static variables on restart; for
now I only see these solutions:
- impose a special convention to the code under test;
- implement a C interpreter;
- assume things on the addresses of global or/and static variable;
- restart the test program using a script;
- restart the test program using system() from a supervising C program.

> (with the exception of erasing an empty cell, which should always be a no-op)
>
> BTW, I think I need 3 bits to store one non-lossy bit.

I come to the same conclusion. My best (hopefully) correct solution uses
at most 5 eErase() or eProgram() for the first bRead(), 2 eErase() or
eProgram() for the first subsequent bToggle(), 1 eErase or eProgram for
another subsequent bToggle().

Francois Grieu

Moi

unread,

Feb 10, 2010, 8:03:24 AM2/10/10

to

On Wed, 10 Feb 2010 13:28:35 +0100, Francois Grieu wrote:

> Moi wrote:

>>
>> BTW is there an elegant way to emulate a faulty eeprom by e.g. a
>> diskfile ? something like:
>> { write random value to cell; sync;
>> wait a second;
>> write good value to cell; sync;
>> }??
>
> [OT] behavior of diskfiles under power disruption is notoriously
> erratic; many hard disks have a write cache and some reorder
> writes..[/OT].

But, in this case it is just what we want!

>> BTW, I think I need 3 bits to store one non-lossy bit.
>
> I come to the same conclusion. My best (hopefully) correct solution uses
> at most 5 eErase() or eProgram() for the first bRead(), 2 eErase() or
> eProgram() for the first subsequent bToggle(), 1 eErase or eProgram for
> another subsequent bToggle().
>

Mine uses one bit_set() or bit_erase() per toggle.
initialisation (after possible crash) is a bit more expensive;
2 or 3 bitops.

BTW: can I rely on the program to crash (and be restarted) on a glitch,
or should the program test itself whether an operation succeeded ?

AvK

Francois Grieu

unread,

Feb 10, 2010, 9:19:09 AM2/10/10

to

Moi wrote:
> On Wed, 10 Feb 2010 13:28:35 +0100, Francois Grieu wrote:
>
>> Moi wrote:
>
>>> BTW is there an elegant way to emulate a faulty eeprom by e.g. a
>>> diskfile ? something like:
>>> { write random value to cell; sync;
>>> wait a second;
>>> write good value to cell; sync;
>>> }??
>> [OT] behavior of diskfiles under power disruption is notoriously
>> erratic; many hard disks have a write cache and some reorder
>> writes..[/OT].
>
> But, in this case it is just what we want!
>
>
>>> BTW, I think I need 3 bits to store one non-lossy bit.
>> I come to the same conclusion. My best (hopefully) correct solution uses
>> at most 5 eErase() or eProgram() for the first bRead(), 2 eErase() or
>> eProgram() for the first subsequent bToggle(), 1 eErase or eProgram for
>> another subsequent bToggle().
>>
>
> Mine uses one bit_set() or bit_erase() per toggle.
> initialisation (after possible crash) is a bit more expensive;
> 2 or 3 bitops.

I'll post mine soon.. after rechecking it!

> BTW: can I rely on the program to crash (and be restarted) on a glitch,

Yes.

> or should the program test itself whether an operation succeeded ?

Assume an error-check by eRead is built into eProgram and eErase.

Francois Grieu

Noob

unread,

Feb 10, 2010, 9:35:27 AM2/10/10

to

Francois Grieu wrote:

> With actual physical EEPROM or Flash, after a change of a cell has been
> disrupted, two readings without a change are not guaranteed to return
> the same value. This is because a cell is read by comparing the charge
> in the cell to a reference value, with some uncertainty, and that charge
> only changes slowly (in the order of a millisecond, and a power loss, at
> least a deliberate one as in short circuit, occurs faster than that).
> Note this is one area where SRAM + battery backup is better (positive
> feedback between the two sides of a SRAM cell makes it inherently
> stable, or at least makes metastability plain unobservable). All this
> stuff is very often overlooked, and gives rise to all kind of subtle and
> hard to track issues, especially when temperature changes and when
> disruption is frequent or/and adversarial (which must be assumed in
> Smart Cards, especially when remotely powered).

This reminds me of the way FPGABoy dumped the GBC Boot ROM :-)

http://www.fpgb.org/?page_id=17

Thad Smith

unread,

Feb 12, 2010, 11:49:21 PM2/12/10

to

Francois Grieu wrote:
> /*
> Disclaimer: This is a hard problem, only minimally disguised into
> something on-topic for comp.lang.c

comp.programming and comp.arch.embedded added to the list.

> You are programming a C99 conforming freestanding implementation.
> Disruption (reset or power off/on cycle) can occur at *ANY* time without
> advance warning, and stops whatever operation is underway. The program
> is re-executed from main() after a disruption.
>
> The objective is to implement a single bit resistant to disruption,
> accessed using two procedures to be written:
> int bRead(void); // Return 0 or 1 according to bit state.
> void bToggle(void); // Change the state of the bit.
>
> These two properties shall be met:
> a) The bit is stable in the absence of Toggle. That is, the result given
> by any two undisrupted bRead() is the same unless bToggle() was invoked
> in-between.
> b) The bit is changed by Toggle. That is, the result given by any two
> undisrupted bRead() is different if bToggle() was invoked exactly once
> in-between and no disruption occurred during that execution of bToggle().

...

> An EEPROM cell can be left neither erased nor programmed by a disrupted

> eErase unless that cell was already erased, or by a disrupted eProgram.
> For a cell in such halfway state, eRead returns a value specified only
> to be 0 or 1. That otherwise unspecified value may in particular change
> after disruption (due e.g. to uncontrollable random noise and/or
> temperature and/or voltage drift in the hardware used by eRead), even
> though the sate of cells is unchanged by disruption or/and eRead.
>
> Note: if a cell is not erased (left in a halfway state or programmed),
> an eProgram of that cell is prohibited. A workaround is to use the

First, this is an uninteresting problem because if a power failure occurs during
a toggle attempt, the restored state gives no information at all.

Consider the following alternate problem that guarantees to restore some
information.

Let a variable have 3 stable states s1, s2, and s3. A stable state is preserved
across power losses. Assume that the state always changes s1 to s2, s2 to s3,
and s3 to s1 (mod 3 counter). Since transitions are not instantaneous, there
are 3 transient states: t12, t23, and t31. How many bits, using your rules, are
required to, on power restoration, recover either the preceding stable state, or
if failure during transition, eith the preceding of succeeding stable state?
This explicitly prohibits restoring the third state.

This can be done with 3 independent bits:
B3B2B1
s1 0 0 1
t12 0 1 1
s2 0 1 0
t23 1 1 0
s3 1 0 0
t31 1 0 1

Transition from s1 to s2: write B2, then erase B1, etc.

Recovery from reset:
If in a stable state, erase all zero bits. This makes the stable state robust.
If in a transient state, erase the bit to revert to the previous stable state.

The only remaining problem is to getting a robust first state, since if the
power is lost when first attempting to write state s1 and the B1 is a weak 1, it
will not be restored. You thus need a robust initialization to the first state.

Proof: There is either one or two one bits set, since once two bits are set an
erasure precedes writing the other bit. If failure during a transition before
the written bit becomes readable, it is erased on recovery, making it stable.
If the transition state is read on restart, the new bit may be weak. It is
erased when the previous state is restored. If a failure occurs during a the
second step of the transition when the erasure was not complete, but intially
read as zero, the rule for zero erasure makes the new state robust.

This can be extended, of course, to a variable with additional states. All real
EEPROM that I am familiar with erases and writes more than one bit at a time, so
that would affect the design of an optimal algorithm for storing more information.

--
Thad

Daniel Giaimo

unread,

Feb 13, 2010, 12:15:47 AM2/13/10

to

I think there is a flaw here. Suppose you are moving from transition
state t12 to s2 when you are interrupted. In this case bit b1 will be
in a halfway state when you come back up. Suppose when you come back up
you read the state as t12. You then follow your procedure to revert to
state s1 by erasing b2. You are then interrupted again. When you come
back, you read all bits 0 because bit b1 was actually in a halfway
state.

--
Dan Giaimo

Moi

unread,

Feb 13, 2010, 7:49:51 AM2/13/10

to

I had exactly the same coding, and I am suffering exactly the same problems at recovery.

The thing that counts is, that (given that at any time *only one bit can be halfway*)
- for the "stable" patters with one 1 bit the 1 bit can be trusted, and one of the 0s may
actually be halfway
- for the "transient" patterns with two 1s, only the 0 can be trusted, and one
of the 1s may actually be halfway.

At recovery: in the t31 (101) case above, the intended state is s3 (100),
and the way to get there is obviously to erase the rightmost bit (followed
by an additional erase of the middle bit)
This will work fine if the rightmost bit was actually halfway, but
if the leftmost bit was halfway, the next state *may* become 000,
(which in my implementation is a "forbidden" uninitialized state,
which steps forward to s1 (001), which is not what we want.
I am a bit hesitating to set the middle bit, since that is the only trusted bit
in the t31 (101) state.

So the real problem becomes: there are two bits can be halfway, but not both,
and we don't know which one. To be sure we need to touch them both, in the
right order (avoiding unwanted states, which, after a crash in our recovery process,
might be picked up by a second recovery)

AvK

Thad Smith

unread,

Feb 13, 2010, 10:13:34 AM2/13/10

to

Daniel Giaimo wrote:

Good catch, Dan. Actually what I would really do is to violate the rule against
reprogramming. If it is read as t12 on powerup I would reprogram B2, then erase
B1. I use the rationale that it is a rare occurrance (power must be lost in
brief transition period) and I expect reprogramming to reduce the lifetime
cycles of the cell, rather than immediately kill it. Also, I would add more
transition states in eliminate erasures for stable states on powerup, since
erasures reduce cell life, as well.

--
Thad

Thad Smith

unread,

Feb 13, 2010, 7:16:29 PM2/13/10

to

Thad Smith wrote:
> Francois Grieu wrote:
>> /*
>> Disclaimer: This is a hard problem, only minimally disguised into
>> something on-topic for comp.lang.c
>
> comp.programming and comp.arch.embedded added to the list.
>
>> You are programming a C99 conforming freestanding implementation.
>> Disruption (reset or power off/on cycle) can occur at *ANY* time
>> without advance warning, and stops whatever operation is underway. The
>> program is re-executed from main() after a disruption.
>>
>> The objective is to implement a single bit resistant to disruption,
>> accessed using two procedures to be written:
>> int bRead(void); // Return 0 or 1 according to bit state.
>> void bToggle(void); // Change the state of the bit.
>>
>> These two properties shall be met:
>> a) The bit is stable in the absence of Toggle. That is, the result
>> given by any two undisrupted bRead() is the same unless bToggle() was
>> invoked in-between.
>> b) The bit is changed by Toggle. That is, the result given by any two
>> undisrupted bRead() is different if bToggle() was invoked exactly once
>> in-between and no disruption occurred during that execution of bToggle().

> ....

>> An EEPROM cell can be left neither erased nor programmed by a disrupted
>> eErase unless that cell was already erased, or by a disrupted eProgram.
>> For a cell in such halfway state, eRead returns a value specified only
>> to be 0 or 1. That otherwise unspecified value may in particular change
>> after disruption (due e.g. to uncontrollable random noise and/or
>> temperature and/or voltage drift in the hardware used by eRead), even
>> though the sate of cells is unchanged by disruption or/and eRead.
>>
>> Note: if a cell is not erased (left in a halfway state or programmed),
>> an eProgram of that cell is prohibited. A workaround is to use the
>
> First, this is an uninteresting problem because if a power failure
> occurs during a toggle attempt, the restored state gives no information
> at all.

I'm taking another stab at the problem, based on more analysis and an
understanding of possible use. Thanks to Daniel Giaimo for finding a problem in
the earlier proposal.

Let's assume the variable has three state categories: s1, s2, and t. s1 and s2
are stable states (will not change spontaneously). Other values are transition
values. If the variable is read in a state t, the variable is then changed to
s1 or s2.

This can be used to perform a consistent database state update:
Assume initially database copy A is consistent and dbstate=s1 (database A is
valid). To update, make an updated copy B. At this point both A and B are
consistent. Set dbstate=s2 (database B is valid). Move B to A. Set dbstate=s1.

If power is lost while dbstate is being changed, both database copies are valid,
so restoring the state to either stable state results in a consistent database.
We just need to make sure that s1 and s2 are always stable when an
modification is in progress on copy A or B.

In the following list of sequential states, each of the four bits are
independently erasable and writable (normally implemented in separate bytes).

Bit
dbstate dcba Stable bits
s1 0001 ca
t1a 0011 da
t1b 0111 db
t1c 0110 db
s2 0010 ba
t2a 1010 cb
t2b 1011 dc
t2c 1001 ca
(s1) 0001 ca

To go from state s1 to s2 or s2 to s1, we first re-erase the last bit erased,
then sequentially write the variable to the following transition and stable
states, changing a single bit at a time.

When reading dbstate, if value is s1 or s2, return the value.
If value is txa or txb, erase and rewrite the last bit set, then precede to the
next stable state. If value is txc, re-erase the last bit erased, then precede
to the next stable state.

If power fails when in a transition state, a reading of the state with the above
rules will restore dbstate to a stable value. Multiple interrupts during a
restoration can cause the state values to go from t1b to t1a to s1, but it
doesn't revert any further (similar for t2b to t2a to s2).

Assume power fails during the t1c-s2 or t2c-s1 transition such that the powerup
state reads s2 or s1 with a partially erased bit (d for s1, c for s2). If the
partially erased bit is read as zero, dbstate will be considered stable, even
though there is an unstable bit. The state may revert to the previous
transition state, which, when read, will be advanced to the same stable state.
Assume d is a weak zero in s1 when dbstate starts to advance. The first
operation on advance is to re-erase d, eliminating the weak state, then start
transition to s2.

--
Thad

Mark Borgerson

unread,

Feb 14, 2010, 12:28:33 AM2/14/10

to

In article <4b7740d3$0$65832$892e...@auth.newsreader.octanews.com>,
Thad...@acm.org says...

Wouldn't all this be simplified if you simply used a non-volatile
memory with a single-cycle read and write? MRAM and FRAM come to
mind. IIRC, they have mechanisms to prevent false writes at
power-down and don't suffer from any awkward in-between states.

Mark Borgerson

Thad Smith

unread,

Feb 14, 2010, 1:55:45 AM2/14/10

to

Mark Borgerson wrote:
> In article <4b7740d3$0$65832$892e...@auth.newsreader.octanews.com>,
> Thad...@acm.org says...

>> [Proposal for robust use of EEPROM through power outages]

> Wouldn't all this be simplified if you simply used a non-volatile
> memory with a single-cycle read and write? MRAM and FRAM come to
> mind. IIRC, they have mechanisms to prevent false writes at
> power-down and don't suffer from any awkward in-between states.

It should be, assuming that there is brownout detection. The OP is apparently
working with Smart Card hardware using EEPROM. Sometimes the hardware is dictated.

--
Thad

Francois Grieu

unread,

Feb 15, 2010, 3:54:56 PM2/15/10

to Thad Smith

Thad Smith wrote about the following problem [I added more notes, since
the problem has been partially misunderstood by some]

Disclaimer: This is a hard problem, only minimally disguised into
something on-topic for comp.lang.c

You are programming a C99 conforming freestanding implementation (e.g. a
microprocessor device programmed in C, but we can't use <stdio.h> or
otherwise store data on disk). Disruption (reset or power off/on cycle)

can occur at *ANY* time without advance warning, and stops whatever
operation is underway. The program is re-executed from main() after a
disruption.

The objective is to implement a single bit resistant to disruption,
accessed using two procedures to be written:
int bRead(void); // Return 0 or 1 according to bit state

void bToggle(void); // Change the state of the bit

and such that these two properties are met:

a) The bit is stable in the absence of Toggle. That is, the result given
by any two undisrupted bRead() is the same unless bToggle() was invoked
in-between.
b) The bit is changed by Toggle. That is, the result given by any two
undisrupted bRead() is different if bToggle() was invoked exactly once
in-between and no disruption occurred during that execution of bToggle().

Note: The bit is therefore unspecified from the invocation of bToggle()
until the end of this bToggle() if it is undisrupted, or otherwise until
the end of the next undisrupted bRead(). This is the only definition of
"bit resistant to disruption" worth of interest.

The only way to store information across disruptions is using a number
of independent physical EEPROM cells, designated by index j. Access to
the EEPROM cells is by the following three functions. The properties
stated for theses functions apply for any fixed non-negative integer j
less than the number of cells, and "for that cell" is implied everywhere.

// Read a designated EEPROM cell; always returns 0 or 1.
extern int eRead(int j);

// Erase a designated EEPROM cell.

// After undisrupted eErase, the cell is "erased" and eRead returns
// 0 until eProgram is invoked (regardless of invocation of eErase
// and disruptions)
extern void eErase(int j);

// Program a designated *ERASED* EEPROM cell.
// After undisrupted eProgram, eRead returns 1 until eErase is invoked

// (regardless of disruptions)

// Programming a cell that is not "erased" cause undefined behavior.
extern void eProgram(int j);

An EEPROM cell can be left neither erased nor programmed by a disrupted

eErase unless that cell was already erased, or by a disrupted eProgram.
For a cell in such halfway state, eRead returns a value specified only
to be 0 or 1. That otherwise unspecified value may in particular change
after disruption (due e.g. to uncontrollable random noise and/or
temperature and/or voltage drift in the hardware used by eRead), even
though the sate of cells is unchanged by disruption or/and eRead.

Note: if a cell is not erased (left in a halfway state or programmed),
an eProgram of that cell is prohibited. A workaround is to use the

following rather than eProgram():
void mySafeProgram(int j)
{
eErase(j);
eProgram(j);
}

Before the first run, all EEPROM cells have been erased. eRead, eErase,
and eProgram terminate unless a disruption occurs, assuming constraints
on j and eProgram are met. bRead and bToggle must similarly terminate
unless a disruption occurs.

Can it be done? If yes, what is the minimum number of cells needed? Give
a proof.

Depending on the answer, try to minimize the expected failure rate or
maximize the expected LED flash rate in the following use case:
*/

#define NCELLS 10 // number of cells

extern void sLed(int x); // 0: LED on green then off
// 1: LED on red then off.
extern int eRead(int j); // Read a cell.
extern void eErase(int j); // Erase a cell to 0.
extern void eProgram(int j); // Program an erased cell to 1.

int bRead(void); // Return 0 or 1 according to bit state.
void bToggle(void); // Change the state of the bit.

// Declarations, and the code for bRead/bToggle, go here

int main(void)
{
int vState;
vState = bRead();
while(1)
{
sLed(vState);
bToggle();
vState = !vState;
}
return 0;
}

/*
Assume eProgram, eErase, sLed each last 1 second; everything else is
comparatively instant; a disruption occurs with odds p each second;
unspecified eRead values are assigned at random with mean 0.5; the LED
is off following disruption; a failure is a violation of the
requirement: if a disruption occurs with the LED on in some color, then
the first time the program lights a LED it must be with that color.

Note: any bRead/bToggle conforming to a) b) works in this test program,
and any bRead/bToggle making this test program working can be changed
into bRead/bToggle2 conforming to a) b), by defining bToggle2 as:

void bToggle2(void)
{
(void)bRead();
bToggle();
}

Notice that disruption without warning is the standard under which most
real-life hardware actually operates (with the exception of systems with
physical protection against adversarial reset and a power reserve with
means to sense imminent power loss), even though this is often left out
of the specification. And that real disruption-surviving media (EEPROM,
Flash, CDRW, though not battery-backed SRAM) do have the limitation that
read after interrupted change of state sometime gives an unstable
result. While "programming a cell that is not erased cause undefined
behavior" is not the rule, it can exist due to a poorly specified API,
an overcautious hardware specification, or in some cases due to tue
hardware limitations.

The problem is derived from actual requirements in a Smart Card. This is
neither homework, nor an attempt to gather brainpower for free: I'm the
author of the problem and have conclusively determined the feasibility,
with a concise proof. A similar problem "A bit in bottles" is in
rec.puzzles, some of the replies contain answers, and I committed mine.
http://groups.google.com/group/rec.puzzles/browse_thread/thread/ab892b3cf88af0de

Francois Grieu

> First, this is an uninteresting problem because if a power failure
> occurs during a toggle attempt, the restored state gives no
> information at all.

Is the prefix "UN" intended? Thad Smith shows continued INterest, and
I'm grateful to him for that.

> pr*o*cede to the next stable state. If value is txc, re-erase the
> last bit erased, then procede to the next stable state.

Going from t2b to t2c, disruption occurs while we erase "b".
On restart, state is "10?1". We read it as "1011" that is t2b.
While we "erase the last bit set", that is "a", disruption occurs.
On restart, state is "10??". We read it as "1010" that is t2a.
[notice the "Stable bits: cb" condition is not met]
While we "erase the last bit set", that is "d", disruption occurs.
On restart, state is "?0??" [or maybe "?0?0" is we have been prudent
and erased "a" right after seeing it as 0].
We read that as "0010" and assume state s2.
Before we make any change/call to bToggle(), disruption occurs
and we read the same state as "0001" and assume state s1, this is
a violation of requirement a.
[is we have erased "a" we will read "0000" and I guess we
will next move to s1 and be equally busted]

> If power fails when in a transition state, a reading of the state
> with the above rules will restore dbstate to a stable value.

Not so, see above.

> Multiple interrupts during a restoration can cause the state values
> to go from t1b to t1a to s1, but it doesn't revert any further

Right, unless I err.

> (similar for t2b to t2a to s2).

I don't think so..

[snip]

I plan to post my answer by the end of the week.

Francois Grieu

Thad Smith

unread,

Feb 17, 2010, 9:54:52 AM2/17/10

to

After having two previous attempts shot down, I have two more proposals. Today
I am describing the "game" solution. The game solution is optimized for minimum
number of bits and fastest execution, ignoring power cycle wear factor, a
concern with real EEPROM in high power cycle uses.

State ba Powerup sequence
p1 00 p1, p2, s1
p2 01 p2, p1, p2, s1
s1 11 s1, p2, p1, p2, s1
s2 10 erase a (a=0)

On powerup, perform the indicated powerup sequence. Powerup in s2 erases a, all
others erase both bits and return to s1. With power on, toggle bit a to go from
s1 to s2. Going from s2 to s1 is a program without erase.

The toggle cost is one erase or program time.

--
Thad

Francois Grieu

unread,

Feb 17, 2010, 12:56:27 PM2/17/10

to Thad Smith

Thad Smith wrote :

>
> After having two previous attempts shot down, I have two more
> proposals. Today I am describing the "game" solution. The game
> solution is optimized for minimum number of bits and fastest execution,
> ignoring power cycle wear factor, a concern with real EEPROM in high
> power cycle uses.

Very right. However EEPROM wear is easy to solve with one extra cell,
at some cost in time.

> State ba Powerup sequence
> p1 00 p1, p2, s1
> p2 01 p2, p1, p2, s1
> s1 11 s1, p2, p1, p2, s1
> s2 10 erase a (a=0)
>
> On powerup, perform the indicated powerup sequence. Powerup in s2
> erases a, all others erase both bits and return to s1. With power on,
> toggle bit a to go from s1 to s2. Going from s2 to s1 is a program
> without erase.

Well, let's attack that..

We are moving from s1 to s2, thus erase a;
disruption occurs, cells are left 1?

On power-up we read 11, that is s1, and proceed to move to p2
that is erase b; disruption, cells are left ??

On power-up we read 10, and erase a (that's during bRead);
cells are now ?0; we believe we are in state s2 (the test
program lights in some color)

Disruption occurs outside bRead/bToggle (with the light on),
after bRead, thus with the bit supposedly stable.

On power-up we read 00, and proceed to s1 without disruption;
we believe we are in state s1 (the test program lights in
some other color). This breaks the stability rule, bust.

If there is a solution with two cells, I quit programming.

> The toggle cost is one erase or program time.

At the expense of correctness! I went down the same pitfall.
Someone on rec.puzzles broke my nearly-published version,
optimized for fastest execution [the only one I committed
with a hash :-( ]. Now I need time to recheck the baseline
3-cells solution, and it won't be as fast as I claimed.

I'll post my C version (or admit defeat on my own problem)
this week, promised.

Francois Grieu

Thad Smith

unread,

Feb 20, 2010, 1:24:37 AM2/20/10

to

Francois Grieu wrote:
> Thad Smith wrote :
>>
>> After having two previous attempts shot down, I have two more
>> proposals. Today I am describing the "game" solution.

...

>> State ba Powerup sequence
>> p1 00 p1, p2, s1
>> p2 01 p2, p1, p2, s1
>> s1 11 s1, p2, p1, p2, s1
>> s2 10 erase a (a=0)
>>
>> On powerup, perform the indicated powerup sequence. Powerup in s2
>> erases a, all others erase both bits and return to s1. With power on,
>> toggle bit a to go from s1 to s2. Going from s2 to s1 is a program
>> without erase.
>
> Well, let's attack that..
>
> We are moving from s1 to s2, thus erase a;
> disruption occurs, cells are left 1?
>
> On power-up we read 11, that is s1, and proceed to move to p2
> that is erase b; disruption, cells are left ??

...

> If there is a solution with two cells, I quit programming.

I'm batting 0 for 3 here. Here is another "game" trial solution, this time with
3 bits.

State cba Powerup sequence
p1 000 erase cba, s1
s1 001 (no change)
t1 101 erase b, s1, p1, s1
t2 100 erase ba, p1, s1
t3 110 erase a, t2, p1, s1
s2 010 erase c

s1 to s2: erase c, t1, t2, t3, s2.
s2 to s1: t3, t2, t1, s2.

State s1 has no change on powerup. If interrupted during p1-s1 or s1-t1 we may
read s1 for an arbitrary duration, then later read p1, t1, or t2, all which
transition to s1 on powerup or proceed to s1 with the given s1-s2 transition.
s2 must be refreshed on powerup by erasing c.

--
Thad

Francois Grieu

unread,

Feb 21, 2010, 4:25:31 AM2/21/10

to Thad Smith

Thad Smith wrote:

> I'm batting 0 for 3 here. Here is another "game" trial solution, this
> time with 3 bits.
>
> State cba Powerup sequence
> p1 000 erase cba, s1
> s1 001 (no change)
> t1 101 erase b, s1, p1, s1
> t2 100 erase ba, p1, s1
> t3 110 erase a, t2, p1, s1
> s2 010 erase c
>
>
> s1 to s2: erase c, t1, t2, t3, s2.
> s2 to s1: t3, t2, t1, s2.
>
> State s1 has no change on powerup. If interrupted during p1-s1 or s1-t1
> we may read s1 for an arbitrary duration, then later read p1, t1, or t2,
> all which transition to s1 on powerup or proceed to s1 with the given
> s1-s2 transition. s2 must be refreshed on powerup by erasing c.

At first I thought this could work; it would have been more efficient
than my solution by a mile. But on second look..

We are at the last step of moving from s1 to s2, disruption occurs
while we erase c; cells are [?10].
On power-up we get 110, and thus erase a, then attempt to proceed to
t2, disruption occurs while we erase b; cells are [??0].
On powerup we get 010, and thus erase c; we act as if we are at s2,
but cells are [0?0]. The test program turns the light in some color.
Disruption occurs.
On power-up we get 000, and proceed to s1. The test program turns the
light in the other color. Bust.

My solution is *nearly* ready to post.

Francois Grieu

Thad Smith

unread,

Mar 5, 2010, 9:59:00 PM3/5/10

to

Francois Grieu wrote:
> Thad Smith wrote :
>>
>> After having two previous attempts shot down, I have two more
>> proposals.

...

>> State ba Powerup sequence
>> p1 00 p1, p2, s1
>> p2 01 p2, p1, p2, s1
>> s1 11 s1, p2, p1, p2, s1
>> s2 10 erase a (a=0)
>>
>> On powerup, perform the indicated powerup sequence. Powerup in s2
>> erases a, all others erase both bits and return to s1. With power on,
>> toggle bit a to go from s1 to s2. Going from s2 to s1 is a program
>> without erase.
>
> Well, let's attack that..
>
> We are moving from s1 to s2, thus erase a;
> disruption occurs, cells are left 1?
>
> On power-up we read 11, that is s1, and proceed to move to p2
> that is erase b; disruption, cells are left ??
>
> On power-up we read 10, and erase a (that's during bRead);
> cells are now ?0; we believe we are in state s2 (the test
> program lights in some color)
>
> Disruption occurs outside bRead/bToggle (with the light on),
> after bRead, thus with the bit supposedly stable.
>
> On power-up we read 00, and proceed to s1 without disruption;
> we believe we are in state s1 (the test program lights in
> some other color). This breaks the stability rule, bust.
>
> If there is a solution with two cells, I quit programming.

It's been a while now. Here is another attempt at minimal bit implementation.
This is another "game" solution (powerup activity for both resting states). The
powerup operations can be done on the first call to read the non-volatile bit.

State ba Powerup sequence
s0 00 erase b,a
s1 01 erase b, p3, p2, p3, s1
p2 10 erase a, p3, s1
p3 11 s1, p3, p2, p3, p1

The resting states are s0 and s1.
Programmed transitions:
s0 to s1: s0, s1
s1 to s0: s1, s0

The programmed transitions are a bit erase or program on "a". After the powerup
sequence is done both bits are stable.

--
Thad

Moi

unread,

Mar 13, 2010, 12:04:45 PM3/13/10

to

#include <stdio.h>
#include <stdlib.h>

/* EEprom, 20100313, Avk
*
* Encoding.
* I use four bits. Two or three of these are ever set.
* The two 1-bits walk from right to left: a two-bit pattern
* grows a bit at it's left side; a three-bit pattern looses one
* 1-bit at the right side.
* A nice property of this encoding is, that for every
* 'allowed' two bit pattern, the two 1-bits can be trusted; after
* a crash+restart, only the 0-bits need to be re-reset.
* For the patterns with three 1-bits, the middle 1-bit plus
* the single 0-bit can be trusted; the outer two 1-bits are suspect.
*
* State table.
* "trusted" bits are shown as 0 and 1;
* "suspect" bits as . and ?;
* -a / +a := clear / set bit 'a'
* ... :+ continue with the normal state path.
*
* dcba dcba [ ... recovery path ... ]
* 0011 ..11 [Off] -c .011 -d 0011
* 0111 0?1? -a 0?10 -c 0010 +c 0110 ...
* 0110 .11. -a .110 -d 0110 ...
* 1110 ?1?0 -d 01?0 -b 0100 +b 0110 ...
* 1100 11.. [On] -a 11.0 -b 1100
* 1101 1?0? -a 1?00 -c 1000 +a 1001 ...
* 1001 1..1 -b 1.01 -c 1001 ...
* 1011 ?0?1 -b ?001 -d 0001 +b 0011
*/

#define COMBINE4(d,c,b,a) (8*(d)+4*(c)+2*(b)+1*(a))

static char *states[16] =
{ "0000" , "0001" , "0010" , "0011" , "0100" , "0101" , "0110" , "0111"
, "1000" , "1001" , "1010" , "1011" , "1100" , "1101" , "1110" , "1111" };

/* interface */
int restart(void);
int getstate(void);
int toggle(void);

/* primitives */
static unsigned askbit(unsigned pos);
static void bit_set(unsigned pos);
static void bit_clr(unsigned pos);

/* in core storage for the bits */
static unsigned char bits[4] = {0,0,0,0};
/* on disk storage for the bits */
static int bits_open(char *name);
static void bits_flush(void);
static unsigned getbits(void);

int main(int argc, char **argv)
{
unsigned state;
int rc, bit;
char line[100];

rc = bits_open("bitsfile" );
fprintf (stderr, "Open := %d\n", rc );

bit = restart();
fprintf (stderr, "Restart : Bit=%d Rawbits=%02x %02x %02x %02x\n"
, bit, bits[3], bits[2], bits[1], bits[0] );

while (fgets (line, sizeof line, stdin)) {
switch (line[0]) {
case 't':
bit = toggle();
fprintf(stderr, "toggle() Bit=%d\n", bit );
case '?':
state = getbits();
bit = getstate();
fprintf(stderr, "Getbits() = %s Bit=%d Rawbits=%02x %02x %02x %02x\n"
, states[state], bit, bits[3], bits[2], bits[1], bits[0] );
break;
case 'q': goto quit;
}
}
quit:
exit (0);
}

int restart(void)
{
unsigned state;

while (1) {
state = getbits();
fprintf(stderr, "Restart state = %s Rawbits=%02x %02x %02x %02x\n"
, states[state]
, bits[3], bits[2], bits[1], bits[0] );
switch (state) {
case COMBINE4(0,0,0,0): /* before initialization */
bit_clr(2);
bit_clr(1);
bit_clr(0);
bit_set(0);
bit_set(1);
bit_clr(3);
return getstate();
case COMBINE4(0,0,1,1):/* ..11 [Off] */
bit_clr(2);
bit_clr(3);
return getstate();
case COMBINE4(0,1,1,1): /* 0?1? [Rising] */
bit_clr(0); /* 0110 0010 */
bit_clr(2); /* 0010 */
bit_set(2); /* 0110 */
return getstate();
case COMBINE4(0,1,1,0): /* .11. [Rising] */
bit_clr(0); /* .110 */
bit_clr(3); /* 0110 */
return getstate();
case COMBINE4(1,1,1,0): /* ?1?. [Rising] */
bit_clr(3); /* 0100 0110 */
bit_clr(1); /* 0100 */
bit_set(1); /* 0110 */
return getstate();
case COMBINE4(1,1,0,0):/* 11.. [On] */
bit_clr(0); /* 1100 1110 */
bit_clr(1); /* 1100 */
return getstate();
case COMBINE4(1,1,0,1): /* 1?0? [Falling] */
bit_clr(2); /* 1000 1001 */
bit_clr(0); /* 1000 */
bit_set(0); /* 1001 */
return getstate();
case COMBINE4(1,0,0,1): /* 1..1 [Falling] */
bit_clr(1); /* 1.01 */
bit_clr(2); /* 1001 */
return getstate();
case COMBINE4(1,0,1,1): /* ?0?1 [Falling] */
bit_clr(3); /* 0011 0001 */
bit_clr(1); /* 0001 */
bit_set(1); /* 0011 */
return getstate();
/*** The states below can only occur if we have crashed during recovery */
case COMBINE4(0,0,0,1): /* .0.? */
bit_clr(2); /* .0.? */
bit_clr(3); /* 00.? */
bit_set(3); /* 10.? */
bit_clr(1); /* 100? */
bit_clr(0); /* 1000 */
bit_set(0); /* 1001 */
continue;
case COMBINE4(0,1,0,0): /* .?.0 */
bit_clr(0); /* .?.0 */
bit_clr(1); /* .?00 */
bit_set(1); /* .?10 */
bit_clr(3); /* 0?10 */
bit_clr(2); /* 0010 */
bit_set(2); /* 0110 */
continue;
case COMBINE4(0,0,1,0): /* 0.?. */
bit_clr(3); /* 0.?. */
bit_clr(0); /* 0.?0 */
bit_clr(2); /* 00?0 */
bit_set(2); /* 01?0 */
bit_clr(1); /* 0100 */
bit_set(1); /* 0110 */
continue;
case COMBINE4(1,0,0,0): /* ?.0. */
bit_clr(1); /* ?.0. */
bit_clr(2); /* ?00. */
bit_clr(0); /* ?000 */
bit_set(0); /* ?001 */
bit_clr(3); /* 0001 */
bit_set(3); /* 1001 */
continue;
case COMBINE4(1,0,1,0): bit_clr(3); bit_clr(2); continue;
case COMBINE4(0,1,0,1): bit_clr(0); bit_clr(3); continue;
case COMBINE4(1,1,1,1): bit_clr(0); bit_clr(2); continue;
default:
fprintf(stderr, "Unwanted state = %s in recovery Rawbits=%02x %02x %02x %02x\n"
, states[state]
, bits[3], bits[2], bits[1], bits[0] );
return -1;
}
}
return -1;
}

int getstate(void)
{
unsigned state;
state = getbits();

switch (state) {
case COMBINE4(1,1,0,1): bit_clr(2);
case COMBINE4(1,0,0,1): bit_set(1);
case COMBINE4(1,0,1,1): bit_clr(3);
case COMBINE4(0,0,1,1): return 0;
case COMBINE4(0,1,1,1): bit_clr(0);
case COMBINE4(0,1,1,0): bit_set(3);
case COMBINE4(1,1,1,0): bit_clr(1);
case COMBINE4(1,1,0,0): return 1;

case COMBINE4(0,0,0,0):
case COMBINE4(0,0,0,1):
case COMBINE4(0,0,1,0):
case COMBINE4(0,1,0,0):
case COMBINE4(1,0,0,0):
case COMBINE4(0,1,0,1):
case COMBINE4(1,0,1,0):
case COMBINE4(1,1,1,1):
default:
break;
}
return -1;
}

int toggle(void)
{
unsigned state;
state = getbits();

switch (state) {
case COMBINE4(0,0,1,1): bit_set(2);
case COMBINE4(0,1,1,1): bit_clr(0);
case COMBINE4(0,1,1,0): bit_set(3);
case COMBINE4(1,1,1,0): bit_clr(1);
return 1;
case COMBINE4(1,1,0,0): bit_set(0);
case COMBINE4(1,1,0,1): bit_clr(2);
case COMBINE4(1,0,0,1): bit_set(1);
case COMBINE4(1,0,1,1): bit_clr(3);
return 0;
default: break;
}
return -1;
}

/************************************************************
* Low-level functions to simulate an eeprom using a diskfile
* We use on byte per bit of storage
* 0x00 := False
* 0xff := True
* All others := Indeterminate.
* 1) write a random value
* 2) sleep for 1 second (allowing us to kill the program ...)
* 3) write the intended value.
*/
static void bit_clr(unsigned pos)
{
fprintf(stderr, "Bit_clr(%u) Oldval=%02x" , pos, bits[pos] );

/* crashing while in the act of clearing a "set" bit will cause it to be indeterminate */
if (bits[pos]) bits[pos] = 1+rand() % 0xFE;
fprintf(stderr, " Tmpval=%02x" , bits[pos] );
bits_flush();
sleep(1);
bits[pos] = 0;
fprintf(stderr, " Final=%02x" , bits[pos] );
bits_flush();
fprintf(stderr, " Flushed\n" );
}

static void bit_set(unsigned pos)
{
fprintf(stderr, "Bit_set(%u) Oldval=%02x" , pos, bits[pos] );

if (bits[pos]) {
/* rewriting an already "set" bit will cause it to be indeterminate */
bits[pos] = 1+rand() % 0xFE;
fprintf(stderr, " BadFinal=%02x" , bits[pos] );
bits_flush();
fprintf(stderr, " Crashed\n" );
exit(EXIT_FAILURE);
}
else {
bits[pos] = 1+rand() % 0xFE;
fprintf(stderr, " Tmpval=%02x" , bits[pos] );
bits_flush();
sleep(1);
bits[pos] = 0xFF;
fprintf(stderr, " Final=%02x" , bits[pos] );
bits_flush();
fprintf(stderr, " Flushed\n" );
}
}

static unsigned askbit(unsigned pos)
{
switch(bits[pos]) {
case 0: return 0;
case 0xff: return 1;
default: return (1+rand() % 0xFE) > bits[pos] ? 0 : 1;
}
}

static unsigned getbits(void)
{
#if 0
unsigned val; val =askbit(0) + 2 * askbit(1) + 4 * askbit(2) + 8 * askbit(3) ; return val;
#else
return COMBINE4( askbit(3), askbit(2), askbit(1), askbit(0));
#endif
}

static FILE *bitsfile = NULL;
int bits_open(char *name)
{
int rc;
bitsfile = fopen(name, "r+" );
if (!bitsfile) bitsfile = fopen(name, "w+" );
if (!bitsfile) return -1;
rc = fread(bits, sizeof bits, 1, bitsfile);
if (rc < 1) rc = fwrite(bits, sizeof bits, 1, bitsfile);
return rc <1 ? -1 : 0;
}

static void bits_flush(void)
{
fseek (bitsfile, SEEK_SET, 0);
fwrite(bits, sizeof bits, 1, bitsfile);
fflush(bitsfile);
}

/*********** Eof **************/

AvK

Thad Smith

unread,

Mar 14, 2010, 10:20:00 PM3/14/10

to

Moi wrote:
> On Fri, 12 Feb 2010 21:49:21 -0700, Thad Smith wrote:
>
>> Francois Grieu wrote:
>>> /*
>>> Disclaimer: This is a hard problem, only minimally disguised into
>>> something on-topic for comp.lang.c
>> comp.programming and comp.arch.embedded added to the list.

> * Encoding.

What is the recovery path for 0000, the initial state?

--
Thad

Thad Smith

unread,

Mar 15, 2010, 12:23:38 AM3/15/10

to

It would be best to specify the recovery path for all possible startup states.
How about startup in 0, 1, 2, 4, 5, 8, a, f, interpreting bit combination as
hexadecimal?

I assume that resting states are 3, 6, c, 9, where 3, 6 are global off, while c
and 9 are global on. Is this correct?

--
Thad

Moi

unread,

Mar 15, 2010, 6:39:37 PM3/15/10

to

It is in the code. The comment at the top of the
file only describes the normal flow of state.

**********/

int restart(void)
{
unsigned state;

while (1) {
state = getbits();
fprintf(stderr, "Restart state = %s Rawbits=%02x %02x %02x %02x\n"
, states[state]
, bits[3], bits[2], bits[1], bits[0] );
switch (state) {
case COMBINE4(0,0,0,0): /* before initialization */
bit_clr(2);
bit_clr(1);
bit_clr(0);
bit_set(0);
bit_set(1);
bit_clr(3);
return getstate();
case COMBINE4(0,0,1,1):/* ..11 [Off] */

/*************

et cetera.

AvK

Thad Smith

unread,

Mar 16, 2010, 12:05:45 AM3/16/10

to

Since Francois hasn't posted in a while, I'll do the analysis.
The initial state, before the program runs is 0000. Run the initialization for
that state and interrupt during bit_set(1), producing 00?1.

On subsequent powerup, bits 2 and 3 are erased, not changing the state and
resting at 00?1. It can be read as either global state 0 or global state -1
without intervening toggle.

--
Thad