80188 CPU instruction test (IEC1508, EN601-1-4)

Jonas Svensson

unread,

Oct 7, 1998, 3:00:00 AM10/7/98

to

Hi!
I'm working on an embedded system to be used in a medical device. As the
s/w + h/w will be safety critical a number of requirements concerning
system performance have to be met (EN 601-1-4 and IEC1508 etc...).
One of the required tests is to make sure that the cpu can execute
correctly (as a part of the built in power-on self test), which involves
testing all registers, conditional jumps etc. and the cpu peripherals
(80188 EX) such as the CSU etc.
Surely I'm not the first to encounter this problem, it must have been
done several times before, so if anybody "out there" knows where I can
find source code for these type of tests, public domain or commercial,
or knows the locations of www-resources that might be useful, please
mail me.

Thanks!
Jonas Svensson
jona...@hotmail.com

Guy Macon

unread,

Oct 7, 1998, 3:00:00 AM10/7/98

to

<RANT=ON>
Nope. If you want help on Usenet, you have to participate in the
newsgroup at least enough to read it. Email answers only help you,
for which I would charge my normal consulting fee. Posted answers
help others - my pay for a posted answer is all of the other great
posted answers. Join the discussion online. You will be glad.
<RANT=OFF>

The basic problem with such a test is that you are asking a CPU
that may be broken to test itself. What if it's broken in such a way
that it is not telling the truth about it's own health? If you really
need to be sure, another processor can check the first, but a quick
estimate of the failure rate of the extra solder joints vs. the chance
of finding a bad CPU will cure you of that idea real quick.

That being said, self tests do catch some problems, and I like them
for safety related systems if the following conditions apply:

[1] Not working at all is better than operating poorly (Example:
the pump that meters pain killers on demand to medical patients)

[2] Your self test code hangs on any error. You can't trust anything
about that processor once you see a failure.

[3] You have a test fixture that burns in and tests the CPUs over their
full temperature range before they ever run the self test code.
This fixture should use independent processors to handle errors.

Another consideration: how will you test your code? You can cause
peripherals to fail, but how do you get the code that tests a certain
register to ever take the "fail" branch under a real failure situation?

How complete do you want your test to be? let's look at RAM tests: a
stuck bit can be found with a couple of simple test patterns (All 1,
All 0) but what if setting the 8 bits around a bit to 1 changes that
bit to 1 as well? What if this happens 0.1% of the time? Do you
know which bits are next to each other inside your RAM? The old
"11111111/00000000/10101010/01010101" pattern doesn't even catch
Address 0 shorted to address 5 or data 1 shorted to data 3. (Ask me
about walking 1's and walking 0's sometime...) and that's just RAM,
not your core CPU logic.

Can you make your hardware so that even a malicious programmer can't
hurt someone? Current limit the electricity. Flow limit the chemicals.
force limit the mechanicals. Key all of the connectors. Autosense
instead of relying on a human to set a switch or enter a value.

Look hard at your hardware. Can it be designed to be safe despite
hardware failures? Can you make it so that failures are easy to
detect? There is a change machine in my lunchroom with a light
labled "out of order: no change available". Really nice if it
runs out of coins (it never has: it get's filled when the coke
machine goes empty) but not so good if the plug gets kicked out
of the wall or the light bulb burns out.

What if you did a halfhearted job on the CPU test code and spent
the time you save on extra debugging? What is the chance of
a software bug vs. a bad microprocessor? How many of each kind
gets past the existing set of tests (You DO have fully automated
regression testing of all code, RIGHT???)

Jonas Svensson

unread,

Oct 8, 1998, 3:00:00 AM10/8/98

to

Hi again!
<snip>

> >One of the required tests is to make sure that the cpu can execute

> >correctly.

> >Surely I'm not the first to encounter this problem, it must have been
> >done several times before, so if anybody "out there" knows where I can
> >find source code for these type of tests, public domain or commercial,
> >or knows the locations of www-resources that might be useful, please
> >mail me.
>
> <RANT=ON>
> Nope. If you want help on Usenet, you have to participate in the
> newsgroup at least enough to read it. Email answers only help you,
> for which I would charge my normal consulting fee. Posted answers
> help others - my pay for a posted answer is all of the other great
> posted answers. Join the discussion online. You will be glad.
> <RANT=OFF>

Sorry for being egoistic, maybe this discussion interests somebody else
too?

> The basic problem with such a test is that you are asking a CPU
> that may be broken to test itself. What if it's broken in such a way
> that it is not telling the truth about it's own health? If you really
> need to be sure, another processor can check the first, but a quick
> estimate of the failure rate of the extra solder joints vs. the chance
> of finding a bad CPU will cure you of that idea real quick.

Actually we already have a main processor that handles the control part
of the machine, however the authorities demand (Medical Device
Directive, MDD) that we fulfill the EN60601-1-4 standard and it's
substandards. In the particular substandard that applies to our product,
an independent automated "protective system" is required that monitors
specific parameters of the "control system". Blood pressures, pump
speeds and temperatures. As I see it there's no alternative. ;-(

> That being said, self tests do catch some problems, and I like them
> for safety related systems if the following conditions apply:
>
> [1] Not working at all is better than operating poorly (Example:
> the pump that meters pain killers on demand to medical patients)

If the "protective system" or the "control system" is not working
correctly the machine is designed to (and hopefully will) enter a
"patient safe state".

> [2] Your self test code hangs on any error. You can't trust anything
> about that processor once you see a failure.

Of course.

> [3] You have a test fixture that burns in and tests the CPUs over their
> full temperature range before they ever run the self test code.
> This fixture should use independent processors to handle errors.

Why, wouldn't this "age" the CPUs, making a real error more probable
during it's total expected life time?

> Another consideration: how will you test your code? You can cause
> peripherals to fail, but how do you get the code that tests a certain
> register to ever take the "fail" branch under a real failure situation?

The nearest thing that we can simulate is to manipulate register bits in
an emulator.

> How complete do you want your test to be? let's look at RAM tests: a
> stuck bit can be found with a couple of simple test patterns (All 1,
> All 0) but what if setting the 8 bits around a bit to 1 changes that
> bit to 1 as well? What if this happens 0.1% of the time? Do you
> know which bits are next to each other inside your RAM? The old
> "11111111/00000000/10101010/01010101" pattern doesn't even catch
> Address 0 shorted to address 5 or data 1 shorted to data 3. (Ask me
> about walking 1's and walking 0's sometime...) and that's just RAM,
> not your core CPU logic.

Actually, walking 1's / 0's are implemented in our product, to fulfill
"sufficient test coverage" (as defined in the annex of IEC1508). The
Flash PROMS are CRC32 checked.

> Can you make your hardware so that even a malicious programmer can't
> hurt someone? Current limit the electricity. Flow limit the chemicals.
> force limit the mechanicals. Key all of the connectors. Autosense
> instead of relying on a human to set a switch or enter a value.
> Look hard at your hardware. Can it be designed to be safe despite
> hardware failures? Can you make it so that failures are easy to
> detect? There is a change machine in my lunchroom with a light
> labled "out of order: no change available". Really nice if it
> runs out of coins (it never has: it get's filled when the coke
> machine goes empty) but not so good if the plug gets kicked out
> of the wall or the light bulb burns out.
>
> What if you did a halfhearted job on the CPU test code and spent
> the time you save on extra debugging? What is the chance of
> a software bug vs. a bad microprocessor? How many of each kind
> gets past the existing set of tests (You DO have fully automated
> regression testing of all code, RIGHT???)

The whole idea (as I see it) is to keep the "protective system" code
enough simple so that it can be tested as close to 100% as practicably
possible. Then, if we can convince our external test house that the
"protective system" always will detect a patient threatening failure of
the "control system" and that the possibility of an undetected failure
the "protective system" is so remote that is negligible, we'll be able
to build a much more complicated "control system" that might (read
will...) have undetected s/w + h/w bugs (because testing 100% of the
(control + protective) code so that all possible states are fully
covered is not a dream - it's a nightmare!), and still be reasonably
sure that the machine won't kill anybody.
(Just for the record: Yes, there is a battery backup, and if the
batteries also fail a small 9V cell will sound a really annoying buzzer
that will make even dead persons want ear plugs, because the processors
don't kick the safety circuit anymore).

I'm fully aware that there is no such thing as a bug free product. I'm
also convinced that the risk of an undetected h/w failure of the
processor (even w/o any power on self test of the CPU at all) is
probably smaller than the risk connected w/ adding complexity to the
code (such as a POST CPU test). But life isn't always as easy as we
engineers want it to be. Our test house says they want a CPU test to
approve the product and refers to the standards (IEC1508), my boss tells
me we must have their approval to sell the product. (He actually says
that if he doesn't sell any machines, my salary will not come
anymore...) Life's hard!
So the question remains: Hasn't anybody already done this (even
commercially)?
All you other medical device / safety critical product manufactures, how
do you solve this problem. Tests -or does everybody sweep this problem
under the proverbial rug?

Regards,
Jonas

msi...@tefbbs.com

unread,

Oct 8, 1998, 3:00:00 AM10/8/98

to

I have done this sort of test on an 8751. I wrote it for an aircraft
control system. I thought it was stupid. The fault coverage was 10% -
maybe.

I have been working on another aircraft system with a monitor
processor for every control CPU. We figure there will be very few
catastrophic failures because the MTBF of the hardware has been lowerd
so much that the plane will hardly ever be dispatched. Of course we
will make a lot of $$$$$$$$$ on test and repairs. All per customer
request.

As I like to say: " I'm not smart enough to do something that
stupid."

Simon
=================================================================
Jonas Svensson <jona...@hotmail.com> wrote:

Design Your Own MicroProcessor(tm) http://www/tefbbs.com/spacetime/index.htm

Guy Macon

unread,

Oct 8, 1998, 3:00:00 AM10/8/98

to

In article <361D2040...@hotmail.com>, jona...@hotmail.com wrote:

>Guy Macon wrote:

>> [3] You have a test fixture that burns in and tests the CPUs over their
>> full temperature range before they ever run the self test code.
>> This fixture should use independent processors to handle errors.
>
>Why, wouldn't this "age" the CPUs, making a real error more probable
>during it's total expected life time?

Good question! The failure curve of electronics is shaped like a bathtub.
Most failures happen very early, and the rate falls exponentialy. 40 to
80 hour at an varying temperature will put your hardware out on the flat
and low portion of the curve before you test it.

>So the question remains: Hasn't anybody already done this (even
>commercially)?

I can think of one vendor who makes self tests for PC's that could probably
be adapted to an 80188. Go to http://www.award.com/ and look at their
POST CARD and the "Embedded & Internet Devices" section under AwardBIOS.

You might try contacting the folks at http://www.datadepo.com as well.

Paul E. Bennett

unread,

Oct 8, 1998, 3:00:00 AM10/8/98

to

In article <361BD91C...@hotmail.com>
jonas.s...@mbox2.swipnet.se "Jonas Svensson" writes:

> Hi!
> I'm working on an embedded system to be used in a medical device. As the
> s/w + h/w will be safety critical a number of requirements concerning
> system performance have to be met (EN 601-1-4 and IEC1508 etc...).

> One of the required tests is to make sure that the cpu can execute

> correctly (as a part of the built in power-on self test), which involves
> testing all registers, conditional jumps etc. and the cpu peripherals
> (80188 EX) such as the CSU etc.

> Surely I'm not the first to encounter this problem, it must have been
> done several times before, so if anybody "out there" knows where I can
> find source code for these type of tests, public domain or commercial,
> or knows the locations of www-resources that might be useful, please
> mail me.

As someone who has completed a software certification exercise on a
"life-support" medical device, I can understand your concerns. One of my
comments to the software authors related to relying on just the "Power On
System Testing" (POST) for the integrity checks. I much prefer to see a
continuous test being performed throughout the operation of the equipment.

There are several ways of acheiving continuous performance integrity testing
but they should be determined from a full risk assessment of the product in
relation to all its operating modes. I will not expound on that here lest I
put all High Integrity Systems Developers out of business (including myself).

Incidently, noting your use of the 80188 as the CPU, I trust that is not
the version with writable control store. Whilst it may be a useful feature
at times you will need to consider the risks of accidentally jumping to
an un-initialised control store (seen that happen on one system with quite
disasterous consequences; fortunately during system integration testing).

We have recently covered an enourmous amount of ground on memroy testing,
watchdogs and general integrity monitoring within this and the real-time
groups and a browse with DejaNews would be a worthwhile activity at this
point.

Appropriate hardware and software design is only part of creating a high
integrity system. You need to have a sound development process running in
which you can trace all problems to their resolution and in which the
system configuration is properly established.

If you would like specific help with developing risk reduction strategies
for your system I can offer our consultancy services. We are used to doing
systems development remotely via the internet and have strategies for managing
such projects with globally spread development teams.

--
Paul E. Bennett ................... <p...@transcontech.co.uk>
Transport Control Technology Ltd. <http://www.tcontec.demon.co.uk/>
+44 (0)117-9499861 <enq...@transcontech.co.uk>
Going Forth Safely

Guy Macon

unread,

Oct 9, 1998, 3:00:00 AM10/9/98

to

In article <361e1...@139.134.5.33>, sad...@bigpond.com wrote:
>
>I have read, this type of testing occurring in the onboard computers in the
>space shuttle. There every decision by a processor is confirmed by two
>others (so I'm told). Everything is done on best out of three. If one of the
>processors disagrees too often with the others then it is flagged for human
>inspection. This may be a load of gibberish but the principle sounds right.

The Space Shuttle computer has five computers taking a vote. The reason is
that, if the number of computers drops to two, the mission must be scrubbed
because there is no longer a failsafe computer (one in which no single
failure will stop it from working). The shuttle can have up to two failed
computers and still continue with three way redundancy. They also put a
tested and working sixth computer in an out of the way storage closet.
Not really needed, but the weight and size was small enough so that they
didn't mind, and it might come in handy some day.

The Shuttle computer also has a reduced functionality flight program
available that is much smaller and simpler than the usual program.
All it does is land them from any point in the mission. It was, of
course, written by a programming team in another city with no contact
with or knowledge of the first programming team.

Saddle

unread,

Oct 10, 1998, 3:00:00 AM10/10/98

to

I have read, this type of testing occurring in the onboard computers in the
space shuttle. There every decision by a processor is confirmed by two
others (so I'm told). Everything is done on best out of three. If one of the
processors disagrees too often with the others then it is flagged for human
inspection. This may be a load of gibberish but the principle sounds right.

If you need to test the processor for internal faults then I would suggest
testing it with another. Allow the processor to go through small routines to
test the maths etc and place the answers in shared memory (or pump a fifo).
Use the second processor to compare the test routine to what should have
occurred. Allow the processor to continue to the main function of the device
and maybe require periodic testing of certain functions as an improved
watchdog service. Make the main system depend upon the two deciding
everything is ok or not at all.

Hope this helps,

Saddle (In the land of OZ)

Jonas Svensson wrote in message <361BD91C...@hotmail.com>...

>Hi!
>I'm working on an embedded system to be used in a medical device. As the
>s/w + h/w will be safety critical a number of requirements concerning
>system performance have to be met (EN 601-1-4 and IEC1508 etc...).
>One of the required tests is to make sure that the cpu can execute
>correctly (as a part of the built in power-on self test), which involves
>testing all registers, conditional jumps etc. and the cpu peripherals
>(80188 EX) such as the CSU etc.
>Surely I'm not the first to encounter this problem, it must have been
>done several times before, so if anybody "out there" knows where I can
>find source code for these type of tests, public domain or commercial,
>or knows the locations of www-resources that might be useful, please
>mail me.
>

>Thanks!
>Jonas Svensson
>jona...@hotmail.com
>