Machine Check Exception

martin brook

unread,

Jan 17, 2002, 7:14:17 PM1/17/02

to

Hi,

I'm having problems with my application getting random machine check
exceptions (0x200). The exception address reported does not correspond to
any code which may cause a machine check. I understand that the exception
reporting can be delayed so giving no clue as to where the exception is
occuring.

Has anyone got a method of debugging these exceptions ? or a way of turning
the machine check off ?

My app is doing a lot of serial I/O on ports 0/1 which are set to
115200baud.

TIA martin

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.313 / Virus Database: 174 - Release Date: 1/3/02

Scott Johnson

unread,

Jan 17, 2002, 7:28:52 PM1/17/02

to

In article <a27pcp$pv8$1...@knossos.btinternet.com>,

martin brook <vgr...@btinternet.com> wrote:
>Hi,
>
>I'm having problems with my application getting random machine check
>exceptions (0x200). The exception address reported does not correspond to
>any code which may cause a machine check. I understand that the exception
>reporting can be delayed so giving no clue as to where the exception is
>occuring.

Quite likely you have a stray pointer somewhere in your code. Have you
done a stack trace of the offending task to see where the exception is
occurring?

>Has anyone got a method of debugging these exceptions ? or a way of turning
>the machine check off ?

In hardware, you cannot "turn off" a machine check. (Well, you can--but
the alternative to enabling machine check exceptions is to halt the
processor when a machine chek occurs).

You can try to disable the exception handler in vxWorks; but that is a BAD
idea. The machine check is telling you that your code is buggy, and that
you had better go fix it. (Else your hardware is buggy.)

Attempting to ignore it will only lead to worse pain.

>My app is doing a lot of serial I/O on ports 0/1 which are set to
>115200baud.

I don't know if this would cause the problem. What are you using for your
serial port hardware?

(BTW, since you say "machine check" rather than "bus error"; I am assuming
you have a PowerPC of some sort. If not, some of the above comments might
not apply.)

e_S

>TIA martin
>
>
>
>---
>Outgoing mail is certified Virus Free.
>Checked by AVG anti-virus system (http://www.grisoft.com).
>Version: 6.0.313 / Virus Database: 174 - Release Date: 1/3/02
>
>

--
------------------------------------------------------------------------
engineer_scotty (no, not that one) -- consumer of fine ales and lagers
some days you're the fire hydrant, some days you're the dog | go blazers
no small furry creatures were harmed in the creation of this .signature

Gary M

unread,

Jan 18, 2002, 1:48:40 AM1/18/02

to

I agree with Scott: attempting to disable the machine check exception will
only postpone death.

Have you eliminated hardware issues, such as bus transaction aborts, RAM
refresh problems, or bad configuration of SIU registers?

"martin brook" <vgr...@btinternet.com> wrote in message
news:a27pcp$pv8$1...@knossos.btinternet.com...

John

unread,

Jan 18, 2002, 3:48:22 AM1/18/02

to

Hello,

Which type of PPC are you using, and what version of VxWorks? Also,
what else is running on your target at the time, or is it just the
serial output? In particular, are you using things like watchdogs or
POSIX timers?

Rgds,

John...

"martin brook" <vgr...@btinternet.com> wrote in message news:<a27pcp$pv8$1...@knossos.btinternet.com>...

kannan

unread,

Jan 18, 2002, 7:03:07 AM1/18/02

to

Hi,

Whether your code is accessing a memory area which is not existing or
you are storing a value in a memory area which is not existing.
This may lead to machine check exception. you can simple hack the
the machine check exception by excVecSet your own C routine.
And inside your own handler you can increment PC(program counter).
But this is worst thing to do ....Instead you can verify your code.
You can go step by step first inside your handler you can check
the taskName and taskId of the task which creates exception.
Confirm that there is a bug in a single task or many
tasks are creating those exceptions.So that u can isolate that code
alone.
Inside your own handler do a taskRestart for that particular task and
check whether that particular task is again creating exceptions
frequently.
If so then verify your task with stack trace are see where exactly
that exception occurs and handle accordingly.

kannan

"Gary M" <com.vertical@garym> wrote in message news:<czP18.2686$Lj2....@rwcrnsc51.ops.asp.att.net>...

Yves

unread,

Jan 20, 2002, 9:00:07 PM1/20/02

to

Hi,

we are having a similar problem with our system. A machine check
exception is happening ... in a streaming method. It runs through the
code on a number of occasion, than one time it produces an exception.
This problem comes and goes with re-compiles... We have created a ring
buffer with which all tasks check in on a task switch, to try and
identify the task causing the problem, but have yet to succeed...

the machine check exception can be caused by parity, invalid instruction
and some bus errors (I don't have my PPC manual with me).

Any suggestions on finding (what I assumed to be) the stray pointer
would be welcomed at this point.

PPC 750, Tornado II (with TCP2) on NT. We are using watchdogs.

thanks

Yves

Gary M

unread,

Jan 21, 2002, 1:44:26 AM1/21/02

to

On the MPC8xx family at least, machine checks will also happen if you
reference a physical address for which the memory controller (not the MMU)
is configured but the target responds incorrectly or is unresponsive at that
particular address. Additionally, bus targets which provide their own TA
(Transfer Acknowledge) to the core but fail to do so before the core times
out will cause a machine check.

"Yves" <yves.bou...@sympatico.ca> wrote in message
news:3C4B7627...@sympatico.ca...

John

unread,

Jan 21, 2002, 3:47:53 AM1/21/02

to

Hello,

> the machine check exception can be caused by parity, invalid instruction
> and some bus errors (I don't have my PPC manual with me).

Could be a stray pointer, but some other things to check on:

1) Stack sizes for all your tasks; make sure none have overflowed.

2) Make sure that you set VX_FP_TASK when needed. You might need it
for other cases than just FP when using PPC - the GNU compiler for PPC
has some "interesting" optimisations that cause it to borrow FP regs!
Search the archives here for info on this, or just set VX_FP_TASK on
all tasks and see if it magically fixes the problem.

3) Check your watchdog routines carefully; they are interrupt handlers
and must follow the rules for ISRs.

Those are my general guidelines for trapping these sorts of things.
The other thing to look at is the address that caused the crash - is
it an address that corresponds to something else?

HTH,

John...

kannan

unread,

Jan 21, 2002, 9:25:22 AM1/21/02

to

Hi Yves,

> we are having a similar problem with our system. A machine check
> exception is happening ... in a streaming method. It runs through the
> code on a number of occasion, than one time it produces an exception.
> This problem comes and goes with re-compiles... We have created a ring
> buffer with which all tasks check in on a task switch, to try and
> identify the task causing the problem, but have yet to succeed...

The contents that are printed in the console after getting the
exception are:

1).Exception current instruction address
2).Machine Status Register
3).Condition Register
4).TaskId
5).TaskName

so probably you will get that particular taskName and taskId which
causes the
Exception .Suppose if you are going a hanged state then probably you
can write your own handler to logMsg the taskName and taskId in the
console.
So you will isolate the code which creates those exception.Check
whether
your default handler(VxWorks) makes that particular task which creates
exception to suspended state? If so you can only check the task which
goes to the suspended state.

> the machine check exception can be caused by parity, invalid instruction
> and some bus errors (I don't have my PPC manual with me).
>
> Any suggestions on finding (what I assumed to be) the stray pointer
> would be welcomed at this point.
>
> PPC 750, Tornado II (with TCP2) on NT. We are using watchdogs.

check whether your code has any invalid memory access.

check this code :

void machineChecktest (void)
{
printf("\n Test Task");
d (0xffffffff); /*some Invalid address*/
printf("\n Machine check exception test code over\n");
}

->sp machineChecktest

test this code and see whether the machine check exception
is happening ... in a streaming method or you are able
to get the machine check exception only once and also check
whether you get that task machineChecktest to suspended state
and the taskName and taskId are printed in your console.

Rgds,
kannan

Yves <yves.bou...@sympatico.ca> wrote in message news:<3C4B7627...@sympatico.ca>...

David Machin

unread,

Mar 15, 2002, 9:41:05 AM3/15/02

to

I apologise for replying so long after the original post.

My client had similar problems with a PPC 750 processor running Tornado
2 - the main symptom being a Machine Check exception where the code and
register set appeared to be fine
We investigated this with the assistance of the hardware vendor and
could find no obvious underlying cause (such as a transfer abort to a
peripheral etc.) It is worth noting that a posted write to a peripheral
(especially over PCI) is often the cause of this kind of behaviour, as
the resulting exception is asynchronous with the instruction sequence
and could occur hundreds of cycles after the instruction that caused it.

We found that the rate of failure was affected by where the executable
was located in memory (i.e. moving the same executable to a different
place would change the behaviour) - but extensive RAM tests and testing
on other boards seemed to rule out any memory problems.
We also found that code which performed intensive operations on large
complex data structures was particularly badly affected.

Based on what we observed, we suspected some form of caching problem was
to blame, and thus disabled the level 2 cache. Since doing this we have
not seen the problem again.

The board vendor has a TSR with windriver to investigate this problem as
we suspect it may involve swapping of Page Table Entries (PTEs) - I
would be very interested to hear from anyone else who has had problems
with the PPC750 running T2 with level 2 cache enabled.

HTH

Dave Machin
Machin Consultants Ltd. working on behalf of BAE SYSTEMS

Yves

unread,

Mar 15, 2002, 9:33:44 PM3/15/02

to

Hi,

thanks for the input.

We successfully tracked the problem down ... gcc problem. A register
(exception context?) was being reused as a general purpose register that
should not have been. A SPR is logged at WRS.

Yves

hari

unread,

Dec 17, 2004, 6:34:39 AM12/17/04

to

Hi Yves,
Can you please share how exactly this problem was solved.
Also can you please eloborate on what oprions we have to use to
overcome this issue.
We are also facing the similar issues.

Thanks
Hari