I'm having problems with my application getting random machine check
exceptions (0x200). The exception address reported does not correspond to
any code which may cause a machine check. I understand that the exception
reporting can be delayed so giving no clue as to where the exception is
occuring.
Has anyone got a method of debugging these exceptions ? or a way of turning
the machine check off ?
My app is doing a lot of serial I/O on ports 0/1 which are set to
115200baud.
TIA martin
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.313 / Virus Database: 174 - Release Date: 1/3/02
Quite likely you have a stray pointer somewhere in your code. Have you
done a stack trace of the offending task to see where the exception is
occurring?
>Has anyone got a method of debugging these exceptions ? or a way of turning
>the machine check off ?
In hardware, you cannot "turn off" a machine check. (Well, you can--but
the alternative to enabling machine check exceptions is to halt the
processor when a machine chek occurs).
You can try to disable the exception handler in vxWorks; but that is a BAD
idea. The machine check is telling you that your code is buggy, and that
you had better go fix it. (Else your hardware is buggy.)
Attempting to ignore it will only lead to worse pain.
>My app is doing a lot of serial I/O on ports 0/1 which are set to
>115200baud.
I don't know if this would cause the problem. What are you using for your
serial port hardware?
(BTW, since you say "machine check" rather than "bus error"; I am assuming
you have a PowerPC of some sort. If not, some of the above comments might
not apply.)
e_S
>TIA martin
>
>
>
>---
>Outgoing mail is certified Virus Free.
>Checked by AVG anti-virus system (http://www.grisoft.com).
>Version: 6.0.313 / Virus Database: 174 - Release Date: 1/3/02
>
>
--
------------------------------------------------------------------------
engineer_scotty (no, not that one) -- consumer of fine ales and lagers
some days you're the fire hydrant, some days you're the dog | go blazers
no small furry creatures were harmed in the creation of this .signature
Have you eliminated hardware issues, such as bus transaction aborts, RAM
refresh problems, or bad configuration of SIU registers?
"martin brook" <vgr...@btinternet.com> wrote in message
news:a27pcp$pv8$1...@knossos.btinternet.com...
Which type of PPC are you using, and what version of VxWorks? Also,
what else is running on your target at the time, or is it just the
serial output? In particular, are you using things like watchdogs or
POSIX timers?
Rgds,
John...
"martin brook" <vgr...@btinternet.com> wrote in message news:<a27pcp$pv8$1...@knossos.btinternet.com>...
Whether your code is accessing a memory area which is not existing or
you are storing a value in a memory area which is not existing.
This may lead to machine check exception. you can simple hack the
the machine check exception by excVecSet your own C routine.
And inside your own handler you can increment PC(program counter).
But this is worst thing to do ....Instead you can verify your code.
You can go step by step first inside your handler you can check
the taskName and taskId of the task which creates exception.
Confirm that there is a bug in a single task or many
tasks are creating those exceptions.So that u can isolate that code
alone.
Inside your own handler do a taskRestart for that particular task and
check whether that particular task is again creating exceptions
frequently.
If so then verify your task with stack trace are see where exactly
that exception occurs and handle accordingly.
kannan
"Gary M" <com.vertical@garym> wrote in message news:<czP18.2686$Lj2....@rwcrnsc51.ops.asp.att.net>...
Hi,
we are having a similar problem with our system. A machine check
exception is happening ... in a streaming method. It runs through the
code on a number of occasion, than one time it produces an exception.
This problem comes and goes with re-compiles... We have created a ring
buffer with which all tasks check in on a task switch, to try and
identify the task causing the problem, but have yet to succeed...
the machine check exception can be caused by parity, invalid instruction
and some bus errors (I don't have my PPC manual with me).
Any suggestions on finding (what I assumed to be) the stray pointer
would be welcomed at this point.
PPC 750, Tornado II (with TCP2) on NT. We are using watchdogs.
thanks
Yves
"Yves" <yves.bou...@sympatico.ca> wrote in message
news:3C4B7627...@sympatico.ca...
> the machine check exception can be caused by parity, invalid instruction
> and some bus errors (I don't have my PPC manual with me).
Could be a stray pointer, but some other things to check on:
1) Stack sizes for all your tasks; make sure none have overflowed.
2) Make sure that you set VX_FP_TASK when needed. You might need it
for other cases than just FP when using PPC - the GNU compiler for PPC
has some "interesting" optimisations that cause it to borrow FP regs!
Search the archives here for info on this, or just set VX_FP_TASK on
all tasks and see if it magically fixes the problem.
3) Check your watchdog routines carefully; they are interrupt handlers
and must follow the rules for ISRs.
Those are my general guidelines for trapping these sorts of things.
The other thing to look at is the address that caused the crash - is
it an address that corresponds to something else?
HTH,
John...
> we are having a similar problem with our system. A machine check
> exception is happening ... in a streaming method. It runs through the
> code on a number of occasion, than one time it produces an exception.
> This problem comes and goes with re-compiles... We have created a ring
> buffer with which all tasks check in on a task switch, to try and
> identify the task causing the problem, but have yet to succeed...
The contents that are printed in the console after getting the
exception are:
1).Exception current instruction address
2).Machine Status Register
3).Condition Register
4).TaskId
5).TaskName
so probably you will get that particular taskName and taskId which
causes the
Exception .Suppose if you are going a hanged state then probably you
can write your own handler to logMsg the taskName and taskId in the
console.
So you will isolate the code which creates those exception.Check
whether
your default handler(VxWorks) makes that particular task which creates
exception to suspended state? If so you can only check the task which
goes to the suspended state.
> the machine check exception can be caused by parity, invalid instruction
> and some bus errors (I don't have my PPC manual with me).
>
> Any suggestions on finding (what I assumed to be) the stray pointer
> would be welcomed at this point.
>
> PPC 750, Tornado II (with TCP2) on NT. We are using watchdogs.
check whether your code has any invalid memory access.
check this code :
void machineChecktest (void)
{
printf("\n Test Task");
d (0xffffffff); /*some Invalid address*/
printf("\n Machine check exception test code over\n");
}
->sp machineChecktest
test this code and see whether the machine check exception
is happening ... in a streaming method or you are able
to get the machine check exception only once and also check
whether you get that task machineChecktest to suspended state
and the taskName and taskId are printed in your console.
Rgds,
kannan
Yves <yves.bou...@sympatico.ca> wrote in message news:<3C4B7627...@sympatico.ca>...
My client had similar problems with a PPC 750 processor running Tornado
2 - the main symptom being a Machine Check exception where the code and
register set appeared to be fine
We investigated this with the assistance of the hardware vendor and
could find no obvious underlying cause (such as a transfer abort to a
peripheral etc.) It is worth noting that a posted write to a peripheral
(especially over PCI) is often the cause of this kind of behaviour, as
the resulting exception is asynchronous with the instruction sequence
and could occur hundreds of cycles after the instruction that caused it.
We found that the rate of failure was affected by where the executable
was located in memory (i.e. moving the same executable to a different
place would change the behaviour) - but extensive RAM tests and testing
on other boards seemed to rule out any memory problems.
We also found that code which performed intensive operations on large
complex data structures was particularly badly affected.
Based on what we observed, we suspected some form of caching problem was
to blame, and thus disabled the level 2 cache. Since doing this we have
not seen the problem again.
The board vendor has a TSR with windriver to investigate this problem as
we suspect it may involve swapping of Page Table Entries (PTEs) - I
would be very interested to hear from anyone else who has had problems
with the PPC750 running T2 with level 2 cache enabled.
HTH
Dave Machin
Machin Consultants Ltd. working on behalf of BAE SYSTEMS
Hi,
thanks for the input.
We successfully tracked the problem down ... gcc problem. A register
(exception context?) was being reused as a general purpose register that
should not have been. A SPR is logged at WRS.
Yves
Thanks
Hari