Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to implement an ECC detection appication?

0 views
Skip to first unread message

cai

unread,
Dec 9, 2002, 4:55:58 AM12/9/02
to
ECC --error correction control.I wonder if windows provide any api for me to
write a program,which can monitor my machine memory. If no, can I do it with
raw assembly language? I have no idea that how to do it. Any commments are
appreciated.thank you!

Michael Brown

unread,
Dec 10, 2002, 6:56:13 AM12/10/02
to
"cai" <aladd...@asus.com.cn> wrote in message
news:at1osm$v8i8m$1...@ID-147672.news.dfncis.de...

Both parity errors and ECC errors/corrections cause a NMI.

AFAIK, parity failures cause all versions of Windows to BSOD, so no hope of
catching them.

ECC corrections are not caught by Windows 9x (ie 95, 98, ME) so they cannot
be detected there using user-mode. You may be able to hack up a kernel-mode
NMI-catcher that watches for ECC errors (check the P4 manual for details, I
think it is buried in there somewhere) and then report it to you
application, but this would be ugly/unstable.

ECC events *are* caught by Windows NT (versions 4, 2K, XP, maybe 3). I don't
know the interface to it though, and a quick search of the MS site turned up
no clues. Maybe a kernel-mode hack is required for this as well, something
which I would *not* recommend for critical servers, which are just about the
only thing that use ECC memory.

--
Michael Brown
My inbox is always open (remove the obvious):
emb...@i4free.NOSPAM.co.nz


Robert Redelmeier

unread,
Dec 12, 2002, 9:56:01 AM12/12/02
to
Michael Brown <emb...@i4free.nospam.co.nz> wrote:
> Both parity errors and ECC errors/corrections cause a NMI.
>
> AFAIK, parity failures cause all versions of Windows to BSOD, so no hope of
> catching them.
>
> ECC corrections are not caught by Windows 9x (ie 95, 98, ME) so they cannot
> be detected there using user-mode. You may be able to hack up a kernel-mode
> NMI-catcher that watches for ECC errors (check the P4 manual for details, I
> think it is buried in there somewhere) and then report it to you
> application, but this would be ugly/unstable.
>
> ECC events *are* caught by Windows NT (versions 4, 2K, XP, maybe 3). I don't
> know the interface to it though, and a quick search of the MS site turned up
> no clues. Maybe a kernel-mode hack is required for this as well, something
> which I would *not* recommend for critical servers, which are just about the
> only thing that use ECC memory.


I thought ECC handling was too Northbridge-specific for PC OSes
to handle, and the ECC event triggered an SMI (System Management
Interrupt) not an NMI.

SMIs are transparent to the OS and are handled by the BIOS
even under pmode OSes. Some BIOSes keep [small] ECC logs
that you can access from the BIOS config screens. Presumably
there's some way to access these CMOS/NVRAM logs from an
OS, but it will be very chipset/BIOS specific.

-- Robert

Michael Brown

unread,
Dec 12, 2002, 6:17:44 PM12/12/02
to
"Robert Redelmeier" <red...@ev1.net.invalid> wrote in message
news:4t0K9.2474$jr6.81...@newssvr12.news.prodigy.com...

You are probably correct, as my statement was from memory of some internet
site from a couple of months ago.

> SMIs are transparent to the OS and are handled by the BIOS
> even under pmode OSes. Some BIOSes keep [small] ECC logs
> that you can access from the BIOS config screens. Presumably
> there's some way to access these CMOS/NVRAM logs from an
> OS, but it will be very chipset/BIOS specific.

One area of interest is the Pentium 2 (and up) "Machine Check Exception".
This appears to be able to report corrected (and uncorrected) ECC errors.
See page E-3 of the IA32 manual, volume 3, P4 edition (order number
245472-007). This is page number 735 in the PDF version I have.

However, it also says

<QUOTE>
The information in Table E-1 is implementation-specific for the P6 family
processors. The error information returned for a Pentium 4 processor is
considerably different.
</QUOTE>

and also in the section about the MC exception, it says that the reporting
codes are model specific, and does not provide any references as to where
you can find them out. A brief search of the Intel site didn't help either,
so I'm not sure if it's actually public. Be pretty stupid if it wasn't, but
remember Appendix H?

Of course, this depends on the chipset reporting this to the CPU. I'm not
sure of which pins do this on the Pentiums (so this may be chipset
dependant), but the Athlon does memory ECC internally (the data bus it 72
bits wide).

cai

unread,
Dec 12, 2002, 8:55:58 PM12/12/02
to
Thank two.
Yes, I heard that ECC is related with north bridge and SMI,too.And it is
machine dependent. But I can't find enough documents about it.
As to the BIOS, I don't know exactly where the event log is.

"Robert Redelmeier" <red...@ev1.net.invalid> wrote in message
news:4t0K9.2474$jr6.81...@newssvr12.news.prodigy.com...

0 new messages