Both parity errors and ECC errors/corrections cause a NMI.
AFAIK, parity failures cause all versions of Windows to BSOD, so no hope of
catching them.
ECC corrections are not caught by Windows 9x (ie 95, 98, ME) so they cannot
be detected there using user-mode. You may be able to hack up a kernel-mode
NMI-catcher that watches for ECC errors (check the P4 manual for details, I
think it is buried in there somewhere) and then report it to you
application, but this would be ugly/unstable.
ECC events *are* caught by Windows NT (versions 4, 2K, XP, maybe 3). I don't
know the interface to it though, and a quick search of the MS site turned up
no clues. Maybe a kernel-mode hack is required for this as well, something
which I would *not* recommend for critical servers, which are just about the
only thing that use ECC memory.
--
Michael Brown
My inbox is always open (remove the obvious):
emb...@i4free.NOSPAM.co.nz
I thought ECC handling was too Northbridge-specific for PC OSes
to handle, and the ECC event triggered an SMI (System Management
Interrupt) not an NMI.
SMIs are transparent to the OS and are handled by the BIOS
even under pmode OSes. Some BIOSes keep [small] ECC logs
that you can access from the BIOS config screens. Presumably
there's some way to access these CMOS/NVRAM logs from an
OS, but it will be very chipset/BIOS specific.
-- Robert
You are probably correct, as my statement was from memory of some internet
site from a couple of months ago.
> SMIs are transparent to the OS and are handled by the BIOS
> even under pmode OSes. Some BIOSes keep [small] ECC logs
> that you can access from the BIOS config screens. Presumably
> there's some way to access these CMOS/NVRAM logs from an
> OS, but it will be very chipset/BIOS specific.
One area of interest is the Pentium 2 (and up) "Machine Check Exception".
This appears to be able to report corrected (and uncorrected) ECC errors.
See page E-3 of the IA32 manual, volume 3, P4 edition (order number
245472-007). This is page number 735 in the PDF version I have.
However, it also says
<QUOTE>
The information in Table E-1 is implementation-specific for the P6 family
processors. The error information returned for a Pentium 4 processor is
considerably different.
</QUOTE>
and also in the section about the MC exception, it says that the reporting
codes are model specific, and does not provide any references as to where
you can find them out. A brief search of the Intel site didn't help either,
so I'm not sure if it's actually public. Be pretty stupid if it wasn't, but
remember Appendix H?
Of course, this depends on the chipset reporting this to the CPU. I'm not
sure of which pins do this on the Pentiums (so this may be chipset
dependant), but the Athlon does memory ECC internally (the data bus it 72
bits wide).
"Robert Redelmeier" <red...@ev1.net.invalid> wrote in message
news:4t0K9.2474$jr6.81...@newssvr12.news.prodigy.com...