Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: MCA error, possible causes?

16 views
Skip to first unread message

John Baldwin

unread,
Feb 24, 2016, 3:17:22 PM2/24/16
to
On Friday, February 12, 2016 08:11:37 PM Ultima wrote:
> Recently installed some cpus and received two MCA errors. Using mcelog, I
> found that the version in ports is about 5 years out of dated and didn't
> support my cpu. Decided to update it to the newest version (Will post on
> bugzilla shortly) to pull some more info. Going to post orig and decoded
> mcelog.
>
>
> Raw:
> MCA: Bank 20, Status 0xc800084000310e0f
> MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
> MCA: Vendor "GenuineIntel", ID 0x306f1, APIC ID 0
> MCA: CPU 0 COR (33) OVER BUSLG ??? ERR Other
> MCA: Misc 0x1df87b000d9eff
> MCA: Bank 5, Status 0xc800008000310e0f
> MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
> MCA: Vendor "GenuineIntel", ID 0x306f1, APIC ID 42
> MCA: CPU 34 COR (2) OVER BUSLG ??? ERR Other
> MCA: Misc 0xdf87b008d9eff
>
> mcelog v131:
> Hardware event. This is not a software error.
> CPU 0 BANK 20
> MISC 1df87b000d9eff
> MCG status:
> QPI: Rx detected CRC error - successful LLR wihout Phy re-init
> STATUS c800084000310e0f MCGSTATUS 0
> MCGCAP 7000c16 APICID 0 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 63
> Hardware event. This is not a software error.
> CPU 34 BANK 5
> MISC df87b008d9eff
> MCG status:
> QPI: Rx detected CRC error - successful LLR wihout Phy re-init
> STATUS c800008000310e0f MCGSTATUS 0
> MCGCAP 7000c16 APICID 2a SOCKETID 0
> CPUID Vendor Intel Family 6 Model 63
>
> After receiving this error, the system was in a frozen state. Any ideas
> what may cause this?

Well, hardware causes it. QPI is the interconnect bus between your
CPUs and RAM. "Rx detected CRC error" implies that a CPU detected a
corrupted message on that bus, but when it requested a resend the
resent message was ok. Normally corrected errors shouldn't hang your
machine, but perhaps your machine had another hardware error after this
that broke it too badly to report and/or log the subsequent error.

--
John Baldwin
_______________________________________________
freebsd-...@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to "freebsd-hardwa...@freebsd.org"

Ultima

unread,
Feb 24, 2016, 3:51:14 PM2/24/16
to
Hi John,

Thanks for the explanation. I ran some tests and ended up being a power
savings mode (aka unstable mode?). Disabling this feature put an end to the
freezes. I came to this conclusion by stress testing the box for 3 days,
and there were no issues. Nothing, then I stopped the stress test and about
15-30 min later it froze. It seemed to only occur during periods of low
load. I have not received any of these errors after turning off this power
savings mode.
0 new messages