Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[gentoo-user] MCE in kernel

42 views
Skip to first unread message

Alan E. Davis

unread,
Aug 31, 2007, 9:10:05 PM8/31/07
to
I have been unable to boot into my gentoo system due to a Machine Check Exception.  This is an AMD 64 system.  MCE for AMD is enabled in the kernel (2.6.21 gentoo-sources). 

I am unable to boot in to turn off MCE checking.  I was able to log in by single user mode.  The MCE happens at the end of the loading of "default" scripts, at least this is what I am seeing on the screen: xdm has been loaded. 

The problem is, I have been installing ubuntu on another partition, and it boots fine. 

If I have it right, I can download a gentoo live install disk and compile a new kernel.  Is there a howto on this specific problem?

Thank you,

Alan Davis

--
Alan Davis, Kagman High School, Saipan  lng...@gmail.com  

"An inviscid theory of flow renders the screw useless, but the need for one non-existent."    
         ---Lord Raleigh (aka John William Strutt), or else his son,

Alan E. Davis

unread,
Aug 31, 2007, 11:00:16 PM8/31/07
to
Followuing up, I removed a troublesome partition that every time was being checked on boot, and I was able to boot ok.  Does this make sense?

Alan

Tim

unread,
Aug 31, 2007, 11:50:07 PM8/31/07
to
Alan E. Davis wrote:
> Followuing up, I removed a troublesome partition that every time was
> being checked on boot, and I was able to boot ok. Does this make sense?
>
> Alan
>
> On 9/1/07, * Alan E. Davis* <lng...@gmail.com

> <mailto:lng...@gmail.com>> wrote:
>
> I have been unable to boot into my gentoo system due to a Machine
> Check Exception. This is an AMD 64 system. MCE for AMD is enabled
> in the kernel (2.6.21 gentoo-sources).
>
> I am unable to boot in to turn off MCE checking. I was able to log
> in by single user mode. The MCE happens at the end of the loading
> of "default" scripts, at least this is what I am seeing on the
> screen: xdm has been loaded.
>
> The problem is, I have been installing ubuntu on another partition,
> and it boots fine.
>
> If I have it right, I can download a gentoo live install disk and
> compile a new kernel. Is there a howto on this specific problem?
>
> Thank you,
>
> Alan Davis
>
> --
> Alan Davis, Kagman High School, Saipan lng...@gmail.com
> <mailto:lng...@gmail.com>
>
> "An inviscid theory of flow renders the screw useless, but the need
> for one non-existent."
> ---Lord Raleigh (aka John William Strutt), or else his son,
>
>
>
>
> --
> Alan Davis, Kagman High School, Saipan lng...@gmail.com
> <mailto:lng...@gmail.com>
>
> "An inviscid theory of flow renders the screw useless, but the need for
> one non-existent."
> ---Lord Raleigh (aka John William Strutt), or else his son,

This makes little sense without knowing what partition you removed and
what you mean by "removing" it - did you take it out of /etc/fstab? Did
you actually repartition your disk? What partition was it, what kind was
it (primary, logical, extended) and what was on it? Hopefully we can be
of more assistance with this info.

-Tim
--
gento...@gentoo.org mailing list

Alan E. Davis

unread,
Sep 1, 2007, 12:10:07 AM9/1/07
to


On 9/1/07, Tim <ro...@pneumaticsystem.com> wrote:
Thank you for the response, Tim:


This makes little sense without knowing what partition you removed and
what you mean by "removing" it - did you take it out of /etc/fstab? Did
you actually repartition your disk? What partition was it, what kind was
it (primary, logical, extended) and what was on it? Hopefully we can be
of more assistance with this info.

I removed the partition from /dev/fstab.  It is a partition on /dev/sda1, a SATA drive, with about 20% fragmentation.  I moved everything off the drive, and will reformat, making sure it is in ext3 or other journaling format.  Something was triggering a check every boot.  (message saying the partition was not properly mounted---I don't have access to the exact message now). 

I wonder whether this kind of hardware issue might trigger the Machine Check Exception.

Thank you,

Alan Davis

 

-Tim
--
gento...@gentoo.org mailing list


--
Alan Davis, Kagman High School, Saipan   lng...@gmail.com  

Dan Farrell

unread,
Sep 3, 2007, 3:20:09 PM9/3/07
to
On Sat, 1 Sep 2007 11:08:27 +1000

"Alan E. Davis" <lng...@gmail.com> wrote:

> I have been unable to boot into my gentoo system due to a Machine
> Check Exception. This is an AMD 64 system. MCE for AMD is enabled
> in the kernel (2.6.21 gentoo-sources).
>
> I am unable to boot in to turn off MCE checking.

did you know you can disable this at boot time? Check it out:

| $ grep mce /usr/src/linux/Documentation/kernel-parameters.txt
| mce [IA-32] Machine Check Exception
| nomce [IA-32] Machine Check Exception

just add 'nomce' to your kernel boot line in grub and you should be able
to boot with MCE turned of to reconfigure.
-- Dan
--
gento...@gentoo.org mailing list

Alan E. Davis

unread,
Sep 3, 2007, 5:00:22 PM9/3/07
to
Thank you.  I have solved the problem for now, but live in fear that there is something untoward going in on my hardware.

Earlier on, this was intermittent.  I also wonder whether a register was set or a cmos flag, because after I booted the Ubuntu partition, the machine did boot with no complaint.  It hadn't been going on long, though.  Well, I finally was able to boot using an earlier kernel with no MCE flag set, then recompile a newer kernel without it.

I think your solution is the better one, though. 

I did follow the instructions of the boot messages and installed an mce log translation utility, but I didn't make sense of what to do with it.

Thank you again,

Alan

Dan Farrell

unread,
Sep 3, 2007, 6:40:05 PM9/3/07
to
On Tue, 4 Sep 2007 06:51:38 +1000
"Alan E. Davis" <lng...@gmail.com> wrote:

> I think your solution is the better one, though.
>
> I did follow the instructions of the boot messages and installed an
> mce log translation utility, but I didn't make sense of what to do
> with it.

The thing is, you are only masking symptoms. There may be something
wrong, and perhaps you could save a lot of work later by fixing a
problem before it turns catastrophic.

from http://en.wikipedia.org/wiki/Machine_Check_Exception

A Machine Check Exception, also called MCE, is a computer hardware
error which occurs when a computer's central processing unit detects an
unrecoverable hardware problem.

Normal causes for MCE errors are overheating and/or incorrect hardware
installation. Overheating can cause electrons to become more animated
and thus escape from the silicon tracks, resulting in corrupted data.
Some specific manually induced causes could be:

Overclocking (naturally increases heat output)

Poorly fitted heatsink/computer fans (the same problem can happen with
excessive dust in the CPU fan)

Computer software can also cause errors in this way (normally by
corrupting data they are reading or writing). For example:

-Software performing read or write operations to non-existent memory
regions which leads to confusion for the processor and/or the system
bus.

3rd party programs

mcelog
mcelog is a Linux program to decode MCE's on x86-64 processors

--
gento...@gentoo.org mailing list

Message has been deleted

Don Jerman

unread,
Sep 4, 2007, 11:50:07 AM9/4/07
to
On 9/3/07, Alan E. Davis <lng...@gmail.com> wrote:
> Thank you. I have solved the problem for now, but live in fear that there
> is something untoward going in on my hardware.
>
Quite possible. It can also be caused by misconfiguring kernel
drivers. I recently (accidently) selected the ATI agpart driver
instead of the Intel driver. Most drivers correctly detect when their
corresponding device isn't present, but this one gamely tried to
manage the AGP bridge and fouled up memory whenever X started...

So you may want to review your kernel config and make sure you have
all the devices you're attempting to use.
--
gento...@gentoo.org mailing list

Alan E. Davis

unread,
Sep 4, 2007, 4:50:13 PM9/4/07
to
Thank you.  I noticed that when I ran "make oldconfig" on a new kernel, the configs were not what I'd expected.  The wrong CPU type was configured. 

Alan
0 new messages