Fwd: random FreeBSD panics

Masoom Shaikh

unread,

Mar 28, 2010, 7:16:17 AM3/28/10

to freebsd...@freebsd.org

---------- Forwarded message ----------
From: Masoom Shaikh <masoom...@gmail.com>
Date: Sun, Mar 28, 2010 at 8:28 AM
Subject: random FreeBSD panics
To: freebsd...@freebsd.org, freebsd-questions
<freebsd-...@freebsd.org>

Hello List,

I was a happy FreeBSD user, just before I installed FreeBSD8.0-RC1.
Since then, system randomly just freezes, and there is no option other
than hard boot. I guessed this will get solved in 8.0-RELEASE, but it
was not :(

Many times I get vmcore files, not always. I have dumpdev set to AUTO
in my rc.conf. Almost every time it just fsck's the file-system on
reboot. I have not lost any files though. This is a Dell Inspiron 1525
Laptop with 1GB ram, Intel Core2 Duo T5500 with ATI Radeon X1400 card.
The installation in question is KDE4 from ports, with radeon/ati
driver.

I felt the problem is with wpi driver, then suspected dri driver of X.
Then I observed system freezes even if none of this is installed. e.g.
if it is under some load, like building a port and simultaneously
fetching something over network it hangs, and hangs hard. This
persuaded me to think something is wrong in kernel scheduling itself.
May be it is lost in some deadlock, etc... Thus last weekend I thought
I would see how immediate previous version i.e. FreeBSD-7.3-RELEASE
would behave.

I reinstalled FreeBSD7.1 from iso images, svn up'ed FreeBSD7.3 source,
did the normal buildworld, buildkernel, installkernel, installworld
cycle. Unfortunatly this kernel is naughty as well ;-), it also
freezes with same stubbornness. But difference is this time I happen
to catch something interesting.

It panics on NMI, fatal trap 19 while in kernel mode. Loaded the
vmcore file in kgdb and got the backtrace. I obtained vmcore files on
two occasions. I have attached both the back traces. This error most
likely suggests hardware error in RAM, but Windox7 and XP boot just
fine and never caused any errors.

To verify if I have errors in my RAM I let run sysutils/memtest86+
overnight, to double verify I also executed Windows Memory Diagnostic
test for four times. None of them reported errors. Can anyone here
suggest any solution.

Masoom Shaikh

forwarding to stable@ with respect to a generous suggestion

vmcore0.log

vmcore1.log

Masoom Shaikh

unread,

Mar 28, 2010, 10:42:19 AM3/28/10

to Ivan Voras, freebsd...@freebsd.org, freebsd...@freebsd.org, freebsd-...@freebsd.org

On Sun, Mar 28, 2010 at 12:03 PM, Ivan Voras <ivo...@freebsd.org> wrote:
> On 28 March 2010 13:18, Masoom Shaikh <masoom...@gmail.com> wrote:

>> On Sun, Mar 28, 2010 at 10:32 AM, Ivan Voras <ivo...@freebsd.org> wrote:
>>> Masoom Shaikh wrote:
>>>>
>>>> Hello List,
>>>>
>>>> I was a happy FreeBSD user, just before I installed FreeBSD8.0-RC1. Since
>>>> then, system randomly just freezes, and there is no option other than hard
>>>> boot. I guessed this will get solved in 8.0-RELEASE, but it was not :(
>>>

>>> I wild shot - did you try disabling superpages?
>>
>> umm, how do I do that ?
>
> Set
>
> vm.pmap.pg_ps_enabled=0
>
> in /boot/loader.conf and reboot. Report back if it helps or not.
>

nopes, this didn't help too, machine freezed again after using for 30
minutes or so
all it was doing is playing amarok, fetching sources from svn repos,
and using firefox

lets assume if this is h/w problem, then how can other OSes overcome
this ? is there a way to make FreeBSD ignore this as well, let it
result in reasonable performance penalty.

Adam Vande More

unread,

Mar 28, 2010, 1:34:02 PM3/28/10

to Masoom Shaikh, freebsd...@freebsd.org, freebsd...@freebsd.org, Ivan Voras, freebsd-...@freebsd.org

On Sun, Mar 28, 2010 at 8:42 AM, Masoom Shaikh <masoom...@gmail.com>wrote:

> nopes, this didn't help too, machine freezed again after using for 30
> minutes or so
> all it was doing is playing amarok, fetching sources from svn repos,
> and using firefox
>
> lets assume if this is h/w problem, then how can other OSes overcome
> this ? is there a way to make FreeBSD ignore this as well, let it
> result in reasonable performance penalty.
>

They would remove or replace the bad hardware.

I've seen more that one DIMM which passed every memory checker I could find
in it's most extensive testing mode. Only consistently effective option is
to replace with a known good piece of memory.

--
Adam Vande More

Ivan Voras

unread,

Mar 28, 2010, 1:38:58 PM3/28/10

to Masoom Shaikh, freebsd...@freebsd.org, freebsd...@freebsd.org, freebsd-...@freebsd.org

On 28 March 2010 16:42, Masoom Shaikh <masoom...@gmail.com> wrote:

> lets assume if this is h/w problem, then how can other OSes overcome
> this ? is there a way to make FreeBSD ignore this as well, let it
> result in reasonable performance penalty.

Very probably, if only we could detect where the problem is.
Try adding "options PRINTF_BUFR_SIZE=128" to the kernel
configuration file if you can, to see if you can get a less mangled
log outout.

Masoom Shaikh

unread,

Mar 29, 2010, 1:01:02 PM3/29/10

to Ivan Voras, freebsd...@freebsd.org, freebsd...@freebsd.org, freebsd-...@freebsd.org

On Sun, Mar 28, 2010 at 5:38 PM, Ivan Voras <ivo...@freebsd.org> wrote:
> On 28 March 2010 16:42, Masoom Shaikh <masoom...@gmail.com> wrote:
>
>> lets assume if this is h/w problem, then how can other OSes overcome
>> this ? is there a way to make FreeBSD ignore this as well, let it
>> result in reasonable performance penalty.
>
> Very probably, if only we could detect where the problem is.
> Try adding "options PRINTF_BUFR_SIZE=128" to the kernel

this option is already there

Jeremy Chadwick

unread,

Mar 29, 2010, 1:30:38 PM3/29/10

to Masoom Shaikh, freebsd...@freebsd.org, freebsd...@freebsd.org, Ivan Voras, freebsd-...@freebsd.org

On Mon, Mar 29, 2010 at 05:01:02PM +0000, Masoom Shaikh wrote:
> On Sun, Mar 28, 2010 at 5:38 PM, Ivan Voras <ivo...@freebsd.org> wrote:
> > On 28 March 2010 16:42, Masoom Shaikh <masoom...@gmail.com> wrote:
> >
> >> lets assume if this is h/w problem, then how can other OSes overcome
> >> this ? is there a way to make FreeBSD ignore this as well, let it
> >> result in reasonable performance penalty.
> >
> > Very probably, if only we could detect where the problem is.
> > Try adding "options PRINTF_BUFR_SIZE=128" to the kernel
>
> this option is already there

The key word in Ivan's phrase is "less mangled". Neither use of or
increasing PRINTF_BUFR_SIZE solves the problem of interspersed console
output. I've been ranting/raving about this problem for years now; it
truly looks like a mutex lock issue (or lack of such lock), but I've
been told numerous times that isn't the case.

To developers: what incentives would help get this issue well-needed
attention? This problem makes kernel debugging, panic analysis, and
other console-oriented viewing basically impossible.

John Baldwin

unread,

Mar 29, 2010, 2:27:34 PM3/29/10

to freebsd...@freebsd.org, freebsd...@freebsd.org, Masoom Shaikh, Ivan Voras, Jeremy Chadwick, freebsd-...@freebsd.org

On Monday 29 March 2010 1:30:38 pm Jeremy Chadwick wrote:
> On Mon, Mar 29, 2010 at 05:01:02PM +0000, Masoom Shaikh wrote:
> > On Sun, Mar 28, 2010 at 5:38 PM, Ivan Voras <ivo...@freebsd.org> wrote:
> > > On 28 March 2010 16:42, Masoom Shaikh <masoom...@gmail.com> wrote:
> > >
> > >> lets assume if this is h/w problem, then how can other OSes overcome
> > >> this ? is there a way to make FreeBSD ignore this as well, let it
> > >> result in reasonable performance penalty.
> > >
> > > Very probably, if only we could detect where the problem is.
> > > Try adding "options PRINTF_BUFR_SIZE=128" to the kernel
> >
> > this option is already there
>
> The key word in Ivan's phrase is "less mangled". Neither use of or
> increasing PRINTF_BUFR_SIZE solves the problem of interspersed console
> output. I've been ranting/raving about this problem for years now; it
> truly looks like a mutex lock issue (or lack of such lock), but I've
> been told numerous times that isn't the case.
>
> To developers: what incentives would help get this issue well-needed
> attention? This problem makes kernel debugging, panic analysis, and
> other console-oriented viewing basically impossible.

I was recently going to look at it. The somewhat drastic approach I was going
to take was to add a simple serializing lock around trap_fatal() and a few
other places that do similar block prints (e.g. mca_log()). One of the issues
with fixing this in printf itself is that you'd want probably want to
serialize complete lines of text on a per-thread basis. You would want to be
able to accumulate this line of text across multiple calls to printf (think of
it as line-buffering ala stdio). However, some folks may be nervous about
printf not printing things immediately.

The other issue is that lots of code assumes it can call printf from anywhere
and everywhere. Mostly this just means that if you add locking and line-
buffering to printf(9) you have to be very careful to make sure it works in
odd places. Probably a lot of this could be solved by deferring things like
trap_fatal() until panic() has already been called (which is bde's preferred
solution I think).

--
John Baldwin

C. P. Ghost

unread,

Mar 29, 2010, 4:18:35 PM3/29/10

to John Baldwin, freebsd...@freebsd.org

On Mon, Mar 29, 2010 at 8:27 PM, John Baldwin <j...@freebsd.org> wrote:
>> To developers: what incentives would help get this issue well-needed
>> attention? This problem makes kernel debugging, panic analysis, and
>> other console-oriented viewing basically impossible.
>
> I was recently going to look at it. The somewhat drastic approach I was going
> to take was to add a simple serializing lock around trap_fatal() and a few
> other places that do similar block prints (e.g. mca_log()). One of the issues
> with fixing this in printf itself is that you'd want probably want to
> serialize complete lines of text on a per-thread basis. You would want to be
> able to accumulate this line of text across multiple calls to printf (think of
> it as line-buffering ala stdio). However, some folks may be nervous about
> printf not printing things immediately.
>
> The other issue is that lots of code assumes it can call printf from anywhere
> and everywhere. Mostly this just means that if you add locking and line-
> buffering to printf(9) you have to be very careful to make sure it works in
> odd places. Probably a lot of this could be solved by deferring things like
> trap_fatal() until panic() has already been called (which is bde's preferred
> solution I think).

How about serializing all printf(9) through a dedicated kernel thread? Maybe
something as flexible as syslogd for kernel space (klogd), that could also
redirect output to a file, to a serial console etc...?

> John Baldwin

-cpghost.

--
Cordula's Web. http://www.cordula.ws/

Jeremy Chadwick

unread,

Mar 29, 2010, 4:30:48 PM3/29/10

to John Baldwin, freebsd-...@freebsd.org, Masoom Shaikh, freebsd...@freebsd.org, Ivan Voras, freebsd...@freebsd.org

John,

Thanks for the insights, they're greatly appreciated.

I went looking this morning to see how Linux addressed this issue (if at
all), and it's been discussed a few times in the past. The longest lkml
thread I could find that mentioned the problem was circa 2002. Probably
not worth reading as there was work done in 2009 to solve the issue.

http://lkml.indiana.edu/hypermail/linux/kernel/0204.1/index.html#161

Work done by RedHat in 2009 details how they implemented a lockless
version of their kernel ring buffer (similar to our system message
buffer, but probably a lot more complex):

http://lwn.net/Articles/340400/
http://lwn.net/Articles/340443/

Supposedly having multiple writers to the ring is 100% safe; no
interspersed output. Same goes for interrupt-generated stuff. There's
some comments in the technical document (2nd link) that imply there's an
individual ring buffer for each CPU; possibly per-CPU kernel message
buffers would solve our issue?

Masoom Shaikh

unread,

Apr 3, 2010, 8:51:46 AM4/3/10

to Ivan Voras, freebsd...@freebsd.org, freebsd...@freebsd.org, freebsd-...@freebsd.org

ok, after few days of silence I am back with more questions
this time system feels little better, it is able to sustain for more
time that what 7.3-RELEASE could

FreeBSD raptor 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Thu Apr 1
01:20:45 UTC 2010 root@:/usr/obj/usr/src/sys/INSPIRON amd64

I am using KDE4, and when OS freezes, well it freezes, means I cannot
change to tty0 and see the panic text, if any it might possibly have
spit. the stuck frozen GUI keeps staring there. So the question is how
to I capture that panic text ? unfortunately I am not getting core
files too, so there is nothing I can pick up hints

is there some option (KDB, DDB), so that on panic system drop to debugger ?

Masoom Shaikh

Gary Jennejohn

unread,

Apr 3, 2010, 10:35:23 AM4/3/10

to Masoom Shaikh, freebsd...@freebsd.org, freebsd...@freebsd.org

On Sat, 3 Apr 2010 12:51:46 +0000
Masoom Shaikh <masoom...@gmail.com> wrote:

> On Sun, Mar 28, 2010 at 5:38 PM, Ivan Voras <ivo...@freebsd.org> wrote:
> > On 28 March 2010 16:42, Masoom Shaikh <masoom...@gmail.com> wrote:
> >
> >> lets assume if this is h/w problem, then how can other OSes overcome
> >> this ? is there a way to make FreeBSD ignore this as well, let it
> >> result in reasonable performance penalty.
> >
> > Very probably, if only we could detect where the problem is.

> > Try adding "options __ __ PRINTF_BUFR_SIZE=128" to the kernel

> > configuration file if you can, to see if you can get a less mangled
> > log outout.
> >
>
> ok, after few days of silence I am back with more questions
> this time system feels little better, it is able to sustain for more
> time that what 7.3-RELEASE could
>
> FreeBSD raptor 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Thu Apr 1
> 01:20:45 UTC 2010 root@:/usr/obj/usr/src/sys/INSPIRON amd64
>
> I am using KDE4, and when OS freezes, well it freezes, means I cannot
> change to tty0 and see the panic text, if any it might possibly have
> spit. the stuck frozen GUI keeps staring there. So the question is how
> to I capture that panic text ? unfortunately I am not getting core
> files too, so there is nothing I can pick up hints
>
> is there some option (KDB, DDB), so that on panic system drop to debugger ?
>

[trimmed Cc - no need to send this to 3 MLs]

There's no code in the kernel to switch back out of graphics mode (i.e.
what X uses) when a panic happens.

You probably can switch to v0, but you won't be able to see it.

The only sure-fire way is to hook up a screen (terminal, laptop or
another computer) to a serial port.

--
Gary Jennejohn

Anoop Kumar Narayanan

unread,

Apr 8, 2010, 12:45:00 PM4/8/10

to Masoom Shaikh, freebsd...@freebsd.org, freebsd...@freebsd.org, Ivan Voras, freebsd-...@freebsd.org

I am having the very same problem, with my AMD64 running i386 (both
7.3-REL and 8.0-REL) keeps crashing, The best part is, if I disable
ACPI it crashes before it even boots up so is the case with safe-mode
and single-user-mode. With ACPI it boots up but crashes after a while.
I have the vmcore files on the system. Who do I contact on this regard
?

> _______________________________________________
> freebsd-...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "freebsd-questi...@freebsd.org"
>

Masoom Shaikh

unread,

Apr 8, 2010, 1:46:55 PM4/8/10

to Anoop Kumar Narayanan, freebsd...@freebsd.org, freebsd...@freebsd.org, Ivan Voras, freebsd-...@freebsd.org

can u load that file in kgdb in get backtrace ?