Hello List,
I was a happy FreeBSD user, just before I installed FreeBSD8.0-RC1.
Since then, system randomly just freezes, and there is no option other
than hard boot. I guessed this will get solved in 8.0-RELEASE, but it
was not :(
Many times I get vmcore files, not always. I have dumpdev set to AUTO
in my rc.conf. Almost every time it just fsck's the file-system on
reboot. I have not lost any files though. This is a Dell Inspiron 1525
Laptop with 1GB ram, Intel Core2 Duo T5500 with ATI Radeon X1400 card.
The installation in question is KDE4 from ports, with radeon/ati
driver.
I felt the problem is with wpi driver, then suspected dri driver of X.
Then I observed system freezes even if none of this is installed. e.g.
if it is under some load, like building a port and simultaneously
fetching something over network it hangs, and hangs hard. This
persuaded me to think something is wrong in kernel scheduling itself.
May be it is lost in some deadlock, etc... Thus last weekend I thought
I would see how immediate previous version i.e. FreeBSD-7.3-RELEASE
would behave.
I reinstalled FreeBSD7.1 from iso images, svn up'ed FreeBSD7.3 source,
did the normal buildworld, buildkernel, installkernel, installworld
cycle. Unfortunatly this kernel is naughty as well ;-), it also
freezes with same stubbornness. But difference is this time I happen
to catch something interesting.
It panics on NMI, fatal trap 19 while in kernel mode. Loaded the
vmcore file in kgdb and got the backtrace. I obtained vmcore files on
two occasions. I have attached both the back traces. This error most
likely suggests hardware error in RAM, but Windox7 and XP boot just
fine and never caused any errors.
To verify if I have errors in my RAM I let run sysutils/memtest86+
overnight, to double verify I also executed Windows Memory Diagnostic
test for four times. None of them reported errors. Can anyone here
suggest any solution.
Masoom Shaikh
forwarding to stable@ with respect to a generous suggestion
nopes, this didn't help too, machine freezed again after using for 30
minutes or so
all it was doing is playing amarok, fetching sources from svn repos,
and using firefox
lets assume if this is h/w problem, then how can other OSes overcome
this ? is there a way to make FreeBSD ignore this as well, let it
result in reasonable performance penalty.
> nopes, this didn't help too, machine freezed again after using for 30
> minutes or so
> all it was doing is playing amarok, fetching sources from svn repos,
> and using firefox
>
> lets assume if this is h/w problem, then how can other OSes overcome
> this ? is there a way to make FreeBSD ignore this as well, let it
> result in reasonable performance penalty.
>
They would remove or replace the bad hardware.
I've seen more that one DIMM which passed every memory checker I could find
in it's most extensive testing mode. Only consistently effective option is
to replace with a known good piece of memory.
--
Adam Vande More
> lets assume if this is h/w problem, then how can other OSes overcome
> this ? is there a way to make FreeBSD ignore this as well, let it
> result in reasonable performance penalty.
Very probably, if only we could detect where the problem is.
Try adding "options PRINTF_BUFR_SIZE=128" to the kernel
configuration file if you can, to see if you can get a less mangled
log outout.
this option is already there
The key word in Ivan's phrase is "less mangled". Neither use of or
increasing PRINTF_BUFR_SIZE solves the problem of interspersed console
output. I've been ranting/raving about this problem for years now; it
truly looks like a mutex lock issue (or lack of such lock), but I've
been told numerous times that isn't the case.
To developers: what incentives would help get this issue well-needed
attention? This problem makes kernel debugging, panic analysis, and
other console-oriented viewing basically impossible.
--
| Jeremy Chadwick j...@parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
I was recently going to look at it. The somewhat drastic approach I was going
to take was to add a simple serializing lock around trap_fatal() and a few
other places that do similar block prints (e.g. mca_log()). One of the issues
with fixing this in printf itself is that you'd want probably want to
serialize complete lines of text on a per-thread basis. You would want to be
able to accumulate this line of text across multiple calls to printf (think of
it as line-buffering ala stdio). However, some folks may be nervous about
printf not printing things immediately.
The other issue is that lots of code assumes it can call printf from anywhere
and everywhere. Mostly this just means that if you add locking and line-
buffering to printf(9) you have to be very careful to make sure it works in
odd places. Probably a lot of this could be solved by deferring things like
trap_fatal() until panic() has already been called (which is bde's preferred
solution I think).
--
John Baldwin
How about serializing all printf(9) through a dedicated kernel thread? Maybe
something as flexible as syslogd for kernel space (klogd), that could also
redirect output to a file, to a serial console etc...?
> John Baldwin
-cpghost.
--
Cordula's Web. http://www.cordula.ws/
John,
Thanks for the insights, they're greatly appreciated.
I went looking this morning to see how Linux addressed this issue (if at
all), and it's been discussed a few times in the past. The longest lkml
thread I could find that mentioned the problem was circa 2002. Probably
not worth reading as there was work done in 2009 to solve the issue.
http://lkml.indiana.edu/hypermail/linux/kernel/0204.1/index.html#161
Work done by RedHat in 2009 details how they implemented a lockless
version of their kernel ring buffer (similar to our system message
buffer, but probably a lot more complex):
http://lwn.net/Articles/340400/
http://lwn.net/Articles/340443/
Supposedly having multiple writers to the ring is 100% safe; no
interspersed output. Same goes for interrupt-generated stuff. There's
some comments in the technical document (2nd link) that imply there's an
individual ring buffer for each CPU; possibly per-CPU kernel message
buffers would solve our issue?
ok, after few days of silence I am back with more questions
this time system feels little better, it is able to sustain for more
time that what 7.3-RELEASE could
FreeBSD raptor 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Thu Apr 1
01:20:45 UTC 2010 root@:/usr/obj/usr/src/sys/INSPIRON amd64
I am using KDE4, and when OS freezes, well it freezes, means I cannot
change to tty0 and see the panic text, if any it might possibly have
spit. the stuck frozen GUI keeps staring there. So the question is how
to I capture that panic text ? unfortunately I am not getting core
files too, so there is nothing I can pick up hints
is there some option (KDB, DDB), so that on panic system drop to debugger ?
Masoom Shaikh
> On Sun, Mar 28, 2010 at 5:38 PM, Ivan Voras <ivo...@freebsd.org> wrote:
> > On 28 March 2010 16:42, Masoom Shaikh <masoom...@gmail.com> wrote:
> >
> >> lets assume if this is h/w problem, then how can other OSes overcome
> >> this ? is there a way to make FreeBSD ignore this as well, let it
> >> result in reasonable performance penalty.
> >
> > Very probably, if only we could detect where the problem is.
> > Try adding "options __ __ PRINTF_BUFR_SIZE=128" to the kernel
> > configuration file if you can, to see if you can get a less mangled
> > log outout.
> >
>
> ok, after few days of silence I am back with more questions
> this time system feels little better, it is able to sustain for more
> time that what 7.3-RELEASE could
>
> FreeBSD raptor 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Thu Apr 1
> 01:20:45 UTC 2010 root@:/usr/obj/usr/src/sys/INSPIRON amd64
>
> I am using KDE4, and when OS freezes, well it freezes, means I cannot
> change to tty0 and see the panic text, if any it might possibly have
> spit. the stuck frozen GUI keeps staring there. So the question is how
> to I capture that panic text ? unfortunately I am not getting core
> files too, so there is nothing I can pick up hints
>
> is there some option (KDB, DDB), so that on panic system drop to debugger ?
>
[trimmed Cc - no need to send this to 3 MLs]
There's no code in the kernel to switch back out of graphics mode (i.e.
what X uses) when a panic happens.
You probably can switch to v0, but you won't be able to see it.
The only sure-fire way is to hook up a screen (terminal, laptop or
another computer) to a serial port.
--
Gary Jennejohn
I am having the very same problem, with my AMD64 running i386 (both
7.3-REL and 8.0-REL) keeps crashing, The best part is, if I disable
ACPI it crashes before it even boots up so is the case with safe-mode
and single-user-mode. With ACPI it boots up but crashes after a while.
I have the vmcore files on the system. Who do I contact on this regard
?
> _______________________________________________
> freebsd-...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "freebsd-questi...@freebsd.org"
>
can u load that file in kgdb in get backtrace ?