Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

freebsd-hackers Digest, Vol 366, Issue 2

1 view
Skip to first unread message

freebsd-hac...@freebsd.org

unread,
Mar 30, 2010, 8:00:21 AM3/30/10
to freebsd...@freebsd.org
Send freebsd-hackers mailing list submissions to
freebsd...@freebsd.org

To subscribe or unsubscribe via the World Wide Web, visit
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
or, via email, send a message with subject or body 'help' to
freebsd-hac...@freebsd.org

You can reach the person managing the list at
freebsd-ha...@freebsd.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of freebsd-hackers digest..."


Today's Topics:

1. Re: periodically save current time to time-of-day hardware
(John Baldwin)
2. Re: random FreeBSD panics (John Baldwin)
3. Re: random FreeBSD panics (Masoom Shaikh)
4. virtual memory question (Dr. Baud)
5. Re: random FreeBSD panics (Jeremy Chadwick)
6. Re: random FreeBSD panics (John Baldwin)
7. Re: random FreeBSD panics (Jeremy Chadwick)
8. book on parallel programming (Sergey Babkin)


----------------------------------------------------------------------

Message: 1
Date: Mon, 29 Mar 2010 10:44:28 -0400
From: John Baldwin <j...@freebsd.org>
Subject: Re: periodically save current time to time-of-day hardware
To: freebsd...@freebsd.org
Cc: Dag-Erling Sm?rgrav <d...@des.no>, Andriy Gapon <a...@icyb.net.ua>
Message-ID: <20100329104...@freebsd.org>
Content-Type: Text/Plain; charset="utf-8"

On Sunday 28 March 2010 7:45:25 am Dag-Erling Smørgrav wrote:
> Peter Jeremy <peter...@acm.org> writes:
> > A new kthread which sleeps on channel "update_rtc". When woken, it
> > checks to see if it's within (say) 50msec of a second boundary and so,
> > it does a trylock on the (new) RTC mutex. If it grabs the mutex then
> > it performs the update. If it was too far from the second boundary or
> > it fails to grab the mutex then it sleeps until the next second
> > boundary and tries again.
> >
> > The existing resettodr() would then turn into a wakeup(update_rtc).
>
> Sounds good to me, but if only that thread has access to the RTC, why
> bother with a mutex?

I would dispense with the kthread and just use a callout (or have a callout
schedule a task for taskqueue_thread).

--
John Baldwin


------------------------------

Message: 2
Date: Mon, 29 Mar 2010 10:48:44 -0400
From: John Baldwin <j...@freebsd.org>
Subject: Re: random FreeBSD panics
To: freebsd...@freebsd.org
Cc: Masoom Shaikh <masoom...@gmail.com>, freebsd-questions
<freebsd-...@freebsd.org>
Message-ID: <20100329104...@freebsd.org>
Content-Type: Text/Plain; charset="iso-8859-15"

On Sunday 28 March 2010 4:28:29 am Masoom Shaikh wrote:
> Hello List,
>
> I was a happy FreeBSD user, just before I installed FreeBSD8.0-RC1. Since
> then, system randomly just freezes, and there is no option other than hard
> boot. I guessed this will get solved in 8.0-RELEASE, but it was not :(
>
> Many times I get vmcore files, not always. I have dumpdev set to AUTO in my
> rc.conf. Almost every time it just fsck's the file-system on reboot. I have
> not lost any files though. This is a Dell Inspiron 1525 Laptop with 1GB ram,
> Intel Core2 Duo T5500 with ATI Radeon X1400 card. The installation in
> question is KDE4 from ports, with radeon/ati driver.
>
> I felt the problem is with wpi driver, then suspected dri driver of X. Then
> I observed system freezes even if none of this is installed. e.g. if it is
> under some load, like building a port and simultaneously fetching something
> over network it hangs, and hangs hard. This persuaded me to think something
> is wrong in kernel scheduling itself. May be it is lost in some deadlock,
> etc... Thus last weekend I thought I would see how immediate previous
> version i.e. FreeBSD-7.3-RELEASE would behave.
>
> I reinstalled FreeBSD7.1 from iso images, svn up'ed FreeBSD7.3 source, did
> the normal buildworld, buildkernel, installkernel, installworld cycle.
> Unfortunatly this kernel is naughty as well ;-), it also freezes with same
> stubbornness. But difference is this time I happen to catch something
> interesting.
>
> It panics on NMI, fatal trap 19 while in kernel mode. Loaded the vmcore file
> in kgdb and got the backtrace. I obtained vmcore files on two occasions. I
> have attached both the back traces. This error most likely suggests hardware
> error in RAM, but Windox7 and XP boot just fine and never caused any errors.

Yes, and note that the chipset has set a register to indicate a RAM parity
error as well, so it is not a random NMI. Have you checked your BIOS' event
log? You may also want to try running with machine checks enabled
(hw.mca.enabled=1 in loader.conf, but it would have to be on very recent 7/8-
stable) to see if you get machine checks for ECC errors. OTOH, if you do not
have ECC memory then this will probably not help.

> To verify if I have errors in my RAM I let run sysutils/memtest86+
> overnight, to double verify I also executed Windows Memory Diagnostic test
> for four times. None of them reported errors. Can anyone here suggest any
> solution.

You can still have bad RAM even if those do not fail.

--
John Baldwin


------------------------------

Message: 3
Date: Mon, 29 Mar 2010 17:01:02 +0000
From: Masoom Shaikh <masoom...@gmail.com>
Subject: Re: random FreeBSD panics
To: Ivan Voras <ivo...@freebsd.org>
Cc: freebsd...@freebsd.org, freebsd...@freebsd.org,
freebsd-...@freebsd.org
Message-ID:
<b10011eb1003291001u767...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Sun, Mar 28, 2010 at 5:38 PM, Ivan Voras <ivo...@freebsd.org> wrote:
> On 28 March 2010 16:42, Masoom Shaikh <masoom...@gmail.com> wrote:
>
>> lets assume if this is h/w problem, then how can other OSes overcome
>> this ? is there a way to make FreeBSD ignore this as well, let it
>> result in reasonable performance penalty.
>
> Very probably, if only we could detect where the problem is.
> Try adding "options � � PRINTF_BUFR_SIZE=128" to the kernel

this option is already there

> configuration file if you can, to see if you can get a less mangled
> log outout.
>


------------------------------

Message: 4
Date: Mon, 29 Mar 2010 08:14:08 -0700 (PDT)
From: "Dr. Baud" <drb...@yahoo.com>
Subject: virtual memory question
To: freebsd...@freebsd.org
Message-ID: <359309....@web65605.mail.ac4.yahoo.com>
Content-Type: text/plain; charset=us-ascii


Should I be able to recover the physical address of a memory region allocated by configmalloc in a kernel module and mapped to a virtual address by a user application?

Dr


------------------------------

Message: 5
Date: Mon, 29 Mar 2010 10:30:38 -0700
From: Jeremy Chadwick <fre...@jdc.parodius.com>
Subject: Re: random FreeBSD panics
To: Masoom Shaikh <masoom...@gmail.com>
Cc: freebsd...@freebsd.org, freebsd...@freebsd.org, Ivan
Voras <ivo...@freebsd.org>, freebsd-...@freebsd.org
Message-ID: <2010032917...@icarus.home.lan>
Content-Type: text/plain; charset=iso-8859-1

On Mon, Mar 29, 2010 at 05:01:02PM +0000, Masoom Shaikh wrote:
> On Sun, Mar 28, 2010 at 5:38 PM, Ivan Voras <ivo...@freebsd.org> wrote:
> > On 28 March 2010 16:42, Masoom Shaikh <masoom...@gmail.com> wrote:
> >
> >> lets assume if this is h/w problem, then how can other OSes overcome
> >> this ? is there a way to make FreeBSD ignore this as well, let it
> >> result in reasonable performance penalty.
> >
> > Very probably, if only we could detect where the problem is.
> > Try adding "options � � PRINTF_BUFR_SIZE=128" to the kernel
>
> this option is already there

The key word in Ivan's phrase is "less mangled". Neither use of or
increasing PRINTF_BUFR_SIZE solves the problem of interspersed console
output. I've been ranting/raving about this problem for years now; it
truly looks like a mutex lock issue (or lack of such lock), but I've
been told numerous times that isn't the case.

To developers: what incentives would help get this issue well-needed
attention? This problem makes kernel debugging, panic analysis, and
other console-oriented viewing basically impossible.

--
| Jeremy Chadwick j...@parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |

------------------------------

Message: 6
Date: Mon, 29 Mar 2010 14:27:34 -0400
From: John Baldwin <j...@freebsd.org>
Subject: Re: random FreeBSD panics
To: freebsd...@freebsd.org
Cc: freebsd...@freebsd.org, Masoom Shaikh
<masoom...@gmail.com>, Ivan Voras <ivo...@freebsd.org>, Jeremy
Chadwick <fre...@jdc.parodius.com>, freebsd-...@freebsd.org
Message-ID: <20100329142...@freebsd.org>
Content-Type: Text/Plain; charset="iso-8859-1"

On Monday 29 March 2010 1:30:38 pm Jeremy Chadwick wrote:
> On Mon, Mar 29, 2010 at 05:01:02PM +0000, Masoom Shaikh wrote:
> > On Sun, Mar 28, 2010 at 5:38 PM, Ivan Voras <ivo...@freebsd.org> wrote:
> > > On 28 March 2010 16:42, Masoom Shaikh <masoom...@gmail.com> wrote:
> > >
> > >> lets assume if this is h/w problem, then how can other OSes overcome
> > >> this ? is there a way to make FreeBSD ignore this as well, let it
> > >> result in reasonable performance penalty.
> > >
> > > Very probably, if only we could detect where the problem is.
> > > Try adding "options PRINTF_BUFR_SIZE=128" to the kernel
> >
> > this option is already there
>
> The key word in Ivan's phrase is "less mangled". Neither use of or
> increasing PRINTF_BUFR_SIZE solves the problem of interspersed console
> output. I've been ranting/raving about this problem for years now; it
> truly looks like a mutex lock issue (or lack of such lock), but I've
> been told numerous times that isn't the case.
>
> To developers: what incentives would help get this issue well-needed
> attention? This problem makes kernel debugging, panic analysis, and
> other console-oriented viewing basically impossible.

I was recently going to look at it. The somewhat drastic approach I was going
to take was to add a simple serializing lock around trap_fatal() and a few
other places that do similar block prints (e.g. mca_log()). One of the issues
with fixing this in printf itself is that you'd want probably want to
serialize complete lines of text on a per-thread basis. You would want to be
able to accumulate this line of text across multiple calls to printf (think of
it as line-buffering ala stdio). However, some folks may be nervous about
printf not printing things immediately.

The other issue is that lots of code assumes it can call printf from anywhere
and everywhere. Mostly this just means that if you add locking and line-
buffering to printf(9) you have to be very careful to make sure it works in
odd places. Probably a lot of this could be solved by deferring things like
trap_fatal() until panic() has already been called (which is bde's preferred
solution I think).

--
John Baldwin


------------------------------

Message: 7
Date: Mon, 29 Mar 2010 13:30:48 -0700
From: Jeremy Chadwick <fre...@jdc.parodius.com>
Subject: Re: random FreeBSD panics
To: John Baldwin <j...@freebsd.org>
Cc: freebsd-...@freebsd.org, Masoom Shaikh
<masoom...@gmail.com>, freebsd...@freebsd.org, Ivan Voras
<ivo...@freebsd.org>, freebsd...@freebsd.org
Message-ID: <2010032920...@icarus.home.lan>
Content-Type: text/plain; charset=us-ascii

On Mon, Mar 29, 2010 at 02:27:34PM -0400, John Baldwin wrote:
> On Monday 29 March 2010 1:30:38 pm Jeremy Chadwick wrote:
> > On Mon, Mar 29, 2010 at 05:01:02PM +0000, Masoom Shaikh wrote:
> > > On Sun, Mar 28, 2010 at 5:38 PM, Ivan Voras <ivo...@freebsd.org> wrote:
> > > > On 28 March 2010 16:42, Masoom Shaikh <masoom...@gmail.com> wrote:
> > > >
> > > >> lets assume if this is h/w problem, then how can other OSes overcome
> > > >> this ? is there a way to make FreeBSD ignore this as well, let it
> > > >> result in reasonable performance penalty.
> > > >
> > > > Very probably, if only we could detect where the problem is.
> > > > Try adding "options PRINTF_BUFR_SIZE=128" to the kernel
> > >
> > > this option is already there
> >
> > The key word in Ivan's phrase is "less mangled". Neither use of or
> > increasing PRINTF_BUFR_SIZE solves the problem of interspersed console
> > output. I've been ranting/raving about this problem for years now; it
> > truly looks like a mutex lock issue (or lack of such lock), but I've
> > been told numerous times that isn't the case.
> >
> > To developers: what incentives would help get this issue well-needed
> > attention? This problem makes kernel debugging, panic analysis, and
> > other console-oriented viewing basically impossible.
>
> I was recently going to look at it. The somewhat drastic approach I was going
> to take was to add a simple serializing lock around trap_fatal() and a few
> other places that do similar block prints (e.g. mca_log()). One of the issues
> with fixing this in printf itself is that you'd want probably want to
> serialize complete lines of text on a per-thread basis. You would want to be
> able to accumulate this line of text across multiple calls to printf (think of
> it as line-buffering ala stdio). However, some folks may be nervous about
> printf not printing things immediately.
>
> The other issue is that lots of code assumes it can call printf from anywhere
> and everywhere. Mostly this just means that if you add locking and line-
> buffering to printf(9) you have to be very careful to make sure it works in
> odd places. Probably a lot of this could be solved by deferring things like
> trap_fatal() until panic() has already been called (which is bde's preferred
> solution I think).

John,

Thanks for the insights, they're greatly appreciated.

I went looking this morning to see how Linux addressed this issue (if at
all), and it's been discussed a few times in the past. The longest lkml
thread I could find that mentioned the problem was circa 2002. Probably
not worth reading as there was work done in 2009 to solve the issue.

http://lkml.indiana.edu/hypermail/linux/kernel/0204.1/index.html#161

Work done by RedHat in 2009 details how they implemented a lockless
version of their kernel ring buffer (similar to our system message
buffer, but probably a lot more complex):

http://lwn.net/Articles/340400/
http://lwn.net/Articles/340443/

Supposedly having multiple writers to the ring is 100% safe; no
interspersed output. Same goes for interrupt-generated stuff. There's
some comments in the technical document (2nd link) that imply there's an
individual ring buffer for each CPU; possibly per-CPU kernel message
buffers would solve our issue?

--
| Jeremy Chadwick j...@parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |

------------------------------

Message: 8
Date: Mon, 29 Mar 2010 16:58:00 -0500
From: Sergey Babkin <bab...@verizon.net>
Subject: book on parallel programming
To: hac...@freebsd.org
Message-ID: <4BB12268...@verizon.net>
Content-Type: text/plain; charset=us-ascii

Hi all,

For everyone who asked about my book "The Practice of Parallel
Programming" being printed, I've got it self-published
through CreateSpace:

https://www.createspace.com/3438465

They say it should get to Amazon too, in 3 weeks or so.
The discount code RYM7VM5Q gives $14 off the list price at the
CreateSpace store.

The online free version is still available at
http://members.verizon.net/~babkin/tpopp/

Though the printed edition has quite a few improvements in it :-)

-SB


------------------------------


End of freebsd-hackers Digest, Vol 366, Issue 2
***********************************************

0 new messages