Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

freebsd-stable Digest, Vol 164, Issue 5

0 views

Skip to first unread message

freebsd-sta...@freebsd.org

unread,

Jun 27, 2006, 8:00:54 AM6/27/06

to freebsd...@freebsd.org

Send freebsd-stable mailing list submissions to
freebsd...@freebsd.org

To subscribe or unsubscribe via the World Wide Web, visit
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
or, via email, send a message with subject or body 'help' to
freebsd-sta...@freebsd.org

You can reach the person managing the list at
freebsd-st...@freebsd.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of freebsd-stable digest..."

Today's Topics:

1. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch)
2. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch)
3. Re: leaking blocked processes in vmstat ... how to debug?
(Marc G. Fournier)
4. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
(Pete French)
5. Re: Losing confidence in FreeBSD 6.x in a loaded environment
... :( (Matthew Jacob)
6. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
(Pete French)
7. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
(Dmitry Pryanishnikov)
8. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch)
9. Re: 6.1-stable hangs and LORs (Brad Waite)
10. Re: vmstat 'b' (disk busy?) field keeps climbing ...
(Marian Hettwer)
11. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch)
12. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
(Mike Jakubik)
13. Re: wi0 down when print a lot of data to screen over ssh
(Sam Leffler)
14. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
(Paul Allen)
15. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
(Andrew Reilly)
16. Re: vinum to gvinum help (Greg 'groggy' Lehey)
17. Re: vmstat 'b' (disk busy?) field keeps climbing ...
(Jonathan Noack)
18. trap 12: supervisor write, page not present on 6.1-STABLE Tue
May 16 2006 (Stanislaw Halik)
19. Re: trap 12: supervisor write, page not present on 6.1-STABLE
Tue May 16 2006 (Stanislaw Halik)
20. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
(Wilko Bulte)
21. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
(Dmitry Pryanishnikov)
22. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
(Dmitry Pryanishnikov)
23. Re: What denotes a 'blocked' process? (Peter Jeremy)
24. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
(Alban Hertroys)
25. Re: FreeBSD 6.x CVSUP today crashes with zero load ...
(Peter Jeremy)
26. FreeBSD 6.1 and maildrop compiling error (Albert Czarnecki)
27. [OT] Thanks (M.Hirsch)

----------------------------------------------------------------------

Message: 1
Date: Tue, 27 Jun 2006 01:07:19 +0200
From: "M.Hirsch" <webm...@hirsch.it>
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
To: Dmitry Pryanishnikov <dmi...@atlantis.dp.ua>
Cc: freebsd...@freebsd.org
Message-ID: <44A068A7...@hirsch.it>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Dmitry Pryanishnikov schrieb:

> When you wrote "ECC is a way to mask broken hardware", you were plain
> wrong.
> If you're using hardware w/o ECC, it just can't tell whether error
> present
> or absent. So ECC _is_ the way to detect (not mask) broken hardware.
>
Ok, thanks. I think I understand the meaning of ECC now.
So, unlike my supplier claims, ECC is not supposed to help against
hardware failures.
But it is the way to detect them, right?

> If you want ECC corrector to raise NMI on corrected error (as well as
> uncorrectable), just set approproate bit in control register - every
> Intel's ECC-capable chipset allows it. But if we're speaking about
> production environment, such behaviour (abnormal termination on
> _corrected_
> error) is unacceptable.

"abnormal termination" is not only acceptable for me, it is what I am
looking for.
Make the node crash completely, so one of the others can take over its
task(s).

> Don't get me wrong, but tracking bugs in FreeBSD is quite more of an
> effort than "just" akquiring a new box...
>
> I don't see connection between this sentence and ECC (which is
> hardware option).

What I wanted to say:
Looking for errors in the logs is only a few seconds.
Finding out what caused them, is hours...
Akquiring a new box is only $29,95 ;) - that's like 30 minutes, if you
regard it from the business side. ... I rather rent 100 boxes to do the
task of ten, than employ 100 admins to find the "real" problem.

Thanks, Dmitry. I think I know what to look for now...

------------------------------

Message: 2
Date: Tue, 27 Jun 2006 01:08:39 +0200
From: "M.Hirsch" <M.Hi...@hirsch.it>
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
To: Steven Hartland <kil...@multiplay.co.uk>
Cc: freebsd...@freebsd.org
Message-ID: <44A068F...@hirsch.it>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Wow, Steven,

you've been really helpful here...

Steven Hartland schrieb:

>
> My advice would be dont feed the troll.
>
> Steve

------------------------------

Message: 3
Date: Mon, 26 Jun 2006 11:04:47 -0300 (ADT)
From: "Marc G. Fournier" <scr...@hub.org>
Subject: Re: leaking blocked processes in vmstat ... how to debug?
To: Gavin Atkinson <gavin.a...@ury.york.ac.uk>
Cc: freebsd...@freebsd.org
Message-ID: <2006062611...@ganymede.hub.org>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

On Mon, 26 Jun 2006, Gavin Atkinson wrote:

> On Mon, 2006-06-26 at 08:29 -0300, Marc G. Fournier wrote:
>> Up to 48 right now, and still nothing to show for it ...
>
> Is it possible that these processes are blocked while exiting?
>
> There's currently an unresolved bug where processes get wedged either
> during startup or exit. See if you have any processes in state "START"
> in top, or with an E in the "STAT" column:
>
> ps ax -O ppid,flags,mwchan | awk '$6 ~ /E/ || $6 == "STAT"'

Checked ... that one is clean too :(

I've even checked for L (waiting to acquire lock), since that sounded
like something that might be seen as 'blocked' .. nadda ..

Its as is 'blocked' is incrementing, but not always decrementing properly
...

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email . scr...@hub.org MSN . scr...@hub.org
Yahoo . yscrappy Skype: hub.org ICQ . 7615664

------------------------------

Message: 4
Date: Tue, 27 Jun 2006 00:33:57 +0100
From: Pete French <petef...@ticketswitch.com>
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
To: dmi...@atlantis.dp.ua, webm...@hirsch.it
Cc: freebsd...@freebsd.org
Message-ID: <E1Fv0ab-...@dilbert.firstcallgroup.co.uk>

> So, unlike my supplier claims, ECC is not supposed to help against
> hardware failures.
> But it is the way to detect them, right?

Yes!!! Absolutely.

-pete.

------------------------------

Message: 5
Date: Mon, 26 Jun 2006 16:25:04 -0700
From: "Matthew Jacob" <lydianc...@gmail.com>
Subject: Re: Losing confidence in FreeBSD 6.x in a loaded environment
... :(
To: "Marc G. Fournier" <scr...@hub.org>
Cc: freebsd...@freebsd.org
Message-ID:
<7579f7fb0606261625n619...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Well, I have to say that I had to reboot my 6.1 gateway on Sunday. It
was 5.X prior to this and never had to be rebooted.

On 6/26/06, Marc G. Fournier <scr...@hub.org> wrote:
>
> Okay, now this is getting ridiculous .. and for all those that have been
> helping me debug things over the past few days, this isn't a rant against
> "people", its a rant against the state of FreeBSD 6.x :(
>
> Its got SERIOUS problems.
>
> As easy as it is to 'blame the hardware', I'm up to my third FreeBSD 6.x
> system that is having problems now, all of which ran flawlessly under
> FreeBSD 4-STABLE under some serious load ...
>
> And by 'serious load' .. jupiter (the one that was giving me the SegFaults
> under those two kernels I tried) was up to 100 vServers running on it
> while I was clearning things off of pluto to upgrade her to 6.x ... and
> something like 209 days uptime ...
>
> Right now, I have 3 FreeBSD 4.x left in production, and 4 FreeBSD 6.x ...
>
> One 4.x server is running 87 vServers, has been up for 74 days now, and
> vmstat 5 shows:
>
> # vmstat 5
> procs memory page disks faults cpu
> r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id
> 3 5 0 3594432 228088 60 3 3 1 73 310 0 0 584 562 569 28 29 43
> 3 5 0 3578556 227384 546 0 1 7 592 0 0 18 434 1869 1647 7 11 83
> 2 5 0 3564464 225092 807 0 0 0 505 0 2 4 382 2698 2684 8 9 83
>
> One of my 6.x just went down for the second time today ... locked up solid
> ... first time it did it today, it was hitting maxpipekva, which seems to
> be my biggest headache so far with 6.x ...
>
> Pluto, the one that we've been pretty much fighting over this past
> weekend, is running 69 vServers, 1363 processes, a loadavg <1 ... and I'm
> lucky to keep it running for 24 hours ...
>
> maxpipekva is set to:
>
> kern.ipc.maxpipekva: 67108864 - kern.ipc.pipekva: 35782656
>
> It just doesn't feel like 6.x is as robust under loaded conditions as
> 4-STABLE was ... :(
>
> I don't expect anything to get fixed based on this email ... it was more
> to get off my chest about those that have been suggesting that I'm
> overloading the server(s) and causing the problems ... I loaded them worse
> under 4.x, with less problems ... hell, I got better uptimes under load
> when I was using unionfs then 6.x right now is giving me *without* any
> "funny mounts" :(
>
>
> ----
> Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
> Email . scr...@hub.org MSN . scr...@hub.org
> Yahoo . yscrappy Skype: hub.org ICQ . 7615664
> _______________________________________________
> freebsd...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"
>

------------------------------

Message: 6
Date: Tue, 27 Jun 2006 00:26:26 +0100
From: Pete French <petef...@ticketswitch.com>
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
To: M.Hi...@gmx.de, M.Hi...@hirsch.it
Cc: freebsd...@freebsd.org
Message-ID: <E1Fv0TK-...@dilbert.firstcallgroup.co.uk>

> I am not looking for workarounds, like ECC. I want the box to break
> immediately once any single component goes wrong...

Uh, that *is* what ECC does (or can do). Without ECC your broken hardware
continues to run un-noticed. With ECC you can either make it break
immediatley, or log an error or continue to run.

Stop thinking of ECC as error correction and start thinking of it as
error detection. No ECC gives you no way to detect failing memory.

-pete.

------------------------------

Message: 7
Date: Tue, 27 Jun 2006 02:21:28 +0300 (EEST)
From: Dmitry Pryanishnikov <dmi...@atlantis.dp.ua>
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
To: "M.Hirsch" <webm...@hirsch.it>
Cc: freebsd...@freebsd.org
Message-ID: <2006062702...@atlantis.atlantis.dp.ua>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

On Tue, 27 Jun 2006, M.Hirsch wrote:
>> If you're using hardware w/o ECC, it just can't tell whether error present
>> or absent. So ECC _is_ the way to detect (not mask) broken hardware.
>>
> Ok, thanks. I think I understand the meaning of ECC now.
> So, unlike my supplier claims, ECC is not supposed to help against hardware
> failures.
> But it is the way to detect them, right?

ECC stands for Error Checking and Correction. It's a hardware feature,
and its primary task is Checking (that is, detection) of errors. It just
happens that number of additional bits which carry checking code is sufficient
to correct _any_ _single-bit_ data error (not mask it, but really correct),
and to detect any double-bit and most of several-bit errors (w/o
correction).

>> Intel's ECC-capable chipset allows it. But if we're speaking about
>> production environment, such behaviour (abnormal termination on _corrected_
>> error) is unacceptable.
>
> "abnormal termination" is not only acceptable for me, it is what I am looking
> for.
> Make the node crash completely, so one of the others can take over its
> task(s).

Again, when single-bit correction has happened, it's not fake, the result is
actually correct. Why panic the machine immediately if all data OK?

Sincerely, Dmitry
--
Atlantis ISP, System Administrator
e-mail: dmi...@atlantis.dp.ua
nic-hdl: LYNX-RIPE

------------------------------

Message: 8
Date: Tue, 27 Jun 2006 01:40:13 +0200
From: "M.Hirsch" <M.Hi...@hirsch.it>
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
To: Pete French <petef...@ticketswitch.com>
Cc: freebsd...@freebsd.org
Message-ID: <44A0705D...@hirsch.it>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed

So what do I need to do to make the box panic() on an ECC error?
Is there a kernel parameter, sysctl, or what else?

Thanks,
M.

Pete French schrieb:

>>I am not looking for workarounds, like ECC. I want the box to break
>>immediately once any single component goes wrong...
>>
>>
>
>Uh, that *is* what ECC does (or can do). Without ECC your broken hardware
>continues to run un-noticed. With ECC you can either make it break
>immediatley, or log an error or continue to run.
>
>Stop thinking of ECC as error correction and start thinking of it as
>error detection. No ECC gives you no way to detect failing memory.
>
>-pete.
>
>

------------------------------

Message: 9
Date: Mon, 26 Jun 2006 17:48:20 -0600
From: Brad Waite <fre...@wcubed.net>
Subject: Re: 6.1-stable hangs and LORs
To: Max Laier <m...@love2party.net>
Cc: freebsd...@freebsd.org, Kris Kennaway <kr...@obsecurity.org>
Message-ID: <44A07244...@wcubed.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hey Max, thanks big-time for the help!

It's been just over 2 weeks with no LORs and no lockups. I've read the
man page probably a hundred times when setting up pf, but never saw the
bug. Wanted to confirm the fix with anyone else that may be struggling
like I was.

Would it be out of the question to throw a notice to syslog() if the
user or group filter? That would have saved me months of frustration.

Brad

Max Laier wrote:

> From pf.conf(5):
> BUGS
> Due to a lock order reversal (LOR) with the socket layer, the use of the
> group and user filter parameter in conjuction with a Giant-free netstack
> can result in a deadlock. If you have to use group or user you must set
> debug.mpsafenet to ``0'' from the loader(8), for the moment. This work-
> around will still produce the LOR, but Giant will protect from the dead-
> lock.
>

------------------------------

Message: 10
Date: Tue, 27 Jun 2006 01:25:29 +0200
From: Marian Hettwer <M...@kernel32.de>
Subject: Re: vmstat 'b' (disk busy?) field keeps climbing ...
To: "Marc G. Fournier" <scr...@hub.org>
Cc: Kostik Belousov <kost...@gmail.com>, freebsd...@freebsd.org,
Dmitry Morozovsky <ma...@rinet.ru>
Message-ID: <44A06CE9...@kernel32.de>
Content-Type: text/plain; charset=ISO-8859-15

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hej there,

Marc G. Fournier wrote:
>
> I think I might have found *at least* one of the problems, and that
> being the excessively high blocked states while ps isn't finding
> anything ...
>
> MySQL
>
> We just recently started allowing clients to run a MySQL server *within*
> their vServer ... in a drastic move, I just shut them all down on pluto,
> and blocked drop'd from ~86 down to 5 in a matter of moments ...
> restarting them all has it climbing once more, being up around 22
> already ...
>
I don't know wether it helps at all. I guess, not... But I'm seeing
blocked processes (mysqld) waiting for disk I/O all over the place when
running heavy duty MySQL servers. This is on Linux and FreeBSD.
Linux would be either 2.4.31 or 2.6.14 both with MySQL 4.1.x

./Marian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFEoGzngAq87Uq5FMsRAvk2AKDhWv6DplvGko1/5F4sy7JXuSOcTACcC8zO
uF/xKOKq7oyR2V/cP93CKzI=
=dv7F
-----END PGP SIGNATURE-----

------------------------------

Message: 11
Date: Tue, 27 Jun 2006 01:38:35 +0200
From: "M.Hirsch" <webm...@hirsch.it>
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
To: Dmitry Pryanishnikov <dmi...@atlantis.dp.ua>
Cc: freebsd...@freebsd.org
Message-ID: <44A06FF...@hirsch.it>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Yes, the result may be correct.
'Do not take "ECC" for "equals additional security"'

So I understand what's ECC good for, other than the usual "marketing talk".

But, in FreeBSD, the function is a result of hardware-level correction.
Something that only kicks in in _real_ _serious_ situations.
I just would like you (not specifically you, Dmitry) to aknowledge that
broken RAM is worth a "panic" in "standard situations"- if I may call it
like that.

If the RAM is broken for some bits, chances are great that there are
more following soon.
... from the replies I got via PM, I feel some people don't agree with
that....

Sticks don't just break on a single bit. From my experience, a stick
that's got any problems at all, will cause even more trouble soon...
If a hardware problem isn't worth panick'ing, what else is?
(don't answer this one please, this was a rhetorical question - to those
who didn't get it...)

Still, I'd rather have the node break down completely than that...

M.^

------------------------------

Message: 12
Date: Mon, 26 Jun 2006 20:08:14 -0400
From: Mike Jakubik <mi...@rogers.com>
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
To: Wilko Bulte <w...@freebie.xs4all.nl>
Cc: "M.Hirsch" <M.Hi...@hirsch.it>, Dmitry Pryanishnikov
<dmi...@atlantis.dp.ua>, freebsd...@FreeBSD.ORG
Message-ID: <44A076E...@rogers.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Wilko Bulte wrote:
> Proper hardware will log the ECC errors, a proper OS tailored to that
> hardware will log and notify the sysadmins.
>

So the question is.. is FreeBSD one of those operating systems? What
features/software is present if any, to report ECC problems?

------------------------------

Message: 13
Date: Mon, 26 Jun 2006 17:01:02 -0700
From: Sam Leffler <s...@errno.com>
Subject: Re: wi0 down when print a lot of data to screen over ssh
To: Ren Zhen <fbl...@gmail.com>
Cc: freebsd...@freebsd.org
Message-ID: <44A0753E...@errno.com>
Content-Type: text/plain; charset=ISO-8859-1

Ren Zhen wrote:
> wi0 goes down when I run a program print a lot of data to
> stdout, or when I use zmrx-zmtx it also goes down.
>
> kernel says:
> kernel: wi0: timeout in wi_seek to 152/0
> last message repeated 7 times
> kernel: wi0: device timeout
> kernel: wi0: timeout in wi_seek to 152/0
> kernel: wi0: link state changed to DOWN
>
> another time kernel says:
> kernel: wi0: timeout in wi_cmd 0x010b; event status 0x8000
> kernel: wi0: xmit failed
> kernel: wi0: timeout in wi_seek to 128/0
> last message repeated 3 times
>
> System Information:
> FreeBSD 6.1-STABLE
> IBM ThinkPad T23-4NC with a 802.11b wifi card original.
> wi0 use WEP 128bit encryption

<...stuff deleted...>

What firmware revs are you using?

Sam

------------------------------

Message: 14
Date: Mon, 26 Jun 2006 17:41:15 -0700
From: Paul Allen <nos...@ugcs.caltech.edu>
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
To: "M.Hirsch" <webm...@hirsch.it>
Cc: Dmitry Pryanishnikov <dmi...@atlantis.dp.ua>,
freebsd...@freebsd.org
Message-ID: <20060627004...@groat.ugcs.caltech.edu>
Content-Type: text/plain; charset=us-ascii

>From "M.Hirsch" <webm...@hirsch.it>, Tue, Jun 27, 2006 at 01:38:35AM +0200:
> Sticks don't just break on a single bit. From my experience, a stick
> that's got any problems at all, will cause even more trouble soon...
> If a hardware problem isn't worth panick'ing, what else is?
> (don't answer this one please, this was a rhetorical question - to those
> who didn't get it...)
As has been mentioned by other people already: this position is severely
ahistorical. ECC has traditionally been motivated by a desire to
1) provide reliable computing operations
2) ensure high-availability (uptime)

The very originating purpose of ECC was to keep the computer going in the
face of an alpha particle strike.

Alpha particles flip *single* bits.

ECC was never intended to detect crummy, failing hardware: that's a use
people have shoe-horned it into, but for which it is not entirely suited.

------------------------------

Message: 15
Date: Tue, 27 Jun 2006 10:41:55 +1000
From: Andrew Reilly <andrew-...@areilly.bpc-users.org>
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
To: "M.Hirsch" <webm...@hirsch.it>
Cc: Dmitry Pryanishnikov <dmi...@atlantis.dp.ua>,
freebsd...@freebsd.org
Message-ID: <20060627004...@duncan.reilly.home>
Content-Type: text/plain; charset=us-ascii

On Tue, Jun 27, 2006 at 01:38:35AM +0200, M.Hirsch wrote:
> I just would like you (not specifically you, Dmitry) to aknowledge that
> broken RAM is worth a "panic" in "standard situations"- if I may call it
> like that.

Well, ideally, if broken ram could be isolated with something
like IBM's chipkill stuff, then that would be better than
panicing. Sort of like enabling hot-swap of failing disk
drives.

The point that's been made, though, is that "soft" errors aren't
necessarily (or even) hardware failures at all. Hardware
failures can look like persistent soft errors, but soft errors
are real: radiation induced bit-flippage happens. ECC
turns what would otherwise be a panic-inducing error state into
a total non-event, improving the uptime of very large memory
systems to useful levels. Exactly similar to the forward error
correction used on disk drives and communications channels. In
all of these systems, the technology has been pushed so close to
the limits that the difference between "signal" and "noise" can
only be determined by sophisticated statistical analysis and
systematic redundancy.

> If the RAM is broken for some bits, chances are great that there are
> more following soon.
> ... from the replies I got via PM, I feel some people don't agree with
> that....

A single corrected error just isn't an indication that the
hardware is broken. If the ECC scrubber can't flip the bit to
the right state, *then* the hardware is broken, and you do need
to panic.

--
Andrew

------------------------------

Message: 16
Date: Tue, 27 Jun 2006 10:38:36 +0930
From: Greg 'groggy' Lehey <gr...@FreeBSD.org>
Subject: Re: vinum to gvinum help
To: Sven Willenberger <sv...@dmv.com>, freebsd-stable
<freebsd...@freebsd.org>
Message-ID: <20060627010...@wantadilla.lemis.com>
Content-Type: text/plain; charset="us-ascii"

On Monday, 26 June 2006 at 19:15:36 +0200, Roland Smith wrote:
> On Mon, Jun 26, 2006 at 12:22:07PM -0400, Sven Willenberger wrote:
>> I have an i386 system currently running 5.2.1-RELEASE with a vinum
>> mirror array (2 drives comprising /usr ). I want to upgrade this to
>> 5.5-RELEASE which, if I understand correctly, no longer supports vinum
>> arrays. Would simply chaning /boot/loader.conf to read gvinum_load
>> instead of vinum_load work or would the geom layer prevent this from
>> working properly? If not, is there a recommended way of upgrading a
>> vinum array to a gvinum or gmirror array?
>
> Lost of things have changed between 5.2.1 and 5.5. I think it would be
> best to make a backup and do a clean reinstall.

To the best of my knowledge, the Vinum on-disk layout has not changed,
so you shouldn't need to reinstall the data. A backup is always an
excellent idea, of course :-)

Greg
--
See complete headers for address and phone numbers.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060627/9058f8fa/attachment-0001.pgp

------------------------------

Message: 17
Date: Mon, 26 Jun 2006 21:32:04 -0400
From: "Jonathan Noack" <noa...@alumni.rice.edu>
Subject: Re: vmstat 'b' (disk busy?) field keeps climbing ...
To: "Marc G. Fournier" <scr...@hub.org>
Cc: Kostik Belousov <kost...@gmail.com>, Max Laier
<m...@love2party.net>, freebsd...@freebsd.org, Dmitry Morozovsky
<ma...@rinet.ru>
Message-ID: <44A08A94...@alumni.rice.edu>
Content-Type: text/plain; format=flowed; charset="ISO-8859-1"

Marc G. Fournier wrote:
> On Mon, 26 Jun 2006, Max Laier wrote:
>> On Monday 26 June 2006 20:25, Marc G. Fournier wrote:
>>> I think I might have found *at least* one of the problems, and that
>>> being
>>> the excessively high blocked states while ps isn't finding anything ...
>>>
>>> MySQL
>>>
>>> We just recently started allowing clients to run a MySQL server *within*
>>> their vServer ... in a drastic move, I just shut them all down on pluto,
>>> and blocked drop'd from ~86 down to 5 in a matter of moments ...
>>> restarting them all has it climbing once more, being up around 22
>>> already
>>> ...
>>>
>>> I'm going to go with that theory for now, and keep an eye on things ...
>>>
>>> Just curious as to why, even with -H, its not showing any blocked states
>>> within ps though ... ?
>>
>> The "blocked" column shows also processes that have objects "paging".
>> Most likely you are *short* on memory. In order to relieve the
>> pressure program .text pages are free'ed and need to be refetched from
>> disc whenever the respective code is being executed.
>
> 'k, but shouldn't the OS be doing any swapping, if this was the case?
> I'm getting <1M of swappage when the blocked pages are really high ...

It makes sense when you think about it (as Matthew Fuller pointed out in
this thread 2 days ago). There is no point in swapping out binary pages
as they are ALREADY stored on disk and can be re-fetched with ease
(remember the binary is marked in use so we don't have to worry about it
getting modified out from under us); why double disk usage by storing
binaries on the swap partition? In this case, binary pages are getting
paged out under memory pressure and have to be paged back in when
needed. This results in high vnode pager activity but little swap pager
activity.

Matthew pointed out that the vnode pager also handles mmap()'d files,
which could come into play with MySQL.

-Jonathan

------------------------------

Message: 18
Date: Tue, 27 Jun 2006 06:53:10 +0200
From: Stanislaw Halik <sth...@tehran.lain.pl>
Subject: trap 12: supervisor write, page not present on 6.1-STABLE Tue
May 16 2006
To: freebsd...@freebsd.org
Message-ID: <2006062704...@tehran.lain.pl>
Content-Type: text/plain; charset="us-ascii"

Hello,

6.1-STABLE crashed on me. I'm providing a backtrace. Could any of you,
experienced people, suggest me if it's a hardware problem or is it an
error inside the OS?

-->--
Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x58
fault code = supervisor write, page not present
instruction pointer = 0x20:0xc058e01a
stack pointer = 0x28:0xd68d5acc
frame pointer = 0x28:0xd68d5b04
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 42435 (rtorrent)
trap number = 12
panic: page fault
Uptime: 24d18h34m6s
Dumping 511 MB (2 chunks)
chunk 0: 1MB (160 pages) ... ok
chunk 1: 511MB (130816 pages) 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16

#0 doadump () at pcpu.h:165
165 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) bt
#0 doadump () at pcpu.h:165
#1 0xc04d609c in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2 0xc04d63e9 in panic (fmt=0xc06817e7 "%s") at /usr/src/sys/kern/kern_shutdown.c:565
#3 0xc066347c in trap_fatal (frame=0xd68d5a8c, eva=0) at /usr/src/sys/i386/i386/trap.c:836
#4 0xc0663152 in trap_pfault (frame=0xd68d5a8c, usermode=0, eva=88) at /usr/src/sys/i386/i386/trap.c:744
#5 0xc0662d0f in trap (frame=
{tf_fs = 892993544, tf_es = -1014235096, tf_ds = -1024327640, tf_edi = 0, tf_esi = 0, tf_ebp = -695379196, tf_isp = -695379272, tf_ebx = -695378816, tf_edx = -695378544, tf_ecx = 0, tf_eax = 8, tf_trapno = 12, tf_err = 2, tf_eip = -1067917286, tf_cs = 32, tf_eflags = 2163335, tf_esp = -695378816, tf_ss = -695379220}) at /usr/src/sys/i386/i386/trap.c:434
#6 0xc0653cfa in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7 0xc058e01a in ip_ctloutput (so=0xd68d5d90, sopt=0xd68d5c80) at /usr/src/sys/netinet/ip_output.c:1210
#8 0xc059f7df in tcp_ctloutput (so=0xc35fb6f4, sopt=0xd68d5c80) at /usr/src/sys/netinet/tcp_usrreq.c:1038
#9 0xc051d867 in sosetopt (so=0xc35fb6f4, sopt=0xd68d5c80) at /usr/src/sys/kern/uipc_socket.c:1560
#10 0xc05246b9 in kern_setsockopt (td=0xc38c6780, s=8, level=8, name=8, val=0xbfbfe61c, valseg=UIO_USERSPACE, valsize=0)
at /usr/src/sys/kern/uipc_syscalls.c:1351
#11 0xc05245be in setsockopt (td=0x8, uap=0xd68d5d90) at /usr/src/sys/kern/uipc_syscalls.c:1307
#12 0xc0663870 in syscall (frame=
{tf_fs = 139198523, tf_es = 138412091, tf_ds = -1078001605, tf_edi = -1077942700, tf_esi = -1077942700, tf_ebp = -1077942744, tf_isp = -695378588, tf_ebx = 673057632, tf_edx = 0, tf_ecx = 0, tf_eax = 105, tf_trapno = 0, tf_err = 2, tf_eip = 676107131, tf_cs = 51, tf_eflags = 2097734, tf_esp = -1077942788, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:981
#13 0xc0653d4f in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200
#14 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
--<--

Thanks in advance for any feedback.

--
Stanislaw Halik
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060627/ab2eacba/attachment-0001.pgp

------------------------------

Message: 19
Date: Tue, 27 Jun 2006 07:20:39 +0200
From: Stanislaw Halik <sth...@tehran.lain.pl>
Subject: Re: trap 12: supervisor write, page not present on 6.1-STABLE
Tue May 16 2006
To: freebsd...@freebsd.org
Message-ID: <2006062705...@tehran.lain.pl>
Content-Type: text/plain; charset="us-ascii"

On Tue, Jun 27, 2006, Stanislaw Halik wrote:
> 6.1-STABLE crashed on me. I'm providing a backtrace. Could any of you,
> experienced people, suggest me if it's a hardware problem or is it an
> error inside the OS?
[...]

More info follows:

#7 0xc058e01a in ip_ctloutput (so=0xd68d5d90, sopt=0xd68d5c80) at
/usr/src/sys/netinet/ip_output.c:1210
1210 inp->inp_ip_tos = optval;
Current language: auto; currently c
(kgdb) p inp
$1 = (struct inpcb *) 0x0

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060627/500cf093/attachment-0001.pgp

------------------------------

Message: 20
Date: Tue, 27 Jun 2006 08:26:08 +0200
From: Wilko Bulte <w...@freebie.xs4all.nl>
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
To: "M.Hirsch" <M.Hi...@hirsch.it>
Cc: freebsd...@freebsd.org
Message-ID: <20060627062...@freebie.xs4all.nl>
Content-Type: text/plain; charset=us-ascii

On Tue, Jun 27, 2006 at 12:33:39AM +0200, M.Hirsch wrote..
> Wilko Bulte schrieb:
>
> >You really have never seen a machine used for serious business apparantly.
> >
> >
> >
> Depends on what you define "serious business"...
> Yes, I am rather new to FreeBSD (2y+)
> I am just trying to setup a /stable/ cluster of six machines right now.
> For over a week straight.
> 4.11 works perfectly. But support is going to be dropped very soon, so
> that's a bad option for me right now.
>
> Over all, the system is /only/ supposed to handle a few hundred hits per
> second. (but including dynamic stuff like php...)
>
> Dunno if that (or what else) is "serious business" for you.
> Which version would you suggest for "serious business"?

I am not talking about FreeBSD specifically, I am talking about computing in
general.

> Anyways, my point stands: I rather have any of my nodes panic than
> carrying the risk of creating invalid data...
> One in a billion can be high probability, soon... (just planning for the
> future...)
>
> >panics like that should be eradicated, adding more nonsensical panics
> >is not what we need.
> >
> uh, I would not call hardware failure "nonsensical panics". I guess I
> must have misunderstood you...

Panics are there for situations when there is no other way out. Really no
other way out. Panicing an ECC-equipped box for a single bit error is
nonsense and defeats the whole idea behind ECC. Don;t confuse ECC with
parity checking.

Please go and read bit on soft bit errors on RAM, can be induced by (cosmic)
radiation etc. ECC will correct(!!!!) these single bit errors, and detect
multiple bit errors.

--
Wilko Bulte wi...@FreeBSD.org

------------------------------

Message: 21
Date: Tue, 27 Jun 2006 09:41:14 +0300 (EEST)
From: Dmitry Pryanishnikov <dmi...@atlantis.dp.ua>
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
To: "M.Hirsch" <webm...@hirsch.it>
Cc: freebsd...@freebsd.org
Message-ID: <2006062709...@atlantis.atlantis.dp.ua>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

On Tue, 27 Jun 2006, M.Hirsch wrote:
> Yes, the result may be correct.

If you're talking about single-bit error, you aren't quite correct. It isn't
"may be correct", it's _definitely_ correct (in mathematical sense; that it,
correcting code proves that we have one and only one error in bit number
N, hardware just inverts this bit, and result _is_ OK).

> 'Do not take "ECC" for "equals additional security"'

Not security. ECC adds reliability.

> But, in FreeBSD, the function is a result of hardware-level correction.
> Something that only kicks in in _real_ _serious_ situations.
> I just would like you (not specifically you, Dmitry) to aknowledge that
> broken RAM is worth a "panic" in "standard situations"- if I may call it like
> that.

The predominant RAM errors are exactly the single-bit ones. Moreover,
usually they _don't_ reappear again at the same cell. They (for example) may
be caused by the spontaneous alpha-radioactivity (brought into the your
computer by the usual dust) and as such don't indicate that RAM module must
be replaced. They just break your data in unpredictable way, not your
hardware. They (single-bit errors) are the main reason why ECC-capable memory
and chipset must be used in the computer which calculates/transfers actually
valuable data.

> If the RAM is broken for some bits, chances are great that there are more
> following soon.

If multiple-bit error happens, then yes, it can be the sign of actual
hardware fault. And yes, ECC logic will report this event instantly.

Sincerely, Dmitry
--
Atlantis ISP, System Administrator
e-mail: dmi...@atlantis.dp.ua
nic-hdl: LYNX-RIPE

------------------------------

Message: 22
Date: Tue, 27 Jun 2006 10:25:05 +0300 (EEST)
From: Dmitry Pryanishnikov <dmi...@atlantis.dp.ua>
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
To: Paul Allen <nos...@ugcs.caltech.edu>
Cc: "M.Hirsch" <webm...@hirsch.it>, freebsd...@freebsd.org
Message-ID: <2006062710...@atlantis.atlantis.dp.ua>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

On Mon, 26 Jun 2006, Paul Allen wrote:
> The very originating purpose of ECC was to keep the computer going in the
> face of an alpha particle strike.
>
> Alpha particles flip *single* bits.
>
> ECC was never intended to detect crummy, failing hardware: that's a use
> people have shoe-horned it into, but for which it is not entirely suited.

Well, correction is the last 'C' in ECC. Don't forget about second (and more
significant): Check. Error Check actually detects failing memory chips
(structure of the correcting code ensures detection of every 2-bit failure
and most N-bit (N>2)).

Sincerely, Dmitry
--
Atlantis ISP, System Administrator
e-mail: dmi...@atlantis.dp.ua
nic-hdl: LYNX-RIPE

------------------------------

Message: 23
Date: Tue, 27 Jun 2006 17:45:48 +1000
From: Peter Jeremy <peter...@optushome.com.au>
Subject: Re: What denotes a 'blocked' process?
To: "Marc G. Fournier" <scr...@hub.org>
Cc: freebsd...@freebsd.org
Message-ID: <2006062707...@turion.vk2pj.dyndns.org>
Content-Type: text/plain; charset="us-ascii"

Looking at the sources:

The 'blocked' column in vmstat is the sum of
(struct vmtotal).t_dw /* jobs in ``disk wait'' (neg priority) */ and
(struct vmtotal).t_pw /* jobs in page wait */

'systat -v' splits these into two fields (Proc:d and Proc:p) as does
sysctl vm.vmtotal

It's difficult to map these counters onto ps output. State 'D' and
'W' should catch most of them. You might find it useful looking
through the MWCHAN column for anything looking suspicious.

--
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060627/980378eb/attachment-0001.pgp

------------------------------

Message: 24
Date: Tue, 27 Jun 2006 10:23:57 +0200
From: Alban Hertroys <dal...@solfertje.student.utwente.nl>
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
To: "M.Hirsch" <M.Hi...@hirsch.it>
Cc: Wilko Bulte <w...@freebie.xs4all.nl>, freebsd...@freebsd.org
Message-ID:
<08EB21DA-80A4-4153...@solfertje.student.utwente.nl>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed

On Jun 26, 2006, at 11:54 PM, M.Hirsch wrote:

> Ok, sorry. Misunderstanding here.
> My point was, along what has been posted here in this thread:
> "An ECC error should raise a kernel panic immediately, not only a
> message in the log files."

Preferably not until the running transactions are processed and the
transaction server failed over to another one...

Otherwise it's like making a car do a full stop on a busy highway
because one of the tires is worn out.

--
Alban Hertroys

"I think, therefore I drink"
Lazarus

!DSPAM:74,44a0eb4b333521351116725!

------------------------------

Message: 25
Date: Tue, 27 Jun 2006 18:43:43 +1000
From: Peter Jeremy <peter...@optushome.com.au>
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
To: Dmitry Pryanishnikov <dmi...@atlantis.dp.ua>
Cc: freebs...@freebsd.org, freebsd...@freebsd.org
Message-ID: <2006062708...@turion.vk2pj.dyndns.org>
Content-Type: text/plain; charset="us-ascii"

On Tue, 2006-Jun-27 00:01:08 +0300, Dmitry Pryanishnikov wrote:
>On Mon, 26 Jun 2006, Robert Watson wrote:
>>I think this is a useful activity, especially if you've already run
>>extensive memory testing on the box. If you haven't yet done that, I
>>encourage you to take a break from buildworld's and make sure the memory
>>tests pass. I spent several months on and off trying to track down a bug a
>>few years ago, which turned out to be a one bit error in memory on the
>>box. It would appear and
>
> This is precisely the task which hardware ECC solves: to correct any
> single-bit memory error and to detect 2-bit and most of several-bit errors.

Parity will detect any odd number of bits in error. ECC can typically
correct correct one bit and detect 2 or any odd number of errors.

Note that ECC only checks the path between the RAM and DRAM controller
(eg northbridge). You can also get errors between the northbridge and
the CPU (including the cache). Some caches (eg Alpha) have parity to
help here. Mainframes typically have ECC or parity on _all_ datapaths
(including through the ALU) to catch those errors.

------------------------------

Message: 26
Date: Tue, 27 Jun 2006 10:57:16 +0200
From: Albert Czarnecki <aczar...@osanet.pl>
Subject: FreeBSD 6.1 and maildrop compiling error
To: freebsd...@freebsd.org
Message-ID: <44A0F2EC...@osanet.pl>
Content-Type: text/plain; charset=ISO-8859-2; format=flowed

I have 6.1-STABLE FreeBSD and when I try compiling maildrop with
/usr/ports/mail/maildrop

I get errors:

/bin/sh /usr/ports/mail/maildrop/work/maildrop-2.0.2/install-sh -c -s
rfc2045/makemime /usr/local/bin/makemime
. maildrop/uidgid ; test -z "$gid" && exit 0; test -w /etc || exit 0; cd
/usr/local/bin && chgrp $gid maildrop lockmail
maildrop: Timeout quota exceeded.
*** Error code 75

Stop in /usr/ports/mail/maildrop/work/maildrop-2.0.2.
*** Error code 1

Stop in /usr/ports/mail/maildrop.

df -h

Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1a 1.9G 459M 1.3G 25% /
devfs 1.0K 1.0K 0B 100% /dev
/dev/ad0s1g 78G 24G 54G 31% /home
/dev/ad0s1e 989M 182K 989M 0% /tmp
/dev/ad0s1f 24G 4.7G 20G 19% /usr
/dev/ad0s1d 989M 579M 410M 59% /var

what does meen -> maildrop: Timeout quota exceeded.
where is the problem?

Thx

Albert

------------------------------

Message: 27
Date: Tue, 27 Jun 2006 12:23:08 +0200
From: "M.Hirsch" <M.Hi...@hirsch.it>
Subject: [OT] Thanks
To: freebsd...@freebsd.org
Message-ID: <44A1070C...@hirsch.it>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Just wanted to say thank you for clearing up my confusion about ECC.

And also, I want to excuse for being a bit harsh in some posts.
(I am a rather cynic person, this helps me against not going crazy over
all this stuff.)
Last night, after hours of working on the very same problem without any
success at all, I was at the end of my powers.
Sorry, I'll try to keep back from posting in such situations in the future.

So it seems like I can not track down ram problems in software.
Thanks very much, besides my lack of understanding ECC, I wasn't aware
of that either.
Lesson learned.

Since everything worked fine before, I guess something must have broke
when I took the machine out of the shelf.
But I have decided now to go the easy way out and retire the hardware.
This old box isn't worth wasting more time on...

------------------------------

End of freebsd-stable Digest, Vol 164, Issue 5
**********************************************

0 new messages