Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

freebsd-stable Digest, Vol 350, Issue 2

2 views
Skip to first unread message

freebsd-sta...@freebsd.org

unread,
Mar 30, 2010, 8:00:27 AM3/30/10
to freebsd...@freebsd.org
Send freebsd-stable mailing list submissions to
freebsd...@freebsd.org

To subscribe or unsubscribe via the World Wide Web, visit
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
or, via email, send a message with subject or body 'help' to
freebsd-sta...@freebsd.org

You can reach the person managing the list at
freebsd-st...@freebsd.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of freebsd-stable digest..."


Today's Topics:

1. Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't
(Attila Nagy)
2. ZFS Tuning - arc_summary.pl (Barry Pederson)
3. [if_re] Dropping connectivity (A.J. "Fonz" van Werven)
4. Re: ZFS Tuning - arc_summary.pl (Jeremy Chadwick)
5. Re: ZFS Tuning - arc_summary.pl (Ben Kelly)
6. Strange NFS-related messages (related to lockd/statd)
(Jeremy Chadwick)
7. Re: random FreeBSD panics (Masoom Shaikh)
8. Re: [if_re] Dropping connectivity (Pyun YongHyeon)
9. Re: random FreeBSD panics (Jeremy Chadwick)
10. Re: random FreeBSD panics (John Baldwin)
11. Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't
(Pyun YongHyeon)
12. Re: [ HEADS UP ] Ports unstable for the next 10 days (Doug Barton)
13. Re: [ HEADS UP ] Ports unstable for the next 10 days (Doug Barton)
14. Re: [ HEADS UP ] Ports unstable for the next 10 days
(Adam Vande More)
15. Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't
(Attila Nagy)
16. Re: ahcich timeouts, only with ahci, not with ataahci
(Alexander Motin)
17. Re: [ HEADS UP ] Ports unstable for the next 10 days (Doug Barton)
18. Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't
(Pyun YongHyeon)
19. Re: random FreeBSD panics (C. P. Ghost)
20. Re: random FreeBSD panics (Jeremy Chadwick)
21. Re: Strange NFS-related messages (related to lockd/statd)
(Rick Macklem)
22. FreeBSD 8 and quotas on root partition error (M. Vale)
23. boot and boot0cfg problem (Daniel Braniss)
24. Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't
(Attila Nagy)
25. Re: boot and boot0cfg problem (Andrey V. Elsukov)
26. Re: boot and boot0cfg problem (Daniel Braniss)


----------------------------------------------------------------------

Message: 1
Date: Mon, 29 Mar 2010 14:04:36 +0200
From: Attila Nagy <b...@fsn.hu>
Subject: Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't
To: Mike Tancsa <mi...@sentex.net>
Cc: freebsd...@freebsd.org
Message-ID: <4BB09754...@fsn.hu>
Content-Type: text/plain; charset=ISO-8859-1

Mike Tancsa wrote:
> At 11:39 AM 3/25/2010, Michael Loftis wrote:
>> --On Thursday, March 25, 2010 3:22 PM +0100 Attila Nagy <b...@fsn.hu>
>> wrote:
>>
>> <...>
>>> Both unbound and python accepts DNS requests, and it seems when 25%
>>> interrupt happens, only unbound is in *udp state, where it is 50%, both
>>> programs are in that state.
>>
>> Try turning of hardware TSO/checksum offload if it's availble on your
>> chipset? ifconfig <interface> -rxcsum -txcsum -tso -- I'm only using
>> nfe chips right now, but w/ the TSO/CSUM on they lock up constantly
>> under high load. We're pretty sure it's mostly the nfe driver, or
>> the chips themselves, but have never ruled out some generic 8.x
>> hardware offload issues.
>
> There were also a bunch of commits last night for the bce driver. If
> its the NIC in RELENG_8, perhaps those bug fixes might help
>
> <http://lists.freebsd.org/pipermail/svn-src-stable-8/2010-March/001804.html>http://lists.freebsd.org/pipermail/svn-src-stable-8/2010-March/001804.html
>
> http://lists.freebsd.org/pipermail/svn-src-stable-8/2010-March/001803.html
>
I saw them, but they didn't seem to be related. I've just tested it, and
my assumptions were correct. A fresh 8-STABLE also freezes.


------------------------------

Message: 2
Date: Mon, 29 Mar 2010 09:43:08 -0500
From: Barry Pederson <b...@barryp.org>
Subject: ZFS Tuning - arc_summary.pl
To: jhell <jh...@DataIX.net>
Cc: FreeBSD Stable <freebsd...@freebsd.org>
Message-ID: <4BB0BC7C...@barryp.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

I've been using the arc_summary.pl script from here:

http://jhell.googlecode.com/svn/base/head/scripts/zfs/arc_summary/arc_summary.pl

and noticed some odd numbers, with the ARC Current Size being larger than the Max Size, and the breakdown adding up to
less than the current size as shown below

--------
ARC Size:
Current Size: 992.71M (arcsize)
Target Size: (Adaptive) 512.00M (c)
Min Size (Hard Limit): 81.82M (arc_min)
Max Size (Hard Limit): 512.00M (arc_max)

ARC Size Breakdown:
Recently Used Cache Size: 99.84% 511.18M (p)
Frequently Used Cache Size: 0.16% 0.82M (c-p)
--------


From another thread I saw, it sounds like arc_max isn't really
a "Hard Limit" but rather some kind of high water mark. If that's
the case then I wonder if this might make more sense....

---------
--- arc_summary.pl.original 2010-02-25 19:23:13.000000000 -0600
+++ arc_summary.pl 2010-03-29 09:32:28.000000000 -0500
@@ -121,20 +121,20 @@

my $arc_size = ${Kstat}->{zfs}->{0}->{arcstats}->{size};
my $arc_size_MiB = ($arc_size / 1048576);
-my $mfu_size = $target_size - $mru_size;
+my $mfu_size = $arc_size - $mru_size;
my $mfu_size_MiB = ($mfu_size / 1048576);
-my $mru_perc = 100*($mru_size / $target_size);
-my $mfu_perc = 100*($mfu_size / $target_size);
+my $mru_perc = 100*($mru_size / $arc_size);
+my $mfu_perc = 100*($mfu_size / $arc_size);

print "ARC Size:\n";
printf("\tCurrent Size:\t\t\t\t%0.2fM (arcsize)\n", $arc_size_MiB);
printf("\tTarget Size: (Adaptive)\t\t\t%0.2fM (c)\n", $target_size_MiB);
printf("\tMin Size (Hard Limit):\t\t\t%0.2fM (arc_min)\n", $arc_min_size_MiB);
-printf("\tMax Size (Hard Limit):\t\t\t%0.2fM (arc_max)\n", $arc_max_size_MiB);
+printf("\tMax Size :\t\t\t%0.2fM (arc_max)\n", $arc_max_size_MiB);

print "\nARC Size Breakdown:\n";
printf("\tRecently Used Cache Size:\t%0.2f%%\t%0.2fM (p)\n", $mru_perc, $mru_size_MiB);
-printf("\tFrequently Used Cache Size:\t%0.2f%%\t%0.2fM (c-p)\n", $mfu_perc, $mfu_size_MiB);
+printf("\tFrequently Used Cache Size:\t%0.2f%%\t%0.2fM (arcsize-p)\n", $mfu_perc, $mfu_size_MiB);
print "\n";

### ARC Efficency ###

-----------


Giving something like this...

--------
ARC Size:
Current Size: 992.88M (arcsize)
Target Size: (Adaptive) 512.00M (c)
Min Size (Hard Limit): 81.82M (arc_min)
Max Size : 512.00M (arc_max)

ARC Size Breakdown:
Recently Used Cache Size: 51.48% 511.18M (p)
Frequently Used Cache Size: 48.52% 481.70M (arcsize-p)
--------

Barry


------------------------------

Message: 3
Date: Mon, 29 Mar 2010 17:00:19 +0200 (CEST)
From: "A.J. \"Fonz\" van Werven" <a.j.w...@student.utwente.nl>
Subject: [if_re] Dropping connectivity
To: freebsd...@freebsd.org
Message-ID: <201003291500....@satellite.xs4all.nl>
Content-Type: text/plain; charset="US-ASCII"

It seems like recent commits may have broken if_re.

After I updated last weekend (and again this morning), everything appears
to be fine initially: I can ping hosts, browse the Web with Lynx, etc. But
as soon as a certain amount of data has been transferred (e.g. when I
start a graphical browser like Opera or Seamonkey, or I (try to) run port-
snap) suddenly all connectivity vanishes: resolving no longer works, I
can't even ping my modem/router. Running /etc/rc.d/netif restart doesn't
help. I do get lots of watchdog timeout messages on the console.

Fortunately I still have a USB WiFi adapter I can use (if_rum still works
like a charm) so I'm not entirely cut off, but *some*thing appears to have
happened to if_re.

Any thoughts? Thanks in advance,

Alphons

P.S. In case it matters:

$ uname -a
FreeBSD satellite.xs4all.nl 7.3-STABLE FreeBSD 7.3-STABLE #32: Mon Mar 29 14:33:23 CEST 2010 to...@satellite.xs4all.nl:/usr/obj/usr/src/sys/GENERIC i386
$ dmesg|grep re0
re0: <RealTek 8101E/8102E/8102EL/8103E PCIe 10/100baseTX> port 0x5000-0x50ff mem 0xda000000-0xda000fff irq 18 at device 0.0 on pci5
re0: Using 1 MSI messages
re0: Chip rev. 0x34000000
re0: MAC rev. 0x00000000
miibus0: <MII bus> on re0
re0: Ethernet address: *****
re0: [FILTER]

--
... And once you have tasted flight you will walk the earth with your eyes
turned skywards, for there you have been and there you long to return...
-- Leonardo da Vinci


------------------------------

Message: 4
Date: Mon, 29 Mar 2010 09:11:19 -0700
From: Jeremy Chadwick <fre...@jdc.parodius.com>
Subject: Re: ZFS Tuning - arc_summary.pl
To: Barry Pederson <b...@barryp.org>
Cc: jhell <jh...@DataIX.net>, FreeBSD Stable
<freebsd...@freebsd.org>
Message-ID: <2010032916...@icarus.home.lan>
Content-Type: text/plain; charset=us-ascii

On Mon, Mar 29, 2010 at 09:43:08AM -0500, Barry Pederson wrote:
> From another thread I saw, it sounds like arc_max isn't really
> a "Hard Limit" but rather some kind of high water mark. If that's
> the case then I wonder if this might make more sense....

It became a hard limit in a semi-recent commit somewhere. I've lost
count of the modifications at this point. So, the perl script would
have to read __FreeBSD_version in /usr/include/osreldate.h and adjust
its output accordingly.

--
| Jeremy Chadwick j...@parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |

------------------------------

Message: 5
Date: Mon, 29 Mar 2010 12:27:23 -0400
From: Ben Kelly <b...@wanderview.com>
Subject: Re: ZFS Tuning - arc_summary.pl
To: Barry Pederson <b...@barryp.org>
Cc: jhell <jh...@DataIX.net>, FreeBSD Stable
<freebsd...@freebsd.org>
Message-ID: <6823460E-4878-4936...@wanderview.com>
Content-Type: text/plain; charset=us-ascii


On Mar 29, 2010, at 10:43 AM, Barry Pederson wrote:

> I've been using the arc_summary.pl script from here:
>
> http://jhell.googlecode.com/svn/base/head/scripts/zfs/arc_summary/arc_summary.pl
>
> and noticed some odd numbers, with the ARC Current Size being larger than the Max Size, and the breakdown adding up to less than the current size as shown below

I believe the current size can be larger than the max if your working data set is large enough. The ARC can't evict data that is still being referenced. When I last looked (over a year ago) there was no mechanism to provide back pressure from the ARC to the VM layer to request that more nodes be released to help deal with this situation.

I don't know if that really helps at all, but I just thought I would add the data point from my previous debugging sessions with zfs.

- Ben

------------------------------

Message: 6
Date: Mon, 29 Mar 2010 09:56:47 -0700
From: Jeremy Chadwick <fre...@jdc.parodius.com>
Subject: Strange NFS-related messages (related to lockd/statd)
To: freebsd...@freebsd.org
Message-ID: <2010032916...@icarus.home.lan>
Content-Type: text/plain; charset=us-ascii

I recently brought up rpc.lockd and rpc.statd on all of our NFS clients
(mixed RELENG_6, RELENG_7, and RELENG_8), and our NFS server (RELENG_8).

All clients had nfs_client_enable="yes" in rc.conf prior to their last
reboot, but lacked rpcbind_enable="yes", rpc_lockd_enable="yes", and
rpc_statd_enable="yes" prior to the below.

The 8.x clients started rpcbind, rpc.lockd, rpc.statd -- then said:

NLM: failed to contact remote rpcbind, stat = 0, port = 0
Can't start NLM - unable to contact NSM

The 7.x clients started rpcbind, rpc.lockd, rpc.statd -- then said:

Can't start NLM - unable to contact NSM

One of the 7.x clients also kernel panic'd when starting rpc.lockd,
in some nlm_* kernel functions. Looking at commits showed that the bug
that caused the panic was fixed in a later 7.x release.

The 7.x clients started rpcbind, rpc.lockd, rpc.statd -- and said
nothing.

The above daemons were all started in that order, per the FreeBSD
Handbook.

I can't find a definition of what the acronyms NLM and NSM stand for,
nor does Googling the error messages return relevant results (except one
FreeBSD committer reporting similar, but nobody replied). I don't know
the implications of these messages.

The only thing I can think might cause such errors would be the fact
that these machines all have dual NICs with firewall rules applied only
to their primary (WAN-side) interface. The NFS server exists only on
the private (LAN-side) interface. I'm thinking rpcbind may have tried
to "do stuff" on the WAN interface, since no -h option was applied.

I haven't tried making use of -h yet, nor have I tried restarting the
daemons to see if the errors recur (or if it was just a one-time thing).

Any information/tips/advice would be appreciated. Danke!

--
| Jeremy Chadwick j...@parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |

------------------------------

Message: 7
Date: Mon, 29 Mar 2010 17:01:02 +0000
From: Masoom Shaikh <masoom...@gmail.com>
Subject: Re: random FreeBSD panics
To: Ivan Voras <ivo...@freebsd.org>
Cc: freebsd...@freebsd.org, freebsd...@freebsd.org,
freebsd-...@freebsd.org
Message-ID:
<b10011eb1003291001u767...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Sun, Mar 28, 2010 at 5:38 PM, Ivan Voras <ivo...@freebsd.org> wrote:
> On 28 March 2010 16:42, Masoom Shaikh <masoom...@gmail.com> wrote:
>
>> lets assume if this is h/w problem, then how can other OSes overcome
>> this ? is there a way to make FreeBSD ignore this as well, let it
>> result in reasonable performance penalty.
>
> Very probably, if only we could detect where the problem is.
> Try adding "options � � PRINTF_BUFR_SIZE=128" to the kernel

this option is already there

> configuration file if you can, to see if you can get a less mangled
> log outout.
>


------------------------------

Message: 8
Date: Mon, 29 Mar 2010 10:22:39 -0700
From: Pyun YongHyeon <pyu...@gmail.com>
Subject: Re: [if_re] Dropping connectivity
To: "A.J. Fonz van Werven" <a.j.w...@student.utwente.nl>
Cc: freebsd...@freebsd.org
Message-ID: <2010032917...@michelle.cdnetworks.com>
Content-Type: text/plain; charset=us-ascii

On Mon, Mar 29, 2010 at 05:00:19PM +0200, A.J. Fonz van Werven wrote:
> It seems like recent commits may have broken if_re.
>
> After I updated last weekend (and again this morning), everything appears
> to be fine initially: I can ping hosts, browse the Web with Lynx, etc. But
> as soon as a certain amount of data has been transferred (e.g. when I
> start a graphical browser like Opera or Seamonkey, or I (try to) run port-
> snap) suddenly all connectivity vanishes: resolving no longer works, I
> can't even ping my modem/router. Running /etc/rc.d/netif restart doesn't
> help. I do get lots of watchdog timeout messages on the console.
>
> Fortunately I still have a USB WiFi adapter I can use (if_rum still works
> like a charm) so I'm not entirely cut off, but *some*thing appears to have
> happened to if_re.
>
> Any thoughts? Thanks in advance,
>

What is last known working revision of re(4)?

> Alphons
>
> P.S. In case it matters:
>
> $ uname -a
> FreeBSD satellite.xs4all.nl 7.3-STABLE FreeBSD 7.3-STABLE #32: Mon Mar 29 14:33:23 CEST 2010 to...@satellite.xs4all.nl:/usr/obj/usr/src/sys/GENERIC i386
> $ dmesg|grep re0
> re0: <RealTek 8101E/8102E/8102EL/8103E PCIe 10/100baseTX> port 0x5000-0x50ff mem 0xda000000-0xda000fff irq 18 at device 0.0 on pci5
> re0: Using 1 MSI messages
> re0: Chip rev. 0x34000000
> re0: MAC rev. 0x00000000
> miibus0: <MII bus> on re0
> re0: Ethernet address: *****
> re0: [FILTER]


------------------------------

Message: 9
Date: Mon, 29 Mar 2010 10:30:38 -0700
From: Jeremy Chadwick <fre...@jdc.parodius.com>
Subject: Re: random FreeBSD panics
To: Masoom Shaikh <masoom...@gmail.com>
Cc: freebsd...@freebsd.org, freebsd...@freebsd.org, Ivan
Voras <ivo...@freebsd.org>, freebsd-...@freebsd.org
Message-ID: <2010032917...@icarus.home.lan>
Content-Type: text/plain; charset=iso-8859-1

On Mon, Mar 29, 2010 at 05:01:02PM +0000, Masoom Shaikh wrote:
> On Sun, Mar 28, 2010 at 5:38 PM, Ivan Voras <ivo...@freebsd.org> wrote:
> > On 28 March 2010 16:42, Masoom Shaikh <masoom...@gmail.com> wrote:
> >
> >> lets assume if this is h/w problem, then how can other OSes overcome
> >> this ? is there a way to make FreeBSD ignore this as well, let it
> >> result in reasonable performance penalty.
> >
> > Very probably, if only we could detect where the problem is.
> > Try adding "options � � PRINTF_BUFR_SIZE=128" to the kernel
>
> this option is already there

The key word in Ivan's phrase is "less mangled". Neither use of or
increasing PRINTF_BUFR_SIZE solves the problem of interspersed console
output. I've been ranting/raving about this problem for years now; it
truly looks like a mutex lock issue (or lack of such lock), but I've
been told numerous times that isn't the case.

To developers: what incentives would help get this issue well-needed
attention? This problem makes kernel debugging, panic analysis, and
other console-oriented viewing basically impossible.

--
| Jeremy Chadwick j...@parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |

------------------------------

Message: 10
Date: Mon, 29 Mar 2010 14:27:34 -0400
From: John Baldwin <j...@freebsd.org>
Subject: Re: random FreeBSD panics
To: freebsd...@freebsd.org
Cc: freebsd...@freebsd.org, Masoom Shaikh
<masoom...@gmail.com>, Ivan Voras <ivo...@freebsd.org>, Jeremy
Chadwick <fre...@jdc.parodius.com>, freebsd-...@freebsd.org
Message-ID: <20100329142...@freebsd.org>
Content-Type: Text/Plain; charset="iso-8859-1"

On Monday 29 March 2010 1:30:38 pm Jeremy Chadwick wrote:
> On Mon, Mar 29, 2010 at 05:01:02PM +0000, Masoom Shaikh wrote:
> > On Sun, Mar 28, 2010 at 5:38 PM, Ivan Voras <ivo...@freebsd.org> wrote:
> > > On 28 March 2010 16:42, Masoom Shaikh <masoom...@gmail.com> wrote:
> > >
> > >> lets assume if this is h/w problem, then how can other OSes overcome
> > >> this ? is there a way to make FreeBSD ignore this as well, let it
> > >> result in reasonable performance penalty.
> > >
> > > Very probably, if only we could detect where the problem is.
> > > Try adding "options PRINTF_BUFR_SIZE=128" to the kernel
> >
> > this option is already there
>
> The key word in Ivan's phrase is "less mangled". Neither use of or
> increasing PRINTF_BUFR_SIZE solves the problem of interspersed console
> output. I've been ranting/raving about this problem for years now; it
> truly looks like a mutex lock issue (or lack of such lock), but I've
> been told numerous times that isn't the case.
>
> To developers: what incentives would help get this issue well-needed
> attention? This problem makes kernel debugging, panic analysis, and
> other console-oriented viewing basically impossible.

I was recently going to look at it. The somewhat drastic approach I was going
to take was to add a simple serializing lock around trap_fatal() and a few
other places that do similar block prints (e.g. mca_log()). One of the issues
with fixing this in printf itself is that you'd want probably want to
serialize complete lines of text on a per-thread basis. You would want to be
able to accumulate this line of text across multiple calls to printf (think of
it as line-buffering ala stdio). However, some folks may be nervous about
printf not printing things immediately.

The other issue is that lots of code assumes it can call printf from anywhere
and everywhere. Mostly this just means that if you add locking and line-
buffering to printf(9) you have to be very careful to make sure it works in
odd places. Probably a lot of this could be solved by deferring things like
trap_fatal() until panic() has already been called (which is bde's preferred
solution I think).

--
John Baldwin


------------------------------

Message: 11
Date: Mon, 29 Mar 2010 11:38:48 -0700
From: Pyun YongHyeon <pyu...@gmail.com>
Subject: Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't
To: Attila Nagy <b...@fsn.hu>
Cc: Mailing List FreeBSD Stable <freebsd...@freebsd.org>, Michael
Loftis <mlo...@wgops.com>
Message-ID: <2010032918...@michelle.cdnetworks.com>
Content-Type: text/plain; charset=us-ascii

On Mon, Mar 29, 2010 at 12:57:59PM +0200, Attila Nagy wrote:
> Hi,
>
> Michael Loftis wrote:
> >
> >
> > --On Thursday, March 25, 2010 3:22 PM +0100 Attila Nagy <b...@fsn.hu>
> > wrote:
> >
> > <...>
> >> Both unbound and python accepts DNS requests, and it seems when 25%
> >> interrupt happens, only unbound is in *udp state, where it is 50%, both
> >> programs are in that state.
> >
> > Try turning of hardware TSO/checksum offload if it's availble on your
> > chipset? ifconfig <interface> -rxcsum -txcsum -tso -- I'm only using
> > nfe chips right now, but w/ the TSO/CSUM on they lock up constantly
> > under high load. We're pretty sure it's mostly the nfe driver, or the
> > chips themselves, but have never ruled out some generic 8.x hardware
> > offload issues.
> Bingo, this solved the problem. The current uptime nears four days.
> Previously I couldn't go further than a day.
>
> The machine gets very light TCP load (and other machines which get work
> well), so I guess it's UDP RX or TX checksum related.
>

Hmm, this is unexpected result. Since you're using UDP, TSO is not
involved in this issue. Because you disabled RX/TX checksum
offloading could you check how many number of 'bad checksum' and
and 'no checksum' you have from netstat(1)?
To narrow down which side of checksum offloading causes the issue,
would you just disable one side in a time? For instance, disable TX
checksum offloading with RX checksum offloading enabled and see how
bce(4) works.
#ifconfig bce0 -txcsum rxcsum
If that shows the same issue, try disabling RX checksum offloading
but enabling TX checksum offloading.
#ifconfig bce0 txcsum -rxcsum


------------------------------

Message: 12
Date: Mon, 29 Mar 2010 12:09:57 -0700
From: Doug Barton <do...@FreeBSD.org>
Subject: Re: [ HEADS UP ] Ports unstable for the next 10 days
To: Aristedes Maniatis <a...@ish.com.au>
Cc: sta...@freebsd.org, ques...@freebsd.org,
freebs...@freebsd.org
Message-ID: <4BB0FB05...@FreeBSD.org>
Content-Type: text/plain; charset=ISO-8859-1

On 03/29/10 02:27, Aristedes Maniatis wrote:
> On 29/03/10 7:04 PM, Doug Barton wrote:
>>>> portmaster -r graphics/png
>> That won't work, the man page clearly says that it has to be a port
>> directory or glob pattern from /var/db/pkg. The "glob pattern" bit of
>> that was (unfortunately) broken up till version 2.20, which I just
>> committed.
>
> I'm confused. The manual actually says:
>
> [-R] -r name/glob of port in /var/db/pkg
>
>
> When I try your suggestion I get this:
>
> # portmaster -r png-
>
> ===>>> No valid installed port, or port directory given
> ===>>> Try portmaster --help

Are you using portmaster version 2.20?

--

... and that's just a little bit of history repeating.
-- Propellerheads

Improve the effectiveness of your Internet presence with
a domain name makeover! http://SupersetSolutions.com/

------------------------------

Message: 13
Date: Mon, 29 Mar 2010 12:11:58 -0700
From: Doug Barton <do...@FreeBSD.org>
Subject: Re: [ HEADS UP ] Ports unstable for the next 10 days
To: Garrett Cooper <yane...@gmail.com>
Cc: sta...@freebsd.org, ques...@freebsd.org,
freebs...@freebsd.org
Message-ID: <4BB0FB7E...@FreeBSD.org>
Content-Type: text/plain; charset=ISO-8859-1

On 03/29/10 02:53, Garrett Cooper wrote:
> Besides, when I read `glob' I don't think `regular expression'. A
> glob is a simplified extension of regular expressions,

I wasn't going for a rigorous definition here. :) However, "simplified"
is the correct idea.

> The previous method I described works, and works well:
>
> portmaster -r 'png-*'

Right, that will work, but the * isn't necessary. Portmaster will strip
it internally in any case.

> Not sure why graphics/png doesn't work though; hrrm...

The -r option is only relevant to an installed port.


Doug

--

... and that's just a little bit of history repeating.
-- Propellerheads

Improve the effectiveness of your Internet presence with
a domain name makeover! http://SupersetSolutions.com/

------------------------------

Message: 14
Date: Mon, 29 Mar 2010 13:21:42 -0600
From: Adam Vande More <amvan...@gmail.com>
Subject: Re: [ HEADS UP ] Ports unstable for the next 10 days
To: Doug Barton <do...@freebsd.org>
Cc: Garrett Cooper <yane...@gmail.com>, sta...@freebsd.org,
ques...@freebsd.org, freebs...@freebsd.org
Message-ID:
<6201873e1003291221t143...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Mar 29, 2010 at 1:11 PM, Doug Barton <do...@freebsd.org> wrote:

> Right, that will work, but the * isn't necessary. Portmaster will strip
> it internally in any case.
>

Those type of examples in the man pages and UPDATING have never worked for
me in tcsh, I've always had to glob it like Garret stated.


> pkg_info |grep png
linux-f10-png-1.2.37 RPM of the PNG lib (Linux Fedora 10)
png-1.2.42 Library for manipulating PNG images
scr2png-1.2_3 Converts the output of "vidcontrol -p" to PNG
> portmaster -r png-
===>>> No valid installed port, or port directory given
===>>> Try portmaster --help
> portmaster -r 'png-*'
===>>> Currently installed version: png-1.2.42
===>>> Port directory: /usr/ports/graphics/png
===>>> Gathering distinfo list for installed ports
===>>> Launching 'make checksum' for graphics/png in background
===>>> Gathering dependency list for graphics/png from ports
===>>> No dependencies for graphics/png
===>>> Checking ports that depend on png-1.2.42
===>>> Launching child to update akonadi-1.2.1_1
^C
===>>> Build/Install for graphics/png exiting due to signal


--
Adam Vande More


------------------------------

Message: 15
Date: Mon, 29 Mar 2010 21:21:42 +0200
From: Attila Nagy <b...@fsn.hu>
Subject: Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't
To: pyu...@gmail.com
Cc: Mailing List FreeBSD Stable <freebsd...@freebsd.org>, Michael
Loftis <mlo...@wgops.com>
Message-ID: <4BB0FDC6...@fsn.hu>
Content-Type: text/plain; charset=ISO-8859-1

Pyun YongHyeon wrote:
> On Mon, Mar 29, 2010 at 12:57:59PM +0200, Attila Nagy wrote:
>
>> Hi,
>>
>> Michael Loftis wrote:
>>
>>> --On Thursday, March 25, 2010 3:22 PM +0100 Attila Nagy <b...@fsn.hu>
>>> wrote:
>>>
>>> <...>
>>>
>>>> Both unbound and python accepts DNS requests, and it seems when 25%
>>>> interrupt happens, only unbound is in *udp state, where it is 50%, both
>>>> programs are in that state.
>>>>
>>> Try turning of hardware TSO/checksum offload if it's availble on your
>>> chipset? ifconfig <interface> -rxcsum -txcsum -tso -- I'm only using
>>> nfe chips right now, but w/ the TSO/CSUM on they lock up constantly
>>> under high load. We're pretty sure it's mostly the nfe driver, or the
>>> chips themselves, but have never ruled out some generic 8.x hardware
>>> offload issues.
>>>
>> Bingo, this solved the problem. The current uptime nears four days.
>> Previously I couldn't go further than a day.
>>
>> The machine gets very light TCP load (and other machines which get work
>> well), so I guess it's UDP RX or TX checksum related.
>>
>>
>
> Hmm, this is unexpected result. Since you're using UDP, TSO is not
> involved in this issue. Because you disabled RX/TX checksum
> offloading could you check how many number of 'bad checksum' and
> and 'no checksum' you have from netstat(1)?
> To narrow down which side of checksum offloading causes the issue,
> would you just disable one side in a time? For instance, disable TX
> checksum offloading with RX checksum offloading enabled and see how
> bce(4) works.
> #ifconfig bce0 -txcsum rxcsum
> If that shows the same issue, try disabling RX checksum offloading
> but enabling TX checksum offloading.
> #ifconfig bce0 txcsum -rxcsum
>
It's interesting. During the day, I've disabled only HW checksumming and
left TSO enabled. It couldn't run more than a few hours.
I have disabled tso again to see what happens.

BTW, of course there is TCP traffic on that interface (DNS is also
available on TCP), maybe this causes the problem.


------------------------------

Message: 16
Date: Mon, 29 Mar 2010 22:25:29 +0300
From: Alexander Motin <m...@FreeBSD.org>
Subject: Re: ahcich timeouts, only with ahci, not with ataahci
To: Harald Schmalzbauer <h.schma...@omnilan.de>
Cc: freebsd...@FreeBSD.org
Message-ID: <4BB0FEA9...@FreeBSD.org>
Content-Type: text/plain; charset=ISO-8859-15

Harald Schmalzbauer wrote:
> I have the drives now running in another server, ich7 chipset.
> Using UFS, the complete machine locks up for ~30 secs with disk load of
> 3.5MB/s. But I don't get any timeout messages and the machine always
> recovered.

Most of ICH7's do not support AHCI. What's about your's?

> Changing to the old ata driver solves the problem.

Did I get right that you have switched from ahci(4) to ataahci?

> Any chance to get this problem fixed? I couldn't see lockups on another
> OS with NCQ in AHCI mode enabled. I'd ship such a disk to anyone who is
> willing to debug.

It's difficult to fix something, until problem could be reproduced.

If you wish to send drive - my address is:
Topol-2, b34, f150, Dnepropetrovsk, 49040, Ukraine.
Phone: +380503622312.
Do not use courier services, only regular mail. Ask for tracking number.

--
Alexander Motin


------------------------------

Message: 17
Date: Mon, 29 Mar 2010 12:32:36 -0700
From: Doug Barton <do...@FreeBSD.org>
Subject: Re: [ HEADS UP ] Ports unstable for the next 10 days
To: Adam Vande More <amvan...@gmail.com>
Cc: Garrett Cooper <yane...@gmail.com>, sta...@freebsd.org,
ques...@freebsd.org, freebs...@freebsd.org
Message-ID: <4BB10054...@FreeBSD.org>
Content-Type: text/plain; charset=ISO-8859-1

On 03/29/10 12:21, Adam Vande More wrote:
> On Mon, Mar 29, 2010 at 1:11 PM, Doug Barton <do...@freebsd.org
> <mailto:do...@freebsd.org>> wrote:
>
> Right, that will work, but the * isn't necessary. Portmaster will strip
> it internally in any case.
>
>
> Those type of examples in the man pages and UPDATING have never worked
> for me in tcsh, I've always had to glob it like Garret stated.

I'm sorry to repeat myself, but what you're describing is a result of
the fact that in the past the glob code for the -r option was broken. As
of version 2.20 it is no longer broken, and the * is not necessary
(although it won't hurt anything).


hope this helps,

Doug

--

... and that's just a little bit of history repeating.
-- Propellerheads

Improve the effectiveness of your Internet presence with
a domain name makeover! http://SupersetSolutions.com/

------------------------------

Message: 18
Date: Mon, 29 Mar 2010 12:41:31 -0700
From: Pyun YongHyeon <pyu...@gmail.com>
Subject: Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't
To: Attila Nagy <b...@fsn.hu>
Cc: Mailing List FreeBSD Stable <freebsd...@freebsd.org>, Michael
Loftis <mlo...@wgops.com>
Message-ID: <2010032919...@michelle.cdnetworks.com>
Content-Type: text/plain; charset=us-ascii

On Mon, Mar 29, 2010 at 09:21:42PM +0200, Attila Nagy wrote:
> Pyun YongHyeon wrote:
> > On Mon, Mar 29, 2010 at 12:57:59PM +0200, Attila Nagy wrote:
> >
> >> Hi,
> >>
> >> Michael Loftis wrote:
> >>
> >>> --On Thursday, March 25, 2010 3:22 PM +0100 Attila Nagy <b...@fsn.hu>
> >>> wrote:
> >>>
> >>> <...>
> >>>
> >>>> Both unbound and python accepts DNS requests, and it seems when 25%
> >>>> interrupt happens, only unbound is in *udp state, where it is 50%, both
> >>>> programs are in that state.
> >>>>
> >>> Try turning of hardware TSO/checksum offload if it's availble on your
> >>> chipset? ifconfig <interface> -rxcsum -txcsum -tso -- I'm only using
> >>> nfe chips right now, but w/ the TSO/CSUM on they lock up constantly
> >>> under high load. We're pretty sure it's mostly the nfe driver, or the
> >>> chips themselves, but have never ruled out some generic 8.x hardware
> >>> offload issues.
> >>>
> >> Bingo, this solved the problem. The current uptime nears four days.
> >> Previously I couldn't go further than a day.
> >>
> >> The machine gets very light TCP load (and other machines which get work
> >> well), so I guess it's UDP RX or TX checksum related.
> >>
> >>
> >
> > Hmm, this is unexpected result. Since you're using UDP, TSO is not
> > involved in this issue. Because you disabled RX/TX checksum
> > offloading could you check how many number of 'bad checksum' and
> > and 'no checksum' you have from netstat(1)?
> > To narrow down which side of checksum offloading causes the issue,
> > would you just disable one side in a time? For instance, disable TX
> > checksum offloading with RX checksum offloading enabled and see how
> > bce(4) works.
> > #ifconfig bce0 -txcsum rxcsum
> > If that shows the same issue, try disabling RX checksum offloading
> > but enabling TX checksum offloading.
> > #ifconfig bce0 txcsum -rxcsum
> >
> It's interesting. During the day, I've disabled only HW checksumming and
> left TSO enabled. It couldn't run more than a few hours.
> I have disabled tso again to see what happens.
>
> BTW, of course there is TCP traffic on that interface (DNS is also
> available on TCP), maybe this causes the problem.

The only guess I can think of at this moment is incorrect use of
bus_dma(9) in TX path. But I'm not sure this is related with the
issue you're seeing. Would you try the experimental patch at the
following URL?
http://people.freebsd.org/~yongari/bce/bce.20100305.diff
Please make sure to back up your old bce(4) driver before applying
the patch. I didn't see any abnormal things in testing but it
wasn't much stressed.


------------------------------

Message: 19
Date: Mon, 29 Mar 2010 22:18:35 +0200
From: "C. P. Ghost" <cpg...@cordula.ws>
Subject: Re: random FreeBSD panics
To: John Baldwin <j...@freebsd.org>
Cc: freebsd...@freebsd.org
Message-ID:
<d74eb87c1003291318q33...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Mar 29, 2010 at 8:27 PM, John Baldwin <j...@freebsd.org> wrote:
>> To developers: what incentives would help get this issue well-needed
>> attention? �This problem makes kernel debugging, panic analysis, and
>> other console-oriented viewing basically impossible.
>
> I was recently going to look at it. �The somewhat drastic approach I was going
> to take was to add a simple serializing lock around trap_fatal() and a few
> other places that do similar block prints (e.g. mca_log()). �One of the issues
> with fixing this in printf itself is that you'd want probably want to
> serialize complete lines of text on a per-thread basis. �You would want to be
> able to accumulate this line of text across multiple calls to printf (think of
> it as line-buffering ala stdio). �However, some folks may be nervous about
> printf not printing things immediately.
>
> The other issue is that lots of code assumes it can call printf from anywhere
> and everywhere. �Mostly this just means that if you add locking and line-
> buffering to printf(9) you have to be very careful to make sure it works in
> odd places. �Probably a lot of this could be solved by deferring things like
> trap_fatal() until panic() has already been called (which is bde's preferred
> solution I think).

How about serializing all printf(9) through a dedicated kernel thread? Maybe
something as flexible as syslogd for kernel space (klogd), that could also
redirect output to a file, to a serial console etc...?

> John Baldwin

-cpghost.

--
Cordula's Web. http://www.cordula.ws/


------------------------------

Message: 20
Date: Mon, 29 Mar 2010 13:30:48 -0700
From: Jeremy Chadwick <fre...@jdc.parodius.com>
Subject: Re: random FreeBSD panics
To: John Baldwin <j...@freebsd.org>
Cc: freebsd-...@freebsd.org, Masoom Shaikh
<masoom...@gmail.com>, freebsd...@freebsd.org, Ivan Voras
<ivo...@freebsd.org>, freebsd...@freebsd.org
Message-ID: <2010032920...@icarus.home.lan>
Content-Type: text/plain; charset=us-ascii

On Mon, Mar 29, 2010 at 02:27:34PM -0400, John Baldwin wrote:
> On Monday 29 March 2010 1:30:38 pm Jeremy Chadwick wrote:
> > On Mon, Mar 29, 2010 at 05:01:02PM +0000, Masoom Shaikh wrote:
> > > On Sun, Mar 28, 2010 at 5:38 PM, Ivan Voras <ivo...@freebsd.org> wrote:
> > > > On 28 March 2010 16:42, Masoom Shaikh <masoom...@gmail.com> wrote:
> > > >
> > > >> lets assume if this is h/w problem, then how can other OSes overcome
> > > >> this ? is there a way to make FreeBSD ignore this as well, let it
> > > >> result in reasonable performance penalty.
> > > >
> > > > Very probably, if only we could detect where the problem is.
> > > > Try adding "options PRINTF_BUFR_SIZE=128" to the kernel
> > >
> > > this option is already there
> >
> > The key word in Ivan's phrase is "less mangled". Neither use of or
> > increasing PRINTF_BUFR_SIZE solves the problem of interspersed console
> > output. I've been ranting/raving about this problem for years now; it
> > truly looks like a mutex lock issue (or lack of such lock), but I've
> > been told numerous times that isn't the case.
> >
> > To developers: what incentives would help get this issue well-needed
> > attention? This problem makes kernel debugging, panic analysis, and
> > other console-oriented viewing basically impossible.
>
> I was recently going to look at it. The somewhat drastic approach I was going
> to take was to add a simple serializing lock around trap_fatal() and a few
> other places that do similar block prints (e.g. mca_log()). One of the issues
> with fixing this in printf itself is that you'd want probably want to
> serialize complete lines of text on a per-thread basis. You would want to be
> able to accumulate this line of text across multiple calls to printf (think of
> it as line-buffering ala stdio). However, some folks may be nervous about
> printf not printing things immediately.
>
> The other issue is that lots of code assumes it can call printf from anywhere
> and everywhere. Mostly this just means that if you add locking and line-
> buffering to printf(9) you have to be very careful to make sure it works in
> odd places. Probably a lot of this could be solved by deferring things like
> trap_fatal() until panic() has already been called (which is bde's preferred
> solution I think).

John,

Thanks for the insights, they're greatly appreciated.

I went looking this morning to see how Linux addressed this issue (if at
all), and it's been discussed a few times in the past. The longest lkml
thread I could find that mentioned the problem was circa 2002. Probably
not worth reading as there was work done in 2009 to solve the issue.

http://lkml.indiana.edu/hypermail/linux/kernel/0204.1/index.html#161

Work done by RedHat in 2009 details how they implemented a lockless
version of their kernel ring buffer (similar to our system message
buffer, but probably a lot more complex):

http://lwn.net/Articles/340400/
http://lwn.net/Articles/340443/

Supposedly having multiple writers to the ring is 100% safe; no
interspersed output. Same goes for interrupt-generated stuff. There's
some comments in the technical document (2nd link) that imply there's an
individual ring buffer for each CPU; possibly per-CPU kernel message
buffers would solve our issue?

--
| Jeremy Chadwick j...@parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |

------------------------------

Message: 21
Date: Mon, 29 Mar 2010 18:08:19 -0400 (EDT)
From: Rick Macklem <rmac...@uoguelph.ca>
Subject: Re: Strange NFS-related messages (related to lockd/statd)
To: Jeremy Chadwick <fre...@jdc.parodius.com>
Cc: freebsd...@freebsd.org
Message-ID: <Pine.GSO.4.63.10...@muncher.cs.uoguelph.ca>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

On Mon, 29 Mar 2010, Jeremy Chadwick wrote:

>
> I can't find a definition of what the acronyms NLM and NSM stand for,
> nor does Googling the error messages return relevant results (except one
> FreeBSD committer reporting similar, but nobody replied). I don't know
> the implications of these messages.
>

NLM - Network Lock Manager
NSM - Network Status Monitor (I think?)

These two protocols (separate from NFS) were what Sun implemented in
the 1980s to provide locking on NFS mount points. Imho, these protocols
were poorly designed:
- The NLM allows blocking locks at the server, which can cause assorted
nasty issues when the client crashes or gets network partitioned.
- It also depended on the NSM to decide when machines were up/down and
the NSM protocol basically did this in a rather poor way.

A big part of NFSv4 was the integration of locking, in order to avoid
use of the above. (As you might have guessed, lockd and statd implement
the above two protocols.

rick


------------------------------

Message: 22
Date: Mon, 29 Mar 2010 23:26:41 +0100
From: "M. Vale" <maur...@gmail.com>
Subject: FreeBSD 8 and quotas on root partition error
To: freebsd...@freebsd.org
Message-ID:
<85d001331003291526l31b...@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hi, on FreeBSD 8.0 (i386 or AMD64) if we configure to use quotas on root
partition.

It stops on boot with the following message:

Trying to mount root from ufs:/dev/ad0s1a
mount option <userquota> is unknown
mount option <groupquota> is unknown
ROOT MOUNT ERROR: mount option <groupquota> is unknown
If you have invalid mount options, reboot, and first try the following from

the loader prompt:

set vfs.root.mountfrom.options=rw

and then remove invalid mount options from /etc/fstab.

Loader variables:
vfs.root.mountfrom=ufs:/dev/ad0s1a
vfs.root.mountfrom.options=rw,userquota,groupquota,acls


Manual root filesystem specification:
<fstype>:<device> Mount <device> using filesystem <fstype>
eg. ufs:/dev/da0s1a
eg. cd9660:/dev/acd0

This is equivalent to: mount -t cd9660 /dev/acd0 /

? List valid disk boot devices
<empty line> Abort manual input

mountroot>


If i do:

ufs:/dev/ad0s1a

Then the boot continues and it mount the quotas ok. but if I reboot the same
thing happens again.

This only occurs on FreeBSD 8.

Does anyone have a clue about the problem ?

Best Regards


------------------------------

Message: 23
Date: Tue, 30 Mar 2010 11:05:58 +0300
From: Daniel Braniss <da...@cs.huji.ac.il>
Subject: boot and boot0cfg problem
To: sta...@freebsd.org
Message-ID: <E1NwWSU-...@kabab.cs.huji.ac.il>
Content-Type: text/plain; charset=us-ascii

hi,
I have a this SBC that boots off a CF card,
when it boots, I can select the boot partition via F1 or F2
and all is OK.
when I do it via boot0cfg the 'default_selection' changes
correctly, but the 'active' partition is not changed, so boot
ignores it.
I went ahead and changed boot0cfg.c to set the active
partition and now I'm baffled:

alix-3# ./boot0cfg -v ad0
# flag start chs type end chs offset size
1 0x00 0: 1: 1 0xa5 519: 15:63 63 524097
2 0x80 520: 0: 1 0xa5 1023: 15:63 524160 524160 ------+
3 0x00 1023:255:63 0xa5 1023: 15:63 1048320 2951424 |
|
version=2.0 drive=0x80 mask=0xf ticks=182 bell=# (0x23) |
options=packet,update,nosetdrv |
volume serial ID 0000-800f |
default_selection=F2 (Slice 2) <--------------------------------------------+

so far so good.

alix-3# ./boot0cfg -v -s1 ad0
...
1 0x80 0: 1: 1 0xa5 519: 15:63 63 524097
...
default_selection=F1 (Slice 1)

ok right? but no!
./boot0cfg -v ad0
...
2 0x80 520: 0: 1 0xa5 1023: 15:63 524160 524160
...
default_selection=F1 (Slice 1)

so it seems that someone is preventing changes to the partition table!
btw, this problem was not present in older boot0 (1.0) where the active
partition flag is ignored.

help needed here!

danny


------------------------------

Message: 24
Date: Tue, 30 Mar 2010 12:17:53 +0200
From: Attila Nagy <b...@fsn.hu>
Subject: Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't
To: pyu...@gmail.com
Cc: Mailing List FreeBSD Stable <freebsd...@freebsd.org>, Michael
Loftis <mlo...@wgops.com>
Message-ID: <4BB1CFD1...@fsn.hu>
Content-Type: text/plain; charset=ISO-8859-1

Pyun YongHyeon wrote:
> On Mon, Mar 29, 2010 at 09:21:42PM +0200, Attila Nagy wrote:
>
>> Pyun YongHyeon wrote:
>>
>>> On Mon, Mar 29, 2010 at 12:57:59PM +0200, Attila Nagy wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> Michael Loftis wrote:
>>>>
>>>>
>>>>> --On Thursday, March 25, 2010 3:22 PM +0100 Attila Nagy <b...@fsn.hu>
>>>>> wrote:
>>>>>
>>>>> <...>
>>>>>
>>>>>
>>>>>> Both unbound and python accepts DNS requests, and it seems when 25%
>>>>>> interrupt happens, only unbound is in *udp state, where it is 50%, both
>>>>>> programs are in that state.
>>>>>>
>>>>>>
>>>>> Try turning of hardware TSO/checksum offload if it's availble on your
>>>>> chipset? ifconfig <interface> -rxcsum -txcsum -tso -- I'm only using
>>>>> nfe chips right now, but w/ the TSO/CSUM on they lock up constantly
>>>>> under high load. We're pretty sure it's mostly the nfe driver, or the
>>>>> chips themselves, but have never ruled out some generic 8.x hardware
>>>>> offload issues.
>>>>>
>>>>>
>>>> Bingo, this solved the problem. The current uptime nears four days.
>>>> Previously I couldn't go further than a day.
>>>>
>>>> The machine gets very light TCP load (and other machines which get work
>>>> well), so I guess it's UDP RX or TX checksum related.
>>>>
>>>>
>>>>
>>> Hmm, this is unexpected result. Since you're using UDP, TSO is not
>>> involved in this issue. Because you disabled RX/TX checksum
>>> offloading could you check how many number of 'bad checksum' and
>>> and 'no checksum' you have from netstat(1)?
>>> To narrow down which side of checksum offloading causes the issue,
>>> would you just disable one side in a time? For instance, disable TX
>>> checksum offloading with RX checksum offloading enabled and see how
>>> bce(4) works.
>>> #ifconfig bce0 -txcsum rxcsum
>>> If that shows the same issue, try disabling RX checksum offloading
>>> but enabling TX checksum offloading.
>>> #ifconfig bce0 txcsum -rxcsum
>>>
>>>
>> It's interesting. During the day, I've disabled only HW checksumming and
>> left TSO enabled. It couldn't run more than a few hours.
>> I have disabled tso again to see what happens.
>>
>> BTW, of course there is TCP traffic on that interface (DNS is also
>> available on TCP), maybe this causes the problem.
>>
>
> The only guess I can think of at this moment is incorrect use of
> bus_dma(9) in TX path. But I'm not sure this is related with the
> issue you're seeing. Would you try the experimental patch at the
> following URL?
> http://people.freebsd.org/~yongari/bce/bce.20100305.diff
> Please make sure to back up your old bce(4) driver before applying
> the patch. I didn't see any abnormal things in testing but it
> wasn't much stressed.
>
With the default settings (rx, tx csum, tso) it froze in about an hour:
CPU: 0.0% user, 0.0% nice, 0.0% system, 25.0% interrupt, 75.0% idle
714 bind 4 102 0 1200M 1182M *lle 3 17:24 0.00% unbound

------------------------------

Message: 25
Date: Tue, 30 Mar 2010 14:12:37 +0400
From: "Andrey V. Elsukov" <bu7...@yandex.ru>
Subject: Re: boot and boot0cfg problem
To: Daniel Braniss <da...@cs.huji.ac.il>
Cc: sta...@freebsd.org
Message-ID: <4BB1CE95...@yandex.ru>
Content-Type: text/plain; charset=KOI8-R; format=flowed

On 30.03.2010 12:05, Daniel Braniss wrote:
> so it seems that someone is preventing changes to the partition table!
> btw, this problem was not present in older boot0 (1.0) where the active
> partition flag is ignored.

You can change active partition via gpart(8).

--
WBR, Andrey V. Elsukov


------------------------------

Message: 26
Date: Tue, 30 Mar 2010 14:03:39 +0300
From: Daniel Braniss <da...@cs.huji.ac.il>
Subject: Re: boot and boot0cfg problem
To: "Andrey V. Elsukov" <bu7...@yandex.ru>
Cc: sta...@freebsd.org
Message-ID: <E1NwZER-...@kabab.cs.huji.ac.il>
Content-Type: text/plain; charset=us-ascii

> On 30.03.2010 12:05, Daniel Braniss wrote:
> > so it seems that someone is preventing changes to the partition table!
> > btw, this problem was not present in older boot0 (1.0) where the active
> > partition flag is ignored.
>
> You can change active partition via gpart(8).
>
Hi Andrey,
I'm sorry, I've reread the manual, and can't find the write magic.
btw, boot0cfg does call geom but something seems to be broken.

cheers,
danny


------------------------------


End of freebsd-stable Digest, Vol 350, Issue 2
**********************************************

0 new messages