BNX2: Kernel crashes with 2.6.31 and 2.6.31.9

Bruno Prémont

unread,

Dec 29, 2009, 3:10:01 AM12/29/09

to

On a system that was running 2.6.31 since last September I got two
crashes this December at night (cause unknown), yesterday after second
crash I updated kernel to 2.6.31.9 and enabled netconsole in the hope
to get some information about the cause of the crash.

Today system crashed once again and all I got is the following
incomplete trace on the receiving side of netconsole:

[24701.841185] BUG: unable to handle kernel NULL pointer dereference at (null)
[24701.841188] IP: [<ffffffffa00610fc>] bnx2_poll_work+0x2c/0x12d0 [bnx2]
[24701.841197] PGD 16509067 PUD 4e776067 PMD 0
[24701.841199] Oops: 0000 [#1] SMP
[24701.841202] last sysfs file: /sys/kernel/uevent_seqnum
[24701.841204] CPU 0
[24701.841205] Modules linked in: ipmi_devintf squashfs ext2
zlib_inflate netconsole configfs loop dm_round_robin scsi_dh_rdac
dm_multipath scsi_dh dm_mod sg sr_mod cdrom ata_piix i pmi_si
ipmi_msghandler qla2xxx ahci bnx2 hpwdt uhci_hcd ehci_hcd libata
[24701.841218] Pid: 11273, comm: php-cgi Not tainted 2.6.31.9-x86_64 #1 ProLiant DL360 G5
[24701.841220] RIP: 0010:[<ffffffffa00610fc>] [<ffffffffa00610fc>] bnx2_poll_work+0x2c/0x12d0 [bnx2]

Running objdump on the bnx2.ko module I get the following:
000000000000a0d0 <bnx2_poll_work>:
a0d0: 41 57 push %r15
a0d2: 41 56 push %r14
a0d4: 41 55 push %r13
a0d6: 41 54 push %r12
a0d8: 55 push %rbp
a0d9: 53 push %rbx
a0da: 48 81 ec 28 01 00 00 sub $0x128,%rsp
a0e1: 48 89 7c 24 18 mov %rdi,0x18(%rsp)
a0e6: 48 89 74 24 10 mov %rsi,0x10(%rsp)
a0eb: 89 54 24 0c mov %edx,0xc(%rsp)
a0ef: 89 4c 24 08 mov %ecx,0x8(%rsp)
a0f3: 48 8b 54 24 10 mov 0x10(%rsp),%rdx
a0f8: 48 8b 42 70 mov 0x70(%rdx),%rax
a0fc: 0f b7 10 movzwl (%rax),%edx
a0ff: 31 c0 xor %eax,%eax
a101: 48 8b 4c 24 10 mov 0x10(%rsp),%rcx
a106: 80 fa ff cmp $0xff,%dl
a109: 0f 94 c0 sete %al
a10c: 01 c2 add %eax,%edx
a10e: 66 39 91 1a 02 00 00 cmp %dx,0x21a(%rcx)
a115: 0f 84 78 01 00 00 je a293 <bnx2_poll_work+0x1c3>
a11b: 48 8b 57 08 mov 0x8(%rdi),%rdx
a11f: 48 89 f8 mov %rdi,%rax
a122: 48 8b 9a 00 03 00 00 mov 0x300(%rdx),%rbx
a129: 48 83 c0 40 add $0x40,%rax
a12d: 48 29 c1 sub %rax,%rcx
a130: 48 89 c8 mov %rcx,%rax
a133: 48 c1 f8 06 sar $0x6,%rax
a137: 69 c0 39 8e e3 38 imul $0x38e38e39,%eax,%eax
a13d: 48 c1 e0 07 shl $0x7,%rax
a141: 48 01 d8 add %rbx,%rax
a144: 48 89 44 24 20 mov %rax,0x20(%rsp)
a149: 48 8b 7c 24 10 mov 0x10(%rsp),%rdi
a14e: 48 8b 47 70 mov 0x70(%rdi),%rax
a152: 44 0f b7 30 movzwl (%rax),%r14d
a156: 31 c0 xor %eax,%eax
a158: 0f b7 9f 18 02 00 00 movzwl 0x218(%rdi),%ebx
a15f: 41 80 fe ff cmp $0xff,%r14b
a163: 0f 94 c0 sete %al
a166: 45 31 ff xor %r15d,%r15d
a169: 41 01 c6 add %eax,%r14d
a16c: 66 44 39 f3 cmp %r14w,%bx
a170: 0f 84 ee 00 00 00 je a264 <bnx2_poll_work+0x194>
a176: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
a17d: 00 00 00
a180: 0f b6 cb movzbl %bl,%ecx
a183: 48 8b 44 24 10 mov 0x10(%rsp),%rax
a188: 44 0f b7 e1 movzwl %cx,%r12d
a18c: 49 c1 e4 04 shl $0x4,%r12
a190: 4c 03 a0 10 02 00 00 add 0x210(%rax),%r12
a197: 4d 8b 2c 24 mov (%r12),%r13
a19b: 66 41 83 7c 24 08 00 cmpw $0x0,0x8(%r12)
a1a2: 41 0f 18 8d bc 00 00 prefetcht0 0xbc(%r13)
a1a9: 00
...

Kernel is compiled on Gentoo (64bit):
Linux version 2.6.31.9-x86_64 () (gcc version 4.3.4 (Gentoo 4.3.4 p1.0, pie-10.1.5) ) #1 SMP Mon Dec 28 15:49:16 CET 2009
The affected server (HP DL360 G5) is running OpenSuSE-11.1,
32bit userspace

Any idea if there is a recent patch that could fix this issue? At the
crashing time the server was not specifically loaded and had around
200 packets/s network traffic.

Regards,
Bruno

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Benjamin Li

unread,

Dec 29, 2009, 4:10:02 AM12/29/09

to

Hi Bruno,

It looks like the the NULL dereference is happening at a0fc.

a0f8: 48 8b 42 70 mov 0x70(%rdx),%rax
a0fc: 0f b7 10 movzwl (%rax),%edx
a0ff: 31 c0 xor %eax,%eax

The offset of 0x70 is the bp field in the bnx2_napi structure. (Seen in
the bnx2_napi structure dump below) These lines are found in the
routine, bnx2_get_hw_tx_cons() which look like they were inlined by the
compiler. More specifically it looks like the dereference of the
hw_tx_cons_ptr failed.

cons = *bnapi->hw_tx_cons_ptr;

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/net/bnx2.c;h=06b901152d4487fa04164437cc179661b44657fe;hb=74fca6a42863ffacaf7ba6f1936a9f228950f657#l2761

To be sure this is the case, could you send the .config file you are
using or if you could send me the bnx2 kernel module built with the
CFLAG '-g', then we can definitely verify where in the code it is
crashing.

Did you see anything suspicious in the system kernel logs? If you could
isolate the logs from when the machine booted to when it crash and send
it to us it would be very helpful.

Thanks again for your time.

-Ben

<--snip snip structure dump from pahole-->
struct bnx2_napi {
struct napi_struct napi; /* 0 96
*/
/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
struct bnx2 * bp; /* 96 8
*/
union {
struct status_block * msi; /* 8
*/
struct status_block_msix * msix; /* 8
*/
} status_blk; /* 104 8
*/
u16 * hw_tx_cons_ptr; /* 112 8
*/
u16 * hw_rx_cons_ptr; /* 120 8
*/
/* --- cacheline 2 boundary (128 bytes) --- */
u32 last_status_idx; /* 128 4
*/
u32 int_num; /* 132 4
*/
struct bnx2_rx_ring_info rx_ring; /* 136 360
*/
/* --- cacheline 7 boundary (448 bytes) was 48 bytes ago --- */
struct bnx2_tx_ring_info tx_ring; /* 496 48
*/
/* --- cacheline 8 boundary (512 bytes) was 32 bytes ago --- */

/* size: 576, cachelines: 9 */
/* padding: 32 */
};
<--snip snip-->

Bruno Prémont

unread,

Dec 29, 2009, 4:40:02 AM12/29/09

to

Hi Benjamin,

On Tue, 29 Dec 2009 01:05:40 "Benjamin Li" <be...@broadcom.com> wrote:
> Hi Bruno,
>
> It looks like the the NULL dereference is happening at a0fc.
>
> a0f8: 48 8b 42 70 mov 0x70(%rdx),%rax
> a0fc: 0f b7 10 movzwl (%rax),%edx
> a0ff: 31 c0 xor %eax,%eax

Thanks for confirming my guess

> The offset of 0x70 is the bp field in the bnx2_napi structure. (Seen
> in the bnx2_napi structure dump below) These lines are found in the
> routine, bnx2_get_hw_tx_cons() which look like they were inlined by
> the compiler. More specifically it looks like the dereference of the
> hw_tx_cons_ptr failed.
>
> cons = *bnapi->hw_tx_cons_ptr;
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/net/bnx2.c;h=06b901152d4487fa04164437cc179661b44657fe;hb=74fca6a42863ffacaf7ba6f1936a9f228950f657#l2761
>
> To be sure this is the case, could you send the .config file you are
> using or if you could send me the bnx2 kernel module built with the
> CFLAG '-g', then we can definitely verify where in the code it is
> crashing.

See attached .config, if needed I can recompile with the module with
'-g', but the original instance does not contain debugging info.

> Did you see anything suspicious in the system kernel logs? If you
> could isolate the logs from when the machine booted to when it crash
> and send it to us it would be very helpful.

Unfortunately there is nothing suspicious in there, all I have is
attached dmesg (with IP addresses, MAC addresses replaced by '*'s)

I've not appended the crash dump gathered via netconsole which didn't
make it to the affected system's disk (see previous mail for it).

Regards,
Bruno

dmesg

.config

Bruno Prémont

unread,

Dec 29, 2009, 9:00:01 AM12/29/09

to

On Tue, 29 Dec 2009 01:05:40 "Benjamin Li" <be...@broadcom.com> wrote:

> Hi Bruno,
>
> It looks like the the NULL dereference is happening at a0fc.
>
> a0f8: 48 8b 42 70 mov 0x70(%rdx),%rax
> a0fc: 0f b7 10 movzwl (%rax),%edx
> a0ff: 31 c0 xor %eax,%eax
>
> The offset of 0x70 is the bp field in the bnx2_napi structure. (Seen
> in the bnx2_napi structure dump below) These lines are found in the
> routine, bnx2_get_hw_tx_cons() which look like they were inlined by
> the compiler. More specifically it looks like the dereference of the
> hw_tx_cons_ptr failed.
>
> cons = *bnapi->hw_tx_cons_ptr;
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/net/bnx2.c;h=06b901152d4487fa04164437cc179661b44657fe;hb=74fca6a42863ffacaf7ba6f1936a9f228950f657#l2761
>
> To be sure this is the case, could you send the .config file you are
> using or if you could send me the bnx2 kernel module built with the
> CFLAG '-g', then we can definitely verify where in the code it is
> crashing.
>
> Did you see anything suspicious in the system kernel logs? If you
> could isolate the logs from when the machine booted to when it crash
> and send it to us it would be very helpful.

It crashes every now and then (since netconsole is enabled it does not
survive 24 hours :( ) while or just after transmitting log messages with
netconsole, the messages being transmitted are logging that occurs with
netfilter 'LOG' target.

Sample output as seen by netconsole recipient (1 packet per line, IP
addresses masked):

[ 2115.949606] (reject)output: IN= OUT=eth0
SRC=***.**.*.** DST=**.***.**.***
LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=29589
DF
PROTO=TCP
SPT=58991 DPT=80
WINDOW=5840
RES=0x00
SYN
URGP=0

[ 2115.949704] (reject)output: IN= OUT=eth0
SRC=***.**.*.** DST=**.***.**.***
[ 2115.949729] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 2115.949732] IP: [<ffffffffa00680fc>] bnx2_poll_work+0x2c/0x12d0 [bnx2]
[ 2115.949742] PGD 5b6f0067 PUD 59c04067 PMD 0
[ 2115.949744] Oops: 0000 [#1] SMP
[ 2115.949746] last sysfs file: /sys/kernel/uevent_seqnum
[ 2115.949749] CPU 3
[ 2115.949750] Modules linked in: dm_round_robin scsi_dh_rdac ipmi_devintf netconsole squashfs configfs zlib_inflate ext2 loop dm_multipath scsi_dh dm_mod sg sr_mod cdrom ata_piix h
pwdt qla2xxx ipmi_si ahci bnx2 ipmi_msghandler libata uhci_hcd ehci_hcd
[ 2115.949764] Pid: 7926, comm: php-cgi Not tainted 2.6.31.9-x86_64 #1 ProLiant DL360 G5
[ 2115.949766] RIP: 0010:[<ffffffffa00680fc>] [<ffffffffa00680fc>] bnx2_poll_work+0x2c/0x12d0 [bnx2]

Looks like netpoll is triggering suicide on BNX2.

Any way to get the NULL-pointer non-fatal would help a lot! (any
sensible thing to do when bnapi->hw_tx_cons_ptr is NULL that would
allow the system to continue working without killing everything?)

Benjamin Li

unread,

Dec 30, 2009, 12:10:02 AM12/30/09

to

Hi Bruno.

Could you try running with the attached patch? This debug patch is
built against the linux-2.6.31.9 kernel. I think the panic is occuring
right before a reset has occured due to a TX timeout. To see if this is
happening, this patch will print hardware state information when a TX
timeout occurs. If you could run with this patch and send the logs when
the panic occurs, I would really appreciate it.

Thanks again.

-Ben

bnx2_ftq_state_dump.diff

Benjamin Li

unread,

Feb 19, 2010, 3:00:02 PM2/19/10

to

Hi Bruno,

No problems. Thanks for following up with this problem, I really
appreciate all your help.

>From your logs it looks like the device came up using MSI, but in the
MSI-X poll routine was being called:

[ 9.836673] bnx2: eth0: using MSI
...

[ 134.643459] [<ffffffffa004019e>] bnx2_poll_msix+0x3e/0xd0 [bnx2]
[ 134.643465] [<ffffffff8135bcd1>] netpoll_poll+0xe1/0x3c0

which is incorrect. If we are in MSI mode, the bnx2_poll() routine
should be used.

I think what is going on here is that during the bnx2x driver
initialization the current bnx2 driver adds all possible NAPI structures
that map to all the hardware vectors (BNX2_MAX_MSIX_VEC=9) to the NAPI
list in the net_device structure regardless if they are used or not
(Seen in drivers/net/bnx2.c:bnx2_init_napi()). This can cause
uninitialized NAPI structures to be placed on the napi_list. Because
this device is in MSI mode, only 1 vector is initialized. Now, the
problem is triggered when net/core/netpoll.c:poll_napi() is called.
This is because this routine will run through the entire napi_list
calling all the poll routines. In your particular case, it is calling
the poll routine on an uninitialized vector causing the kernel panic.

Please try the patch below to see if it solves your problem. Note, this
only have been compile tested and tested against basic traffic runs.
Unfortunately, I could not reproduce the kernel panic with the
instructions below to verify the patch.

Thanks again for all your help in helping us track this down.

-Ben

On Fri, 2010-02-19 at 00:10 -0800, Bruno Pr�mont wrote:
> Hi Benjamin,

>
> On Tue, 29 Dec 2009 21:08:11 "Benjamin Li" wrote:
> > Could you try running with the attached patch? This debug patch is
> > built against the linux-2.6.31.9 kernel. I think the panic is
> > occuring right before a reset has occured due to a TX timeout. To
> > see if this is happening, this patch will print hardware state
> > information when a TX timeout occurs. If you could run with this
> > patch and send the logs when the panic occurs, I would really
> > appreciate it.
> >
> > Thanks again.
> >
> > -Ben
>

> Sorry for replying only this late but I've been too busy with other
> things.
>
> Anyhow, I've been doing some more testing yesterday and today and now
> am able to reproduce the/a crash pretty easily.
>
> Either running netconsole and doing 'echo t > /proc/sysrq-trigger' via
> SSH on otherwise idle server (from local console nothing bad happens),
> but then I have no means to communicate with the kernel (I guess it's
> deadlocked somewhere in printk code)
>
> The slightly less easy way to trigger it is with a dummy module that
> kind of simulates netconsole behavior but with dummy data (see
> attached). I have to have some more traffic (TCP?) going on for the bug
> to trigger and tell my module multiple times to push data. This way
> server is still accessible via VGA or serial console.
>
> Attached are my 'netbomb.c' (which is a modified netconsole.c) and
> full kernel log. This time running a 2.6.33-rc8-git3 kernel, having
> forward-ported your patch above (e.g. half of it was already present)
>
>
> I this time I got the following trace:
> [ 134.643292] BUG: unable to handle kernel NULL pointer dereference at (null)
> [ 134.643304] IP: [<ffffffffa003edc2>] bnx2_poll_work+0x32/0x13d0 [bnx2]
> [ 134.643314] PGD 2a972a067 PUD 2aa245067 PMD 0
> [ 134.643319] Oops: 0000 [#1] SMP
> [ 134.643323] last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:01:04.6/class
> [ 134.643328] CPU 4
> [ 134.643334] Pid: 3226, comm: cat Not tainted 2.6.33-rc8-git3-x86_64 #3 /ProLiant DL360 G5
> [ 134.643339] RIP: 0010:[<ffffffffa003edc2>] [<ffffffffa003edc2>] bnx2_poll_work+0x32/0x13d0 [bnx2]
> [ 134.643347] RSP: 0018:ffff8802a9643b38 EFLAGS: 00010092
> [ 134.643351] RAX: 0000000000000000 RBX: ffff8802afab57c0 RCX: 0000000000000010
> [ 134.643355] RDX: 0000000000000000 RSI: ffff8802afab57c0 RDI: ffff8802afab4580
> [ 134.643359] RBP: ffff8802a9643cd8 R08: ffff8802af051000 R09: 0000000000000007
> [ 134.643363] R10: 000000000000000e R11: 0000000000000000 R12: 0000000000000000
> [ 134.643367] R13: 0000000000000010 R14: 0000000000000000 R15: ffff8802afab4580
> [ 134.643371] FS: 0000000000000000(0000) GS:ffff880028300000(0063) knlGS:00000000f765f6c0
> [ 134.643376] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
> [ 134.643380] CR2: 0000000000000000 CR3: 00000002a9606000 CR4: 00000000000006e0
> [ 134.643384] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 134.643388] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 134.643392] Process cat (pid: 3226, threadinfo ffff8802a9642000, task ffff8802aa14bff0)
> [ 134.643396] Stack:
> [ 134.643398] 0000000000000070 0000000000000002 0000000000000010 ffff8802afab57c0
> [ 134.643404] <0> ffff8802afab4580 0000000300000002 0000000000000000 0000000100000002
> [ 134.643410] <0> 0000000000000000 0000000200025220 ffffffff81862e80 0000000000000001
> [ 134.643418] Call Trace:
> [ 134.643427] [<ffffffff8107e3fe>] ? __alloc_pages_nodemask+0xfe/0x660
> [ 134.643433] [<ffffffff811b6de6>] ? msi_set_mask_bit+0x26/0xc0
> [ 134.643438] [<ffffffff811b6e8b>] ? unmask_msi_irq+0xb/0x10
> [ 134.643443] [<ffffffff8106db54>] ? default_enable+0x24/0x40
> [ 134.643448] [<ffffffff8106d9b6>] ? check_irq_resend+0x26/0x70
> [ 134.643453] [<ffffffff8106cc23>] ? __enable_irq+0x73/0x80
> [ 134.643459] [<ffffffffa004019e>] bnx2_poll_msix+0x3e/0xd0 [bnx2]
> [ 134.643465] [<ffffffff8135bcd1>] netpoll_poll+0xe1/0x3c0
> [ 134.643470] [<ffffffff8135c168>] netpoll_send_skb+0x118/0x210
> [ 134.643475] [<ffffffff8135c45b>] netpoll_send_udp+0x1fb/0x210
> [ 134.643480] [<ffffffffa00981c5>] write_msg+0x95/0xd0 [netbomb]
> [ 134.643485] [<ffffffffa0098255>] netbomb_write+0x55/0xa4 [netbomb]
> [ 134.643492] [<ffffffff810f6581>] proc_reg_write+0x71/0xb0
> [ 134.643498] [<ffffffff810ab6cb>] vfs_write+0xcb/0x180
> [ 134.643503] [<ffffffff810ab870>] sys_write+0x50/0x90
> [ 134.643509] [<ffffffff8102a1a4>] sysenter_dispatch+0x7/0x2b
> [ 134.643513] Code: 56 41 55 41 54 53 48 81 ec 78 01 00 00 48 89 bd 80 fe ff ff 48 89 b5 78 fe ff ff 89 95 74 fe ff ff 89 8d 70 fe ff ff 48 8b 46 70 <0f> b7 10 31 c0 80 fa ff 0f 94 c0 01 c2 66 39 96 12 02 00 00 0f
> [ 134.643551] RIP [<ffffffffa003edc2>] bnx2_poll_work+0x32/0x13d0 [bnx2]
> [ 134.643557] RSP <ffff8802a9643b38>
> [ 134.643559] CR2: 0000000000000000
> [ 134.643563] ---[ end trace 48bdec67d6d7aadb ]---
>
> Running objdump on kernel compile with debugging symbols this matches:
> 000000000000ad90 <bnx2_poll_work>:
> }
> }
>
> static int bnx2_poll_work(struct bnx2 *bp, struct bnx2_napi *bnapi,
> int work_done, int budget)
> {
> ad90: 55 push %rbp
> ad91: 48 89 e5 mov %rsp,%rbp
> ad94: 41 57 push %r15
> ad96: 41 56 push %r14
> ad98: 41 55 push %r13
> ad9a: 41 54 push %r12
> ad9c: 53 push %rbx
> ad9d: 48 81 ec 78 01 00 00 sub $0x178,%rsp
> ada4: 48 89 bd 80 fe ff ff mov %rdi,-0x180(%rbp)
> adab: 48 89 b5 78 fe ff ff mov %rsi,-0x188(%rbp)
> adb2: 89 95 74 fe ff ff mov %edx,-0x18c(%rbp)
> adb8: 89 8d 70 fe ff ff mov %ecx,-0x190(%rbp)
> {
> u16 cons;
>
> /* Tell compiler that status block fields can change. */
> barrier();
> cons = *bnapi->hw_tx_cons_ptr;
> adbe: 48 8b 46 70 mov 0x70(%rsi),%rax
> adc2: 0f b7 10 movzwl (%rax),%edx
> barrier();
> if (unlikely((cons & MAX_TX_DESC_CNT) == MAX_TX_DESC_CNT))
> cons++;
> adc5: 31 c0 xor %eax,%eax
> adc7: 80 fa ff cmp $0xff,%dl
> adca: 0f 94 c0 sete %al
> adcd: 01 c2 add %eax,%edx
> int work_done, int budget)
> {
> struct bnx2_tx_ring_info *txr = &bnapi->tx_ring;
> struct bnx2_rx_ring_info *rxr = &bnapi->rx_ring;
>
> if (bnx2_get_hw_tx_cons(bnapi) != txr->hw_tx_cons)
> adcf: 66 39 96 12 02 00 00 cmp %dx,0x212(%rsi)
> add6: 0f 84 4f 03 00 00 je b12b <bnx2_poll_work+0x39b>
>
>
> So as already determined bnapi->hw_tx_cons_ptr is NULL... but nothing is
> happening after that on network side.
>
> Regards,
> Bruno

bnx2_add_only_used_vectors.c

Brian Haley

unread,

Feb 19, 2010, 4:10:01 PM2/19/10

to

Hi Ben,

Benjamin Li wrote:
> Hi Bruno,
>
> No problems. Thanks for following up with this problem, I really
> appreciate all your help.
>
>>From your logs it looks like the device came up using MSI, but in the
> MSI-X poll routine was being called:
>
> [ 9.836673] bnx2: eth0: using MSI
> ...
>
> [ 134.643459] [<ffffffffa004019e>] bnx2_poll_msix+0x3e/0xd0 [bnx2]
> [ 134.643465] [<ffffffff8135bcd1>] netpoll_poll+0xe1/0x3c0
>
> which is incorrect. If we are in MSI mode, the bnx2_poll() routine
> should be used.
>
> I think what is going on here is that during the bnx2x driver
> initialization the current bnx2 driver adds all possible NAPI structures
> that map to all the hardware vectors (BNX2_MAX_MSIX_VEC=9) to the NAPI
> list in the net_device structure regardless if they are used or not
> (Seen in drivers/net/bnx2.c:bnx2_init_napi()). This can cause
> uninitialized NAPI structures to be placed on the napi_list. Because
> this device is in MSI mode, only 1 vector is initialized. Now, the
> problem is triggered when net/core/netpoll.c:poll_napi() is called.
> This is because this routine will run through the entire napi_list
> calling all the poll routines. In your particular case, it is calling
> the poll routine on an uninitialized vector causing the kernel panic.

...
> @@ -8201,7 +8204,7 @@ bnx2_init_napi(struct bnx2 *bp)
> {
> int i;
>
> - for (i = 0; i < BNX2_MAX_MSIX_VEC; i++) {
> + for (i = 0; i < bp->irq_nvecs; i++) {
> struct bnx2_napi *bnapi = &bp->bnx2_napi[i];
> int (*poll)(struct napi_struct *, int);

Would this same change need to be made in other places, like bnx2_init_chip()
or bnx2_clear_ring_states() ?

-Brian

Benjamin Li

unread,

Feb 19, 2010, 4:50:03 PM2/19/10

to

Hi Brian,

On Fri, 2010-02-19 at 13:03 -0800, Brian Haley wrote:
> Hi Ben,
>
> Benjamin Li wrote:
> > Hi Bruno,
> >

> > @@ -8201,7 +8204,7 @@ bnx2_init_napi(struct bnx2 *bp)
> > {
> > int i;
> >
> > - for (i = 0; i < BNX2_MAX_MSIX_VEC; i++) {
> > + for (i = 0; i < bp->irq_nvecs; i++) {
> > struct bnx2_napi *bnapi = &bp->bnx2_napi[i];
> > int (*poll)(struct napi_struct *, int);
>
> Would this same change need to be made in other places, like bnx2_init_chip()
> or bnx2_clear_ring_states() ?

The other locations in the bnx2.c driver are bnx2_init_chip(),
bnx2_clear_ring_states(), bnx2_alloc_mem(). With the current
implementation, the bnx2_napi structures are initialize but never used
which should be ok. But, we can clean this up to save some cycles.

The following are the areas in the code which iterate through all the
vectors.

bnx2_init_chip() - zero the last_status_idx field in the bnx2_napi
structure
bnx2_clear_ring_states() - zero the rings producer/consumer indexes
bnx2_alloc_mem() - initialize the consumer pointers

Thanks again.

-Ben

Bruno Prémont

unread,

Feb 23, 2010, 7:20:02 AM2/23/10

to

Hi Benjamin,

I applied the patch today and tried to reproduce with my showcases.

Seems that it's harder to trigger now but I still end up being able to
crash the box. Don't know if it's the same cause or not (could also
be the tcp-retransmit ghost)...

This time I had to run a few paralell scp's (8Mb/s each) to the box and
'echo t > /proc/sysrq-trigger' multiple times via ssh session for it to
happen. It didn't trigger with by netbomb though I will try some more
and see)

I don't know if it's the same reason or not (hopefully something
reached disk as serial console is dead and pings are not
answered anymore.
It's probably some printk/bug/warn that triggers in network stack and
deadlocks with netconsole.

Benjamin Li

unread,

Mar 1, 2010, 8:30:02 PM3/1/10

to

Hi Bruno,

Thanks for trying the patch. I still haven't been able to reproduce
what you are seeing here. I am able to run scp and 'echo t
> /proc/sysrq-trigger' multiple times. I was wondering if you had any
success reproducing the problem with a stack trace?

Thanks again.

-Ben

Bruno Prémont

unread,

Mar 2, 2010, 2:20:01 AM3/2/10

to

Hi Benjamin,

On Mon, 1 Mar 2010 17:26:10 "Benjamin Li" <be...@broadcom.com> wrote:
> Thanks for trying the patch. I still haven't been able to reproduce
> what you are seeing here. I am able to run scp and 'echo t
> > /proc/sysrq-trigger' multiple times. I was wondering if you had any
> success reproducing the problem with a stack trace?

Last week I couldn't trigger it again after booting with console=ttyS1,
but today I finally got the box to crash (same procedure but matter of
"luck" as to when it happens)

System continues printing the bnx2 debug messages but does not accept any
input (not even on serial console). Every few minutes it complains about
collectd being blocked for more than 120 seconds (collectd wants to send
UDP packets with system state)

The running kernel is 2.6.33-rc8 (though I did not forward-port your
patch in this thread, that is bnx2_dump_ftq() is missing - half of it
having been applied to 2.6.33). Looking at the trace I got I will add
that part and try again.

Regards,
Bruno

[575872.163771] ------------[ cut here ]------------
[575872.172467] WARNING: at /usr/src/linux-2.6.33-rc8-git7/kernel/softirq.c:143 local_bh_enable_ip+0x72/0xa0()
[575872.172890] Hardware name: ProLiant DL360 G5
[575872.172890] Modules linked in: qla2xxx netbomb ipmi_devintf loop dm_mod sg sr_mod cdrom ata_piix ahci uhci_hcd ipmi_si ipmi_msghandler hpwdt bnx2 ehci_hcd libata [last unloaded: qla2xxx]
[575872.172890] Pid: 32564, comm: cat Not tainted 2.6.33-rc8-git7-x86_64 #1
[575872.172890] Call Trace:
[575872.172890] [<ffffffff8103f002>] ? local_bh_enable_ip+0x72/0xa0
[575872.172890] [<ffffffff81039368>] warn_slowpath_common+0x78/0xd0
[575872.172890] [<ffffffff810393cf>] warn_slowpath_null+0xf/0x20
[575872.172890] [<ffffffff8103f002>] local_bh_enable_ip+0x72/0xa0
[575872.172890] [<ffffffff814002af>] _raw_spin_unlock_bh+0xf/0x20
[575872.172890] [<ffffffffa0026ed4>] bnx2_reg_rd_ind+0x44/0x60 [bnx2]
[575872.172890] [<ffffffffa0026eff>] bnx2_shmem_rd+0xf/0x20 [bnx2]
[575872.172890] [<ffffffffa0030ff4>] bnx2_poll+0x194/0x228 [bnx2]
[575872.172890] [<ffffffff8135c081>] netpoll_poll+0xe1/0x3c0
[575872.172890] [<ffffffff8135c518>] netpoll_send_skb+0x118/0x210
[575872.172890] [<ffffffff8135c80b>] netpoll_send_udp+0x1fb/0x210
[575872.172890] [<ffffffffa00501c5>] write_msg+0x95/0xd0 [netbomb]
[575872.172890] [<ffffffffa0050255>] netbomb_write+0x55/0xa4 [netbomb]
[575872.172890] [<ffffffff810f6571>] proc_reg_write+0x71/0xb0
[575872.172890] [<ffffffff810ab6db>] vfs_write+0xcb/0x180
[575872.172890] [<ffffffff810ab880>] sys_write+0x50/0x90
[575872.172890] [<ffffffff8102a1a4>] sysenter_dispatch+0x7/0x2b
[575872.172890] ---[ end trace d4f601a1b34bd327 ]---
[575878.950008] ------------[ cut here ]------------
[575878.955297] WARNING: at /usr/src/linux-2.6.33-rc8-git7/net/sched/sch_generic.c:255 dev_watchdog+0x25e/0x270()
[575878.966541] Hardware name: ProLiant DL360 G5
[575878.972352] NETDEV WATCHDOG: eth0 (bnx2): transmit queue 0 timed out
[575878.976435] Modules linked in: qla2xxx netbomb ipmi_devintf loop dm_mod sg sr_mod cdrom ata_piix ahci uhci_hcd ipmi_si ipmi_msghandler hpwdt bnx2 ehci_hcd libata [last unloaded: qla2xxx]
[575878.985325] Pid: 0, comm: swapper Tainted: G W 2.6.33-rc8-git7-x86_64 #1
[575878.988929] Call Trace:
[575878.990124] <IRQ> [<ffffffff8135f84e>] ? dev_watchdog+0x25e/0x270
[575878.994317] [<ffffffff81039368>] warn_slowpath_common+0x78/0xd0
[575878.998118] [<ffffffff81039444>] warn_slowpath_fmt+0x64/0x70
[575879.010352] [<ffffffff81362206>] ? nlmsg_notify+0x46/0xc0
[575879.012956] [<ffffffff8119f339>] ? strlcpy+0x49/0x60
[575879.015487] [<ffffffff81349b33>] ? netdev_drivername+0x43/0x50
[575879.027713] [<ffffffff8135f84e>] dev_watchdog+0x25e/0x270
[575879.030339] [<ffffffff810354fc>] ? scheduler_tick+0x12c/0x290
[575879.034202] [<ffffffff8135f5f0>] ? dev_watchdog+0x0/0x270
[575879.047296] [<ffffffff810440fc>] run_timer_softirq+0x13c/0x210
[575879.051101] [<ffffffff8105b4b7>] ? clockevents_program_event+0x57/0xa0
[575879.054422] [<ffffffff8103edb6>] __do_softirq+0xa6/0x130
[575879.065396] [<ffffffff81003bcc>] call_softirq+0x1c/0x30
[575879.070825] [<ffffffff81005be5>] do_softirq+0x55/0x90
[575879.073266] [<ffffffff8103eb35>] irq_exit+0x75/0x90
[575879.075618] [<ffffffff8101aeed>] smp_apic_timer_interrupt+0x6d/0xa0
[575879.094357] [<ffffffff81003693>] apic_timer_interrupt+0x13/0x20
[575879.099195] <EOI> [<ffffffff8100b186>] ? mwait_idle+0x66/0x80
[575879.102225] [<ffffffff81001f90>] ? enter_idle+0x20/0x30
[575879.119572] [<ffffffff81002003>] cpu_idle+0x63/0xb0
[575879.121925] [<ffffffff818ab5de>] start_secondary+0x158/0x1aa
[575879.125708] ---[ end trace d4f601a1b34bd328 ]---
[575879.130792] bnx2: eth0 DEBUG: intr_sem[0]
[575879.132880] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[575879.149587] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[575879.152959] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[575883.951883] bnx2: eth0 DEBUG: intr_sem[0]
[575883.961580] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[575883.974574] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[575883.985269] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[575888.951879] bnx2: eth0 DEBUG: intr_sem[0]
[575888.955092] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[575888.967583] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[575888.973761] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[575893.951883] bnx2: eth0 DEBUG: intr_sem[0]
[575893.953954] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[575893.964160] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[575893.968596] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[575898.951879] bnx2: eth0 DEBUG: intr_sem[0]
[575898.957913] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[575898.962225] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[575898.972121] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[575903.951883] bnx2: eth0 DEBUG: intr_sem[0]
[575903.956728] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[575903.961380] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[575903.972171] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[575908.951880] bnx2: eth0 DEBUG: intr_sem[0]
[575908.956790] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[575908.961398] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[575908.974628] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[575913.951883] bnx2: eth0 DEBUG: intr_sem[0]
[575913.958417] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[575913.962159] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[575913.965457] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[575918.951879] bnx2: eth0 DEBUG: intr_sem[0]
[575918.958634] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[575918.965095] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[575918.974149] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[575923.951883] bnx2: eth0 DEBUG: intr_sem[0]
[575923.964361] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[575923.970549] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[575923.975969] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[575928.951879] bnx2: eth0 DEBUG: intr_sem[0]
[575928.961375] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[575928.965103] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[575928.969552] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[575933.951883] bnx2: eth0 DEBUG: intr_sem[0]
[575933.963067] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[575933.970543] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[575933.993032] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[575938.951879] bnx2: eth0 DEBUG: intr_sem[0]
[575938.958134] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[575938.961695] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[575938.968855] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[575943.951883] bnx2: eth0 DEBUG: intr_sem[0]
[575943.959389] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[575943.964102] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[575943.967612] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[575948.951879] bnx2: eth0 DEBUG: intr_sem[0]
....

Blocked collectd task:
[576003.053144] INFO: task collectd:3019 blocked for more than 120 seconds.
[576003.066206] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[576003.072659] collectd D 0000000000000000 0 3019 1 0x00020000
[576003.072663] ffff8802aa3afa78 0000000000200086 0000000000000000 0000000000000008
[576003.072666] 0000000000012780 000000000000dda8 ffff8802aa3affd8 ffff8802aa256fa0
[576003.072670] ffff8802af86e9c0 ffff8802aa257208 00000005ab1a3000 00000001036e43c3
[576003.072673] Call Trace:
[576003.072682] [<ffffffff813ff33e>] __mutex_lock_slowpath+0x12e/0x180
[576003.072687] [<ffffffff813fecfe>] mutex_lock+0x1e/0x40
[576003.072690] [<ffffffff81357fd8>] rtnetlink_rcv+0x18/0x40
[576003.072694] [<ffffffff813621b5>] netlink_unicast+0x285/0x290
[576003.072698] [<ffffffff81344f83>] ? memcpy_fromiovec+0x63/0x80
[576003.072701] [<ffffffff81362c63>] netlink_sendmsg+0x1d3/0x2c0
[576003.072705] [<ffffffff810bdd9e>] ? __d_instantiate+0x5e/0xd0
[576003.072708] [<ffffffff8133981b>] sock_sendmsg+0xbb/0xf0
[576003.072712] [<ffffffff810c72aa>] ? seq_open+0x4a/0xa0
[576003.072714] [<ffffffff81338ed9>] ? copy_from_user+0x9/0x10
[576003.072717] [<ffffffff81338f0c>] ? move_addr_to_kernel+0x2c/0x40
[576003.072720] [<ffffffff8135e09c>] ? verify_compat_iovec+0x9c/0x110
[576003.072723] [<ffffffff8133b734>] sys_sendmsg+0x184/0x320
[576003.072728] [<ffffffff810f5d60>] ? proc_delete_inode+0x0/0x50
[576003.072731] [<ffffffff810700e0>] ? call_rcu_sched+0x10/0x20
[576003.072733] [<ffffffff810700f9>] ? call_rcu+0x9/0x10
[576003.072736] [<ffffffff810bca9a>] ? d_free+0x3a/0x60
[576003.072741] [<ffffffff810acae8>] ? __fput+0x178/0x1f0
[576003.072744] [<ffffffff81058014>] ? getnstimeofday+0x64/0xf0
[576003.072747] [<ffffffff8135d94c>] compat_sys_socketcall+0xcc/0x210
[576003.072751] [<ffffffff8102a1a4>] sysenter_dispatch+0x7/0x2b

Bruno Prémont

unread,

Mar 2, 2010, 3:30:02 AM3/2/10

to

Hi Benjamin,

> The running kernel is 2.6.33-rc8 (though I did not forward-port your
> patch in this thread, that is bnx2_dump_ftq() is missing - half of it
> having been applied to 2.6.33). Looking at the trace I got I will add
> that part and try again.

Here it is with bnx2_dump_ftq added:

[ 3405.422963] ------------[ cut here ]------------
[ 3405.428958] WARNING: at /usr/src/linux-2.6.33-rc8-git7/kernel/softirq.c:143 local_bh_enable_ip+0x72/0xa0()
[ 3405.431858] Hardware name: ProLiant DL360 G5
[ 3405.431858] Modules linked in: netbomb bnx2 ipmi_devintf loop dm_mod sg sr_mod cdrom ata_piix ahci ipmi_si ipmi_msghandler uhci_hcd qla2xxx libata hpwdt ehci_hcd [last unloaded: bnx2]
[ 3405.431858] Pid: 25763, comm: cat Not tainted 2.6.33-rc8-git7-x86_64 #1
[ 3405.431858] Call Trace:
[ 3405.431858] [<ffffffff8103f002>] ? local_bh_enable_ip+0x72/0xa0
[ 3405.431858] [<ffffffff81039368>] warn_slowpath_common+0x78/0xd0
[ 3405.431858] [<ffffffff810393cf>] warn_slowpath_null+0xf/0x20
[ 3405.431858] [<ffffffff8103f002>] local_bh_enable_ip+0x72/0xa0
[ 3405.431858] [<ffffffff814002af>] _raw_spin_unlock_bh+0xf/0x20
[ 3405.431858] [<ffffffffa0108ed4>] bnx2_reg_rd_ind+0x44/0x60 [bnx2]
[ 3405.431858] [<ffffffffa0108eff>] bnx2_shmem_rd+0xf/0x20 [bnx2]
[ 3405.431858] [<ffffffffa0113464>] bnx2_poll+0x194/0x228 [bnx2]
[ 3405.431858] [<ffffffff8135c081>] netpoll_poll+0xe1/0x3c0
[ 3405.431858] [<ffffffff8135c518>] netpoll_send_skb+0x118/0x210
[ 3405.431858] [<ffffffff8135c80b>] netpoll_send_udp+0x1fb/0x210
[ 3405.431858] [<ffffffffa00131c5>] write_msg+0x95/0xd0 [netbomb]
[ 3405.431858] [<ffffffffa0013255>] netbomb_write+0x55/0xa4 [netbomb]
[ 3405.431858] [<ffffffff810f6571>] proc_reg_write+0x71/0xb0
[ 3405.431858] [<ffffffff810ab6db>] vfs_write+0xcb/0x180
[ 3405.431858] [<ffffffff810ab880>] sys_write+0x50/0x90
[ 3405.431858] [<ffffffff8102a1a4>] sysenter_dispatch+0x7/0x2b
[ 3405.431858] ---[ end trace b4ac1510884bf2bc ]---
[ 3411.050005] ------------[ cut here ]------------
[ 3411.054851] WARNING: at /usr/src/linux-2.6.33-rc8-git7/net/sched/sch_generic.c:255 dev_watchdog+0x25e/0x270()
[ 3411.059546] Hardware name: ProLiant DL360 G5
[ 3411.061569] NETDEV WATCHDOG: eth0 (bnx2): transmit queue 0 timed out
[ 3411.064582] Modules linked in: netbomb bnx2 ipmi_devintf loop dm_mod sg sr_mod cdrom ata_piix ahci ipmi_si ipmi_msghandler uhci_hcd qla2xxx libata hpwdt ehci_hcd [last unloaded: bnx2]
[ 3411.064597] Pid: 0, comm: swapper Tainted: G W 2.6.33-rc8-git7-x86_64 #1
[ 3411.064599] Call Trace:
[ 3411.064601] <IRQ> [<ffffffff8135f84e>] ? dev_watchdog+0x25e/0x270
[ 3411.064609] [<ffffffff81039368>] warn_slowpath_common+0x78/0xd0
[ 3411.064612] [<ffffffff81039444>] warn_slowpath_fmt+0x64/0x70
[ 3411.064616] [<ffffffff8103486d>] ? default_wake_function+0xd/0x10
[ 3411.064620] [<ffffffff8119f339>] ? strlcpy+0x49/0x60
[ 3411.064623] [<ffffffff81349b33>] ? netdev_drivername+0x43/0x50
[ 3411.064626] [<ffffffff8135f84e>] dev_watchdog+0x25e/0x270
[ 3411.064630] [<ffffffff8104c000>] ? delayed_work_timer_fn+0x0/0x40
[ 3411.064633] [<ffffffff8104bf87>] ? __queue_work+0x77/0x90
[ 3411.064636] [<ffffffff8103558b>] ? scheduler_tick+0x1bb/0x290
[ 3411.064639] [<ffffffff8135f5f0>] ? dev_watchdog+0x0/0x270
[ 3411.064642] [<ffffffff810440fc>] run_timer_softirq+0x13c/0x210
[ 3411.064645] [<ffffffff8105b4b7>] ? clockevents_program_event+0x57/0xa0
[ 3411.064649] [<ffffffff8103edb6>] __do_softirq+0xa6/0x130
[ 3411.064652] [<ffffffff81003bcc>] call_softirq+0x1c/0x30
[ 3411.064655] [<ffffffff81005be5>] do_softirq+0x55/0x90
[ 3411.064658] [<ffffffff8103eb35>] irq_exit+0x75/0x90
[ 3411.064661] [<ffffffff8101aeed>] smp_apic_timer_interrupt+0x6d/0xa0
[ 3411.064664] [<ffffffff81003693>] apic_timer_interrupt+0x13/0x20
[ 3411.064666] <EOI> [<ffffffff8100b186>] ? mwait_idle+0x66/0x80
[ 3411.064670] [<ffffffff81001f90>] ? enter_idle+0x20/0x30
[ 3411.064673] [<ffffffff81002003>] cpu_idle+0x63/0xb0
[ 3411.064676] [<ffffffff813f2f14>] rest_init+0x74/0x80
[ 3411.064680] [<ffffffff81880c15>] start_kernel+0x2f8/0x336
[ 3411.064683] [<ffffffff8188026d>] x86_64_start_reservations+0x7d/0x84
[ 3411.064686] [<ffffffff81880354>] x86_64_start_kernel+0xe0/0xf2
[ 3411.064688] ---[ end trace b4ac1510884bf2bd ]---
[ 3411.064689] bnx2: <--- start FTQ dump on eth0 --->
[ 3411.064691] bnx2: eth0: BNX2_RV2P_PFTQ_CTL 10000
[ 3411.064693] bnx2: eth0: BNX2_RV2P_TFTQ_CTL 20000
[ 3411.064695] bnx2: eth0: BNX2_RV2P_MFTQ_CTL 20000
[ 3411.064697] bnx2: eth0: BNX2_TBDR_FTQ_CTL 4002
[ 3411.064699] bnx2: eth0: BNX2_TDMA_FTQ_CTL 10002
[ 3411.064702] bnx2: eth0: BNX2_TXP_FTQ_CTL 10002
[ 3411.064704] bnx2: eth0: BNX2_TPAT_FTQ_CTL 10002
[ 3411.064706] bnx2: eth0: BNX2_RXP_CFTQ_CTL 8000
[ 3411.064708] bnx2: eth0: BNX2_RXP_FTQ_CTL 100000
[ 3411.064710] bnx2: eth0: BNX2_COM_COMXQ_FTQ_CTL 10000
[ 3411.064712] bnx2: eth0: BNX2_COM_COMTQ_FTQ_CTL 20000
[ 3411.064715] bnx2: eth0: BNX2_COM_COMQ_FTQ_CTL 10000
[ 3411.064717] bnx2: eth0: BNX2_CP_CPQ_FTQ_CTL 8000
[ 3411.064724] bnx2: eth0: TXP mode b84c state 80001000 evt_mask 500 pc 8000bcc pc 8000bec instr 30620001
[ 3411.064732] bnx2: eth0: TPAT mode b84c state 80005000 evt_mask 500 pc 8000694 pc 80006a4 instr 8e310458
[ 3411.064740] bnx2: eth0: RXP mode b84c state 80001000 evt_mask 500 pc 80044a8 pc 80044a8 instr ac2400f8
[ 3411.064748] bnx2: eth0: COM mode b84c state 80001000 evt_mask 500 pc 8000a60 pc 8000a74 instr 3c030800
[ 3411.064756] bnx2: eth0: CP mode b84c state 80008000 evt_mask 500 pc 8000434 pc 8000700 instr 39ee0001
[ 3411.064758] bnx2: <--- end FTQ dump on eth0 --->
[ 3411.064759] bnx2: eth0 DEBUG: intr_sem[0]
[ 3411.064762] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 3411.064766] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 3411.064768] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 3416.050005] bnx2: <--- start FTQ dump on eth0 --->
[ 3416.054456] bnx2: eth0: BNX2_RV2P_PFTQ_CTL 10000
[ 3416.057683] bnx2: eth0: BNX2_RV2P_TFTQ_CTL 20000
[ 3416.059944] bnx2: eth0: BNX2_RV2P_MFTQ_CTL 20000
[ 3416.069481] bnx2: eth0: BNX2_TBDR_FTQ_CTL 4002
[ 3416.078819] bnx2: eth0: BNX2_TDMA_FTQ_CTL 10002
[ 3416.083896] bnx2: eth0: BNX2_TXP_FTQ_CTL 10002
[ 3416.085968] bnx2: eth0: BNX2_TPAT_FTQ_CTL 10002
[ 3416.088084] bnx2: eth0: BNX2_RXP_CFTQ_CTL 8000
[ 3416.090129] bnx2: eth0: BNX2_RXP_FTQ_CTL 100000
[ 3416.092285] bnx2: eth0: BNX2_COM_COMXQ_FTQ_CTL 10000
[ 3416.096710] bnx2: eth0: BNX2_COM_COMTQ_FTQ_CTL 20000
[ 3416.099065] bnx2: eth0: BNX2_COM_COMQ_FTQ_CTL 10000
[ 3416.101506] bnx2: eth0: BNX2_CP_CPQ_FTQ_CTL 8000
[ 3416.105567] bnx2: eth0: TXP mode b84c state 80001000 evt_mask 500 pc 8000bf0 pc 8000bc0 instr b8302b
[ 3416.109963] bnx2: eth0: TPAT mode b84c state 80001000 evt_mask 500 pc 8000674 pc 8000694 instr af8d0034
[ 3416.120862] bnx2: eth0: RXP mode b84c state 80001000 evt_mask 500 pc 80044c4 pc 800447c instr 3c050800
[ 3416.125237] bnx2: eth0: COM mode b84c state 80001000 evt_mask 500 pc 8000a1c pc 8000a28 instr 8c4200b8
[ 3416.130656] bnx2: eth0: CP mode b84c state 80000000 evt_mask 500 pc 8000728 pc 800040c instr 18d2821
[ 3416.139721] bnx2: <--- end FTQ dump on eth0 --->
[ 3416.142905] bnx2: eth0 DEBUG: intr_sem[0]
[ 3416.145814] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 3416.151452] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 3416.154701] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
...

Regards,
Bruno

Michael Chan

unread,

Mar 2, 2010, 5:30:01 PM3/2/10

to

On Tue, 2010-03-02 at 00:20 -0800, Bruno Prémont wrote:
> [ 3405.422963] ------------[ cut here ]------------
> [ 3405.428958] WARNING: at /usr/src/linux-2.6.33-rc8-git7/kernel/softirq.c:143 local_bh_enable_ip+0x72/0xa0()

In normal NAPI mode, we are in softirq context and we correctly use
spin_lock_bh() and spin_unlock_bh() here. In netpoll mode, IRQs are
disabled and so we get a warning from spin_unlock_bh().

Do we have timers running in this environment? The timer in the bnx2
driver, bnx2_timer(), needs to run to provide a heart beat to the
firmware. In netpoll mode without timer interrupts, if we are regularly
calling the NAPI poll function, it should also be able to provide the
heartbeat. Without the heartbeat, the firmware will reset the chip and
result in the NETDEV WATCHDOG.

Brian Haley

unread,

Mar 4, 2010, 3:40:01 PM3/4/10

to

Hi Michael,

Michael Chan wrote:
> Do we have timers running in this environment? The timer in the bnx2
> driver, bnx2_timer(), needs to run to provide a heart beat to the
> firmware. In netpoll mode without timer interrupts, if we are regularly
> calling the NAPI poll function, it should also be able to provide the
> heartbeat. Without the heartbeat, the firmware will reset the chip and
> result in the NETDEV WATCHDOG.

We have also been seeing watchdog timeouts with bnx2, below is a
stack trace with Benjamin's debug patch applied. Normally we were
only seeing them under heavy load, but this one was at boot. We haven't
tried the latest firmware/driver from 2.6.33 yet. You can contact me
offline if you need more detailed info.

Thanks,

-Brian

[ 2.428093] bnx2 0000:04:00.0: firmware: requesting bnx2/bnx2-rv2p-06-5.0.0.j3.fw
[ 2.432526] eth0: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem f6000000, IRQ 41, node addr 00:1c:c4:e1:cc:ea
[ 2.439520] bnx2 0000:42:00.0: PCI INT A -> GSI 34 (level, low) -> IRQ 34

<snip>

[ 223.805014] ------------[ cut here ]------------
[ 223.805023] WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x12d/0x1d5()
[ 223.805026] Hardware name: ProLiant DL385 G2
[ 223.805028] NETDEV WATCHDOG: eth0 (bnx2): transmit queue 0 timed out
[ 223.805031] Modules linked in: itapi iptable_filter ip_tables x_tables mptctl ipmi_devintf deflate zlib_deflate ctr twofish twofish_common camellia serpent blowfish cast5 des_generic cbc cryptd aes_x86_64 aes_generic xcbc rmd160 sha256_generic sha1_generic crypto_null af_key dm_snapshot dm_mirror dm_region_hash dm_log dm_mod sg bonding sctp crc32c libcrc32c loop psmouse serio_raw amd64_edac_mod edac_core k8temp container i2c_piix4 i2c_core ipmi_si ipmi_msghandler shpchp pci_hotplug hpilo processor evdev ext3 jbd mbcache ses enclosure sd_mod crc_t10dif ide_cd_mod cdrom ata_generic libata ide_pci_generic usbhid hid mptsas bnx2 mptscsih mptbase scsi_transport_sas serverworks ehci_hcd scsi_mod ide_core ohci_hcd uhci_hcd button thermal fan thermal_sys edd [last unloaded: scsi_wait_scan]
[ 223.805102] Pid: 0, comm: swapper Not tainted 2.6.32-clim-4-amd64 #1
[ 223.805105] Call Trace:
[ 223.805108] <IRQ> [<ffffffff812697a0>] ? dev_watchdog+0x12d/0x1d5
[ 223.805118] [<ffffffff81049914>] warn_slowpath_common+0x77/0xa4
[ 223.805123] [<ffffffff810499b6>] warn_slowpath_fmt+0x64/0x66
[ 223.805128] [<ffffffff81045df7>] ? default_wake_function+0xd/0xf
[ 223.805133] [<ffffffff81035fa7>] ? __wake_up_common+0x46/0x76
[ 223.805138] [<ffffffff8103b414>] ? __wake_up+0x43/0x50
[ 223.805143] [<ffffffff81253829>] ? netdev_drivername+0x43/0x4b
[ 223.805147] [<ffffffff812697a0>] dev_watchdog+0x12d/0x1d5
[ 223.805152] [<ffffffff8105e84a>] ? delayed_work_timer_fn+0x0/0x3d
[ 223.805156] [<ffffffff8105e7ee>] ? __queue_work+0x35/0x3d
[ 223.805159] [<ffffffff81269673>] ? dev_watchdog+0x0/0x1d5
[ 223.805164] [<ffffffff8105655a>] run_timer_softirq+0x1ff/0x2a1
[ 223.805169] [<ffffffff810205a1>] ? lapic_next_event+0x18/0x1c
[ 223.805174] [<ffffffff8104f9e3>] __do_softirq+0xde/0x19f
[ 223.805179] [<ffffffff8100ccec>] call_softirq+0x1c/0x28
[ 223.805183] [<ffffffff8100e8b1>] do_softirq+0x41/0x81
[ 223.805187] [<ffffffff8104f7bd>] irq_exit+0x36/0x75
[ 223.805191] [<ffffffff81020f33>] smp_apic_timer_interrupt+0x88/0x96
[ 223.805195] [<ffffffff8100c6b3>] apic_timer_interrupt+0x13/0x20
[ 223.805198] <EOI> [<ffffffff81027740>] ? native_safe_halt+0x6/0x8
[ 223.805207] [<ffffffff81012da3>] ? default_idle+0x55/0x74
[ 223.805210] [<ffffffff810131ce>] ? c1e_idle+0xf4/0xfb
[ 223.805215] [<ffffffff81065529>] ? atomic_notifier_call_chain+0x13/0x15
[ 223.805219] [<ffffffff8100aeec>] ? cpu_idle+0x5b/0x93
[ 223.805225] [<ffffffff81304144>] ? start_secondary+0x1a8/0x1ac
[ 223.805228] ---[ end trace b04d103e6c8c23de ]---
[ 223.805231] bnx2: eth0 DEBUG: intr_sem[0]
[ 223.805236] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 223.805242] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 223.805245] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 228.805016] bnx2: eth0 DEBUG: intr_sem[0]
[ 228.805023] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 228.805029] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 228.805033] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 233.805014] bnx2: eth0 DEBUG: intr_sem[0]
[ 233.805019] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 233.805024] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 233.805028] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 238.805013] bnx2: eth0 DEBUG: intr_sem[0]
[ 238.805019] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 238.805025] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 238.805029] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 243.805015] bnx2: eth0 DEBUG: intr_sem[0]
[ 243.805021] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 243.805027] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 243.805031] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 248.805014] bnx2: eth0 DEBUG: intr_sem[0]
[ 248.805019] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 248.805025] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 248.805028] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 253.805015] bnx2: eth0 DEBUG: intr_sem[0]
[ 253.805021] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 253.805027] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 253.805031] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 258.805016] bnx2: eth0 DEBUG: intr_sem[0]
[ 258.805022] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 258.805027] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 258.805031] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 263.805013] bnx2: eth0 DEBUG: intr_sem[0]
[ 263.805018] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 263.805023] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 263.805027] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 268.805014] bnx2: eth0 DEBUG: intr_sem[0]
[ 268.805019] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 268.805025] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 268.805028] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 273.805015] bnx2: eth0 DEBUG: intr_sem[0]
[ 273.805022] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 273.805028] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 273.805032] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 278.805012] bnx2: eth0 DEBUG: intr_sem[0]
[ 278.805017] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 278.805023] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 278.805026] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 283.805012] bnx2: eth0 DEBUG: intr_sem[0]
[ 283.805016] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 283.805022] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 283.805025] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 288.805015] bnx2: eth0 DEBUG: intr_sem[0]
[ 288.805022] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 288.805028] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 288.805031] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 293.805011] bnx2: eth0 DEBUG: intr_sem[0]
[ 293.805016] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 293.805022] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 293.805026] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 298.805015] bnx2: eth0 DEBUG: intr_sem[0]
[ 298.805021] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 298.805027] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 298.805030] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 303.805013] bnx2: eth0 DEBUG: intr_sem[0]
[ 303.805020] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 303.805026] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 303.805030] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 308.805013] bnx2: eth0 DEBUG: intr_sem[0]
[ 308.805018] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 308.805023] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 308.805027] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
[ 313.805011] bnx2: eth0 DEBUG: intr_sem[0]
[ 313.805016] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
[ 313.805022] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
[ 313.805025] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]

Brian Haley

unread,

Mar 10, 2010, 6:10:02 PM3/10/10

to

Brian Haley wrote:
> Hi Michael,
>
> Michael Chan wrote:
>> Do we have timers running in this environment? The timer in the bnx2
>> driver, bnx2_timer(), needs to run to provide a heart beat to the
>> firmware. In netpoll mode without timer interrupts, if we are regularly
>> calling the NAPI poll function, it should also be able to provide the
>> heartbeat. Without the heartbeat, the firmware will reset the chip and
>> result in the NETDEV WATCHDOG.
>
> We have also been seeing watchdog timeouts with bnx2, below is a
> stack trace with Benjamin's debug patch applied. Normally we were
> only seeing them under heavy load, but this one was at boot. We haven't
> tried the latest firmware/driver from 2.6.33 yet. You can contact me
> offline if you need more detailed info.

Following-up since I have more info on this issue.

I'm able to cause a netdev_watchdog timeout by changing the coalesce
settings on my bnx2, I built a little test program for it:

#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <linux/if.h>
#include <linux/ethtool.h>
#include <linux/sockios.h>

main()
{
struct ifreq ifr;
struct ethtool_coalesce ecoal;
int fd, err;
int sleeptime = 5;
char *ifname = "eth0";

fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
if (fd < 0) {
perror("socket");
exit(1);
}

bzero(&ifr, sizeof(ifr));
bcopy(ifname, ifr.ifr_name, sizeof(ifname));

bzero(&ecoal, sizeof(ecoal));

printf("Running ETHTOOL_GCOALESCE on %s\n", ifname);
ecoal.cmd = ETHTOOL_GCOALESCE;
ifr.ifr_data = (caddr_t)&ecoal;
err = ioctl(fd, SIOCETHTOOL, &ifr);
if (err)
perror("ETHTOOL_GCOALESCE");
printf("Sleeping %d seconds\n", sleeptime);
sleep(sleeptime);

ecoal.rx_coalesce_usecs = 0;
ecoal.rx_max_coalesced_frames = 1;
ecoal.rx_coalesce_usecs_irq = 0;
ecoal.rx_max_coalesced_frames_irq = 1;

printf("Setting ETHTOOL_SCOALESCE on %s\n", ifname);
ecoal.cmd = ETHTOOL_SCOALESCE;
ifr.ifr_data = (caddr_t)&ecoal;
err = ioctl(fd, SIOCETHTOOL, &ifr);
if (err)
perror("ETHTOOL_SCOALESCE");
}

[ 2.428093] bnx2 0000:04:00.0: firmware: requesting bnx2/bnx2-rv2p-06-5.0.0.j3.fw
[ 2.432526] eth0: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem f6000000, IRQ 41, node addr 00:1c:c4:e1:cc:ea
[ 2.439520] bnx2 0000:42:00.0: PCI INT A -> GSI 34 (level, low) -> IRQ 34

lspci shows this is a HP 373i, it's the onboard NIC.

Running this on one particular system I get:

Mar 10 07:48:58 N1002563 kernel: [ 870.780023] ------------[ cut here ]------------
Mar 10 07:48:58 N1002563 kernel: [ 870.780037] WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x12d/0x1d5()
Mar 10 07:48:58 N1002563 kernel: [ 870.780041] Hardware name: ProLiant DL385 G5
Mar 10 07:48:58 N1002563 kernel: [ 870.780046] NETDEV WATCHDOG: eth0 (bnx2): transmit queue 0 timed out
Mar 10 07:48:58 N1002563 kernel: [ 870.780050] Modules linked in: mptctl ipmi_devintf deflate zlib_deflate ctr twofish twofish_common camellia serpent blowfish cast5 des_generic cbc cryptd aes_x86_64 aes_generic xcbc rmd160 sha256_generic sha1_generic crypto_null af_key sg bonding sctp crc32c libcrc32c loop psmouse serio_raw container amd64_edac_mod edac_core i2c_piix4 shpchp pci_hotplug ipmi_si i2c_core ipmi_msghandler hpilo processor evdev ext3 jbd mbcache sd_mod crc_t10dif usbhid hid ata_generic libata ide_pci_generic e1000e bnx2 mptsas mptscsih mptbase serverworks scsi_transport_sas ide_core ehci_hcd scsi_mod ohci_hcd uhci_hcd button thermal fan thermal_sys edd [last unloaded: scsi_wait_scan]
Mar 10 07:48:58 N1002563 kernel: [ 870.780133] Pid: 0, comm: swapper Not tainted 2.6.32-clim-4-amd64 #1
Mar 10 07:48:58 N1002563 kernel: [ 870.780137] Call Trace:
Mar 10 07:48:58 N1002563 kernel: [ 870.780141] <IRQ> [<ffffffff812697a0>] ? dev_watchdog+0x12d/0x1d5
Mar 10 07:48:58 N1002563 kernel: [ 870.780156] [<ffffffff81049914>] warn_slowpath_common+0x77/0xa4
Mar 10 07:48:58 N1002563 kernel: [ 870.780170] [<ffffffff810499b6>] warn_slowpath_fmt+0x64/0x66
Mar 10 07:48:58 N1002563 kernel: [ 870.780177] [<ffffffff81045df7>] ? default_wake_function+0xd/0xf
Mar 10 07:48:58 N1002563 kernel: [ 870.780184] [<ffffffff81035fa7>] ? __wake_up_common+0x46/0x76
Mar 10 07:48:58 N1002563 kernel: [ 870.780191] [<ffffffff8103b414>] ? __wake_up+0x43/0x50
Mar 10 07:48:58 N1002563 kernel: [ 870.780198] [<ffffffff81253829>] ? netdev_drivername+0x43/0x4b
Mar 10 07:48:58 N1002563 kernel: [ 870.780204] [<ffffffff812697a0>] dev_watchdog+0x12d/0x1d5
Mar 10 07:48:58 N1002563 kernel: [ 870.780214] [<ffffffff8105e84a>] ? delayed_work_timer_fn+0x0/0x3d
Mar 10 07:48:58 N1002563 kernel: [ 870.780219] [<ffffffff8105e7ee>] ? __queue_work+0x35/0x3d
Mar 10 07:48:58 N1002563 kernel: [ 870.780227] [<ffffffff81269673>] ? dev_watchdog+0x0/0x1d5
Mar 10 07:48:58 N1002563 kernel: [ 870.780234] [<ffffffff8105655a>] run_timer_softirq+0x1ff/0x2a1
Mar 10 07:48:58 N1002563 kernel: [ 870.780242] [<ffffffff810205a1>] ? lapic_next_event+0x18/0x1c
Mar 10 07:48:58 N1002563 kernel: [ 870.780249] [<ffffffff8104f9e3>] __do_softirq+0xde/0x19f
Mar 10 07:48:58 N1002563 kernel: [ 870.780256] [<ffffffff8100ccec>] call_softirq+0x1c/0x28
Mar 10 07:48:58 N1002563 kernel: [ 870.780262] [<ffffffff8100e8b1>] do_softirq+0x41/0x81
Mar 10 07:48:58 N1002563 kernel: [ 870.780268] [<ffffffff8104f7bd>] irq_exit+0x36/0x75
Mar 10 07:48:58 N1002563 kernel: [ 870.780274] [<ffffffff81020f33>] smp_apic_timer_interrupt+0x88/0x96
Mar 10 07:48:58 N1002563 kernel: [ 870.780287] [<ffffffff8100c6b3>] apic_timer_interrupt+0x13/0x20
Mar 10 07:48:58 N1002563 kernel: [ 870.780291] <EOI> [<ffffffff81027740>] ? native_safe_halt+0x6/0x8
Mar 10 07:48:58 N1002563 kernel: [ 870.780304] [<ffffffff81012da3>] ? default_idle+0x55/0x74
Mar 10 07:48:58 N1002563 kernel: [ 870.780309] [<ffffffff810131ce>] ? c1e_idle+0xf4/0xfb
Mar 10 07:48:58 N1002563 kernel: [ 870.780315] [<ffffffff81065529>] ? atomic_notifier_call_chain+0x13/0x15
Mar 10 07:48:58 N1002563 kernel: [ 870.780321] [<ffffffff8100aeec>] ? cpu_idle+0x5b/0x93
Mar 10 07:48:58 N1002563 kernel: [ 870.780329] [<ffffffff81304144>] ? start_secondary+0x1a8/0x1ac
Mar 10 07:48:58 N1002563 kernel: [ 870.780334] ---[ end trace 08b420ca1e09a176 ]---
Mar 10 07:48:58 N1002563 kernel: [ 870.780339] bnx2: eth0 DEBUG: intr_sem[0]
Mar 10 07:48:58 N1002563 kernel: [ 870.780345] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
Mar 10 07:48:58 N1002563 kernel: [ 870.780352] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
Mar 10 07:48:58 N1002563 kernel: [ 870.780357] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
Mar 10 07:49:03 N1002563 kernel: [ 875.780020] bnx2: eth0 DEBUG: intr_sem[0]
Mar 10 07:49:03 N1002563 kernel: [ 875.780026] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[00000000]
Mar 10 07:49:03 N1002563 kernel: [ 875.780038] bnx2: eth0 DEBUG: MCP_STATE_P0[00000000] MCP_STATE_P1[00000000]
Mar 10 07:49:03 N1002563 kernel: [ 875.780043] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]

This debug message would repeat every 5 seconds, eventually
the systems locks-up.

This is running a 2.6.32-8 stable kernel, I've tried a version with Ben's
patch from this thread installed with no change in behavior. I'm in the
process of backporting all the upstream changes to see if that helps.

Guessing it's a race condition caused by these calls in bnx2_set_coalesce():

if (netif_running(bp->dev)) {
bnx2_netif_stop(bp);
bnx2_init_nic(bp, 0);
bnx2_netif_start(bp);
}

Thanks for any help,

-Brian

Michael Chan

unread,

Mar 10, 2010, 6:50:02 PM3/10/10

to

On Wed, 2010-03-10 at 15:09 -0800, Brian Haley wrote:
> Brian Haley wrote:
> > Hi Michael,
> >
> > Michael Chan wrote:
> >> Do we have timers running in this environment? The timer in the bnx2
> >> driver, bnx2_timer(), needs to run to provide a heart beat to the
> >> firmware. In netpoll mode without timer interrupts, if we are regularly
> >> calling the NAPI poll function, it should also be able to provide the
> >> heartbeat. Without the heartbeat, the firmware will reset the chip and
> >> result in the NETDEV WATCHDOG.
> >
> > We have also been seeing watchdog timeouts with bnx2, below is a
> > stack trace with Benjamin's debug patch applied. Normally we were
> > only seeing them under heavy load, but this one was at boot. We haven't
> > tried the latest firmware/driver from 2.6.33 yet. You can contact me
> > offline if you need more detailed info.
>
> Following-up since I have more info on this issue.
>
> I'm able to cause a netdev_watchdog timeout by changing the coalesce
> settings on my bnx2, I built a little test program for it:

Do you run this program in a loop? How quickly do you see the NETDEV
WATCHDOG?

These rx settings should be ok. Did you change the tx settings? If the
tx settings are all zeros, you won't get any TX interrupts and you can
get a NETDEV WATCHDOG.

Run ethtool -c eth0 to see what the tx settings are. Thanks.

Brian Haley

unread,

Mar 10, 2010, 9:20:02 PM3/10/10

to

Michael Chan wrote:
> On Wed, 2010-03-10 at 15:09 -0800, Brian Haley wrote:
>> Brian Haley wrote:
>>> Hi Michael,
>>>
>>> Michael Chan wrote:
>>>> Do we have timers running in this environment? The timer in the bnx2
>>>> driver, bnx2_timer(), needs to run to provide a heart beat to the
>>>> firmware. In netpoll mode without timer interrupts, if we are regularly
>>>> calling the NAPI poll function, it should also be able to provide the
>>>> heartbeat. Without the heartbeat, the firmware will reset the chip and
>>>> result in the NETDEV WATCHDOG.
>>> We have also been seeing watchdog timeouts with bnx2, below is a
>>> stack trace with Benjamin's debug patch applied. Normally we were
>>> only seeing them under heavy load, but this one was at boot. We haven't
>>> tried the latest firmware/driver from 2.6.33 yet. You can contact me
>>> offline if you need more detailed info.
>> Following-up since I have more info on this issue.
>>
>> I'm able to cause a netdev_watchdog timeout by changing the coalesce
>> settings on my bnx2, I built a little test program for it:
>
> Do you run this program in a loop? How quickly do you see the NETDEV
> WATCHDOG?

It's run once, and we see it almost immediately after ETHTOOL_SCOALESCE.

>> ecoal.rx_coalesce_usecs = 0;
>> ecoal.rx_max_coalesced_frames = 1;
>> ecoal.rx_coalesce_usecs_irq = 0;
>> ecoal.rx_max_coalesced_frames_irq = 1;
>
> These rx settings should be ok. Did you change the tx settings? If the
> tx settings are all zeros, you won't get any TX interrupts and you can
> get a NETDEV WATCHDOG.

We did the read, so the TX should be what it was originally.

> Run ethtool -c eth0 to see what the tx settings are. Thanks.

# ethtool -c eth0
Coalesce parameters for eth0:
Adaptive RX: off TX: off
stats-block-usecs: 999936
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 0
rx-frames: 1
rx-usecs-irq: 0
rx-frames-irq: 1

tx-usecs: 80
tx-frames: 20
tx-usecs-irq: 18
tx-frames-irq: 2

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

If I run 'ethtool -c eth0' after the watchdog triggers either the NIC
or system completely hangs.

David Miller

unread,

Mar 11, 2010, 1:10:02 PM3/11/10

to

From: "Michael Chan" <mc...@broadcom.com>
Date: Thu, 11 Mar 2010 09:49:56 -0800

>
> On Wed, 2010-03-10 at 18:09 -0800, Brian Haley wrote:
>> >> I'm able to cause a netdev_watchdog timeout by changing the coalesce
>> >> settings on my bnx2, I built a little test program for it:
>> >
>> > Do you run this program in a loop? How quickly do you see the NETDEV
>> > WATCHDOG?
>>
>> It's run once, and we see it almost immediately after ETHTOOL_SCOALESCE.
>

> What's the difference between running the test program and doing ethtool
> -C? Do you see the issue in either case? I don't see the issue here
> with ethtool -C.

Probably because the independent program runs faster and thus
can trigger races more easily.

In any case, you should be trying to reproduce his problem with
his test program since he went through the effort of providing
one.

Michael Chan

unread,

Mar 11, 2010, 1:10:03 PM3/11/10

to

On Wed, 2010-03-10 at 18:09 -0800, Brian Haley wrote:
> >> I'm able to cause a netdev_watchdog timeout by changing the coalesce
> >> settings on my bnx2, I built a little test program for it:
> >
> > Do you run this program in a loop? How quickly do you see the NETDEV
> > WATCHDOG?
>
> It's run once, and we see it almost immediately after ETHTOOL_SCOALESCE.

What's the difference between running the test program and doing ethtool

-C? Do you see the issue in either case? I don't see the issue here
with ethtool -C.

Thanks.

Michael Chan

unread,

Mar 11, 2010, 2:00:02 PM3/11/10

to

On Thu, 2010-03-11 at 10:05 -0800, David Miller wrote:
> From: "Michael Chan" <mc...@broadcom.com>
> Date: Thu, 11 Mar 2010 09:49:56 -0800
>
> >
> > On Wed, 2010-03-10 at 18:09 -0800, Brian Haley wrote:
> >> >> I'm able to cause a netdev_watchdog timeout by changing the coalesce
> >> >> settings on my bnx2, I built a little test program for it:
> >> >
> >> > Do you run this program in a loop? How quickly do you see the NETDEV
> >> > WATCHDOG?
> >>
> >> It's run once, and we see it almost immediately after ETHTOOL_SCOALESCE.
> >
> > What's the difference between running the test program and doing ethtool
> > -C? Do you see the issue in either case? I don't see the issue here
> > with ethtool -C.
>
> Probably because the independent program runs faster and thus
> can trigger races more easily.
>

That's what I thought, I thought he was running it in a loop and
triggering some race condition. But he said he only ran it once. His
program gets the coalesce settings, sleeps for 5 seconds, and then sets
the coalesce settings.

> In any case, you should be trying to reproduce his problem with
> his test program since he went through the effort of providing
> one.
>

I just tried it and cannot reproduce the problem.

Brian, please provide more information. Thanks.

Brian Haley

unread,

Mar 11, 2010, 2:50:02 PM3/11/10

to

Michael Chan wrote:
> On Thu, 2010-03-11 at 10:05 -0800, David Miller wrote:
>> From: "Michael Chan" <mc...@broadcom.com>
>> Date: Thu, 11 Mar 2010 09:49:56 -0800
>>
>>> On Wed, 2010-03-10 at 18:09 -0800, Brian Haley wrote:
>>>>>> I'm able to cause a netdev_watchdog timeout by changing the coalesce
>>>>>> settings on my bnx2, I built a little test program for it:
>>>>> Do you run this program in a loop? How quickly do you see the NETDEV
>>>>> WATCHDOG?
>>>> It's run once, and we see it almost immediately after ETHTOOL_SCOALESCE.
>>> What's the difference between running the test program and doing ethtool
>>> -C? Do you see the issue in either case? I don't see the issue here
>>> with ethtool -C.
>> Probably because the independent program runs faster and thus
>> can trigger races more easily.

A customer provided some test code that triggered this hang, so
I've just been using it. I just used ethtool and it happened too:

# ethtool -C eth0 rx-usecs 0 rx-frames 1 rx-usecs-irq 0 rx-frames-irq 1

If the interface is down, no problem.

> That's what I thought, I thought he was running it in a loop and
> triggering some race condition. But he said he only ran it once. His
> program gets the coalesce settings, sleeps for 5 seconds, and then sets
> the coalesce settings.

The 5 seconds was there only because this was a snippet from a larger
function that was doing a lot of ETHTOOL ioctl()'s, and I wanted to
wait between each call to see which was causing this. Removing the
sleep() still triggers the watchdog.

>> In any case, you should be trying to reproduce his problem with
>> his test program since he went through the effort of providing
>> one.
>
> I just tried it and cannot reproduce the problem.
>
> Brian, please provide more information. Thanks.

I can only reproduce this on one system out of many, so it's either a
race condition or bad hardware. The only thing I can confirm at the
moment is that it's the code at the bottom of bnx2_set_coalesce()
that's causing it, I'm trying to go through all those codepaths now.

# lspci -vv -s 04:00
04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)
Subsystem: Hewlett-Packard Company NC373i Integrated Multifunction Gigabit Server Adapter
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64 (16000ns min), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 65
Region 0: Memory at f6000000 (64-bit, non-prefetchable) [size=32M]
[virtual] Expansion ROM at d0200000 [disabled] [size=2K]
Capabilities: [40] PCI-X non-bridge device
Command: DPERE- ERO- RBC=512 OST=8
Status: Dev=04:00.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=512 DMOST=8 DMCRS=32 RSCEM- 266MHz- 533MHz-
Capabilities: [48] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data <?>
Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
Address: 00000000fee0100c Data: 4182
Kernel driver in use: bnx2
Kernel modules: bnx2

# ethtool -i eth0
driver: bnx2
version: 2.0.2
firmware-version: 1.9.6
bus-info: 0000:04:00.0

What other info would help?

-Brian

Michael Chan

unread,

Mar 11, 2010, 3:00:02 PM3/11/10

to

On Thu, 2010-03-11 at 11:40 -0800, Brian Haley wrote:
> I can only reproduce this on one system out of many, so it's either a
> race condition or bad hardware. The only thing I can confirm at the
> moment is that it's the code at the bottom of bnx2_set_coalesce()
> that's causing it, I'm trying to go through all those codepaths now.
>

The NETDEV WATCHDOG is caused by stopping the TX queues with
->trans_start older than dev->watchdog_timeo which is set to 5 seconds
in bnx2. Please try this patch below to update the ->trans_start first
before stopping the TX queues:

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index d3f739a..c0f4aa7 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -656,7 +656,6 @@ bnx2_netif_stop(struct bnx2 *bp)
int i;

bnx2_napi_disable(bp);
- netif_tx_disable(bp->dev);
/* prevent tx timeout */
for (i = 0; i < bp->dev->num_tx_queues; i++) {
struct netdev_queue *txq;
@@ -664,6 +663,7 @@ bnx2_netif_stop(struct bnx2 *bp)
txq = netdev_get_tx_queue(bp->dev, i);
txq->trans_start = jiffies;
}
+ netif_tx_disable(bp->dev);
}
bnx2_disable_int_sync(bp);

Brian Haley

unread,

Mar 11, 2010, 5:00:02 PM3/11/10

to

Michael Chan wrote:
> On Thu, 2010-03-11 at 11:40 -0800, Brian Haley wrote:
>> I can only reproduce this on one system out of many, so it's either a
>> race condition or bad hardware. The only thing I can confirm at the
>> moment is that it's the code at the bottom of bnx2_set_coalesce()
>> that's causing it, I'm trying to go through all those codepaths now.
>
> The NETDEV WATCHDOG is caused by stopping the TX queues with
> ->trans_start older than dev->watchdog_timeo which is set to 5 seconds
> in bnx2. Please try this patch below to update the ->trans_start first
> before stopping the TX queues:

Well I'm an idiot. Someone had cherry-picked commit 4529819c4 (that caused
the reset_task bnx2 crash), so it was bad code in bnx2_netif_stop()/start()
that's already been fixed upstream. I'll merge our bnx2 code up to the
firmware commit and start testing again to see if we still see the watchdog
timeouts we've seen in the past.

Thanks for your help.

-Brian