Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

em0 watchdog timeout

63 views
Skip to first unread message

Willem Jan Withagen

unread,
Nov 10, 2011, 4:41:42 AM11/10/11
to sta...@freebsd.org
Hi

Still running this file server on ZFS, and every now and then em0 goes
down, and is not revivable.... Nothing goes in or out the box...

Any suggestions as how to (help) fix this?

Regards,
--WjW

-------

Nov 10 09:07:41 zfs kernel: em0: Watchdog timeout -- resetting
Nov 10 09:07:41 zfs kernel: em0: Queue(0) tdh = 187, hw tdt = 189
Nov 10 09:07:41 zfs kernel: em0: TX(0) desc avail = 1022,Next TX to
Clean = 187
Nov 10 09:11:32 zfs kernel: em0: Watchdog timeout -- resetting
Nov 10 09:11:32 zfs kernel: em0: Queue(0) tdh = 139, hw tdt = 151
Nov 10 09:11:32 zfs kernel: em0: TX(0) desc avail = 1012,Next TX to
Clean = 139
Nov 10 09:16:05 zfs kernel: em0: Watchdog timeout -- resetting
Nov 10 09:16:05 zfs kernel: em0: Queue(0) tdh = 152, hw tdt = 163
Nov 10 09:16:05 zfs kernel: em0: TX(0) desc avail = 1013,Next TX to
Clean = 152
Nov 10 09:33:10 zfs kernel: em0: Watchdog timeout -- resetting
Nov 10 09:33:10 zfs kernel: em0: Queue(0) tdh = 161, hw tdt = 176
Nov 10 09:33:10 zfs kernel: em0: TX(0) desc avail = 1008,Next TX to
Clean = 160
Nov 10 09:53:18 zfs kernel: em0: Watchdog timeout -- resetting
Nov 10 09:53:18 zfs kernel: em0: Queue(0) tdh = 157, hw tdt = 172
Nov 10 09:53:18 zfs kernel: em0: TX(0) desc avail = 1009,Next TX to
Clean = 157

Device is:
Nov 10 10:07:27 zfs kernel: em0: <Intel(R) PRO/1000 Network Connection
7.2.3> port 0x1820-0x183f mem
0xdf900000-0xdf91ffff,0xdf924000-0xdf924fff irq 16 at device 25.0 on pci0
Nov 10 10:07:27 zfs kernel: em0: Using an MSI interrupt
Nov 10 10:07:27 zfs kernel: em0: [FILTER]

pciconf -lv:
em0@pci0:0:25:0: class=0x020000 card=0x10bd15d9 chip=0x10bd8086
rev=0x02 hdr=0x00
vendor = 'Intel Corporation'
device = 'Intel 82566DM Gigabit Ethernet Adapter (82566DM)'
class = network
subclass = ethernet

uname:
8.2-STABLE FreeBSD 8.2-STABLE #12: Sun Oct 2 13:36:55 CEST 2011
amd64

sysctl -a | grep em.0:
dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.3
dev.em.0.%driver: em
dev.em.0.%location: slot=25 function=0 handle=\_SB_.PCI0.LAN_
dev.em.0.%pnpinfo: vendor=0x8086 device=0x10bd subvendor=0x15d9
subdevice=0x10bd class=0x020000
dev.em.0.%parent: pci0
dev.em.0.nvm: -1
dev.em.0.debug: -1
dev.em.0.rx_int_delay: 0
dev.em.0.tx_int_delay: 66
dev.em.0.rx_abs_int_delay: 66
dev.em.0.tx_abs_int_delay: 66
dev.em.0.rx_processing_limit: 100
dev.em.0.flow_control: 3
dev.em.0.eee_control: 0
dev.em.0.link_irq: 0
dev.em.0.mbuf_alloc_fail: 0
dev.em.0.cluster_alloc_fail: 0
dev.em.0.dropped: 0
dev.em.0.tx_dma_fail: 0
dev.em.0.rx_overruns: 6
dev.em.0.watchdog_timeouts: 5
dev.em.0.device_control: 1074790976
dev.em.0.rx_control: 67141634
dev.em.0.fc_high_water: 8192
dev.em.0.fc_low_water: 6692
dev.em.0.queue0.txd_head: 78
dev.em.0.queue0.txd_tail: 78
dev.em.0.queue0.tx_irq: 0
dev.em.0.queue0.no_desc_avail: 0
dev.em.0.queue0.rxd_head: 376
dev.em.0.queue0.rxd_tail: 375
dev.em.0.queue0.rx_irq: 0
dev.em.0.mac_stats.excess_coll: 0
dev.em.0.mac_stats.single_coll: 0
dev.em.0.mac_stats.multiple_coll: 0
dev.em.0.mac_stats.late_coll: 0
dev.em.0.mac_stats.collision_count: 0
dev.em.0.mac_stats.symbol_errors: 0
dev.em.0.mac_stats.sequence_errors: 0
dev.em.0.mac_stats.defer_count: 0
dev.em.0.mac_stats.missed_packets: 9
dev.em.0.mac_stats.recv_no_buff: 0
dev.em.0.mac_stats.recv_undersize: 0
dev.em.0.mac_stats.recv_fragmented: 0
dev.em.0.mac_stats.recv_oversize: 0
dev.em.0.mac_stats.recv_jabber: 0
dev.em.0.mac_stats.recv_errs: 1
dev.em.0.mac_stats.crc_errs: 1
dev.em.0.mac_stats.alignment_errs: 0
dev.em.0.mac_stats.coll_ext_errs: 0
dev.em.0.mac_stats.xon_recvd: 0
dev.em.0.mac_stats.xon_txd: 0
dev.em.0.mac_stats.xoff_recvd: 0
dev.em.0.mac_stats.xoff_txd: 0
dev.em.0.mac_stats.total_pkts_recvd: 160062850
dev.em.0.mac_stats.good_pkts_recvd: 160062840
dev.em.0.mac_stats.bcast_pkts_recvd: 79648
dev.em.0.mac_stats.mcast_pkts_recvd: 10220
dev.em.0.mac_stats.rx_frames_64: 0
dev.em.0.mac_stats.rx_frames_65_127: 0
dev.em.0.mac_stats.rx_frames_128_255: 0
dev.em.0.mac_stats.rx_frames_256_511: 0
dev.em.0.mac_stats.rx_frames_512_1023: 0
dev.em.0.mac_stats.rx_frames_1024_1522: 0
dev.em.0.mac_stats.good_octets_recvd: 107143604749
dev.em.0.mac_stats.good_octets_txd: 129876768158
dev.em.0.mac_stats.total_pkts_txd: 179010567
dev.em.0.mac_stats.good_pkts_txd: 179010567
dev.em.0.mac_stats.bcast_pkts_txd: 14608
dev.em.0.mac_stats.mcast_pkts_txd: 206
dev.em.0.mac_stats.tx_frames_64: 0
dev.em.0.mac_stats.tx_frames_65_127: 0
dev.em.0.mac_stats.tx_frames_128_255: 0
dev.em.0.mac_stats.tx_frames_256_511: 0
dev.em.0.mac_stats.tx_frames_512_1023: 0
dev.em.0.mac_stats.tx_frames_1024_1522: 0
dev.em.0.mac_stats.tso_txd: 3691806
dev.em.0.mac_stats.tso_ctx_fail: 0
dev.em.0.interrupts.asserts: 130023913
dev.em.0.interrupts.rx_pkt_timer: 0
dev.em.0.interrupts.rx_abs_timer: 0
dev.em.0.interrupts.tx_pkt_timer: 0
dev.em.0.interrupts.tx_abs_timer: 0
dev.em.0.interrupts.tx_queue_empty: 0
dev.em.0.interrupts.tx_queue_min_thresh: 0
dev.em.0.interrupts.rx_desc_min_thresh: 0
dev.em.0.interrupts.rx_overrun: 0
dev.em.0.wake: 0

_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"

Jeremy Chadwick

unread,
Nov 10, 2011, 5:05:10 AM11/10/11
to Willem Jan Withagen, sta...@freebsd.org, Vogel, Jack
On Thu, Nov 10, 2011 at 10:22:39AM +0100, Willem Jan Withagen wrote:
> Still running this file server on ZFS, and every now and then em0
> goes down, and is not revivable.... Nothing goes in or out the
> box...
>
> Any suggestions as how to (help) fix this?

CC'ing Jack Vogel of Intel.

We need "pciconf -lvbc" output (-lv by itself isn't sufficient in this
regard).

Also, please do "sysctl dev.em.0.debug=1", which will show nothing
useful in the output, however "dmesg" shortly after should have a bunch
of driver-level debugging information that should help (output starts
with "Interface is ...". Please provide that too.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |

Willem Jan Withagen

unread,
Nov 10, 2011, 6:55:32 AM11/10/11
to Jeremy Chadwick, sta...@freebsd.org, Vogel, Jack
On 10-11-2011 10:50, Jeremy Chadwick wrote:
> On Thu, Nov 10, 2011 at 10:22:39AM +0100, Willem Jan Withagen wrote:
>> Still running this file server on ZFS, and every now and then em0
>> goes down, and is not revivable.... Nothing goes in or out the
>> box...
>>
>> Any suggestions as how to (help) fix this?
>
> CC'ing Jack Vogel of Intel.
>
> We need "pciconf -lvbc" output (-lv by itself isn't sufficient in this
> regard).

em0@pci0:0:25:0: class=0x020000 card=0x10bd15d9 chip=0x10bd8086
rev=0x02 hdr=0x00
vendor = 'Intel Corporation'
device = 'Intel 82566DM Gigabit Ethernet Adapter (82566DM)'
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0xdf900000, size 131072,
enabled
bar [14] = type Memory, range 32, base 0xdf924000, size 4096, enabled
bar [18] = type I/O Port, range 32, base 0x1820, size 32, enabled
cap 01[c8] = powerspec 2 supports D0 D3 current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 13[e0] = PCI Advanced Features: FLR TP

dmidecode gives:
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: Supermicro
Product Name: C2SBX
Version: 0123456789
Serial Number: 0123456789
UUID: 53D1A494-D663-A0E7-890B-003048DE97CD
Wake-up Type: Power Switch
SKU Number: Not Specified
Family: Not Specified

> Also, please do "sysctl dev.em.0.debug=1", which will show nothing
> useful in the output, however "dmesg" shortly after should have a bunch
> of driver-level debugging information that should help (output starts
> with "Interface is ...". Please provide that too.

System is rebooted. So currrently there is nothing serious in trouble.
But trying to switch is on does not seem to work?

# sysctl dev.em.0.debug=1
dev.em.0.debug: -1 -> -1
# sysctl -a | grep debug | grep em
dev.em.0.debug: -1

Or is it just to dump this:

Nov 10 12:44:27 zfs kernel: Interface is RUNNING and INACTIVE
Nov 10 12:44:27 zfs kernel: em0: hw tdh = 965, hw tdt = 965
Nov 10 12:44:27 zfs kernel: em0: hw rdh = 586, hw rdt = 585
Nov 10 12:44:27 zfs kernel: em0: Tx Queue Status = 0
Nov 10 12:44:27 zfs kernel: em0: TX descriptors avail = 1024
Nov 10 12:44:27 zfs kernel: em0: Tx Descriptors avail failure = 0
Nov 10 12:44:27 zfs kernel: em0: RX discarded packets = 0
Nov 10 12:44:27 zfs kernel: em0: RX Next to Check = 586
Nov 10 12:44:27 zfs kernel: em0: RX Next to Refresh = 585

I'm telling everybody always that they should go for intel ethernet
devices, because "they just work". And I'm still very much convinced of
this. So I'll be more than happy to do any debugging and/or testing
required. The only thing I can not afford at the moment is leave this
box in disconnected state.

And note that this problem only raises it nasty head very few weeks...

--WjW

Joshua Boyd

unread,
Nov 10, 2011, 5:54:25 PM11/10/11
to Willem Jan Withagen, sta...@freebsd.org, Vogel, Jack, Jeremy Chadwick
On Thu, Nov 10, 2011 at 6:51 AM, Willem Jan Withagen <w...@digiware.nl>wrote:

> em0@pci0:0:25:0: class=0x020000 card=0x10bd15d9 chip=0x10bd8086
> rev=0x02 hdr=0x00
> vendor = 'Intel Corporation'
> device = 'Intel 82566DM Gigabit Ethernet Adapter (82566DM)'
> class = network
> subclass = ethernet
> bar [10] = type Memory, range 32, base 0xdf900000, size 131072,
> enabled
> bar [14] = type Memory, range 32, base 0xdf924000, size 4096, enabled
> bar [18] = type I/O Port, range 32, base 0x1820, size 32, enabled
> cap 01[c8] = powerspec 2 supports D0 D3 current D0
> cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
> cap 13[e0] = PCI Advanced Features: FLR TP
>
>
> And note that this problem only raises it nasty head very few weeks...


I have had the same problem, as shown here:

http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/063092.html

According to your pciconf output, your card either doesn't support MSI-X,
or you have MSI-X disabled.

Check the hw.pci.enable_msix sysctl and make sure that it is set to 1. Also
check to make sure there aren't any BIOS settings blocking MSI-X.

Apparently the older Intel gigabit cards don't support MSI-X, and as such
get starved.

However, I haven't had the problem rear it's ugly head in quite a while,
but that's with the newest 8.2-STABLE tree. No idea if it's just chance or
if something was actually fixed.

When it was happening to me, it would also happen about every week or two
and I'd have to reboot the server.

--
Joshua Boyd

E-mail: boy...@jbip.net
http://www.jbip.net

Willem Jan Withagen

unread,
Nov 11, 2011, 3:33:23 AM11/11/11
to Joshua Boyd, sta...@freebsd.org, Vogel, Jack, Jeremy Chadwick
I checked and hw.pci.enable_msix=1, so that is on.
Any hints what to look for in the BIOS settings that might block MSI-X??

I'll also be ugrading the bios this weekend to if that will enable MSI-X.

Another solution would be to get a new version Intel Ethernet card?
And stick it in a PIC-X slot?
Or would that again suffer from starvation.

And as a side question:
Why would that starvation actually "crash" the driver/device?

--WjW

Willem Jan Withagen

unread,
Nov 13, 2011, 1:23:14 PM11/13/11
to Joshua Boyd, sta...@freebsd.org, Vogel, Jack, Jeremy Chadwick
Upgraded to a new bios, but that does not help either.

Now the trick question will be:
IF I get a new servertype PCI-E ethernet card, would that get me
an MSI-X ethernet device.

--WjW

Jack Vogel

unread,
Nov 13, 2011, 2:08:59 PM11/13/11
to Willem Jan Withagen, sta...@freebsd.org, Joshua Boyd, Vogel, Jack, Jeremy Chadwick
On Sun, Nov 13, 2011 at 10:22 AM, Willem Jan Withagen <w...@digiware.nl>wrote:

> On 2011-11-10 23:25, Joshua Boyd wrote:
>
>> On Thu, Nov 10, 2011 at 6:51 AM, Willem Jan Withagen <w...@digiware.nl
>> <mailto:w...@digiware.nl>> wrote:
>>
>> em0@pci0:0:25:0: class=0x020000 card=0x10bd15d9
>> chip=0x10bd8086 rev=0x02 hdr=0x00
>> vendor = 'Intel Corporation'
>> device = 'Intel 82566DM Gigabit Ethernet Adapter (82566DM)'
>> class = network
>> subclass = ethernet
>> bar [10] = type Memory, range 32, base 0xdf900000, size
>> 131072, enabled
>> bar [14] = type Memory, range 32, base 0xdf924000, size 4096,
>> enabled
>> bar [18] = type I/O Port, range 32, base 0x1820, size 32, enabled
>> cap 01[c8] = powerspec 2 supports D0 D3 current D0
>> cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
>> cap 13[e0] = PCI Advanced Features: FLR TP
>>
>>
>> And note that this problem only raises it nasty head very few weeks...
>>
>>
>> I have had the same problem, as shown here:
>>
>> http://lists.freebsd.org/**pipermail/freebsd-stable/2011-**
>> June/063092.html<http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/063092.html>
>>
>> According to your pciconf output, your card either doesn't support
>> MSI-X, or you have MSI-X disabled.
>>
>> Check the hw.pci.enable_msix sysctl and make sure that it is set to 1.
>> Also check to make sure there aren't any BIOS settings blocking MSI-X.
>>
>> Apparently the older Intel gigabit cards don't support MSI-X, and as
>> such get starved.
>>
>
> Upgraded to a new bios, but that does not help either.
>
> Now the trick question will be:
> IF I get a new servertype PCI-E ethernet card, would that get me
> an MSI-X ethernet device.
>
>
There is no 'trick' to it :) The only MSIX capable device that uses the em
driver is 82574. But if you go with igb (82575 and beyond) they are all
MSIX and multiqueue capable.

Jack
0 new messages