GnuBee - Network Driver Crash

35 views
Skip to first unread message

penguinpages

unread,
Dec 30, 2021, 1:52:17 PM12/30/21
to GnuBee

So trying to put the NAS into a more production role... and three times now run into weird


Design: 
ethblack@eth0: 172.16.100.110
ethblue@eth0: 172.16.101.110
root@pandora:~# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda           8:0    0   1.8T  0 disk
`-md0         9:0    0   3.7T  0 raid5
  `-md0p1   259:0    0   3.2T  0 part  /media/md0
sdb           8:16   0   1.8T  0 disk
`-md0         9:0    0   3.7T  0 raid5
  `-md0p1   259:0    0   3.2T  0 part  /media/md0
sdc           8:32   0   1.8T  0 disk
`-md0         9:0    0   3.7T  0 raid5
  `-md0p1   259:0    0   3.2T  0 part  /media/md0
hosts are on the 172.16.101.0/24 network and use just NFS vi single export
/media/md0/vms  172.16.0.0/255.255.0.0

Symptom:  NFS mount via VMWare.  Copy starts.. gets copy around 2-4GB of the 10GB VM and then transfer dies.

Some times the network interfaces go offline completely...restart of network services does NOT resolve.  No ping out. No L2 /MAC showing on switch but link light showing lit.. so some low level issue.  Reboot :fixes" but issue is 100% repeatable after six copy attempts. 

Maybe others have seen this error and or has suggestion.

I will post back to this thread as I root cause a bit more.

Ideas welcome. Thx





syslog_nic_kernel_crash.txt

Neil Brown

unread,
Jan 5, 2022, 9:27:21 PM1/5/22
to GnuBee
I haven't seen anything like that - but I probably don't transfer multiple gigabytes at once very often.
The netdev watchdog message suggests that a transmission started but didn't complete, which probably means a lost interrupt.
I've recently built a 5.15 kernel.  There are several changes to the ethernet driver since 5.10.  None look like they are obviously related to your problem, so I wouldn't get your hopes up.  But it is worth a try.






Jeremey Wise

unread,
Jan 6, 2022, 8:42:17 AM1/6/22
to Neil Brown, GnuBee
Not that this is a huge help but it adds color.

The streaming that was going on was from two sources.   3 x 1080p cameras... I somehow had set to stream all vs just triggers.   Once I fixed that traffic dropped a lot :)

Also, backups from VMs and VM hosting from VMWare / K8 clusters is what pushed it over the edge,   

I also found mismatch in jumbo frames when I did explicite ping test..  

[root@thor:~] vmkping -I vmk1 -d -s 1472 pandoras.penguinpages.local
PING pandoras.penguinpages.local (172.16.101.110): 1472 data bytes
1480 bytes from 172.16.101.110: icmp_seq=0 ttl=64 time=0.382 ms
1480 bytes from 172.16.101.110: icmp_seq=1 ttl=64 time=0.404 ms
1480 bytes from 172.16.101.110: icmp_seq=2 ttl=64 time=0.532 ms

--- pandoras.penguinpages.local ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.382/0.439/0.532 ms

[root@thor:~] ^C
[root@thor:~] vmkping -I vmk1 -d -s 8972 pandoras.penguinpages.local
PING pandoras.penguinpages.local (172.16.101.110): 8972 data bytes

--- pandoras.penguinpages.local ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss

[root@thor:~]

Still working that out.... 



--
You received this message because you are subscribed to a topic in the Google Groups "GnuBee" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gnubee/WiOoGQfhFHI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gnubee+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gnubee/d13c9546-0841-4ed7-a976-7748f0a472d5n%40googlegroups.com.


--

Jeremey Wise

unread,
Jan 6, 2022, 9:01:19 AM1/6/22
to Neil Brown, GnuBee
Hmm..

I think, yet again, this NIC is very limited in features.   And does not support jumbo frames

root@pandora:/etc/network# ifconfig eth0 mtu 9000
SIOCSIFMTU: Invalid argument
root@pandora:/etc/network# hwinfo --netcard
07: None 00.0: 0200 Ethernet controller
  [Created at pci.1017]
  Unique ID: S7lj.TfPFe67fIBB
  SysFS ID: /devices/platform/1e100000.ethernet
  SysFS BusID: 1e100000.ethernet
  Hardware Class: network
  Model: "ARM Ethernet controller"
  Device: "ARM Ethernet controller"
  Driver: "mtk_soc_eth"
  Device File: eth0
  HW Address: 5e:73:b5:84:62:8b
  Link detected: yes
  Module Alias: "of:NethernetT(null)Cmediatek,mt7621-eth"
  Config Status: cfg=new, avail=yes, need=no, active=unknown

09: None 00.0: 0200 Ethernet controller
  [Created at pci.2177]
  Unique ID: ue9r.zpNUCgvBch2
  SysFS ID: /devices/platform/1e100000.ethernet/mdio_bus/mdio-bus/mdio-bus:00
  SysFS BusID: mdio-bus:00
  Hardware Class: network
  Model: "Ethernet controller"
  Device: "Ethernet controller"
  Driver: "mt7530"
  Device File: ethblue
  HW Address: 90:50:5a:55:3a:51
  Link detected: yes
  Config Status: cfg=new, avail=yes, need=no, active=unknown

10: None 00.0: 0200 Ethernet controller
  [Created at pci.2177]
  Unique ID: ue9r.zpNUCgvBch2
  SysFS ID: /devices/platform/1e100000.ethernet/mdio_bus/mdio-bus/mdio-bus:00
  SysFS BusID: mdio-bus:00
  Hardware Class: network
  Model: "Ethernet controller"
  Device: "Ethernet controller"
  Driver: "mt7530"
  Device File: ethblack
  HW Address: 90:50:5a:55:3a:50
  Link detected: yes
  Config Status: cfg=new, avail=yes, need=no, active=unknown


--
Reply all
Reply to author
Forward
0 new messages