First of all, apologies for not being able to get involved with development sooner. Right now I'm tired of logging into an MP-02 every few hours to reboot it.... and am acutally dependent on an MP-02 to get some work done..... and a reboot fixes the issue.
From the log:
[ 52.880000] br-lan: port 3(bat0) entered forwarding state
[ 54.880000] br-lan: port 3(bat0) entered forwarding state
[ 4024.000000] ieee80211 phy0: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 4024.010000] usb 1-1.3: USB disconnect, device number 4
[ 4024.020000] ieee80211 phy0: rt2x00usb_vendor_request: Error - Vendor Request 0x07 failed for offset 0x0438 with error -19
[ 4024.030000] ieee80211 phy0: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 4024.130000] batman_adv: bat0: Interface deactivated: wlan1-1
[ 4024.130000] batman_adv: bat0: Removing interface: wlan1-1
I didn't have an awful lot of time, but was willing to kill a few hours on this because it might render a small 10-node mesh operational again...
I might be going overboard..... but an rmmod rt2800usb followed by a modprobe rt2800usb and setup of bssid and batman seems to repair the connection without a reboot.
Not having had the time to find the proper development wiki or guide, I resorted to putting two files in /overlayfs/ - and call them from /etc/rc.local
---
/etc/rc.local:
/overlayfs/mesh-watch > /dev/null 2>&1 &
exit 0
---
/overlayfs/mesh-watch:
#!/bin/sh
r=`route -n|grep ^0.0.0.0|awk '{print $2}'`
while true; do
sleep 15
ping -c2 $r && n=0 || n=$(($n+1))
[ $n -ge 5 ] && /overlay/mesh-reset
done
---
/overlayfs/mesh-reset:
#!/bin/sh
/usr/sbin/rmmod rt2800usb
/usr/sbin/modprobe rt2800usb
/bin/sleep 3
/usr/sbin/iw wlan1 interface add wlan1-1 type ibss
/sbin/ifconfig wlan1-1 up
/usr/sbin/iw dev wlan1-1 ibss join vt-mesh 5745
/usr/sbin/batctl if add wlan1-1
/usr/sbin/batctl bl 1
/sbin/uci set batman-adv.bat0.gw_mode=off
/usr/sbin/brctl addif br-lan bat0
/usr/sbin/batctl ap 0
/bin/sleep 120
---
The values above are obviously hardcoded for this particular mesh, which runs on wlan1 which is a USB-connected rt2870 meshing on 5745Mhz in Ad-hoc mode. It pings the gateway address, and after 90 seconds of timeouts, it resets the mesh. Depending on the size of the mesh, this is probably too short... and some randomness should probably added in case all the nodes somehow manage to sync up perfectly so they're all down right when they're supposed to be up and get stuck in a bad loop. The correct mesh parameters should probably also be read from the configurations. I poked around the startup scripts to look for a quicker fix, but in my haste I they weren't really readable and I couldn't find a combination that elegantly did just the above.
Instead of reconnecting I guess the whole potato can be rebooted, but that results in dropping all the clients and about 2-3 minutes of downtime, whereas the above usally gets things going again in under a minute.
Regards