MP-02's falling over. My band aid.

18 views
Skip to first unread message

Coenraad Loubser

unread,
Jul 31, 2017, 3:57:35 PM7/31/17
to Village Telco Development Community
First of all, apologies for not being able to get involved with development sooner. Right now I'm tired of logging into an MP-02 every few hours to reboot it.... and am acutally dependent on an MP-02 to get some work done..... and a reboot fixes the issue. 

From the log:
[   52.880000] br-lan: port 3(bat0) entered forwarding state
[   54.880000] br-lan: port 3(bat0) entered forwarding state
[ 4024.000000] ieee80211 phy0: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 4024.010000] usb 1-1.3: USB disconnect, device number 4
[ 4024.020000] ieee80211 phy0: rt2x00usb_vendor_request: Error - Vendor Request 0x07 failed for offset 0x0438 with error -19
[ 4024.030000] ieee80211 phy0: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 4024.130000] batman_adv: bat0: Interface deactivated: wlan1-1
[ 4024.130000] batman_adv: bat0: Removing interface: wlan1-1

I didn't have an awful lot of time, but was willing to kill a few hours on this because it might render a small 10-node mesh operational again... 

I might be going overboard..... but an rmmod rt2800usb followed by a modprobe rt2800usb and setup of bssid and batman seems to repair the connection without a reboot.

Not having had the time to find the proper development wiki or guide, I resorted to putting two files in /overlayfs/ - and call them from /etc/rc.local

---
/etc/rc.local:

/overlayfs/mesh-watch > /dev/null 2>&1 &
exit 0

---
/overlayfs/mesh-watch:

#!/bin/sh
r=`route -n|grep ^0.0.0.0|awk '{print $2}'`
while true; do
  sleep 15
  ping -c2 $r && n=0 || n=$(($n+1))    
  [ $n -ge 5 ] && /overlay/mesh-reset
done

---
/overlayfs/mesh-reset:

#!/bin/sh
/usr/sbin/rmmod rt2800usb
/usr/sbin/modprobe rt2800usb
/bin/sleep 3
/usr/sbin/iw wlan1 interface add wlan1-1 type ibss
/sbin/ifconfig wlan1-1 up
/usr/sbin/iw dev wlan1-1 ibss join vt-mesh 5745
/usr/sbin/batctl if add wlan1-1
/usr/sbin/batctl bl 1
/sbin/uci set batman-adv.bat0.gw_mode=off
/usr/sbin/brctl addif br-lan bat0
/usr/sbin/batctl ap 0
/bin/sleep 120

---

The values above are obviously hardcoded for this particular mesh, which runs on wlan1 which is a USB-connected rt2870 meshing on 5745Mhz in Ad-hoc mode. It pings the gateway address, and after 90 seconds of timeouts, it resets the mesh. Depending on the size of the mesh, this is probably too short... and some randomness should probably added in case all the nodes somehow manage to sync up perfectly so they're all down right when they're supposed to be up and get stuck in a bad loop. The correct mesh parameters should probably also be read from the configurations. I poked around the startup scripts to look for a quicker fix, but in my haste I they weren't really readable and I couldn't find a combination that elegantly did just the above. 

Instead of reconnecting I guess the whole potato can be rebooted, but that results in dropping all the clients and about 2-3 minutes of downtime, whereas the above usally gets things going again in under a minute. 

Regards

Coenraad Loubser

unread,
Jul 31, 2017, 4:56:17 PM7/31/17
to Village Telco Development Community
The above seemed to get killed off, so I moved it to cron. Also, ping isn't necessary... it's enough to test for the presence of the bat0 interface.

rc.local:
crond
echo 0 > /tmp/z # down counter

crontab -e:
*/1 * * * * /overlay/mesh-watch

/overlay/mesh-watch:
#!/bin/sh
n=`cat /tmp/z`
[ $n -ge 3 ] && /overlay/mesh-reset
echo $n
grep -q bat0 /proc/net/dev && n=0 || n=$(($n+1))
echo -n $n > /tmp/z
echo $n

/overlay/mesh-reset:
#!/bin/sh
echo -n 0 > /tmp/z
/usr/sbin/rmmod rt2800usb
/usr/sbin/modprobe rt2800usb
/usr/sbin/iw wlan1 interface add wlan1-1 type ibss
#/sbin/ifconfig wlan1-1 hw ether 11:22:33:44:55:66 up

Steve Song

unread,
Jul 31, 2017, 5:07:27 PM7/31/17
to Village Telco Dev
Hi Coenraad,

That's a neat fix although I wish it were not necessary.  Can I ask what firmware you are running on the MP2s and whether all the MP2s are running the same firmware?  Do all the MP2s suffer from this problem?  Do they all have a usb rt2870 connected to them?

Cheers... Steve

--
You received this message because you are subscribed to the Google Groups "Village Telco Development Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to village-telco-dev+unsubscribe@googlegroups.com.
To post to this group, send email to village-telco-dev@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/village-telco-dev/51a23914-ae7e-43db-9759-ef0674ef575c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

T Gillett

unread,
Jul 31, 2017, 6:06:40 PM7/31/17
to village-telco-dev

Hi Coenraad

I suspect the key issue here is the
"USB disconnect" message at 4024.01

We have found that MP02s based on the Dragino MS14 board can have problems handling large USB memory devices. This seems to be an issue with buffering of the USB data lines.

I suspect that you are seeing the same issue with the Ralink based USB wifi device.

You can try using a USB hub with buffering to see if that cures the problem.

Regards
Terry


--
You received this message because you are subscribed to the Google Groups "Village Telco Development Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to village-telco-dev+unsubscribe@googlegroups.com.
To post to this group, send email to village-telco-dev@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages