Hi all,
I've deployed a number of Beagle Bone black units in West Africa as part of an emergency connectivity relief effort to support NGOs working to fight the Ebola outbreak. The beaglebones are providing a simple network monitoring function.
The beaglebones were imaged in November with the Ubuntu flasher downloaded from here
http://elinux.org/BeagleBoardUbuntu#Flasher (The version of the image is BBB-eMMC-flasher-ubuntu-14.04.1-console-armhf-2014-10-29-2gb.img)
I'm having an issue with a few of the beaglebones hanging unpredictably, and I know I should provide some more information to help diagnose...but I'm having a hard time finding any "smoking gun" of what's causing the hang. The beaglebones are in remote telco sheds monitoring network equipment - so one of my challenges is that I don't have a monitor connected or anyone I can ask "whats on the screen." Fortunately I do have the ability to power cycle remotely (see below).
Here's what I know:
- The beaglebones have not been modified much at all from the standard base flasher image. Just a few monitoring tools I've added from apt packages (smokeping and zabbix-proxy) I use these tools elsewhere, and I've never had an issues with them hanging a system.
- The systems run for weeks at a time just fine
- At some point, the systems in question will "hang". They stop responding to pings, but the ethernet port of the router they are connected to still shows a link light.
- Because I have the beaglebone connected to a remote manageable power strip / PDU, I am able to power cycle the beaglebone when this happens. This causes the unit to boot normally, and it functions normally before the problem reoccurs another few weeks later.
Each beaglebone is powered by a dedicated 5V / 1A power supply connected to its barrel connector. Other equipment at the site does not hang or reboot - so I know the beaglebone hang does not coincide with a power issue at the site.
Can anyone give me any tips on diagnosing this? I can see the time of hang and powercycle in dmesg and syslog....but there's no hint there as to what happened. Everything was "all conditions normal" before the hang.
Has anyone seen this behavior before?
Thanks so much - any help greatly appreciated!