eth0 mysteriously stops working

107 views
Skip to first unread message

William Hermans

unread,
Oct 18, 2016, 11:41:44 PM10/18/16
to BeagleBoard
I had a board a few days ago that just stopped working as far as ssh went. So, I checked the LEDs on my GbE switch, and the lights were not lit. Checking further, I found that the LEDs on the beaglebone were also not lit.

So I disconnected the ethernet jack on both ends and reseated. Nothing. The ethernet did not start working again until I rebooted the board, by physically pressing the rest button on the board.

I was curious if anyone else had experienced this same thing. This, for me has actually happened only once. The board this happened to was a Beaglebone green, running from the eMMC. Currently I'm running from sdcard but . . .
william@beaglebone:~$ sudo mount /dev/mmcblk1p1 /media/rootfs/
william@beaglebone
:~$ cd /media/rootfs/
william@beaglebone
:/media/rootfs$ cat etc/dogtag
BeagleBoard.org Debian Image 2016-06-19
william@beaglebone
:/media/rootfs$ ls boot/ |grep init
initrd
.img-4.4.12-ti-r31



grepping through the various files in /var/log shows that everything was working fine as far as I can tell. No error messages that stand out for 'net' or 'eth0'. I've also talked with a person I know who has experienced this them self. Except for them, it has happened more than once, including with a 3.8.x kernel as well.

As far as when this happened to me personally, The board was just idling for several days(5 - 6 days ) when I needed to get some information from the board, and not response via ssh.

Graham

unread,
Oct 19, 2016, 6:24:35 AM10/19/16
to BeagleBoard
I have two BBG units that I use as headless servers, with only access through Ethernet.  Both have been running without reboot for multiple months without any issues.  I think that I mentioned that I did have a BBB do exactly what you describe, while running as a headless server last year, but at the time there was a thunderstorm in the area, and lightning strikes in the neighborhood. It recovered on reboot, and has never repeated the symptom.

So, my conclusion is that it is possible to happen, but rare, and in my case was probably caused by electrical transient coming in the Ethernet connection which is routed from a cable modem to the outside world.

For high reliability application, perhaps some extra transient protection on the Ethernet connection, and some kind of "ping monitor" that can auto-reboot the BBG.

--- Graham

==

evilwulfie

unread,
Oct 19, 2016, 10:48:05 AM10/19/16
to beagl...@googlegroups.com
what version of the OS and kernel are you using?
--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to the Google Groups "BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beagleboard/83cbe00b-086c-4f61-bc44-015fdb5aef05%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


William Hermans

unread,
Oct 19, 2016, 1:56:18 PM10/19/16
to beagl...@googlegroups.com

On Wed, Oct 19, 2016 at 3:24 AM, Graham <gra...@flex-radio.com> wrote:
I have two BBG units that I use as headless servers, with only access through Ethernet.  Both have been running without reboot for multiple months without any issues.  I think that I mentioned that I did have a BBB do exactly what you describe, while running as a headless server last year, but at the time there was a thunderstorm in the area, and lightning strikes in the neighborhood. It recovered on reboot, and has never repeated the symptom.

So, my conclusion is that it is possible to happen, but rare, and in my case was probably caused by electrical transient coming in the Ethernet connection which is routed from a cable modem to the outside world.

For high reliability application, perhaps some extra transient protection on the Ethernet connection, and some kind of "ping monitor" that can auto-reboot the BBG.

--- Graham

I haven't had a BBG Until the last 2-3 months to play with. Now, I've had ~30 over the course of the last 2 months to observe this behavior on. Which again has only happen once. So, I attributed what happen to me accidentally knocking the board around a little. Until I talked with another person I know who has experienced this issue with multiple kernels, and multiple times over the last I don't know . . . maybe 6 months.

So what I did was first installed the same Debian image he was using, then changed kernels to the *bone* LTS kernel. Removed g_ether, by removing Robert's custom boot script for the 335x evm board. After that I got the project files from this person I know and duplicated his software setup. Which is a mqtt application. With a custom cape.

Anyway, I was running this software last night, and then I downloaded and ran nload from a ssh session. But I keep getting ssh Broken pipe errors. Which is not necessarily a concern. I've seen that  before. I intend to hook up a serial debug cable and run nload from that, but just have not gotten around to it.

One thing on my mind is that perhaps the software this person I know wrote is somehow failing to deal with a "busy network" properly. Meaning if the internet connection is bandwidth saturated, and the application is for some reason unable to deal with a "stale connection" How will it act ? However, I would not think this should cause the hardware to fail. Because that's what I'm seeing when the ethernet traffic indication LEDs stop functioning, While also rendering the ethernet connection non functional. What I was able to observe so far however. Was that this application sends around 8-9kBit/s data, and gets 2-3kBit/s back.

Another concern: Knowing that mqtt by default is an inherently insecure protocol, and this app does currently run as root . . .However there areseveral caveats to this statement / concern. First, the application is a peer to peer design in that only the mqtt broker can communicate with the board. Whether it sends commands, or collects data back from the board. Second, mqtt is able to use certificates, however I do not htink that is currently the case with this software *YET*. I given this person I know the standard security lecture on running root, and locking things down, etc. We just have not acted on it yet

With all of the above mentioned. When I ran into this issue myself, I was not running anything other than a stock image, and the stock software that comes with it. While the board was also just idling for 5-6 days. Maybe a little longer. I ran uptime from an ssh session where it reported back "5 days . . ." After which this happened. So I'm more inclined to think this is most likely not a userspace application issue.

I'm not even sure where to go from here, as far as tracking this issue down. All I can really do is throw everything I know / have at the board, and hope I get an error trapped from the live kernel log through serial.

Robert Nelson

unread,
Oct 19, 2016, 2:05:12 PM10/19/16
to Beagle Board
I think it's related to suspend/cpuidle.. I know another user was
having issues, where they had to ping it twice, as the first would
never respond..

one thing that might help: remove the sleep pinmux's from: mac/davinci_mdio:

https://github.com/RobertCNelson/dtb-rebuilder/blob/4.4-ti/src/arm/am335x-bone-common.dtsi#L370-L383

Regards,

--
Robert Nelson
https://rcn-ee.com/

William Hermans

unread,
Oct 19, 2016, 2:10:00 PM10/19/16
to beagl...@googlegroups.com
Thanks Robert,

I'll check that out, So when you sasy "remove the sleeps". I just delete "sleep" from pinctrl-names = "default", "sleep"; or do I need to also remove pinctrl-1 = <&cpsw_sleep>; as well ?

--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to the Google Groups "BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beagleboard/CAOCHtYiMw40NSswGzXJGas3xMkjAqwL79T8%3DyOinDmcfYFg4Kw%40mail.gmail.com.

William Hermans

unread,
Oct 19, 2016, 2:11:21 PM10/19/16
to beagl...@googlegroups.com
I would think both, but honestly don't know . . .

Robert Nelson

unread,
Oct 19, 2016, 2:15:47 PM10/19/16
to Beagle Board
On Wed, Oct 19, 2016 at 1:09 PM, William Hermans <yyr...@gmail.com> wrote:
> Thanks Robert,
>
> I'll check that out, So when you sasy "remove the sleeps". I just delete
> "sleep" from pinctrl-names = "default", "sleep"; or do I need to also remove
> pinctrl-1 = <&cpsw_sleep>; as well ?


Yeah, from:

&mac {
pinctrl-names = "default", "sleep";
pinctrl-0 = <&cpsw_default>;
pinctrl-1 = <&cpsw_sleep>;
slaves = <1>;
status = "okay";
};

&davinci_mdio {
pinctrl-names = "default", "sleep";
pinctrl-0 = <&davinci_mdio_default>;
pinctrl-1 = <&davinci_mdio_sleep>;
status = "okay";
};

to:

&mac {
pinctrl-names = "default";
pinctrl-0 = <&cpsw_default>;
slaves = <1>;
status = "okay";
};

&davinci_mdio {
pinctrl-names = "default";
pinctrl-0 = <&davinci_mdio_default>;
status = "okay";
};

William Hermans

unread,
Oct 19, 2016, 2:28:41 PM10/19/16
to beagl...@googlegroups.com
Thanks again Robert,

So I'll have to download the overlay board file repo, edit, and then install but hummm. Been a while I need:

##BeagleBone Black: HDMI (Audio/Video) disabled:
dtb=am335x-boneblack-emmc-overlay.dtb

Which probably loads the common overlay file, so . ..yeah ok think I got it. Going to be busy today with other things( unavoidable ) so might be tomorrow before I can write back success in modifying the board file. After which I can hopefully get this modification out to be tested on multiple boards by this person I know.

I'll post full instructions for others here, when I get the chance. SO others can test, and potentially fix the same issue if needed.

--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to the Google Groups "BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard+unsubscribe@googlegroups.com.

William Hermans

unread,
Oct 20, 2016, 1:15:26 AM10/20/16
to beagl...@googlegroups.com
Yeah I'm locked in a boot loop ending here:

Starting kernel ...

[    3.341689] CPUidle arm: CPU 0 failed to init idle CPU ops
[    3.347892] omap_hsmmc 48060000.mmc: unable to obtain RX DMA engine channel 3706465728
[    3.356275] omap_hsmmc 481d8000.mmc: unable to obtain RX DMA engine channel 3706465648
[    3.366324] wkup_m3_rproc 44d00000.wkup_m3: Platform data missing!
[    3.374426] omap_voltage_late_init: Voltage driver support not added
[    3.381301] cpu cpu0: cpu0 clock notifier not ready, retry
[    3.482097] bone_capemgr bone_capemgr: Invalid signature 'ffffffff' at slot 0
[    3.489295] bone_capemgr bone_capemgr: slot #0: No cape found
[    3.548114] bone_capemgr bone_capemgr: slot #1: No cape found
[    3.608112] bone_capemgr bone_capemgr: slot #2: No cape found
[    3.668112] bone_capemgr bone_capemgr: slot #3: No cape found
[    3.675302] cpsw 4a100000.ethernet: Missing rx_descs property in the DT.
[    3.682080] cpsw 4a100000.ethernet: cpsw: platform data missing
Loading, please wait...

A few points of contention. Kernel is a the LTS 4.1.x*bone-rt* variant. Updated yesterday. Then the board is a beaglebone green, but I rebuilt am335x-boneblack-emmc-overlay.dtb. Which is the same overlay file I was loading previous to rebuilding.

William Hermans

unread,
Oct 20, 2016, 1:17:05 AM10/20/16
to beagl...@googlegroups.com
Overlay, meaning board file.

William Hermans

unread,
Oct 20, 2016, 8:25:31 PM10/20/16
to beagl...@googlegroups.com
So, at this point I think I'll have to decompile both board files, and then run diff to see what's different.

William Hermans

unread,
Oct 20, 2016, 11:27:51 PM10/20/16
to beagl...@googlegroups.com
So after decompiling the two files and comparing with diff, then piping to a file . . . the diff file is literally 2186 lines in length . . . wtf ?

William Hermans

unread,
Oct 21, 2016, 12:53:23 AM10/21/16
to BeagleBoard
Ok,I had to modify my workflow, but I do believe I got the changes put into place. Not sure why Robert's way was not working, but I'm used to thinking outside the box, or looking at multiple ways to achieve the same results . . .

You board file name, and kernel version will depend on which board file you need to use, and which kernel you're running . . .

william@beaglebone:~/dev$ cp /boot/dtbs/4.1.34-bone-rt-r24/am335x-boneblack-emmc-overlay.dtb .


 Search for "sleep"

 Line 810-814 for me, remove:


                     cpsw_sleep {
                        pinctrl
-single,pins = <0x108 0x27 0x10c 0x27 0x110 0x27 0x114 0x27 0x118 0x27 0x11c 0x27 0x120 0x27 0x124 0x27 0x128 0x27 0x12c 0x27 0x130 0x27 0x134 0x27 0x138 0x27 0x13c 0x27 0x140 0x27>;
                        linux
,phandle = <0x37>;
                        phandle
= <0x37>;
                   
};



line 816-820 remove:

                    davinci_mdio_sleep {
                        pinctrl
-single,pins = <0x148 0x27 0x14c 0x27>;
                        linux
,phandle = <0x39>;
                        phandle
= <0x39>;
                   
};


Line 1827 change:

pinctrl-names = "default", "sleep";



to:
pinctrl-names = "default";



Line 1841 change:

pinctrl-names = "default", "sleep";



to:
pinctrl-names = "default";



Line 2165 delete this whole line:
cpsw_sleep = "/ocp/l4_wkup@44c00000/scm@210000/pinmux@800/cpsw_sleep";



Line 2166 delete this whole line:

davinci_mdio_sleep = "/ocp/l4_wkup@44c00000/scm@210000/pinmux@800/davinci_mdio_sleep";



Then save, and exit the file. After that rname the old board file:
william@beaglebone:~/dev$ mv am335x-boneblack-emmc-overlay.dtb am335x-boneblack-emmc-overlay.dtb.old



Now compile the newly edited source file back into the original board file name / extension:
william@beaglebone:~/dev$ dtc -I dts -O dtb -o am335x-boneblack-emmc-overlay.dtb am335x-boneblack-emmc-overlay.dts



For convience, since I use an NFS share to do most of my work on, I prefer to move both the new dtb, and old dtb to the destination:
william@beaglebone:~/dev$ sudo cp am335x-boneblack-emmc-overlay.dtb* /boot/dtbs/4.1.34-bone-rt-r24/



Double check:
william@beaglebone:~/dev$ ls /boot/dtbs/4.1.34-bone-rt-r24/ |grep emmc
am335x
-boneblack-emmc-overlay.dtb
am335x
-boneblack-emmc-overlay.dtb.old


Reboot:
william@beaglebone:~/dev$ sudo reboot


Now do keep in mind. Just because I'm calling out line numbers here does not mean they will be the same for you. But if you use a good text editor, you can search for "sleep", and should only find these 6 occurrences in your decompiled source file. With that said, always double check to make sure what you're deleting / changing, is actually what needs to be changed.

William Hermans

unread,
Oct 21, 2016, 1:16:04 AM10/21/16
to BeagleBoard
Sorry I missed putting the decompile step in my workflow. This is the very next step that is used after making a copy of the board file form the /boot/dtbs/<kernel version>/ directory:
dtc -I dtb -O dts -o am335x-boneblack-emmc-overlay.dts am335x-boneblack-emmc-overlay.dtb

William Hermans

unread,
Oct 21, 2016, 12:53:18 PM10/21/16
to beagl...@googlegroups.com
So, I'm still getting "Write failed: broken pipe" using ssh from my debian support system to the beaglebone. This is not a timeout issue at all. As in the ssh session I'm running nload which constantly displays eth0 bandwidth usage.

--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to the Google Groups "BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages