Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#983818: linux-image-5.10.0-3-arm64: often fails to bring up eth0 / dwmac_rk module

650 views
Skip to first unread message

Salvatore Bonaccorso

unread,
Sep 26, 2021, 3:00:03 AM9/26/21
to
Control: tags -1 + moreinfo

On Mon, Mar 01, 2021 at 03:33:27PM -0800, Forest wrote:
> Package: src:linux
> Version: 5.10.13-1
> Severity: critical
> Justification: breaks unrelated software
>
> Dear Maintainer,
>
> When booting recent kernels on a RockPro64 board (rk3399), eth0 often fails
> to come up, leaving this headless box practically unusable without serial
> console intervention. Logging in on the console and using rmmod/modprobe to
> reload dwmac_rk revives the network interface and allows normal operation
> until the next reboot.
>
> Logs are included below, but the last relevant dmesg errors seem to be:
> rk_gmac-dwmac fe300000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed
> rk_gmac-dwmac fe300000.ethernet eth0: stmmac_open: Hw setup failed
>
> The problem doesn't occur on every boot. I haven't determined what conditions
> make it more likely. It's possible that scheduling an fsck makes it happen
> less frequently (perhaps there's a timing issue?) but it's hard to say; it
> still happens fairly often even when I fsck on every boot.
>
> The problem appeared some time in the past two or three months. I keep up
> with unstable kernel updates, so maybe a semi-recent change in the unstable
> kernel caused it? Or maybe I was just lucky until recently.
>
> Curiously, when dropbear ssh launches from initramfs, it never has trouble
> with eth0. The problem doesn't show up until after dropbear has run and I
> have unlocked my root filesystem over ssh and boot continues. I am using an
> initramfs-tools tweak to make dropbear work around #968519, so I suppose that
> bug and the present problem could be related but merely avoided by dropbear.
> However, #968519 was present long before the present problem appeared.
>
> Here's someone else experiencing this problem:
> https://forum.pine64.org/showthread.php?tid=9351&pid=87304#pid87304

Could you try with the current kernel in unstable? We are at 5.14.6-2,
which had some rk3399 related changes. If you still can reproduce the
issue my best guess would be to make a report upstream, presumably
something around contacting

Giuseppe Cavallaro <peppe.c...@st.com> (supporter:STMMAC ETHERNET DRIVER)
Alexandre Torgue <alexandr...@foss.st.com> (supporter:STMMAC ETHERNET DRIVER)
Jose Abreu <joa...@synopsys.com> (supporter:STMMAC ETHERNET DRIVER)
"David S. Miller" <da...@davemloft.net> (maintainer:NETWORKING DRIVERS)
Jakub Kicinski <ku...@kernel.org> (maintainer:NETWORKING DRIVERS)
Maxime Coquelin <mcoquel...@gmail.com> (maintainer:ARM/STM32 ARCHITECTURE)
Philipp Zabel <p.z...@pengutronix.de> (maintainer:RESET CONTROLLER FRAMEWORK)
Liam Girdwood <lgir...@gmail.com> (supporter:VOLTAGE AND CURRENT REGULATOR FRAMEWORK)
Mark Brown <bro...@kernel.org> (supporter:VOLTAGE AND CURRENT REGULATOR FRAMEWORK)
net...@vger.kernel.org (open list:STMMAC ETHERNET DRIVER)
linux...@st-md-mailman.stormreply.com (moderated list:ARM/STM32 ARCHITECTURE)
linux-ar...@lists.infradead.org (moderated list:ARM/STM32 ARCHITECTURE)
linux-...@vger.kernel.org (open list)

(and keeping us downstream in the loop).

Regards,
Salvatore

Forest

unread,
Sep 26, 2021, 9:00:02 PM9/26/21
to
Control: tags -1 - moreinfo

>Could you try with the current kernel in unstable?
>We are at 5.14.6-2, which had some rk3399 related changes.

Did any of those changes arrive after 5.14.0-1? If so, I suppose I would
have to wait for a newer debian kernel to appear before I could test it.

With 5.14.0-1 (the version in unstable), the results are worse:

Dropbear no longer works. Error message:
/scripts/init-premount/dropbear: .: line 333: can't open '/run/net-*.conf': No such file or directory

Using a serial console for LUKS unlock and then running rmmod dwmac_rk /
modprobe dwmac_rk no longer brings up eth0.

The dmesg output has changed a bit:
$ egrep 'mac|eth0' dmesg.linux-image-5.14.0-1-arm64
[ 5.708873] rk_gmac-dwmac fe300000.ethernet: IRQ eth_wake_irq not found
[ 5.709470] rk_gmac-dwmac fe300000.ethernet: IRQ eth_lpi not found
[ 5.710965] rk_gmac-dwmac fe300000.ethernet: PTP uses main clock
[ 5.712133] rk_gmac-dwmac fe300000.ethernet: clock input or output? (input).
[ 5.713263] rk_gmac-dwmac fe300000.ethernet: TX delay(0x28).
[ 5.714418] rk_gmac-dwmac fe300000.ethernet: RX delay(0x11).
[ 5.716512] rk_gmac-dwmac fe300000.ethernet: integrated PHY? (no).
[ 5.717492] rk_gmac-dwmac fe300000.ethernet: cannot get clock clk_mac_speed
[ 5.719275] rk_gmac-dwmac fe300000.ethernet: clock input from PHY
[ 5.724825] rk_gmac-dwmac fe300000.ethernet: init for RGMII
[ 5.725658] rk_gmac-dwmac fe300000.ethernet: User ID: 0x10, Synopsys ID: 0x35
[ 5.726328] rk_gmac-dwmac fe300000.ethernet: DWMAC1000
[ 5.726802] rk_gmac-dwmac fe300000.ethernet: DMA HW capability register supported
[ 5.727511] rk_gmac-dwmac fe300000.ethernet: RX Checksum Offload Engine supported
[ 5.728183] rk_gmac-dwmac fe300000.ethernet: COE Type 2
[ 5.728652] rk_gmac-dwmac fe300000.ethernet: TX Checksum insertion supported
[ 5.729275] rk_gmac-dwmac fe300000.ethernet: Wake-Up On Lan supported
[ 5.731134] rk_gmac-dwmac fe300000.ethernet: Normal descriptors
[ 5.731674] rk_gmac-dwmac fe300000.ethernet: Ring mode enabled
[ 5.732192] rk_gmac-dwmac fe300000.ethernet: Enable RX Mitigation via HW Watchdog Timer
[ 5.851291] libphy: stmmac: probed
[ 5.851612] RTL8211F Gigabit Ethernet stmmac-0:00: attached PHY driver (mii_bus:phy_addr=stmmac-0:00, irq=POLL)
[ 5.852504] RTL8211F Gigabit Ethernet stmmac-0:01: attached PHY driver (mii_bus:phy_addr=stmmac-0:01, irq=POLL)
[ 6.639085] rk_gmac-dwmac fe300000.ethernet eth0: PHY [stmmac-0:00] driver [RTL8211F Gigabit Ethernet] (irq=POLL)
[ 6.641458] rk_gmac-dwmac fe300000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
[ 6.653320] rk_gmac-dwmac fe300000.ethernet eth0: No Safety Features support found
[ 6.653364] rk_gmac-dwmac fe300000.ethernet eth0: PTP not supported by HW
[ 6.654183] rk_gmac-dwmac fe300000.ethernet eth0: configuring for phy/rgmii link mode
[ 9.760371] rk_gmac-dwmac fe300000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 9.760429] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 199.685064] rk_gmac-dwmac fe300000.ethernet eth0: Link is Down
[ 207.195113] rk_gmac-dwmac fe300000.ethernet eth0: PHY [stmmac-0:00] driver [RTL8211F Gigabit Ethernet] (irq=POLL)
[ 207.197673] rk_gmac-dwmac fe300000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
[ 207.206976] rk_gmac-dwmac fe300000.ethernet eth0: No Safety Features support found
[ 207.207681] rk_gmac-dwmac fe300000.ethernet eth0: PTP not supported by HW
[ 207.208307] rk_gmac-dwmac fe300000.ethernet eth0: configuring for phy/rgmii link mode
[ 210.304576] rk_gmac-dwmac fe300000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 210.305423] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

Diederik de Haas

unread,
Jul 29, 2022, 6:30:03 PM7/29/22
to
Control: tag -1 moreinfo

On Mon, 01 Mar 2021 15:33:27 -0800 Forest <fore...@sonic.net> wrote:
> Package: src:linux
> Version: 5.10.13-1
>
> When booting recent kernels on a RockPro64 board (rk3399), eth0 often fails
> to come up, leaving this headless box practically unusable without serial
> console intervention. Logging in on the console and using rmmod/modprobe to
> reload dwmac_rk revives the network interface and allows normal operation
> until the next reboot.

Is this problem still present with a recent 5.10 or (better yet) the 5.18.14
kernel from Unstable?
signature.asc

Forest

unread,
Jul 30, 2022, 8:10:03 PM7/30/22
to
Control: found -1 5.10.127-2
Control: notfound -1 5.18.14-1
Control: tags -1 - moreinfo

On Sat, 30 Jul 2022 00:19:25 +0200, Diederik de Haas wrote:

>Is this problem still present with a recent 5.10 or (better yet) the 5.18.14
>kernel from Unstable?

It is still present in recent 5.10 kernels.

5.18.14-1 from unstable hasn't shown the failure in about a dozen boots.
That's encouraging. I haven't done a bisect, but some relatively recent
commits (e.g. aec3f415) mention dwmac-rk. Perhaps one of those fixed it?

Diederik de Haas

unread,
Jul 31, 2022, 6:40:04 AM7/31/22
to
On Sunday, 31 July 2022 01:51:08 CEST Forest wrote:
> >Is this problem still present with a recent 5.10 or (better yet) the
> >5.18.14 kernel from Unstable?
>
> It is still present in recent 5.10 kernels.
>
> 5.18.14-1 from unstable hasn't shown the failure in about a dozen boots.
> That's encouraging. I haven't done a bisect, but some relatively recent
> commits (e.g. aec3f415) mention dwmac-rk. Perhaps one of those fixed it?

That's certainly encouraging and the commit message makes it appear quite
relevant indeed.
From the partial logs you shared it appeared that your network also went down
after (quite) some time, which is consistent with the commit message.

What's odd then is that that commit has been applied/backported to the 5.10
kernel under commit 97653ba562b9b28e30a3fcff42531e05a434d58c which is part of
5.10.82, so also 5.10.127-2 ...
signature.asc

Forest

unread,
Jul 31, 2022, 7:00:03 PM7/31/22
to
On Sun, 31 Jul 2022 12:30:42 +0200, Diederik de Haas wrote:

>From the partial logs you shared it appeared that your network also went down
>after (quite) some time,

If you're referring to my 5.14.0-1 kernel log, I can't offer any insight, as
I only tried that kernel briefly, nearly a year ago.

If you mean the 5.10 kernel, let me clarify:

1. 5.10 reliably brings up eth0 early enough for dropbear sshd to work.
2. I ssh to dropbear, enter the LUKS passphrase, and the root filesystem is
unlocked.
3. When eth0 fails, it's always shortly after that, still during system
startup.
4. Once I notice, attach a serial terminal, and reload the kernel module,
eth0 comes up and stays up. It doesn't go down again later.

I find it curious that eth0 comes up reliably and then *sometimes* goes down
shortly afterward. I don't know if it's completely random or something
later in the startup process is triggering it.

Obviously, a delay between eth0 first coming up and when it goes down could
be partly from the time it takes me to ssh and type a LUKS passphrase.

>What's odd then is that that commit has been applied/backported to the 5.10
>kernel under commit 97653ba562b9b28e30a3fcff42531e05a434d58c which is part of
>5.10.82, so also 5.10.127-2 ...

Ah, I didn't notice that patch having been backported with a different
commit ID. Thanks for mentioning it.
0 new messages