Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1060706: linux-image-6.1.0-17-amd64: intel i225 NIC loses PCIe link, network becomes unusable

95 views
Skip to first unread message

Diederik de Haas

unread,
Jan 13, 2024, 7:00:04 AM1/13/24
to
On Saturday, 13 January 2024 11:45:29 CET Arno Lehmann wrote:
> Hardware name: ASUS System Product Name/ROG STRIX X670E-A GAMING WIFI,
> BIOS 1410 04/28/2023

Possibly not related, but there's BIOS 1807 available.
signature.asc

Salvatore Bonaccorso

unread,
Jan 13, 2024, 8:00:05 AM1/13/24
to
Control: tags -1 + moreinfo

On Sat, Jan 13, 2024 at 11:45:29AM +0100, Arno Lehmann wrote:
> Package: src:linux
> Version: 6.1.69-1
> Severity: normal
> Tags: upstream
>
> Dear Maintainer,
>
>
> just having the computer run for a while, the network loses connection because
> the NIC detached from PCIe. I suspect this is related to power management but
> am not really sure.
>
> As this seemed to be a known problem, I added pcie_aspm=off to the kernel
> command line.
>
> The problem happens more or less randomly, the computer is usually running 24/7:
>
> # journalctl --grep 'PCIe link lost' --quiet | cat
> Sep 20 14:21:17 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Okt 06 05:44:20 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Okt 07 16:39:10 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device) (uninitialized): PCIe link lost, device now detached
> Okt 23 18:31:25 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Okt 30 11:16:06 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Okt 31 13:50:06 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device) (uninitialized): PCIe link lost, device now detached
> Nov 22 18:59:11 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Nov 23 15:45:49 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Dez 19 07:33:02 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Jan 01 09:57:40 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Jan 10 16:15:20 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Jan 13 11:16:31 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
>
>
> This is what I find in the kernel or system log:
>
> Jan 13 11:16:31 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Jan 13 11:16:31 Zwerg kernel: ------------[ cut here ]------------
> Jan 13 11:16:31 Zwerg kernel: igc: Failed to read reg 0xc030!
> Jan 13 11:16:31 Zwerg kernel: WARNING: CPU: 18 PID: 6389 at drivers/net/ethernet/intel/igc/igc_main.c:6482 igc_rd32+0x91/0xa0 [igc]
> Jan 13 11:16:31 Zwerg kernel: Modules linked in: rfcomm cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_conservative nfsv3 nfs_acl rpcs>
> Jan 13 11:16:31 Zwerg kernel: configfs efivarfs ip_tables x_tables autofs4 xfs libcrc32c crc32c_generic dm_crypt dm_mod hid_generic amdgpu crc32_pc>
> Jan 13 11:16:31 Zwerg kernel: CPU: 18 PID: 6389 Comm: kworker/18:1 Not tainted 6.1.0-17-amd64 #1 Debian 6.1.69-1
> Jan 13 11:16:31 Zwerg kernel: Hardware name: ASUS System Product Name/ROG STRIX X670E-A GAMING WIFI, BIOS 1410 04/28/2023
> Jan 13 11:16:31 Zwerg kernel: Workqueue: events igc_watchdog_task [igc]
> Jan 13 11:16:31 Zwerg kernel: RIP: 0010:igc_rd32+0x91/0xa0 [igc]
> Jan 13 11:16:31 Zwerg kernel: Code: 48 c7 c6 d0 55 56 c0 e8 0b 7d 6c f8 48 8b bd 28 ff ff ff e8 31 c7 23 f8 84 c0 74 b4 89 de 48 c7 c7 f8 55 56 c0 e>
> Jan 13 11:16:31 Zwerg kernel: RSP: 0018:ffffac56d5f13df0 EFLAGS: 00010286
> Jan 13 11:16:31 Zwerg kernel: RAX: 0000000000000000 RBX: 000000000000c030 RCX: 0000000000000027
> Jan 13 11:16:31 Zwerg kernel: RDX: ffffa046f85a03a8 RSI: 0000000000000001 RDI: ffffa046f85a03a0
> Jan 13 11:16:31 Zwerg kernel: RBP: ffffa03f45710c28 R08: 0000000000000000 R09: ffffac56d5f13c68
> Jan 13 11:16:31 Zwerg kernel: R10: 0000000000000003 R11: ffffa04717f7ffe8 R12: ffffa03f45710000
> Jan 13 11:16:31 Zwerg kernel: R13: 0000000000000000 R14: ffffa03f456efd40 R15: 000000000000c030
> Jan 13 11:16:31 Zwerg kernel: FS: 0000000000000000(0000) GS:ffffa046f8580000(0000) knlGS:0000000000000000
> Jan 13 11:16:31 Zwerg kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jan 13 11:16:31 Zwerg kernel: CR2: 00007f1fc894f000 CR3: 00000008a8538000 CR4: 0000000000750ee0
> Jan 13 11:16:31 Zwerg kernel: PKRU: 55555554
> Jan 13 11:16:31 Zwerg kernel: Call Trace:
> Jan 13 11:16:31 Zwerg kernel: <TASK>
>
>
> Obviously, the kernel parameter to disable PCIe power management was not solving this problem.
>
> The way to recover is to restart the computer.

Just to be clear, can you confirm this is or is not a regression from
a previous running 6.1.y kernel? I'm asking because I suspect that
this similar to
https://lore.kernel.org/intel-wired-lan/20221031170...@kernel.org/
and did not ever worked reliably with your hardware?

Regards,
Salvatore

Diederik de Haas

unread,
Jan 13, 2024, 11:20:05 AM1/13/24
to
On Saturday, 13 January 2024 16:39:51 CET Arno Lehmann wrote:
> > Just to be clear, can you confirm this is or is not a regression from
> > a previous running 6.1.y kernel?
>
> On this hardware, the network issues appeared right from the start.
> ...
> Actually I don't even know which was the first kernel version I had on
> this host, but it's been on Bookworm for all its lifetime.

Via https://snapshot.debian.org/package/linux-signed-amd64/ you have easy
access to previous (6.1) kernels uploaded to Debian with which you can check
if the problem was present in early 6.1 kernels.
signature.asc

Diederik de Haas

unread,
Jan 13, 2024, 5:50:03 PM1/13/24
to
On Saturday, 13 January 2024 20:22:39 CET Arno Lehmann wrote:
> Am 13.01.2024 um 17:13 schrieb Diederik de Haas:
> > Via https://snapshot.debian.org/package/linux-signed-amd64/ you have easy
> > access to previous (6.1) kernels uploaded to Debian with which you can
> > check if the problem was present in early 6.1 kernels.
>
> The oldest record of this issue has happened with Linux version
> 6.1.0-11-amd64
>
> As I usually keep this box updated, and the problems happens only
> randomly, I think the best way forward might be to try with a kernel
> that did *not* show this problem.
>
> Does that look reasonable?

Yes

> So I conclude I should look at something earlier than what was used with
> boot 86e1a04baba04a409c34796c0fb079ff, i.e.
>
> journalctl --boot 86e1a04baba04a409c34796c0fb079ff | head -n 1
> Aug 30 18:16:18 Zwerg kernel: Linux version 6.1.0-11-amd64
> (debian...@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU
> ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian
> 6.1.38-4 (2023-08-08)
>
> correct?
>
> Via the page you reference, I find a kernel package
> linux-image-6.1.0-1-amd64 6.1.4-1 which might be worth a try.
>
> I'll need some time to sort out how to install such a package...

https://snapshot.debian.org/package/linux-signed-amd64/6.1.4%2B1/#linux-image-6.1.0-1-amd64_6.1.4-1

It should be as simple as downloading that .deb file and installing it via
``dpkg -i <deb-file>`` or
``apt install ./<deb-file>``

If you also have custom kernel modules via dkms, then you'd also need the
corresponding linux-headers package.
https://snapshot.debian.org/package/linux/6.1.4-1/#linux-headers-6.1.0-1-amd64_6.1.4-1

You could also try version 6.1~rc3+1~exp1, but if it's present in 6.1.4-1,
then I guess it's safe to say the issue is present in the whole 6.1 series
and it probably has never worked (as Salvatore thought).
signature.asc

Salvatore Bonaccorso

unread,
Jan 18, 2024, 4:20:03 PM1/18/24
to
Hi,

On Sat, Jan 13, 2024 at 04:39:51PM +0100, Arno Lehmann wrote:
> Hi Salvatore,
>
> Am 13.01.2024 um 13:47 schrieb Salvatore Bonaccorso:
>
> > Just to be clear, can you confirm this is or is not a regression from
> > a previous running 6.1.y kernel?
>
> On this hardware, the network issues appeared right from the start.
>
> First time I encountered it was with the Debian installation sime time last
> year, and that's where my research led me to turn off PCIe power management.
>
> Actually I don't even know which was the first kernel version I had on this
> host, but it's been on Bookworm for all its lifetime.

This "feels" like its probably not really a regression, thus the
similarity (though not the identical case as the referenced thread).

What about newer kernels? Do 6.6.11-1 or 6.7-1~exp1 taken from
unstable (resp. experimental) show the same problem?

If yes, then it might be an idea to bring it upstream.

Regards,
Salvatore

Craig Holyoak

unread,
Feb 2, 2024, 6:10:06 PM2/2/24
to
FWIW I'm having the same problems. Granted, this NIC is in a
Thunderbolt dock, so one can't exclude this as a factor, but the
errors are identical. This is even the case running the 6.7.1-1~exp1
kernel from experimental.
0 new messages