Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bookworm soft lockup

80 views
Skip to first unread message

Christian Gelinek

unread,
May 14, 2023, 9:40:06 PM5/14/23
to
Hi,

I encountered my Debian frozen this morning. This is the 2nd time this
happened, the 1st one was on April 10, with very similar symptoms: The
PC was still running, but moving the mouse or typing didn't wake up my
screens and I couldn't connect to it via SSH.

After force-rebooting, I had a look at journalctl and these are the
messages before the reboot:

May 14 00:00:09 gar systemd[1]: Starting cups.service - CUPS Scheduler...
May 14 00:00:09 gar audit[2912]: AVC apparmor="DENIED"
operation="capable" profile="/usr/sbin/cupsd" pid=2912 comm="cupsd"
capability=12 capname="net_admin"
May 14 00:00:09 gar systemd[1]: Started cups.service - CUPS Scheduler.
May 14 00:00:09 gar kernel: audit: type=1400 audit(1683988209.079:32):
apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=2912
comm="cupsd" capability=12 capname="net_admin"
May 14 00:00:09 gar systemd[1]: Started cups-browsed.service - Make
remote CUPS printers available locally.
May 14 00:00:09 gar systemd[1]: logrotate.service: Deactivated successfully.
May 14 00:00:09 gar systemd[1]: Finished logrotate.service - Rotate log
files.
May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session opened
for user root(uid=0) by (uid=0)
May 14 00:17:01 gar CRON[2930]: (root) CMD (cd / && run-parts --report
/etc/cron.hourly)
May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session closed
for user root
May 14 00:54:00 gar kernel: snd_hda_intel 0000:04:00.0: Unable to change
power state from D3hot to D0, device inaccessible
May 14 00:54:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* render: timed out waiting for forcewake ack to clear.
May 14 00:54:03 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:07 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* render: timed out waiting for forcewake ack to clear.
May 14 00:54:07 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:11 gar kernel: hrtimer: interrupt took 252466383 ns
May 14 00:54:11 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* render: timed out waiting for forcewake ack to clear.
May 14 00:54:11 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:16 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* gt: timed out waiting for forcewake ack to clear.
May 14 00:54:16 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:17 gar kernel: i915 0000:03:00.0: [drm] *ERROR* CT:
Corrupted descriptor head=4294967295 tail=4294967295 status=0xffffffff
May 14 00:54:26 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* render: timed out waiting for forcewake ack to clear.
May 14 00:54:26 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:26 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* gt: timed out waiting for forcewake ack to clear.
May 14 00:54:26 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:26 gar kernel: watchdog: BUG: soft lockup - CPU#15 stuck
for 26s! [kworker/15:1:233]
May 14 00:54:26 gar kernel: Modules linked in: snd_seq_dummy snd_hrtimer
snd_seq snd_seq_device nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver nfs lockd grace fscache netfs rfkill qrtr sunrpc
binfmt_misc nls_ascii nls_cp437 vfat fat snd_sof_pci_>
May 14 00:54:26 gar kernel: intel_uncore ee1004 pcspkr watchdog snd
soundcore intel_vsec serial_multi_instantiate acpi_pad intel_pmc_core
acpi_tad mei_me sg mei evdev parport_pc ppdev lp parport fuse loop
efi_pstore configfs efivarfs ip_tables x_tables autof>
May 14 00:54:26 gar kernel: CPU: 15 PID: 233 Comm: kworker/15:1 Tainted:
G U W 6.1.0-8-amd64 #1 Debian 6.1.25-1
May 14 00:54:26 gar kernel: Hardware name: Micro-Star International Co.,
Ltd. MS-7E02/PRO B760M-P DDR4 (MS-7E02), BIOS 1.00 10/21/2022
May 14 00:54:26 gar kernel: Workqueue: pm pm_runtime_work
May 14 00:54:26 gar kernel: RIP: 0010:pci_mmcfg_read+0xb0/0xe0
May 14 00:54:26 gar kernel: Code: 5d 41 5e 41 5f c3 cc cc cc cc 4c 01 e0
66 8b 00 0f b7 c0 89 45 00 eb dc 4c 01 e0 8a 00 0f b6 c0 89 45 00 eb cf
4c 01 e0 8b 00 <89> 45 00 eb c5 e8 66 a2 78 ff c7 45 00 ff ff ff ff b8
ea ff ff ff
May 14 00:54:26 gar kernel: RSP: 0018:ffffa9d000947cc0 EFLAGS: 00000286
May 14 00:54:26 gar kernel: RAX: 00000000ffffffff RBX: 0000000000400000
RCX: 0000000000000ffc
May 14 00:54:26 gar kernel: RDX: 00000000000000ff RSI: 0000000000000004
RDI: 0000000000000000
May 14 00:54:26 gar kernel: RBP: ffffa9d000947cfc R08: 0000000000000004
R09: ffffa9d000947cfc
May 14 00:54:26 gar kernel: R10: 0000000000000004 R11: ffffffffbb7a6b80
R12: 0000000000000ffc
May 14 00:54:26 gar kernel: R13: 0000000000000000 R14: 0000000000000004
R15: 0000000000000000
May 14 00:54:26 gar kernel: FS: 0000000000000000(0000)
GS:ffff967f1fbc0000(0000) knlGS:0000000000000000
May 14 00:54:26 gar kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
May 14 00:54:26 gar kernel: CR2: 000055ba02054018 CR3: 0000000109b4c004
CR4: 0000000000770ee0
May 14 00:54:26 gar kernel: PKRU: 55555554
May 14 00:54:26 gar kernel: Call Trace:
May 14 00:54:26 gar kernel: <TASK>
May 14 00:54:26 gar kernel: pci_bus_read_config_dword+0x46/0x80
May 14 00:54:26 gar kernel: pci_find_next_ext_capability+0x82/0xe0
May 14 00:54:26 gar kernel: ? pci_conf1_read+0x9b/0xf0
May 14 00:54:26 gar kernel: pci_restore_state.part.0+0x5d/0x3a0
May 14 00:54:26 gar kernel: pci_pm_runtime_resume+0x41/0xe0
May 14 00:54:26 gar kernel: ? pci_pm_restore_noirq+0xc0/0xc0
May 14 00:54:26 gar kernel: __rpm_callback+0x41/0x170
May 14 00:54:26 gar kernel: ? pci_pm_restore_noirq+0xc0/0xc0
May 14 00:54:26 gar kernel: rpm_callback+0x5d/0x70
May 14 00:54:26 gar kernel: ? pci_pm_restore_noirq+0xc0/0xc0
May 14 00:54:26 gar kernel: rpm_resume+0x5df/0x820
May 14 00:54:26 gar kernel: pm_runtime_work+0x6c/0xa0
May 14 00:54:26 gar kernel: process_one_work+0x1c4/0x380
May 14 00:54:26 gar kernel: worker_thread+0x4d/0x380
May 14 00:54:26 gar kernel: ? _raw_spin_lock_irqsave+0x23/0x50
May 14 00:54:26 gar kernel: ? rescuer_thread+0x3a0/0x3a0
May 14 00:54:26 gar kernel: kthread+0xe6/0x110
May 14 00:54:26 gar kernel: ? kthread_complete_and_exit+0x20/0x20
May 14 00:54:26 gar kernel: ret_from_fork+0x1f/0x30
May 14 00:54:26 gar kernel: </TASK>
-- Boot 846264f027214bbfbb81c66db4ff1c81 --

It seems to be an issue with the i915 driver, potentially triggered by
snd_hda_intel.

`sudo lspci -v` reports (among others):

03:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A750] (rev
08) (prog-if 00 [VGA controller])
Subsystem: Intel Corporation DG2 [Arc A750]
Flags: bus master, fast devsel, latency 0, IRQ 153, IOMMU group 14
Memory at 80000000 (64-bit, non-prefetchable) [size=16M]
Memory at 4000000000 (64-bit, prefetchable) [size=8G]
Expansion ROM at 81000000 [disabled] [size=2M]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Capabilities: [d0] Power Management version 3
Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
Capabilities: [420] Physical Resizable BAR
Capabilities: [400] Latency Tolerance Reporting
Kernel driver in use: i915
Kernel modules: i915

00:1f.3 Audio device: Intel Corporation Device 7a50 (rev 11)
DeviceName: Onboard - Sound
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 9e02
Flags: bus master, fast devsel, latency 32, IRQ 158, IOMMU group 10
Memory at 4200920000 (64-bit, non-prefetchable) [size=16K]
Memory at 4200800000 (64-bit, non-prefetchable) [size=1M]
Capabilities: [50] Power Management version 3
Capabilities: [80] Vendor Specific Information: Len=14 <?>
Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel, snd_sof_pci_intel_tgl

I'm using firmware-misc-nonfree version 20230210-4,
`sudo dmesg |grep i915` returns

[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.1.0-8-amd64
root=/dev/mapper/gar--vg-root ro quiet i915.force_probe=56a1
[ 0.018130] Kernel command line: BOOT_IMAGE=/vmlinuz-6.1.0-8-amd64
root=/dev/mapper/gar--vg-root ro quiet i915.force_probe=56a1
[ 1.379955] i915 0000:03:00.0: [drm] Incompatible option enable_guc=3
- HuC is not supported!
[ 1.380780] i915 0000:03:00.0: [drm] VT-d active for gfx access
[ 1.380845] i915 0000:03:00.0: vgaarb: deactivate vga console
[ 1.380869] i915 0000:03:00.0: [drm] Local memory IO size:
0x00000001fc000000
[ 1.380870] i915 0000:03:00.0: [drm] Local memory available:
0x00000001fc000000
[ 1.393505] i915 0000:03:00.0: vgaarb: changed VGA decodes:
olddecodes=io+mem,decodes=io+mem:owns=none
[ 1.393643] i915 0000:03:00.0: firmware: direct-loading firmware
i915/dg2_dmc_ver2_07.bin
[ 1.396144] i915 0000:03:00.0: [drm] Finished loading DMC firmware
i915/dg2_dmc_ver2_07.bin (v2.7)
[ 1.404739] i915 0000:03:00.0: firmware: direct-loading firmware
i915/dg2_guc_70.bin
[ 1.484762] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist
Class(1):Compute(4)!
[ 1.484763] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist
Instance(2):Compute(4)!
[ 1.487222] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist
Class(1):Compute(4)!
[ 1.487223] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist
Instance(2):Compute(4)!
[ 1.488237] i915 0000:03:00.0: [drm] GuC firmware i915/dg2_guc_70.bin
version 70.5.1
[ 1.488347] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist
Class(1):Compute(4)!
[ 1.488348] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist
Instance(2):Compute(4)!
[ 1.500565] i915 0000:03:00.0: [drm] GuC submission enabled
[ 1.500565] i915 0000:03:00.0: [drm] GuC SLPC enabled
[ 1.500891] i915 0000:03:00.0: [drm] GuC RC: enabled
[ 1.521026] [drm] Initialized i915 1.6.0 20201103 for 0000:03:00.0 on
minor 0
[ 2.234182] fbcon: i915drmfb (fb0) is primary device
[ 2.326912] i915 0000:03:00.0: [drm] fb0: i915drmfb frame buffer device
[ 4.824372] snd_hda_intel 0000:04:00.0: bound 0000:03:00.0 (ops
i915_audio_component_bind_ops [i915])

Is anyone else seeing a similar problem? What can I do to avoid this? Do
we need anything else to narrow it down further?

Thanks for your time!

gene heskett

unread,
May 15, 2023, 4:12:23 AM5/15/23
to
On 5/14/23 21:30, Christian Gelinek wrote:

I've had 2 similar lockups that needed a front panel reset just in the
last 2 weeks.
Something isn't right.
> .

Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
- Louis D. Brandeis
Genes Web page <http://geneslinuxbox.net:6309/>

Anssi Saari

unread,
May 15, 2023, 4:30:07 AM5/15/23
to
Christian Gelinek <cgel...@radlogic.com.au> writes:

> Is anyone else seeing a similar problem? What can I do to avoid this?
> Do we need anything else to narrow it down further?

Only time I've seen a soft lockup was from a bad CPU. There were a bunch
of them and eventually the computer hung. Going back to the slow
plodding Celeron fixed all issues. Except CPU performance of course.

David

unread,
May 15, 2023, 4:40:08 AM5/15/23
to
It's happened to me a couple of times, but only since I switched from
stable to testing, over the last month.
As I don't think everybody is running a Dell 980 desktop, or the same
desktop environment, it's probably not a hardware/software mismatch.
We'd be looking at strictly software, I suspect.
Cheers!


--
A Kiwi in Australia,
doing my bit toward raising the national standard.

piorunz

unread,
May 15, 2023, 4:50:06 AM5/15/23
to
On 15/05/2023 02:13, Christian Gelinek wrote:
> It seems to be an issue with the i915 driver, potentially triggered by
> snd_hda_intel.

Yes indeed that looks like it, to my untrained eye.
Does it happen on Debian Stable (bullseye) also?
I have one laptop with Intel CPU, Intel integrated graphics, on Debian
Stable (bullseye), and it's ok, never crashes. Probably using same
driver as you, i915 on that one.

--
With kindest regards, Piotr.

⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org/
⠈⠳⣄⠀⠀⠀⠀

Timothy M Butterworth

unread,
May 15, 2023, 9:00:06 PM5/15/23
to
I am also running Bookworm. I am not having the problems you describe. I have KDE Plasma installed. I do have an issue with CrossOver Office locking up when exiting TES IV Oblivion. I have to switch terminals and manually kill the Oblivion processes to get access to the GUI back.
 

--
A Kiwi in Australia,
doing my bit toward raising the national standard.



--
⢀⣴⠾⠻⢶⣦⠀

Christian Gelinek

unread,
May 16, 2023, 8:40:07 PM5/16/23
to
On Mon, 15 May 2023 18:30:31, David wrote:
> On Mon, 2023-05-15 at 11:17 +0300, Anssi Saari wrote:
>> Christian Gelinek <cgel...@radlogic.com.au> writes:
>>
>> > Is anyone else seeing a similar problem? What can I do to avoid
>> > this?
>> > Do we need anything else to narrow it down further?
>>
>> Only time I've seen a soft lockup was from a bad CPU. There were a
>> bunch
>> of them and eventually the computer hung. Going back to the slow
>> plodding Celeron fixed all issues. Except CPU performance of course.
>
> It's happened to me a couple of times, but only since I switched from
> stable to testing, over the last month.
> As I don't think everybody is running a Dell 980 desktop, or the same
> desktop environment, it's probably not a hardware/software mismatch.
> We'd be looking at strictly software, I suspect.

I have the same hunch. I got that PC new on Feb 20th, so I hope it's not
the CPU. It also happened just twice since then, even though I keep it
running pretty much 24/7. Not doing very much when I'm not there, which
was the case both times.

And for both times, the journalctl log looks suspiciously similar,
starting with the snd_hda_intel entry.

First time:

Apr 10 07:36:07 gar systemd[1]: anacron.service: Deactivated successfully.
Apr 10 07:50:01 gar kernel: snd_hda_intel 0000:04:00.0: Unable to change
power state from D3hot to D0, device inaccessible
Apr 10 07:50:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* render: timed out waiting for forcewake ack to clear.
Apr 10 07:50:03 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
...

Second time:

May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session closed
for user root
May 14 00:54:00 gar kernel: snd_hda_intel 0000:04:00.0: Unable to change
power state from D3hot to D0, device inaccessible
May 14 00:54:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* render: timed out waiting for forcewake ack to clear.
May 14 00:54:03 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
...

To reproduce it, I'd probably have to somehow trigger the condition
manually, any ideas?

Thanks for your time!

Christian Gelinek

unread,
May 16, 2023, 8:51:25 PM5/16/23
to
On Mon, 15 May 2023 09:48:12 +0100, piorunz <pio...@gmx.com> wrote:
> On 15/05/2023 02:13, Christian Gelinek wrote:
>
> It seems to be an issue with the i915 driver, potentially triggered by
> snd_hda_intel.
>
>
> Yes indeed that looks like it, to my untrained eye.
> Does it happen on Debian Stable (bullseye) also?
> I have one laptop with Intel CPU, Intel integrated graphics, on Debian
> Stable (bullseye), and it's ok, never crashes. Probably using same
> driver as you, i915 on that one.

Interesting. I'm using an Intel ARC 750 discrete graphics card and
couldn't get it to work properly with bullseye, which is why I'm using
bookworm. (In bullseye, the kernel and the firmware-misc-nonfree
packages weren't recent enough).

Since it happened just twice in the last (almost) 3 months, I think I'd
need to manually trigger the cause of this lockup - any suggestions are
welcome.

Thanks for your time!

Philip Wyett

unread,
May 16, 2023, 9:21:16 PM5/16/23
to
Hi,

You state that this is a new PC. These issues can be caused by faulty firmware e.g. BIOS. Keep an
eye out on your motherboards manufacturer to see if any BIOS updates become available.

Regards

Phil

--
*** Playing the game for the games own sake. ***


Associations:

* Debian Maintainer (DM)
* Fedora/EPEL Maintainer.
* Contributor member of the AlmaLinux foundation.

WWW: https://kathenas.org

Buy Me a Coffee: https://www.buymeacoffee.com/kathenasorg

Twitter: @kathenasorg

Instagram: @kathenasorg

IRC: kathenas

GPG: 724AA9B52F024C8B
signature.asc

Philip Wyett

unread,
May 16, 2023, 9:21:16 PM5/16/23
to
Hi,

A little research shows that this is not that uncommon. A suggested workaround is to disable the
power management for the device as follows.

Create a file (such as): /etc/modprobe.d/snd-intel-disable-power-management.conf

Add the following line: options snd_hda_intel power_save=0

Reboot.

Hopefully this may assist.
signature.asc

Xiyue Deng

unread,
May 16, 2023, 11:50:07 PM5/16/23
to

Philip Wyett <philip...@kathenas.org> writes:

> [[PGP Signed Part:Undecided]]
Seconded. My system had a similar soft lockup issue[1] (also after
upgrading from Bullseye to Bookworm) though without any backtrace in
journalctl. After debugging over a month it turned out that the BIOS
had an issue that certain instructions to access the TPM may cause the
system to freeze, and an upgrade to a beta BIOS fixed this issue. So
definitely contact their customer service and check for similar reports.

[1] https://lists.debian.org/debian-user/2023/04/msg00425.html
--
Manphiz
signature.asc

Christian Gelinek

unread,
May 22, 2023, 8:50:06 PM5/22/23
to
It happened again (see journal entries below), so I'm going to try your
workaround and see if that helps.

On Wed, 17 May 2023 02:12:32 +0100, Philip Wyett wrote:

> A little research shows that this is not that uncommon. A suggested
> workaround is to disable the
> power management for the device as follows.
>
> Create a file (such as):
> /etc/modprobe.d/snd-intel-disable-power-management.conf
>
> Add the following line: options snd_hda_intel power_save=0
>
> Reboot.
>
> Hopefully this may assist.

Thanks for the suggestion.

Journal of the most recent lockup below:

May 19 23:17:01 gar CRON[1902]: pam_unix(cron:session): session opened
for user root(uid=0) by (uid=0)
May 19 23:17:01 gar CRON[1903]: (root) CMD (cd / && run-parts --report
/etc/cron.hourly)
May 19 23:17:01 gar CRON[1902]: pam_unix(cron:session): session closed
for user root
May 19 23:19:00 gar kernel: snd_hda_intel 0000:04:00.0: Unable to change
power state from D3hot to D0, device inaccessible
May 19 23:19:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* render: timed out waiting for forcewake ack to clear.
May 19 23:19:03 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 19 23:19:07 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* render: timed out waiting for forcewake ack to clear.
May 19 23:19:07 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 19 23:19:09 gar kernel: hrtimer: interrupt took 252466511 ns
May 19 23:19:11 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* render: timed out waiting for forcewake ack to clear.
May 19 23:19:11 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 19 23:19:15 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* gt: timed out waiting for forcewake ack to clear.
May 19 23:19:15 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 19 23:19:16 gar kernel: i915 0000:03:00.0: [drm] *ERROR* CT:
Corrupted descriptor head=4294967295 tail=4294967295 status=0xffffffff
-- Boot 4540f787dd2341cea70a75aac62b1843 --
May 23 09:24:43 gar kernel: Linux version 6.1.0-9-amd64
(debian...@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU
ld (GNU Binutils for Debian) 2.40) #1 SMP P>

Thank you for your time!

Christian

Christian Gelinek

unread,
May 30, 2023, 8:10:05 PM5/30/23
to
On Wed, 17 May 2023 02:12:32 +0100, Philip Wyett wrote:

>> A little research shows that this is not that uncommon. A suggested workaround is to disable the
>>
>> power management for the device as follows.
>>
>> Create a file (such as): /etc/modprobe.d/snd-intel-disable-power-management.conf
>>
>>
>> Add the following line: options snd_hda_intel power_save=0
>>
>> Reboot.
>>
>> Hopefully this may assist.

I tried:

$ ls -l /etc/modprobe.d/snd-intel-disable-power-management.conf
-rw-r----- 1 root root 36 May 23 10:01
/etc/modprobe.d/snd-intel-disable-power-management.conf

$ sudo cat /etc/modprobe.d/snd-intel-disable-power-management.conf
options snd_hda_intel power_save=0

Nevertheless, this morning I arrived to the locked-up state. Journalctl
output:

May 31 04:17:01 gar CRON[2175]: (root) CMD (cd / && run-parts --report
/etc/cron.hourly)
May 31 04:17:01 gar CRON[2174]: pam_unix(cron:session): session closed
for user root
May 31 05:12:00 gar kernel: snd_hda_intel 0000:04:00.0: Unable to change
power state from D3hot to D0, device inaccessible
May 31 05:12:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* render: timed out waiting for forcewake ack to clear.
May 31 05:12:03 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x>
May 31 05:12:07 gar kernel: INFO: NMI handler (perf_event_nmi_handler)
took too long to run: 84.155 msecs
May 31 05:12:07 gar kernel: perf: interrupt took too long (657458 >
2500), lowering kernel.perf_event_max_sample_rate to 250
May 31 05:12:07 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* render: timed out waiting for forcewake ack to clear.
May 31 05:12:07 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x>
May 31 05:12:11 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* render: timed out waiting for forcewake ack to clear.
May 31 05:12:11 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x>
May 31 05:12:15 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* gt: timed out waiting for forcewake ack to clear.
May 31 05:12:15 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x>
May 31 05:12:16 gar kernel: i915 0000:03:00.0: [drm] *ERROR* CT:
Corrupted descriptor head=4294967295 tail=4294967295 status=0xfffff>
May 31 05:12:27 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* render: timed out waiting for forcewake ack to clear.
May 31 05:12:27 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x>
May 31 05:12:27 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* gt: timed out waiting for forcewake ack to clear.
May 31 05:12:27 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x>
May 31 05:12:27 gar kernel: watchdog: BUG: soft lockup - CPU#14 stuck
for 26s! [kworker/14:2:2123]
May 31 05:12:27 gar kernel: Modules linked in: snd_seq_dummy snd_hrtimer
snd_seq snd_seq_device nfsv3 nfs_acl rpcsec_gss_krb5 auth_r>
May 31 05:12:27 gar kernel: intel_uncore wmi_bmof ee1004 pcspkr
watchdog soundcore intel_vsec serial_multi_instantiate intel_pmc_co>
May 31 05:12:27 gar kernel: CPU: 14 PID: 2123 Comm: kworker/14:2
Tainted: G U W 6.1.0-9-amd64 #1 Debian 6.1.27-1
May 31 05:12:27 gar kernel: Hardware name: Micro-Star International Co.,
Ltd. MS-7E02/PRO B760M-P DDR4 (MS-7E02), BIOS 1.00 10/21/20>
May 31 05:12:27 gar kernel: Workqueue: pm pm_runtime_work
May 31 05:12:27 gar kernel: RIP: 0010:pci_mmcfg_read+0xb0/0xe0
May 31 05:12:27 gar kernel: Code: 5d 41 5e 41 5f c3 cc cc cc cc 4c 01 e0
66 8b 00 0f b7 c0 89 45 00 eb dc 4c 01 e0 8a 00 0f b6 c0 89>
May 31 05:12:27 gar kernel: RSP: 0018:ffffb5ce87b9bcc0 EFLAGS: 00000286
May 31 05:12:27 gar kernel: RAX: 00000000ffffffff RBX: 0000000000400000
RCX: 0000000000000ffc
May 31 05:12:27 gar kernel: RDX: 00000000000000ff RSI: 0000000000000004
RDI: 0000000000000000
May 31 05:12:27 gar kernel: RBP: ffffb5ce87b9bcfc R08: 0000000000000004
R09: ffffb5ce87b9bcfc
May 31 05:12:27 gar kernel: R10: 0000000000000004 R11: ffffffffb19a70a0
R12: 0000000000000ffc
May 31 05:12:27 gar kernel: R13: 0000000000000000 R14: 0000000000000004
R15: 0000000000000000
May 31 05:12:27 gar kernel: FS: 0000000000000000(0000)
GS:ffff9c641fb80000(0000) knlGS:0000000000000000
May 31 05:12:27 gar kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
May 31 05:12:27 gar kernel: CR2: 00007f78df2005c9 CR3: 000000010571e005
CR4: 0000000000770ee0
May 31 05:12:27 gar kernel: PKRU: 55555554
May 31 05:12:27 gar kernel: Call Trace:
May 31 05:12:27 gar kernel: <TASK>
May 31 05:12:27 gar kernel: pci_bus_read_config_dword+0x46/0x80
May 31 05:12:27 gar kernel: pci_find_next_ext_capability+0x82/0xe0
May 31 05:12:27 gar kernel: ? pci_conf1_read+0x9b/0xf0
May 31 05:12:27 gar kernel: pci_restore_state.part.0+0x5d/0x3a0
May 31 05:12:27 gar kernel: pci_pm_runtime_resume+0x41/0xe0
May 31 05:12:27 gar kernel: ? pci_pm_restore_noirq+0xc0/0xc0
May 31 05:12:27 gar kernel: __rpm_callback+0x41/0x170
May 31 05:12:27 gar kernel: ? pci_pm_restore_noirq+0xc0/0xc0
May 31 05:12:27 gar kernel: rpm_callback+0x5d/0x70
May 31 05:12:27 gar kernel: ? pci_pm_restore_noirq+0xc0/0xc0
May 31 05:12:27 gar kernel: rpm_resume+0x5df/0x820
May 31 05:12:27 gar kernel: pm_runtime_work+0x6c/0xa0
May 31 05:12:27 gar kernel: process_one_work+0x1c4/0x380
May 31 05:12:27 gar kernel: worker_thread+0x4d/0x380
May 31 05:12:27 gar kernel: ? _raw_spin_lock_irqsave+0x23/0x50
May 31 05:12:27 gar kernel: ? rescuer_thread+0x3a0/0x3a0
May 31 05:12:27 gar kernel: kthread+0xe6/0x110
May 31 05:12:27 gar kernel: ? kthread_complete_and_exit+0x20/0x20
May 31 05:12:27 gar kernel: ret_from_fork+0x1f/0x30
May 31 05:12:27 gar kernel: </TASK>
-- Boot 8206a74ca5a54b4c98a1ee6b0586771c --

Maybe I should try to upgrade the BIOS, as suggested by Xiyue Deng [0]
when I get a chance.

Thanks for your time!

[0]: https://lists.debian.org/debian-user/2023/05/msg00749.html
0 new messages