Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1006149: linux-image-5.16.0-1-686: Fails to boot on T41 Thinkpads

76 views
Skip to first unread message

Salvatore Bonaccorso

unread,
Feb 21, 2022, 9:10:04 AM2/21/22
to
Control: tags -1 + moreinfo

Hi

On Sat, Feb 19, 2022 at 10:04:14PM +0100, Petra R.-P. wrote:
> Package: src:linux
> Version: 5.16.7-2
> Severity: critical
> Justification: breaks the whole system
>
> Dear Maintainer,
>
> This new kernel version does not boot on two fairly similar
> old IBM T41 Thinkpads.
>
> What reproducibly happens is as follows:
>
> After the lines
>
> Loading Linux 5.16.0-1-686 ...
> Loading initial ramdisk ...
>
> the screen gets flushed, and I see:
>
> [ 4.xxxxxx] ata1.00: Read log 0x00 page 0x00 failed Emask 0x1
> <blinking cursur>
>
> The "xxxxxx" vary for every try.
>
> Then nothing else happens.
> Ctrl-Alt-Del has no effect.
> I have to reset the computer by pressing the button.
>
> On an old PC running the same kernel the same message "ata1.00: Read log ..."
> appears, but then the boot process continues normally.
>
> linux-image-5.15.0-3-686, which I am using to write this
> message, runs fine.

Are you booting in quite mode? if yes, can you remote if from the
kernel command line and see if you get more information on the screen?

Got off-bug a report from someone with similar Hardware with similar
symptoms.

From the failed boot, can you extract the kernel logs produced?

Regards,
Salvatore

Petra Rübe-Pugliese

unread,
Feb 21, 2022, 11:30:03 AM2/21/22
to
Hello Salvatore,

Am Mo., 21. Feb. 2022, um 14:56 +0100 schrieb Salvatore Bonaccorso <car...@debian.org>:
> Control: tags -1 + moreinfo
[...]
> Are you booting in quite mode?

I do not think so.
I usually get _heaps_ of output scrolling away on the screen,
but in this case absolutely _nothing_ happens after the
first line of output.

> if yes, can you remote if from the kernel command line and see
> if you get more information on the screen?

How would that be done?

> Got off-bug a report from someone with similar Hardware with similar
> symptoms.

I'm glad to hear that it's not just me ...
>
> >From the failed boot, can you extract the kernel logs produced?

I did the following:

-> Start the notebook with the "bad" kernel at about 16:56 today.
(The notebook had not run before today.)
-> Press the button at around 16:58 to stop it.
-> Restart the "good" kernel at 16:59:xx

Here is the corresponding passage from /var/log/kern.log :

Feb 20 10:24:42 localhost kernel: [ 2220.772792] sd 2:0:0:0: [sdb] Attached SCSI removable disk
Feb 20 10:24:43 localhost kernel: [ 2221.424423] sd 2:0:0:0: [sdb] 15753215 512-byte logical blocks: (8.07 GB/7.51 GiB)
Feb 20 10:24:43 localhost kernel: [ 2221.425660] sdb: detected capacity change from 0 to 15753215
Feb 20 10:24:43 localhost kernel: [ 2221.426455] sdb: sdb1
Feb 20 10:24:54 localhost kernel: [ 2232.992583] EXT4-fs (sdb1): mounting ext2 file system using the ext4 subsystem
Feb 20 10:24:54 localhost kernel: [ 2233.011371] EXT4-fs (sdb1): mounted filesystem without journal. Opts: (null). Quota mode: none.
Feb 20 10:29:11 localhost kernel: [ 2489.270389] usb 1-4: USB disconnect, device number 8
Feb 21 17:00:53 localhost kernel: [ 0.000000] Linux version 5.15.0-3-686 (debian...@lists.debian.org) (gcc-11 (Debian 11.2.0-14) 11.2.0, GNU ld (GNU Binutils for Debian) 2.37.90.20220123) #1 SMP Debian 5.15.15-2 (2022-01-30)
Feb 21 17:00:53 localhost kernel: [ 0.000000] x86/fpu: x87 FPU will use FXSAVE
Feb 21 17:00:53 localhost kernel: [ 0.000000] signal: max sigframe size: 1440
Feb 21 17:00:53 localhost kernel: [ 0.000000] BIOS-provided physical RAM map:
Feb 21 17:00:53 localhost kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
Feb 21 17:00:53 localhost kernel: [ 0.000000] BIOS-e820: [mem 0x000000000009f000-0x000000000009ffff] reserved
Feb 21 17:00:53 localhost kernel: [ 0.000000] BIOS-e820: [mem 0x00000000000d2000-0x00000000000d3fff] reserved
Feb 21 17:00:53 localhost kernel: [ 0.000000] BIOS-e820: [mem 0x00000000000dc000-0x00000000000fffff] reserved
Feb 21 17:00:53 localhost kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000001ff5ffff] usable
Feb 21 17:00:53 localhost kernel: [ 0.000000] BIOS-e820: [mem 0x000000001ff60000-0x000000001ff76fff] ACPI data
Feb 21 17:00:53 localhost kernel: [ 0.000000] BIOS-e820: [mem 0x000000001ff77000-0x000000001ff78fff] ACPI NVS
Feb 21 17:00:53 localhost kernel: [ 0.000000] BIOS-e820: [mem 0x000000001ff80000-0x000000001fffffff] reserved
Feb 21 17:00:53 localhost kernel: [ 0.000000] BIOS-e820: [mem 0x00000000ff800000-0x00000000ffffffff] reserved
Feb 21 17:00:53 localhost kernel: [ 0.000000] Notice: NX (Execute Disable) protection missing in CPU!



That is: Nothing whatever got recorded between
Feb 20 10:29:11 localhost kernel: [ 2489.270389] usb 1-4: USB disconnect, device number 8
(last activity yesterday)
and
Feb 21 17:00:53 localhost kernel: [ 0.000000] Linux version 5.15.0-3-686 (debian...@lists.debian.org) (
(start of the "good" kernel today).

I'm afraid that is not much in the way of "more information" ... :-\


Best regards,
Petra

Axel Beckert

unread,
Feb 22, 2022, 4:10:03 AM2/22/22
to
Hi Diederik,

Diederik de Haas wrote:
> > pn firmware-iwlwifi <none>
>
> I see you don't have firmware-iwlwifi installed, but there have been other bug
> reports which could be similar, so with the 5.15 kernel, do you have the
> 'iwlwifi' kernel module loaded?

Likely not. IIRC the first Thinkpads with iwlwifi were the T60/T61
generation. Both affected devices are too old for having iwlwifi
cards.

The T41 had (optionally) either:

* IBM 11a/b/g Wireless LAN Mini PCI Adapter
* Cisco Aironet Wireless 802.11b
* Intel PRO/Wireless LAN 2100 3B Mini PCI Adapter (that's the ipw2x00
driver series, probably ipw2200.ko nowadays)

So maybe the question is, if …

> > pn firmware-ipw2x00 <none>

… installing firmware-ipw2x00 makes a difference (if an Intel
PRO/Wireless LAN 2100 wifi card is present). Then again #1005884 was
about a change in explicitly iwlwifi, not Intel WiFi cards in general.

The A31 had (optionally):

* IBM High Rate Wireless LAN Mini-PCI Adapter with Modem

But mine has no wifi at all, just an RJ45 socket and potentially a
PCMCIA wifi card (which is not plugged in currently).

So in my case it's likely unrelated to any wifi driver change in 5.16.
(Assuming my issue is really the same as this one.)

BTW, here's the bug report metadata from my affected host (booted
under the working 5.15-3 kernel):

-- Package-specific info:
** Kernel log: boot messages should be attached

** Model information

** PCI devices:
00:00.0 Host bridge [0600]: Intel Corporation 82845 845 [Brookdale] Chipset Host Bridge [8086:1a30] (rev 04)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
Latency: 0
Region 0: Memory at e0000000 (32-bit, prefetchable) [size=64M]
Capabilities: <access denied>
Kernel driver in use: agpgart-intel

00:01.0 PCI bridge [0604]: Intel Corporation 82845 845 [Brookdale] Chipset AGP Bridge [8086:1a31] (rev 04) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 96
Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
I/O behind bridge: 00003000-00003fff [size=4K]
Memory behind bridge: d0100000-d01fffff [size=1M]
Prefetchable memory behind bridge: e8000000-efffffff [size=128M]
Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA+ VGA+ VGA16- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-

00:1d.0 USB controller [0c03]: Intel Corporation 82801CA/CAM USB Controller #1 [8086:2482] (rev 02) (prog-if 00 [UHCI])
Subsystem: IBM ThinkPad A/T/X Series [1014:0220]
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 11
Region 4: I/O ports at 1800 [size=32]
Kernel driver in use: uhci_hcd
Kernel modules: uhci_hcd

00:1d.1 USB controller [0c03]: Intel Corporation 82801CA/CAM USB Controller #2 [8086:2484] (rev 02) (prog-if 00 [UHCI])
Subsystem: IBM ThinkPad A/T/X Series [1014:0220]
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 11
Region 4: I/O ports at 1820 [size=32]
Kernel driver in use: uhci_hcd
Kernel modules: uhci_hcd

00:1d.2 USB controller [0c03]: Intel Corporation 82801CA/CAM USB Controller #3 [8086:2487] (rev 02) (prog-if 00 [UHCI])
Subsystem: IBM ThinkPad A/T/X Series [1014:0220]
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin C routed to IRQ 11
Region 4: I/O ports at 1840 [size=32]
Kernel driver in use: uhci_hcd
Kernel modules: uhci_hcd

00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev 42) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Bus: primary=00, secondary=02, subordinate=08, sec-latency=64
I/O behind bridge: 00004000-00008fff [size=20K]
Memory behind bridge: d0200000-dfffffff [size=254M]
Prefetchable memory behind bridge: f0000000-f7ffffff [size=128M]
Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA+ VGA- VGA16- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-

00:1f.0 ISA bridge [0601]: Intel Corporation 82801CAM ISA Bridge (LPC) [8086:248c] (rev 02)
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Kernel driver in use: lpc_ich
Kernel modules: intel_rng, lpc_ich

00:1f.1 IDE interface [0101]: Intel Corporation 82801CAM IDE U100 Controller [8086:248a] (rev 02) (prog-if 8a [ISA Compatibility mode controller, supports both channels switched to PCI native mode, supports bus mastering])
Subsystem: IBM ThinkPad A/T/X Series [1014:0220]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at 01f0 [size=8]
Region 1: I/O ports at 03f4
Region 2: I/O ports at 0170 [size=8]
Region 3: I/O ports at 0374
Region 4: I/O ports at 1860 [size=16]
Region 5: Memory at 40000000 (32-bit, non-prefetchable) [size=1K]
Kernel driver in use: ata_piix
Kernel modules: ata_piix, ata_generic

00:1f.3 SMBus [0c05]: Intel Corporation 82801CA/CAM SMBus Controller [8086:2483] (rev 02)
Subsystem: IBM ThinkPad A/T/X Series [1014:0220]
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin B routed to IRQ 11
Region 4: I/O ports at 1880 [size=32]
Kernel driver in use: i801_smbus
Kernel modules: i2c_i801

00:1f.5 Multimedia audio controller [0401]: Intel Corporation 82801CA/CAM AC'97 Audio Controller [8086:2485] (rev 02)
Subsystem: IBM ThinkPad T30 [1014:0508]
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 11
Region 0: I/O ports at 1c00 [size=256]
Region 1: I/O ports at 18c0 [size=64]
Kernel driver in use: snd_intel8x0
Kernel modules: snd_intel8x0

00:1f.6 Modem [0703]: Intel Corporation 82801CA/CAM AC'97 Modem Controller [8086:2486] (rev 02) (prog-if 00 [Generic])
Subsystem: IBM ThinkPad A/T/X Series [1014:0223]
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 11
Region 0: I/O ports at 2400 [size=256]
Region 1: I/O ports at 2000 [size=128]
Kernel driver in use: snd_intel8x0m
Kernel modules: snd_intel8x0m

01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] RV200/M7 [Mobility Radeon 7500] [1002:4c57] (prog-if 00 [VGA controller])
Subsystem: IBM RV200/M7 [Mobility Radeon 7500] [1014:0509]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR+ FastB2B+ DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 66 (2000ns min), Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 11
Region 0: Memory at e8000000 (32-bit, prefetchable) [size=128M]
Region 1: I/O ports at 3000 [size=256]
Region 2: Memory at d0100000 (32-bit, non-prefetchable) [size=64K]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: radeon
Kernel modules: radeonfb, radeon

02:00.0 CardBus bridge [0607]: Ricoh Co Ltd RL5c476 II [1180:0476] (rev 80)
Subsystem: IBM ThinkPad A/T/X Series [1014:0185]
Physical Slot: 1
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 168
Interrupt: pin A routed to IRQ 11
Region 0: Memory at 50000000 (32-bit, non-prefetchable) [size=4K]
Bus: primary=02, secondary=03, subordinate=06, sec-latency=176
Memory window 0: f0000000-f3ffffff (prefetchable)
Memory window 1: d4000000-d7ffffff
I/O window 0: 00004000-000040ff
I/O window 1: 00004400-000044ff
BridgeCtl: Parity- SERR- ISA- VGA- MAbort- >Reset- 16bInt+ PostWrite+
16-bit legacy interface ports at 0001
Capabilities: <access denied>
Kernel driver in use: yenta_cardbus
Kernel modules: yenta_socket

02:00.1 CardBus bridge [0607]: Ricoh Co Ltd RL5c476 II [1180:0476] (rev 80)
Subsystem: IBM ThinkPad A/T/X Series [1014:0185]
Physical Slot: 1
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 168
Interrupt: pin B routed to IRQ 11
Region 0: Memory at 50100000 (32-bit, non-prefetchable) [size=4K]
Bus: primary=02, secondary=07, subordinate=07, sec-latency=176
Memory window 0: f4000000-f7ffffff (prefetchable)
Memory window 1: d8000000-dbffffff
I/O window 0: 00004800-000048ff
I/O window 1: 00004c00-00004cff
BridgeCtl: Parity- SERR- ISA- VGA- MAbort- >Reset- 16bInt+ PostWrite+
16-bit legacy interface ports at 0001
Capabilities: <access denied>
Kernel driver in use: yenta_cardbus
Kernel modules: yenta_socket

02:08.0 Ethernet controller [0200]: Intel Corporation 82801CAM (ICH3) PRO/100 VE (LOM) Ethernet Controller [8086:1031] (rev 42)
Subsystem: IBM ThinkPad A/T/X Series [1014:0209]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 66 (2000ns min, 14000ns max), Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 11
Region 0: Memory at d0200000 (32-bit, non-prefetchable) [size=4K]
Region 1: I/O ports at 8000 [size=64]
Capabilities: <access denied>
Kernel driver in use: e100
Kernel modules: e100


** USB devices:
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub


-- System Information:
Debian Release: bookworm/sid
APT prefers unstable
APT policy: (990, 'unstable'), (500, 'testing'), (110, 'experimental')
Architecture: i386 (i686)

Kernel: Linux 5.15.0-3-686-pae (SMP w/1 CPU thread)
Locale: LANG=de_CH.UTF-8, LC_CTYPE=de_CH.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /bin/dash
Init: OpenRC (via /run/openrc), PID 1: init

Versions of packages linux-image-5.16.0-2-686-pae-unsigned depends on:
ii initramfs-tools [linux-initramfs-tool] 0.140
ii kmod 29-1
ii linux-base 4.8

Versions of packages linux-image-5.16.0-2-686-pae-unsigned recommends:
pn apparmor <none>
ii firmware-linux-free 20200122-1

Versions of packages linux-image-5.16.0-2-686-pae-unsigned suggests:
ii debian-kernel-handbook 1.0.19
ii extlinux 3:6.04~git20190206.bf6db5b4+dfsg1-3+b1
ii grub-pc 2.06-2
pn linux-doc-5.16 <none>

Versions of packages linux-image-5.16.0-2-686-pae-unsigned is related to:
ii firmware-amd-graphics 20210818-1
pn firmware-atheros <none>
pn firmware-bnx2 <none>
pn firmware-bnx2x <none>
pn firmware-brcm80211 <none>
pn firmware-cavium <none>
pn firmware-intel-sound <none>
pn firmware-intelwimax <none>
pn firmware-ipw2x00 <none>
pn firmware-ivtv <none>
pn firmware-iwlwifi <none>
pn firmware-libertas <none>
ii firmware-linux-nonfree 20210818-1
ii firmware-misc-nonfree 20210818-1
pn firmware-myricom <none>
pn firmware-netxen <none>
pn firmware-qlogic <none>
pn firmware-realtek <none>
pn firmware-samsung <none>
pn firmware-siano <none>
pn firmware-ti-connectivity <none>
pn xen-hypervisor <none>

-- no debconf information

Regards, Axel
--
,''`. | Axel Beckert <a...@debian.org>, https://people.debian.org/~abe/
: :' : | Debian Developer, ftp.ch.debian.org Admin
`. `' | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5
`- | 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE

Petra R.-P.

unread,
Mar 5, 2022, 12:10:04 PM3/5/22
to
Update for linux-image-5.16.0-3-686 :

On Sat 19 Feb 2022 at 22:04:14 +0100 Petra R.-P. <de...@prp.in-berlin.de> wrote:

[...]
> After the lines
>
> Loading Linux 5.16.0-1-686 ...
Now of course: s/-1-/-3-/
> Loading initial ramdisk ...
>
> the screen gets flushed, and I see:
>
> [ 4.xxxxxx] ata1.00: Read log 0x00 page 0x00 failed Emask 0x1
> <blinking cursur>

The "Read log 0x00 page 0x00 failed" line has disappeared,
the endlessly blinking cursor remains in the upper left corner
of the screen. Nothing else happens.
This is again reproducible on both T41 Thinkpads.

Regards,
Petra

Diederik de Haas

unread,
Mar 5, 2022, 12:30:03 PM3/5/22
to
On Monday, 21 February 2022 17:25:33 CET Petra Rübe-Pugliese wrote:
> > if yes, can you remote if from the kernel command line and see
> > if you get more information on the screen?
>
> How would that be done?

If you do "cat /proc/cmdline" and see the word 'quiet' in there, then it's not
as verbose as it could be on screen.

If you're using GRUB and the system boots up and you see the GRUB menu, press
'e' to edit the line and remove the 'quiet' word. That way it will not be
quiet for that boot.

If you want to remove it by default, look in /etc/default/grub and there you
should see (f.e.):
GRUB_CMDLINE_LINUX_DEFAULT="quiet"

So if you remove the 'quiet' word there and do an update-grub, then the boot
will be 'noisier' by default (on every boot).

HTH,
Diederik
signature.asc

Diederik de Haas

unread,
Mar 5, 2022, 6:50:03 PM3/5/22
to
Hi Petra,

On Saturday, 5 March 2022 21:12:12 CET Petra R.-P. wrote:
> On Sat 05 Mar 2022 at 18:23:52 +0100 Diederik de Haas
> <didi....@cknow.org> wrote:
> > On Monday, 21 February 2022 17:25:33 CET Petra Rübe-Pugliese wrote:
> [...]
>
> > So if you remove the 'quiet' word there and do an update-grub, then the
> > boot will be 'noisier' by default (on every boot).
>
> This is what I have done, for the time being.
>
> As a result, there was a lot of output on the screen.
> I am attaching a foto of the final state, where it stopped.

I didn't see an obvious clue as to why it didn't continue ...

> However, I cannot find any trace of this in /var/log/kern.log.
> Whereas "grep 5.15.0 /var/log/kern.log" gives loads of output,
> "grep 5.16.0 /var/log/kern.log" does not produce anything at
> all.

How about in /var/log/kern.log.1 (f.e.) ? Because on my system the very first
message of a boot in that file begins with:
[ 0.000000] Linux version 5.16.0-3-amd64 ...

> Also manual inspection of the file does not show any relevant passage.

So far, *I* haven't found a direct clue as to why things fail, so if you do
have _a_ log of a boot with the 5.16 kernel, then having that could help.
Especially if you could provide a similar log of a boot with the 5.15 kernel.
Doesn't need to be the whole log, but knowing a couple of lines which happen
thereafter on 5.15 but not on 5.16 could provide a clue.

I saw that the 5.16.11-1 kernel transitioned to testing and it is useful to
know if the issue is still present with that version.
In the upstream kernel in drivers/gpu/drm/amd I saw a number of commits since
version 5.16.7 (and also several other commits which are part of 5.16.12 which
is present in salsa, but not yet released).

When I look at the photo from https://bugs.debian.org/1006149#44 (from Axel),
I do notice an important difference:
He has various [drm] messages, whereas I see none of those with you.
Do you have [drm] messages when booting with the 5.15 kernel?

That bug message also has the following which is different from yours:
ii firmware-amd-graphics 20210818-1
...
ii firmware-linux-nonfree 20210818-1
ii firmware-misc-nonfree 20210818-1

So it would be interesting to know whether installing any of those packages
makes a difference. I'd suggest first installing the firmware-amd-graphics
package.

HTH,
Diederik
signature.asc

Petra R.-P.

unread,
Mar 6, 2022, 5:30:05 AM3/6/22
to
Hi Diederik,

Am So., 6. Mär. 2022, um 00:44 +0100 schrieb Diederik de Haas <didi....@cknow.org>:

> On Saturday, 5 March 2022 21:12:12 CET Petra R.-P. wrote:
[...]
> How about in /var/log/kern.log.1 (f.e.) ?
The kern.log I inspected yesterday reached back until earlier
days, so I don't think kern.log.1 would have contributed
anything new. Today's kern.log was a new one.

> Because on my system the very first
> message of a boot in that file begins with:
> [ 0.000000] Linux version 5.16.0-3-amd64 ...
Yes, I understood that, and checked the file for such lines
over and over again. Not a single trace of any "5.16",
only "Linux version 5.15.0-3-686" everywhere.

[...]
> So far, *I* haven't found a direct clue as to why things fail, so if you do
> have _a_ log of a boot with the 5.16 kernel, then having that could help.
Sorry, there isn't any such thing:
No output:
~/tmp > cd /var/log
/var/log > zgrep 5.16.0- kern.log*
/var/log >

Whereas "zgrep 5.15.0- kern.log*" gives 376 lines of output.

> Especially if you could provide a similar log of a boot with the 5.15 kernel.
> Doesn't need to be the whole log, but knowing a couple of lines which happen
> thereafter on 5.15 but not on 5.16 could provide a clue.
See kernel-5.15-log-T41.txt attached.
>
> I saw that the 5.16.11-1 kernel transitioned to testing and it is useful to
> know if the issue is still present with that version.
Today's "apt-get dist-upgrade" did not yield anything
kernel-related. Will try again tomorrow and the following days
and report on the results.
> In the upstream kernel in drivers/gpu/drm/amd I saw a number of commits since
> version 5.16.7 (and also several other commits which are part of 5.16.12 which
> is present in salsa, but not yet released).
>
> When I look at the photo from https://bugs.debian.org/1006149#44 (from Axel),
> I do notice an important difference:
> He has various [drm] messages, whereas I see none of those with you.
> Do you have [drm] messages when booting with the 5.15 kernel?
Grepping for "drm" in kernel-5.15-log-T41.txt :
Mar 6 09:53:25 netty kernel: [ 51.179666] [drm] radeon kernel modesetting enabled.
Mar 6 09:53:25 netty kernel: [ 51.184860] [drm] initializing kernel modesetting (RV200 0x1002:0x4C57 0x1014:0x0530 0x00).
Mar 6 09:53:25 netty kernel: [ 51.184896] [drm] Forcing AGP to PCI mode
Mar 6 09:53:25 netty kernel: [ 51.187292] [drm] Detected VRAM RAM=128M, BAR=128M
Mar 6 09:53:25 netty kernel: [ 51.187316] [drm] RAM width 64bits DDR
Mar 6 09:53:25 netty kernel: [ 51.187383] [drm] radeon: 32M of VRAM memory ready
Mar 6 09:53:25 netty kernel: [ 51.187392] [drm] radeon: 512M of GTT memory ready.
Mar 6 09:53:25 netty kernel: [ 51.187417] [drm] GART: num cpu pages 131072, num gpu pages 131072
Mar 6 09:53:25 netty kernel: [ 51.192671] [drm] radeon: power management initialized
Mar 6 09:53:25 netty kernel: [ 51.192720] [drm] PCI GART of 512M enabled (table at 0x0000000002400000).
Mar 6 09:53:25 netty kernel: [ 51.195571] [drm] radeon: irq initialized.
Mar 6 09:53:25 netty kernel: [ 51.195614] [drm] Loading R100 Microcode
Mar 6 09:53:25 netty kernel: [ 51.195710] [drm:r100_cp_init [radeon]] *ERROR* Failed to load firmware!
Mar 6 09:53:25 netty kernel: [ 51.195987] [drm] radeon: cp finalized
Mar 6 09:53:25 netty kernel: [ 51.198852] [drm] Panel ID String: 1024x768
Mar 6 09:53:25 netty kernel: [ 51.198873] [drm] Panel Size 1024x768
Mar 6 09:53:25 netty kernel: [ 51.199136] [drm] No TV DAC info found in BIOS
Mar 6 09:53:25 netty kernel: [ 51.199215] [drm] Radeon Display Connectors
Mar 6 09:53:25 netty kernel: [ 51.199279] [drm] Connector 0:
Mar 6 09:53:25 netty kernel: [ 51.199285] [drm] VGA-1
Mar 6 09:53:25 netty kernel: [ 51.199291] [drm] DDC: 0x60 0x60 0x60 0x60 0x60 0x60 0x60 0x60
Mar 6 09:53:25 netty kernel: [ 51.199302] [drm] Encoders:
Mar 6 09:53:25 netty kernel: [ 51.199308] [drm] CRT1: INTERNAL_DAC1
Mar 6 09:53:25 netty kernel: [ 51.199315] [drm] Connector 1:
Mar 6 09:53:25 netty kernel: [ 51.199320] [drm] DVI-D-1
Mar 6 09:53:25 netty kernel: [ 51.199326] [drm] HPD1
Mar 6 09:53:25 netty kernel: [ 51.199331] [drm] DDC: 0x64 0x64 0x64 0x64 0x64 0x64 0x64 0x64
Mar 6 09:53:25 netty kernel: [ 51.199341] [drm] Encoders:
Mar 6 09:53:25 netty kernel: [ 51.199347] [drm] DFP1: INTERNAL_TMDS1
Mar 6 09:53:25 netty kernel: [ 51.199353] [drm] Connector 2:
Mar 6 09:53:25 netty kernel: [ 51.199358] [drm] LVDS-1
Mar 6 09:53:25 netty kernel: [ 51.199364] [drm] Encoders:
Mar 6 09:53:25 netty kernel: [ 51.199369] [drm] LCD1: INTERNAL_LVDS
Mar 6 09:53:25 netty kernel: [ 51.199375] [drm] Connector 3:
Mar 6 09:53:25 netty kernel: [ 51.199381] [drm] SVIDEO-1
Mar 6 09:53:25 netty kernel: [ 51.199386] [drm] Encoders:
Mar 6 09:53:25 netty kernel: [ 51.199391] [drm] TV1: INTERNAL_DAC2
Mar 6 09:53:25 netty kernel: [ 51.280392] [drm] fb mappable at 0xE0040000
Mar 6 09:53:25 netty kernel: [ 51.280412] [drm] vram apper at 0xE0000000
Mar 6 09:53:25 netty kernel: [ 51.280419] [drm] size 2621440
Mar 6 09:53:25 netty kernel: [ 51.280425] [drm] fb depth is 16
Mar 6 09:53:25 netty kernel: [ 51.280431] [drm] pitch is 2560
Mar 6 09:53:25 netty kernel: [ 51.281252] fbcon: radeondrmfb (fb0) is primary device
Mar 6 09:53:25 netty kernel: [ 51.376835] radeon 0000:01:00.0: [drm] fb0: radeondrmfb frame buffer device
Mar 6 09:53:25 netty kernel: [ 51.377282] [drm] Initialized radeon 2.50.0 20080528 for 0000:01:00.0 on minor 0
But they seem to appear much later, long after my 5.16 kernel
would have stopped.
>
> That bug message also has the following which is different from yours:
> ii firmware-amd-graphics 20210818-1
> ...
> ii firmware-linux-nonfree 20210818-1
> ii firmware-misc-nonfree 20210818-1
>
> So it would be interesting to know whether installing any of those packages
> makes a difference. I'd suggest first installing the firmware-amd-graphics
> package.

I installed firmware-amd-graphics first: No difference.
Then firmware-linux-nonfree, which depends on firmware-misc-nonfree:
Again no difference.

> HTH,

Hoping the same, although doubtful ;-)

Best,
Petra

kernel-5.15-log-T41.txt

Diederik de Haas

unread,
Mar 6, 2022, 10:10:04 AM3/6/22
to
Hi Petra,

On Sunday, 6 March 2022 11:23:57 CET Petra R.-P. wrote:
> > I saw that the 5.16.11-1 kernel transitioned to testing and it is useful
> > to know if the issue is still present with that version.
>
> Today's "apt-get dist-upgrade" did not yield anything kernel-related.

Do you have 'linux-image-686' (kernel metapackage) installed? If so you can use
'apt-cache policy linux-image-686' and that should also give a version table.
The 5.16.11 version comes in the linux-image-5.16.0-3-686 package.
The linux-signed-i386 package transitioned on 2022-03-04, so it should be there.
If you don't use 'deb.debian.org' in your /etc/apt/sources.list line(s), then
changing to that may help.

> > Because on my system the very first message of a boot in that file
> > begins with: [ 0.000000] Linux version 5.16.0-3-amd64 ...
>
> Yes, I understood that, and checked the file for such lines
> over and over again. Not a single trace of any "5.16",
> only "Linux version 5.15.0-3-686" everywhere.

Then I may have found a reason which would perfectly explain that and
that would also explain why the boot stops:

From you 5.15 boot log:
Mar 6 09:53:25 netty kernel: [ 4.493610] libata version 3.00 loaded.
Mar 6 09:53:25 netty kernel: [ 4.502635] ata_piix 0000:00:1f.1: version 2.13
Mar 6 09:53:25 netty kernel: [ 4.502656] ata_piix 0000:00:1f.1: enabling device (0005 -> 0007)
Mar 6 09:53:25 netty kernel: [ 4.535303] scsi host0: ata_piix
Mar 6 09:53:25 netty kernel: [ 4.548193] scsi host1: ata_piix
Mar 6 09:53:25 netty kernel: [ 4.548354] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0x1860 irq 14
Mar 6 09:53:25 netty kernel: [ 4.548398] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x1868 irq 15
Mar 6 09:53:25 netty kernel: [ 4.713115] ata2.00: ATAPI: HL-DT-STDVD-ROM GDR8083N, 0K04, max UDMA/33
Mar 6 09:53:25 netty kernel: [ 4.713521] ata1.00: ATA-6: IC25N030ATMR04-0, MOAOAD4A, max UDMA/100
Mar 6 09:53:25 netty kernel: [ 4.713566] ata1.00: 58605120 sectors, multi 16: LBA
Mar 6 09:53:25 netty kernel: [ 4.724928] scsi 0:0:0:0: Direct-Access ATA IC25N030ATMR04-0 AD4A PQ: 0 ANSI: 5
Mar 6 09:53:25 netty kernel: [ 4.732350] scsi 1:0:0:0: CD-ROM HL-DT-ST DVD-ROM GDR8083N 0K04 PQ: 0 ANSI: 5
Mar 6 09:53:25 netty kernel: [ 4.797990] e1000 0000:02:01.0 eth0: (PCI:33MHz:32-bit) 00:0d:60:cf:80:2e
Mar 6 09:53:25 netty kernel: [ 4.798057] e1000 0000:02:01.0 eth0: Intel(R) PRO/1000 Network Connection
Mar 6 09:53:25 netty kernel: [ 4.816199] sd 0:0:0:0: [sda] 58605120 512-byte logical blocks: (30.0 GB/27.9 GiB)
Mar 6 09:53:25 netty kernel: [ 4.816288] sd 0:0:0:0: [sda] Write Protect is off
Mar 6 09:53:25 netty kernel: [ 4.816328] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Mar 6 09:53:25 netty kernel: [ 4.816373] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

This mostly overlaps with the screenshot you send wrt the 5.16 kernel,
with one critical difference: I don't see the "libata version 3.00 loaded."
line and the line following that (but it does show the line following that...)
and at the end I see the detection of sda which is missing in your screenshot!
No HDD detected means it can't write a (kernel) log file to it and it would
also prevent loading the rest of the OS.
(I'm also missing the detection of the NIC (e1000 driver), but that seems less
relevant.

I found commit 0188d5195b6705691fcc35046561e7ddf59ac626 in the 5.16.y branch
which links to upstream commit fda17afc6166e975bec1197bd94cd2a3317bce3f and
the commit message mentions:

Since many older drives
react badly to the READ LOG EXT and/or READ LOG DMA EXT commands isued
to read device log pages, avoid problems with older drives by limiting
the concurrent positioning ranges support detection to drives
implementing at least the ACS-4 ATA standard (major version 11). This
additional condition effectively turns ata_dev_config_cpr() into a nop
for older drives, avoiding problems in the field.

This commit is part of the 5.16.9 kernel.

If updating to linux-image-5.16.0-3-686 (5.16.11) doesn't fix it, then I guess
the issue should be taken to the upstream kernel, but I don't know where
that should be done. I guess someone else who follows the list, does though :)

Good luck!

Diederik
signature.asc

Diederik de Haas

unread,
Mar 6, 2022, 5:30:03 PM3/6/22
to
On Sunday, 6 March 2022 21:05:10 CET Petra R.-P. wrote:
> > Update for linux-image-5.16.0-3-686 :

Indeed. Somehow I missed it, sorry.

Upstream commit 68dbbe7d5b4fde736d104cbbc9a2fce875562012 [1]
seems potentially relevant (included in 5.16-rc1), but given ...

On Saturday, 5 March 2022 17:59:50 CET Petra R.-P. wrote:
> The "Read log 0x00 page 0x00 failed" line has disappeared,

... I don't know if building a new kernel with that commit reverted
would be enough.

From the above mentioned commit message it very much seems the logic
wrt slow drives was changed, but how exactly is 'above my pay grade'.
The new timeout (15s) seems to work for SATA 6.0 Gbps links, but it appears
the T41 uses PATA which I think is (quite a bit?) slower then that.

I did notice there were some big time gaps in your kernel log:
[ 4.548354] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0x1860 irq 14
[ 4.548398] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x1868 irq 15
[ 4.713115] ata2.00: ATAPI: HL-DT-STDVD-ROM GDR8083N, 0K04, max UDMA/33
[ 4.713521] ata1.00: ATA-6: IC25N030ATMR04-0, MOAOAD4A, max UDMA/100
...
[ 6.786933] EXT4-fs (sda1): mounting ext3 file system using the ext4 subsystem
[ 7.394550] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[ 7.793198] Not activating Mandatory Access Control as /sbin/tomoyo-init does not exist.
[ 8.107259] random: crng init done
[ 27.821039] fuse: init (API version 7.34)
[ 27.969483] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro. Quota mode: none.
...
[ 51.367346] Console: switching to colour frame buffer device 128x48
[ 51.376835] radeon 0000:01:00.0: [drm] fb0: radeondrmfb frame buffer device
[ 51.377282] [drm] Initialized radeon 2.50.0 20080528 for 0000:01:00.0 on minor 0
[ 152.603793] lib80211_crypt: unregistered algorithm 'NULL'
[ 155.456128] e1000: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
...
[ 159.808244] scsi 2:0:0:0: Direct-Access CBM Flash Disk 5.00 PQ: 0 ANSI: 2
[ 159.824472] sd 2:0:0:0: [sdb] 1992192 512-byte logical blocks: (1.02 GB/973 MiB)
[ 159.837887] sd 2:0:0:0: Attached scsi generic sg2 type 0
[ 159.860089] sd 2:0:0:0: [sdb] Write Protect is off
[ 159.867932] sd 2:0:0:0: [sdb] Mode Sense: 0b 00 00 08
[ 159.879826] sd 2:0:0:0: [sdb] No Caching mode page found
[ 159.886610] sd 2:0:0:0: [sdb] Assuming drive cache: write through
[ 159.900341] sdb: sdb1
[ 159.915825] sd 2:0:0:0: [sdb] Attached SCSI removable disk
[ 361.546715] EXT4-fs (sdb1): mounting ext2 file system using the ext4 subsystem
[ 361.568224] EXT4-fs (sdb1): mounted filesystem without journal. Opts: (null). Quota mode: none.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=68dbbe7d5b4fde736d104cbbc9a2fce875562012
signature.asc

Diederik de Haas

unread,
Mar 7, 2022, 2:20:04 PM3/7/22
to
Hi Damien,

On Sunday, 6 March 2022 23:26:03 CET Diederik de Haas wrote:
> Upstream commit 68dbbe7d5b4fde736d104cbbc9a2fce875562012 [1]
> seems potentially relevant (included in 5.16-rc1), but given ...
>
> On Saturday, 5 March 2022 17:59:50 CET Petra R.-P. wrote:
> > The "Read log 0x00 page 0x00 failed" line has disappeared,
>
> ... I don't know if building a new kernel with that commit reverted
> would be enough.
>
> From the above mentioned commit message it very much seems the logic
> wrt slow drives was changed, but how exactly is 'above my pay grade'.
> The new timeout (15s) seems to work for SATA 6.0 Gbps links, but it appears
> the T41 uses PATA which I think is (quite a bit?) slower then that.

In https://bugs.debian.org/1006149 we have 2 users where the 5.16 kernel fails
to boot, while the 5.15 kernel succeeded.
In the bug report there are a number of dmesg outputs posted which contain
time gaps of ~ 20, 100 and even 160 seconds; i.e. more then 15 seconds.
The commit message mentions 'SATA link up 6.0 Gbps', but these users hava PATA
links for their HDD. IIUC that means a max speed of ~600MB/s vs ~133MB/s.

As the author of the above mentioned commit, could you tell us whether it
*could* be that your commit is causing the boot failures? Or is it completely
irrelevant for this problem?

Cheers,
Diederik
signature.asc

Axel Beckert

unread,
Mar 7, 2022, 4:00:03 PM3/7/22
to
Hi Damien,

one of the affected users here.

Damien Le Moal wrote:
> In the bug report, I did not see a dmesg output for a failed boot. But I
> guess it is because the user cannot capture it.

Correct.

> Could you have a look at this bug report for Fedora:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=215519
>
> This Debian case does sound similar. The problem is fixed with commit:
>
> fda17afc6166 ("ata: libata-core: Fix ata_dev_config_cpr()")
>
> which is included in kernel 5.16.9.

Then it didn't help.

> Could the users try that kernel version to see if that fixes the
> issue ?

Both affected users already tried 5.16.11 and the issue is still
there.

Nevertheless thanks for your prompt reply and all the details so far!

Axel Beckert

unread,
Mar 8, 2022, 9:00:03 AM3/8/22
to
Hi Damien,

Damien Le Moal wrote:
> > Damien Le Moal wrote:
> >> In the bug report, I did not see a dmesg output for a failed boot. But I
> >> guess it is because the user cannot capture it.
> >
> > Correct.
>
> Could you try taking a video of the boot messages ?

Did that now.

https://noone.org/debian/Bug-Reports/1006149/DSCN5255.MOV

(82 MB, no debugging yet)

Note that this is from an IBM ThinkPad A31
(https://www.thinkwiki.org/wiki/Category:A31), not from a ThinkPad
T41 as with the original bug reporter, i.e. it's even a bit older.

> However, since things are working with 5.15, I would like to
> understand what is going on and which part of the device scan is
> failing. For that, booting with logging_level=7 (debug) and having
> the kernel messages would help.

Ok, did another one, this time with "debug loglevel=7" added on the
kernel commandline by editing it in GRUB:

https://noone.org/debian/Bug-Reports/1006149/DSCN5259.MOV

(94 MB, with debugging)

HTH. (Sorry for all that zooming out at the beginning of the videos.
For some reason my digital camera doesn't seem to use the same zoom
setting or maybe resolution for videos as the one it uses for pictures
(and hence also for the preview before starting the recording).

Petra R.-P.

unread,
Mar 8, 2022, 4:10:03 PM3/8/22
to
Hello,

I am the "OP" with the T41 Thinkpads.

On Tue 08 Mar 2022 at 05:58:53 +0900 Damien Le Moal <damien...@opensource.wdc.com> wrote:

> Could you try taking a video of the boot messages ?

You can find my first attempt (only ≈12MB) here: https://prp.in-berlin.de/MAQ01257.MP4
I will leave it there for 2 weeks.

Not sure if I got the loglevel right and whether the image is
clear enough.

BTW: This is really a T41 Thinkpad; I'm using it with an
external monitor because its display is broken.

Best regards,
Petra

Petra R.-P.

unread,
Mar 9, 2022, 2:40:05 AM3/9/22
to
Hello Damien,

Thanks for looking into the problem:

On Wed 09 Mar 2022 at 16:11:37 +0900 Damien Le Moal <damien...@opensource.wdc.com> wrote:

[...]
> Could you try this patch:
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
[...]

I am very sorry, but I am a simple user, not a developer.
It would need a lot of tuition to make me accomplish that,
and I'm not sure I'd have enough time at the moment.

> Not sure if it will help, especially if you have a clean boot with 5.15
> ? If you do, could you also send me the full dmesg of a boot with 5.15
> kernel ? No video, dmesg output :)
>
> I suspect that my patch that increases the timeout for read log may be
> the cause for the "hang", but the root cause is that the laptop drive
> does not like read log commands (there are some drives like that).
> Before the patch, the failure was faster and somehow ignored. I would
> like to see if dmesg shows the failures with 5.15.

Okay, that's easier: See attached dmesg-T41-k5.15.txt .

HTH a little.

Best regards,
Petra

dmesg-T41-k5.15.txt

Diederik de Haas

unread,
Mar 9, 2022, 9:40:03 AM3/9/22
to
Hi Petra and Axel,

On Wednesday, 9 March 2022 08:11:37 CET Damien Le Moal wrote:
> Could you try this patch:
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 62eb9921cc94..525ce40b524d 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3320,6 +3320,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
> sd_read_block_limits(sdkp);
> sd_read_block_characteristics(sdkp);
> sd_zbc_read_zones(sdkp, buffer);
> + sd_read_cpr(sdkp);
> }
>
> sd_print_capacity(sdkp, old_capacity);
> @@ -3329,7 +3330,6 @@ static int sd_revalidate_disk(struct gendisk *disk)
> sd_read_app_tag_own(sdkp, buffer);
> sd_read_write_same(sdkp, buffer);
> sd_read_security(sdkp, buffer);
> - sd_read_cpr(sdkp);
> }
>
> /*

On Wednesday, 9 March 2022 08:35:57 CET Petra R.-P. wrote:
> I am very sorry, but I am a simple user, not a developer.
> It would need a lot of tuition to make me accomplish that,
> and I'm not sure I'd have enough time at the moment.

If you save the above patch to f.e. 'fix-bug1006149.patch' and then follow
https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#s4.2.2 (after having followed 4.2.1 Preparation), then you should
get a kernel package with that patch applied.
That's relatively easy ... if you're a developer.
While I can understand that it would still be too complicated for Petra, it
should be doable for Axel (being a DD).
I could try to help further if needed.

HTH (a bit ;-))
signature.asc

Axel Beckert

unread,
Mar 9, 2022, 9:50:05 AM3/9/22
to
Hi Damien,

Damien Le Moal wrote:
> > Ok, did another one, this time with "debug loglevel=7" added on the
> > kernel commandline by editing it in GRUB:
> >
> > https://noone.org/debian/Bug-Reports/1006149/DSCN5259.MOV
>
> Thanks for this. But unfortunately, it does not tell us much as to what
> is going on...

Ah, what a pity. Thanks for having a look anyways.

> Could you send me dmesg output of a clean boot with 5.15
> kernel ?

Sure. I already posted this to the Debian bug report at
https://bugs.debian.org/cgi-bin/bugreport.cgi?att=2;bug=1006149;filename=5.15.15-2.dmesg.xz;msg=96
shortly before you got into the loop.

> Also, once booted in 5.15, install sg3utils and run these commands:
>
> sudo sg_sat_read_gplog --log=0 --page=0 /dev/sdX
> sudo sg_sat_read_gplog --log=0 --page=0 --dma /dev/sdX

[I find it disturbing how often people just prepend "sudo" in
instructions when they actually mean "as root". Not everyone has sudo
installed — usually for good reasons.]

# sg_sat_read_gplog --log=0 --page=0 /dev/sda
00 0001 0000 0000 0002 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
08 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
10 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
18 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
20 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
28 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
30 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
38 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
40 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
48 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
50 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
58 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
60 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
68 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
70 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
78 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
80 0010 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
88 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
90 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
98 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
a0 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
a8 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
b0 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
b8 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
c0 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
c8 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
d0 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
d8 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
e0 0001 0001 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
e8 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
f0 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
f8 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. ..
# sg_sat_read_gplog --log=0 --page=0 --dma /dev/sda
ATA PASS-THROUGH (16), bad field in cdb
sg_sat_read_gplog failed: Illegal request
#

Since the last output might be a hardware-specific issue, here some
more information on /dev/sda:

# fdisk -l /dev/sda
Disk /dev/sda: 149.05 GiB, 160041885696 bytes, 312581808 sectors
Disk model: SAMSUNG HM160HC
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000500cc

Device Boot Start End Sectors Size Id Type
/dev/sda1 * 63 993279 993217 485M 83 Linux
/dev/sda3 996030 2988089 1992060 972.7M 82 Linux swap / Solaris
/dev/sda4 2988090 312576704 309588615 147.6G 5 Extended
/dev/sda5 2988153 86874232 83886080 40G 83 Linux
# smartctl --info /dev/sda
smartctl 7.2 2020-12-30 r5155 [i686-linux-5.15.0-3-686-pae] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: SAMSUNG SpinPoint M5
Device Model: SAMSUNG HM160HC
Serial Number: S12TJD0SA62821
LU WWN Device Id: 5 0f0000 003162821
Firmware Version: LQ100-10
User Capacity: 160’041’885’696 bytes [160 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS, ATA/ATAPI-7 T13/1532D revision 0
Transport Type: Parallel, Unknown (0x00f)
Local Time is: Wed Mar 9 15:25:44 2022 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

HTH.

Petra R.-P.

unread,
Mar 9, 2022, 3:20:03 PM3/9/22
to
For the sake of completeness ...

On Wed 09 Mar 2022 at 08:35:57 +0100 Petra R.-P. <de...@prp.in-berlin.de> wrote:

[...]
> Okay, that's easier: See attached dmesg-T41-k5.15.txt .

Adding dmesg-T41-b-k5.15.txt — corresponding file for the other
T41 notebook, which has slightly different hardware.

Best,
Petra
dmesg-T41-b-k5.15.txt

Francesco C

unread,
Mar 12, 2022, 11:50:04 AM3/12/22
to
The same here with linux-image-5.16.0-4-686 : different machine but
the same ata controller (ata_piix).

linux-image-5.15.x were all fine ; the same with custom kernels of the
5.15.x series : custom kernels of the 5.16 series all fail to boot and
stop loading at the same point.

Petra R.-P.

unread,
Mar 12, 2022, 3:50:03 PM3/12/22
to
On Sat 05 Mar 2022 at 17:59:51 +0100 Petra R.-P. <de...@prp.in-berlin.de> wrote:
> Update for linux-image-5.16.0-3-686 :
[...]

The error persists also in linux-image-5.16.0-4-686 (5.16.12-1) .

Petra

Petra Rübe-Pugliese

unread,
Mar 26, 2022, 12:30:03 PM3/26/22
to
... and in linux-image-5.16.0-5-686 (5.16.14-1) ...

Petra

Salvatore Bonaccorso

unread,
Mar 30, 2022, 2:50:03 AM3/30/22
to
Hi,
Thanks for constantly testing the new versions (there is 5.16.18-1
upcoming btw, or 5.17.1 in experiemntal).

Do we have a report upstream already about this issue?

Regards,
Salvatore

Petra R.-P .

unread,
Mar 30, 2022, 4:00:03 AM3/30/22
to
Am Mi., 30. Mär. 2022, um 08:38 +0200 schrieb Salvatore Bonaccorso <car...@debian.org>:
[...]
> Thanks for constantly testing the new versions (there is 5.16.18-1
> upcoming btw, or 5.17.1 in experiemntal).

I'm just testing whatever is pulled in automatically by linux-image-686 (in bookworm).
>
> Do we have a report upstream already about this issue?

I don't know anything that has not been reported to 100...@bugs.debian.org .

Best regards,
Petra

Diederik de Haas

unread,
Mar 30, 2022, 7:10:03 AM3/30/22
to
On woensdag 30 maart 2022 08:38:30 CEST Salvatore Bonaccorso wrote:
> Do we have a report upstream already about this issue?

What I tried to achieve with my involvement is determining where (exactly) the
problem is to determine which upstream to contact.
I (still) think the most likely culprit is in the ATA/disk subsystem, but the
progress stalled when the request came in to apply custom patches to the
kernel (which IIRC was primarily to gather further information).

There's quite a bit of information present in this bug report and a fresh set
of eyes could be quite beneficial :-)
signature.asc

Axel Beckert

unread,
Mar 30, 2022, 10:50:03 AM3/30/22
to
Hi,

Diederik de Haas wrote:
> I (still) think the most likely culprit is in the ATA/disk subsystem, but the
> progress stalled when the request came in to apply custom patches to the
> kernel (which IIRC was primarily to gather further information).

Yep, and that's still on my TODO list.

Francesco C

unread,
Apr 2, 2022, 9:40:03 AM4/2/22
to
linux-image-5.16.0-6-686 (5.16.18-1) still does not boot on a
different machine with the same controller.

Vanilla kernel 5.16.18 built with custom .config for the specific
machine fails to boot at the same point too.

I'm also trying to track stable patches in the stable tree queue
(https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/log/)
but it seems to me at the moment there not any patch about ata_pxii in
the 5.16.x branch and in the 5.17.x either.

Diederik de Haas

unread,
Apr 2, 2022, 12:40:03 PM4/2/22
to
On Saturday, 2 April 2022 15:36:53 CEST Francesco C wrote:
> Vanilla kernel 5.16.18 built with custom .config for the specific
> machine fails to boot at the same point too.

As you appear to be familiar with kernel building, could you look at the
patches that Damien Le Moal posted in this bug report?

> I'm also trying to track stable patches in the stable tree queue
> (https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/log
> /) but it seems to me at the moment there not any patch about ata_pxii in
> the 5.16.x branch and in the 5.17.x either.

AFAICT, we're still at the stage of identifying the problem, so I don't expect
anything in the stable-queue until we're past that stage.
signature.asc

Francesco C

unread,
Apr 3, 2022, 9:30:03 AM4/3/22
to
> As you appear to be familiar with kernel building, could you look at the
> patches that Damien Le Moal posted in this bug report?

Done ... same behaviour : as before the only way to restart the
machine is by pressing the power button ; Ctl+Alt+Del and Alt+SysReq+b
do not work.

> AFAICT, we're still at the stage of identifying the problem, so I don't expect
> anything in the stable-queue until we're past that stage.

This Is the diff between the ata_pii driver in 5.15 and 5.16 :

--- linux-5.15/drivers/ata/ata_piix.c 2021-10-31 21:53:10.000000000 +0100
+++ linux-5.16/drivers/ata/ata_piix.c 2022-01-09 23:55:34.000000000 +0100
@@ -1085,14 +1085,16 @@
.set_dmamode = ich_set_dmamode,
};

-static struct device_attribute *piix_sidpr_shost_attrs[] = {
- &dev_attr_link_power_management_policy,
+static struct attribute *piix_sidpr_shost_attrs[] = {
+ &dev_attr_link_power_management_policy.attr,
NULL
};

+ATTRIBUTE_GROUPS(piix_sidpr_shost);
+
static struct scsi_host_template piix_sidpr_sht = {
ATA_BMDMA_SHT(DRV_NAME),
- .shost_attrs = piix_sidpr_shost_attrs,
+ .shost_groups = piix_sidpr_shost_groups,
};

static struct ata_port_operations piix_sidpr_sata_ops = {

Diederik de Haas

unread,
Apr 3, 2022, 9:50:03 AM4/3/22
to
On Sunday, 3 April 2022 15:16:06 CEST Francesco C wrote:
> > As you appear to be familiar with kernel building, could you look at the
> > patches that Damien Le Moal posted in this bug report?
>
> Done ... same behaviour : as before the only way to restart the
> machine is by pressing the power button ; Ctl+Alt+Del and Alt+SysReq+b
> do not work.

Did you try the patch that Damien posted here:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006149#131
Asking as you posted a different patch later in your reply.

I _think_ that Damien's patch is supposed to supply more information, not fix
the issue. I'm not sure whether the patch should be applied to the working
5.15 kernel or the non-working 5.16 kernel though.
In the former case a full dmesg output of a boot without the patch and one
with the patch should be attached to this bug report.
In the latter case there likely won't be a log file and in that case a video of
the (5.16) boot without and with the patch should hopefully give us more
information.

Damien: can you clarify whether the patch should be applied to the 5.15 or
5.16 kernel?
signature.asc

Francesco C

unread,
Apr 3, 2022, 10:20:03 AM4/3/22
to
> Did you try the patch that Damien posted here:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006149#131

Exactly ... when I write "same behaviour" I mean exactly the same :)

It stops at the same point as shown in the videos , no further
messages ... with a vanilla kernel 5.16.18 with that patch applied.

I posted the diff of the ata_pii driver to highlight that maybe there
was a change in the structure of libata and subsequently in the
related drivers that I suspect (maybe) have broken something.

Axel Beckert

unread,
Apr 4, 2022, 1:20:03 AM4/4/22
to
Hi Damien,

Damien Le Moal wrote:
> My hunch is that this drive simply reacts badly to read log commands
> and should be marked with a horkage to blacklist that command for
> it.

Hrm, do really mean drive or controller? If you really mean "drive",
shall all who are affected by this send in all the affected HDD's
details? (Taking Petra back into Cc for that.)

I think you already have mine, but just to be sure, here's the
affected HDD's details:

# fdisk -l /dev/sda
Disk /dev/sda: 149.05 GiB, 160041885696 bytes, 312581808 sectors
Disk model: SAMSUNG HM160HC
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000500cc

[…]
# smartctl -i /dev/sda
smartctl 7.2 2020-12-30 r5155 [i686-linux-5.15.0-3-686-pae] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: SAMSUNG SpinPoint M5
Device Model: SAMSUNG HM160HC
Serial Number: S12TJD0SA62821
LU WWN Device Id: 5 0f0000 003162821
Firmware Version: LQ100-10
User Capacity: 160’041’885’696 bytes [160 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS, ATA/ATAPI-7 T13/1532D revision 0
Transport Type: Parallel, Unknown (0x00f)
Local Time is: Mon Apr 4 07:02:54 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

> Cleanups for read log handling in 5.15 actually now use that command
> to check accesses to any log page. Meaning that prior to 5.15, the
> command was not issued for that drive.

I think you meant 5.16 in this paragraph. 5.15 works well. 5.16 is the
one which no more works.
>
> I will send a patch to blacklist read log for that drive later today, to
> try. If that patch works, I will queue it and cc stable so that it gets
> backported.

I assume this still requires recompilation as this blacklist is
probably compiled into the kernel and not just a file in the
initramfs.

Axel Beckert

unread,
Apr 4, 2022, 1:50:03 AM4/4/22
to
Hi Damien,

Damien Le Moal wrote:
> > > I will send a patch to blacklist read log for that drive later today, to
> > > try. If that patch works, I will queue it and cc stable so that it gets
> > > backported.
> >
> > I assume this still requires recompilation as this blacklist is
> > probably compiled into the kernel and not just a file in the
> > initramfs.
>
> Correct. You will need to compile and install a kernel. Can you do
> that ?

Theoretically yes, but I actually have no idea how to cross-compile a
kernel package from amd64 to i386.

And I suspect that compiling the Debian kernel package on the device
itself will run out of resources with only 1 GB of RAM or at least
take ages (single-core Pentium 4 Mobile with 1.8 GHz). But I can try.
(Failed to test the last patch due to not finding time, though.)

Petra R.-P.

unread,
Apr 4, 2022, 3:10:03 AM4/4/22
to
Just for the record:

On Mon 04 Apr 2022 at 14:49:05 +0900 Damien Le Moal <damien...@opensource.wdc.com> wrote:
[...]
> There will be no issue with your 1GB of ram. Giving this amount of ram and
> CPU, step (4) will indeed take a while, but it will run fine.

My T41 Thinkpads only have 500MB of ram and little space left on
their hd. Added to that my inexperience and lack of time I'm
afraid I cannot be of much help for re-compiling kernels (which
AFAIK is quite a stress test for computers).
Sorry.

Petra

Axel Beckert

unread,
Apr 4, 2022, 3:50:03 AM4/4/22
to
Hi Damien,

Damien Le Moal wrote:
> > My T41 Thinkpads only have 500MB of ram and little space left on
> > their hd. Added to that my inexperience and lack of time I'm
> > afraid I cannot be of much help for re-compiling kernels (which
> > AFAIK is quite a stress test for computers).
> > Sorry.
> >
> > Petra
>
> OK. We just need to find a Debian user volunteer for this :)
>
> Anyone ?

I'm on it. Still installing build-dependencies...

> I can always build a generic .deb from kernel source if needed, but not sure
> what patch sets the debian kernel adds, if any.

Might help for Petra. But I can also provide my .deb once its built.

Axel Beckert

unread,
Apr 4, 2022, 4:10:03 AM4/4/22
to
Hi Petra,

Petra R.-P. wrote:
> Sorry for the stupid question, but: How can I find out?
> What command shall I enter?

"fdisk -l /dev/sda" should suffice.

Look for a line like this:

Disk model: SAMSUNG HM160HC

It's the second line of output in my case.

Petra R.-P.

unread,
Apr 4, 2022, 4:10:03 AM4/4/22
to
Am Mo., 4. Apr. 2022, um 16:39 +0900 schrieb Damien Le Moal <damien...@opensource.wdc.com>:
[...]
> By the way, what is the drive connected on your Thinkpad ? Same as Axel, a
> Samsung HM160HC ? If it is a different drive, then patch 2 (big hammer) may
> be better as this may indicate that the adapter does not like read log
> commands...

Sorry for the stupid question, but: How can I find out?
What command shall I enter?
What part of the output are you interested in?

Petra

Petra R.-P.

unread,
Apr 4, 2022, 5:00:03 AM4/4/22
to
Am Mo., 4. Apr. 2022, um 10:05 +0200 schrieb Axel Beckert <a...@debian.org>:
[...]
> "fdisk -l /dev/sda" should suffice.


,-----[ 1st T41 ]-----------------------------------------------------------
| ~ > fdisk -l /dev/sda
| Disk /dev/sda: 27.95 GiB, 30005821440 bytes, 58605120 sectors
| Disk model: IC25N030ATMR04-0
| Units: sectors of 1 * 512 = 512 bytes
| Sector size (logical/physical): 512 bytes / 512 bytes
| I/O size (minimum/optimal): 512 bytes / 512 bytes
| Disklabel type: dos
| Disk identifier: 0xcaef08be
|
| Device Boot Start End Sectors Size Id Type
| /dev/sda1 * 63 56098979 56098917 26.8G 83 Linux
| /dev/sda2 56098980 58605119 2506140 1.2G 5 Extended
| /dev/sda5 56099043 58605119 2506077 1.2G 82 Linux swap / Solaris
| ~ >
`--------------------------------------------------------------------------


,-----[ 2nd T41 ]-----------------------------------------------------------
| ~ > fdisk -l /dev/sda
| Disk /dev/sda: 37.26 GiB, 40007761920 bytes, 78140160 sectors
| Disk model: HTS548040M9AT00
| Units: sectors of 1 * 512 = 512 bytes
| Sector size (logical/physical): 512 bytes / 512 bytes
| I/O size (minimum/optimal): 512 bytes / 512 bytes
| Disklabel type: dos
| Disk identifier: 0x000942b7
|
| Device Boot Start End Sectors Size Id Type
| /dev/sda1 * 63 75119939 75119877 35.8G 83 Linux
| /dev/sda2 75119940 78140159 3020220 1.4G 5 Extended
| /dev/sda5 75120003 78140159 3020157 1.4G 82 Linux swap / Solaris
| ~ >
`--------------------------------------------------------------------------

Looks rather "no name" to me ...

Petra

Petra R.-P.

unread,
Apr 4, 2022, 5:50:02 AM4/4/22
to
On Mon 04 Apr 2022 at 18:09:59 +0900 Damien Le Moal <damien...@opensource.wdc.com> wrote:
[...]
> hdparm -I /dev/sdX

,-----[ 1st T41 ]-------------------------------------------------------------
| netty:~# hdparm -I /dev/sda
|
| /dev/sda:
|
| ATA device, with non-removable media
| Model Number: IC25N030ATMR04-0
| Serial Number: MRG2H0KBCYTR8R
| Firmware Revision: MOAOAD4A
| Standards:
| Used: ATA/ATAPI-6 T13 1410D revision 3a
| Supported: 6 5 4
| Configuration:
| Logical max current
| cylinders 16383 16383
| heads 15 15
| sectors/track 63 63
| --
| CHS current addressable sectors: 15481935
| LBA user addressable sectors: 58605120
| Logical/Physical Sector size: 512 bytes
| device size with M = 1024*1024: 28615 MBytes
| device size with M = 1000*1000: 30005 MBytes (30 GB)
| cache/buffer size = 1740 KBytes (type=DualPortCache)
| Capabilities:
| LBA, IORDY(can be disabled)
| Standby timer values: spec'd by Vendor, no device specific minimum
| R/W multiple sector transfer: Max = 16 Current = 16
| Advanced power management level: 128
| Recommended acoustic management value: 128, current value: 254
| DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
| Cycle time: min=120ns recommended=120ns
| PIO: pio0 pio1 pio2 pio3 pio4
| Cycle time: no flow control=240ns IORDY flow control=120ns
| Commands/features:
| Enabled Supported:
| * SMART feature set
| Security Mode feature set
| * Power Management feature set
| * Write cache
| * Look-ahead
| * Host Protected Area feature set
| * WRITE_BUFFER command
| * READ_BUFFER command
| * NOP cmd
| * Advanced Power Management feature set
| Power-Up In Standby feature set
| * SET_FEATURES required to spinup after power up
| Address Offset Reserved Area Boot
| SET_MAX security extension
| Automatic Acoustic Management feature set
| * Device Configuration Overlay feature set
| * Mandatory FLUSH_CACHE
| * SMART error logging
| * SMART self-test
| Security:
| Master password revision code = 65534
| supported
| not enabled
| not locked
| frozen
| not expired: security count
| not supported: enhanced erase
| 26min for SECURITY ERASE UNIT.
| HW reset results:
| CBLID- above Vih
| Device num = 0 determined by the jumper
| Checksum: correct
| netty:~#
`----------------------------------------------------------------------------



,-----[ 2nd T41 ]-------------------------------------------------------------
| negrito:~# hdparm -I /dev/sda
|
| /dev/sda:
|
| ATA device, with non-removable media
| Model Number: HTS548040M9AT00
| Serial Number: MRL202L2K9PJ6B
| Firmware Revision: MG2OA5BA
| Standards:
| Used: ATA/ATAPI-6 T13 1410D revision 3a
| Supported: 6 5 4
| Configuration:
| Logical max current
| cylinders 16383 16383
| heads 16 16
| sectors/track 63 63
| --
| CHS current addressable sectors: 16514064
| LBA user addressable sectors: 78140160
| Logical/Physical Sector size: 512 bytes
| device size with M = 1024*1024: 38154 MBytes
| device size with M = 1000*1000: 40007 MBytes (40 GB)
| cache/buffer size = 7877 KBytes (type=DualPortCache)
| Capabilities:
| LBA, IORDY(can be disabled)
| Standby timer values: spec'd by Vendor, no device specific minimum
| R/W multiple sector transfer: Max = 16 Current = 16
| Advanced power management level: 192
| Recommended acoustic management value: 128, current value: 254
| DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
| Cycle time: min=120ns recommended=120ns
| PIO: pio0 pio1 pio2 pio3 pio4
| Cycle time: no flow control=240ns IORDY flow control=120ns
| Commands/features:
| Enabled Supported:
| * SMART feature set
| Security Mode feature set
| * Power Management feature set
| * Write cache
| * Look-ahead
| * Host Protected Area feature set
| * WRITE_BUFFER command
| * READ_BUFFER command
| * NOP cmd
| * Advanced Power Management feature set
| Power-Up In Standby feature set
| * SET_FEATURES required to spinup after power up
| Address Offset Reserved Area Boot
| SET_MAX security extension
| Automatic Acoustic Management feature set
| * Device Configuration Overlay feature set
| * Mandatory FLUSH_CACHE
| * SMART error logging
| * SMART self-test
| Security:
| Master password revision code = 65534
| supported
| not enabled
| not locked
| frozen
| not expired: security count
| not supported: enhanced erase
| 26min for SECURITY ERASE UNIT.
| HW reset results:
| CBLID- above Vih
| Device num = 0 determined by the jumper
| Checksum: correct
| negrito:~#
`----------------------------------------------------------------------------

Petra

Francesco C

unread,
Apr 4, 2022, 10:50:03 AM4/4/22
to
Axel Beckert wrote:

> ... but I actually have no idea how to cross-compile a
> kernel package from amd64 to i386.

The 64 bits compiler supports native build of 32 bits binaries through
the -m32 flag so you have just to install kernel build dependencies ,
multilibs support and point to the kernel source tree :

make ARCH=i386 menuconfig and then load your kernel config from elsewhere
make ARCH=i386 -j(number_of_cores) bindeb-pkg

You will obtain kernel-image-<version_progressive-number>-i386.deb and
kernel-headers-<version_progressive-number>-i386.deb for the 32 bit
architecture.

When installing the headers in the native 32 bit machine, the external
modules will not be built by dkms because the headers' scripts
"fixdep" and "genksym" are not cross-compiled but they're built for a
x86_64 environment : those scripts can be simply replaced by copying
their 32 bit version from a pre installed 32 bit headers' scripts
location ( eg: /usr/src/linux-headers-5.16.0-6-common/scripts/basic/fixdep
, /usr/src/linux-headers-5.16.0-6-common/scripts/genksym/genksym )

modpost gives problems to be replaced : I found a solution by
installing qemu-x86_64 in the 32 bit environment , putting the 64 bits
libc-2.3x.so and ld.so somewhere and finally changing in
scripts/Makefile.modpost the line

MODPOST = scripts/mod/modpost

with

MODPOST = qemu-x86_64 -L <location_of_libc-.so and_ld.so> scripts/mod/modpost

You can now build your modules with :

dkms autoinstall -k <kernel-version>

It's an artifact that I use because I don't really want to set up a
native 32 bit build host.

Sooooo ....is the solution to this bug to not permit reading the
driver's log when attached to an ICH4 controller ? If that's the case
you will have to blacklist all the machines with that controller
installed, not only "Thinkpad" branded machines. I suspect that the
controller is the base for a bunch of different branded machines of a
certain period ( mine is an ACER )

P.s. for readers .... for the next time , please sanitize the serial
number of the drives :)

Francesco C

unread,
Apr 6, 2022, 10:10:03 AM4/6/22
to
Hi people !

Big news here :

- I didn't apply the first patch since the disk drive installed in my
machine is exactly an "old" Western Digital :)

> hdparm -I /dev/sda
!
! /dev/sda:
!
! ATA device, with non-removable media
! Model Number: WDC WD1200BEVE-00WZT0
.....
___________________________________________

(The original disk was a Seagate Momentum)

- Big Hammer Patch does not work , as expected , at least in my case
since my machine is a ACER branded Pentium-m with the ICH4-M
controller .... so I've applied a "bigger hammer patch" (I've flagged
all ICH4 family chips with "ich_pata_100_nolog" ) and ..... this is
not working too :)

Same behaviour as before :)

Francesco C

unread,
Apr 7, 2022, 10:20:03 AM4/7/22
to
Hello everyone

I've tested also a patch based on
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006149#260 and
modified simply by adding the disk drive model installed in my machine
to the backlist, but the final result is exactly the same ... the
computer stops booting always at the same point and does not complete
the boot process :)

Francesco C

unread,
Apr 7, 2022, 5:40:04 PM4/7/22
to
I've also tested linux-5.17.1 from experimental but it does not boot
as expected :)

It could be interesting to know if there is any 32 bit pentium-m based
platform with ICH4 controller that boots with any kernel in the range
from 5.16.0 to 5.17.1

Anyway I think I will remain with my custom kernel , since linux
5.15.x is a long term version :)

Petra R.-P.

unread,
Apr 9, 2022, 2:30:03 AM4/9/22
to
... and in linux-image-5.16.0-6-686 (5.16.18-1) ...

Petra

Francesco C

unread,
Apr 11, 2022, 6:10:03 PM4/11/22
to
Damien Le Moal wrote :

> I posted another patch which disables the read log command for the adapter
> instead of just for your disk. Can you try that one too ?

It did not work in my case ... and neither blacklisting the entire
ICH4 class of pci adapters worked.

Blacklisting components and/or adding exceptions is not a good
practice to solve problems. In my opinion in the first post when the
log says :

[ 4.xxxxxx] ata1.00: Read log 0x00 page 0x00 failed Emask 0x1

maybe indicates that it has nothing to read since it still has not any
access to disk ...

Anyway this reminds me of another kernel version starting from which
most of the x86 32 bit systems did not shutdown correctly. You could
have seen the screen becoming black , heard disks positioning their
heads and stopping but the POWER ON light or the fan still running
said the opposite.

This was true for every kernel ( debian-flavoured, vanilla , RT ...)

Then it appears a patch in the lkml :

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 7aa3dcad2175..f88bf3c77fc0 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2605,4 +2605,4 @@ static int __init cpufreq_core_init(void)
return 0;
}
module_param(off, int, 0444);
-core_initcall(cpufreq_core_init);
+late_initcall(cpufreq_core_init);

When applied , the systems came back to shutdown with every kernel (
vanilla , deb , RT ) ....but a subsequent kernel version seemed to
have restored the shutdown without the necessity of applying that
patch.

Problem solved ?

Partially , because RT kernels ( debian-flavoured RT , vanilla + RT ,
custom-RT ) still do not correctly shutdown , neither with that patch
while it was so before "they have solved the problem".

Why a RT kernel on a 2003 pentium-m machine ? Because it is able in a
non professional mode ( read : CHEAP way ) to record real time audio
at 48 kHz without crackling while a newer , 64 bit multicore and more
expensive ( but still not professional ) machine doesn't.

I am very sorry for ranting , but thank you for your patience to read this.

Francesco C

unread,
Apr 27, 2022, 6:10:04 PM4/27/22
to
Hi ,

5.16 series is EOL so I've just continued to do tests with vanilla
linux-5.17.4 and linux-5.17.5 : both do not boot and stop at the same
point as indicated in the messages above _but_ ... something strange
is happening also with longterm 5.15 series since version 5.15.35 and
5.15.36 do not boot too.

The kernels are both custom but the config for both versions is
exactly the same.

The following lines are the last ones appearing over the screen :

[ 3.736342] ima: No TPM chip found, activating TPM-bypass!
[ 3.736360] ima: Allocated hash algorithm: sha512
[ 3.736401] ima: No architecture policies found
[ 3.736429] evm: Initialising EVM extended attributes:
[ 3.736431] evm: security.selinux
[ 3.736433] evm: security.SMACK64 (disabled)
[ 3.736435] evm: security.SMACK64EXEC (disabled)
[ 3.736437] evm: security.SMACK64TRANSMUTE (disabled)
[ 3.736438] evm: security.SMACK64MMAP (disabled)
[ 3.736440] evm: security.apparmor
[ 3.736442] evm: security.ima
[ 3.736443] evm: security.capability
[ 3.736445] evm: HMAC attrs: 0x1 <------ They stop booting here

Not the same point as in 5.16 and 5.17 cases , but with the same result.

So at the moment , at least in my case , the last working kernel is
version 5.15.34

Salvatore Bonaccorso

unread,
Apr 28, 2022, 4:10:06 AM4/28/22
to
Hi,

On Thu, Apr 28, 2022 at 12:04:50AM +0200, Francesco C wrote:
> Hi ,
>
> 5.16 series is EOL so I've just continued to do tests with vanilla
> linux-5.17.4 and linux-5.17.5 : both do not boot and stop at the same
> point as indicated in the messages above _but_ ... something strange
> is happening also with longterm 5.15 series since version 5.15.35 and
> 5.15.36 do not boot too.

Now this gives us probably a good hint!

https://bugzilla.kernel.org/show_bug.cgi?id=215909

d6b88ce2eb9d ("ACPI: processor idle: Allow playing dead in C3 state")
and a followup commit bfe55a1f7fd6 ("ACPI: processor: idle: fix lockup
regression on 32-bit ThinkPad T40") were both applied to 5.15.35. I
suspect the former is the one so which causes the regression as well
for you.

If someone can can check if reverting the commit d6b88ce2eb9d helps,
this might need a similar solution for your problem as it was done for
the ThinkPad T40.

Regards,
Salvatore

Francesco C

unread,
Apr 28, 2022, 6:40:03 PM4/28/22
to
Hi ,

reverting d6b88ce2eb9d ("ACPI: processor idle: Allow playing dead in
C3 state") made kernel version 5.15.35 booting , but still not kernel
version 5.17.5 in my case.

The last one has changed behaviour : now the kernel seems to boot - at
least it is not freezing and key combination works to reboot the
machine - but it does not detect ata controller at all and after
trying without success to wait for root partition to be detected , it
finishes prompting at initramfs emergency console.

I'd like to do some more tests to be sure that it is really a kernel
boot failure and not an error I've made by messing up something else :
anyway config includes ata_generic and ata_pii modules.

Just a note about d6b88ce2eb9d : while reading the message of the
commit you can see it regards amd cpu _only_ ; it was introduced
without inserting neither a conditional check , at least !!

Francesco C

unread,
Apr 29, 2022, 5:30:03 PM4/29/22
to
Hi

I've done more tests (basically recompiled and installed 2 times the
same kernel) with linux-5.17.5 but I am still having no luck : I can't
figure out why it does not detect the controller ... I've checked the
config and the drivers for the controller (ata_piix , ata_generic )
are built but they are not loaded at the initramfs stage. I've tried
to put them explicitly in the initramfs via
/etc/initramfs-tools/modules but nothing worked.

So I decided to give a try to linux 5.18-rc4 : I've reverted the
commit d6b88ce2eb9d , recompiled , installed in the target machine ,
rebooted and BINGO !!

The system boots almost with the same config used for compiling linux 5.17.5 :)

uname -a
Linux <myhost> 5.18.0-rc4 #1 PREEMPT Fri Apr 29 11:12:55 CEST 2022
i686 GNU/Linux

cat /proc/version
Linux version 5.18.0-rc4 (<user>@<64bit_hostname>) (gcc (Debian
11.3.0-1) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38) #1 PREEMPT
Fri Apr 29 11:12:55 CEST 2022

The deb package I built is basically an optimized kernel for my
machine with only the modules for the internal devices and the
external components I have , so to be quick at building a new kernel ;
it's not suitable for other systems.

For who would like to try to build standard deb-flavour kernels in a
quicker way in a faster machine, I can say that in the past I managed
a 32 bit build host via virtual machine in my 64 bit machine to build
deb packages I needed for my 32 bit system (basically, it was a
minimal 32 bit debian unstable distribution installed as a virtualbox
machine) : it was - and it is still now in my opinion - the easiest
way to have a build host for 32 bit packages in a 64 bit environment
because I found more complicated setting up a full cross compile
environment .

Petra R.-P.

unread,
Apr 30, 2022, 3:30:03 AM4/30/22
to
... and in linux-image-5.17.0-1-686 (5.17.3-1) ...

although the last messages visible on screen are now different;
in fact the last one is:
clocksource: Switched to clocksource acpi_pm

Petra

Francesco C

unread,
May 12, 2022, 6:40:03 PM5/12/22
to
Hi

After updating linux-image-686 with the latest update (5.17.0-2-686) ,
finally my machine successfully boots. The problem should be solved
also for other people affected by this bug.

P.S. : For other people interested , starting from linux-5.18-rc5-rt4
and beyond ( latest I've built is linux-5.18-rc6-rt7) my system is
shutting down correctly in both power off and restart modes. And I
suppose it should be the same for other 686 class users that are
running a RT kernel :)

Diederik de Haas

unread,
May 12, 2022, 8:40:03 PM5/12/22
to
Hi Petra and Axel,

On Thursday, 28 April 2022 09:56:29 CEST Salvatore Bonaccorso wrote:
> Now this gives us probably a good hint!
>
> https://bugzilla.kernel.org/show_bug.cgi?id=215909
>
> d6b88ce2eb9d ("ACPI: processor idle: Allow playing dead in C3 state")
> and a followup commit bfe55a1f7fd6 ("ACPI: processor: idle: fix lockup
> regression on 32-bit ThinkPad T40") were both applied to 5.15.35. I
> suspect the former is the one so which causes the regression as well
> for you.

It was a good hint, but apparently not (yet) the solution.
Francesco reported that the issue was fixed for him in 5.17.0-2-686 which
corresponds to 5.17.6 upstream kernel.
Petra & Axel: can you verify whether it is also fixed for you?

Looking through the changes between 5.17.5 and 5.17.6, I found the following:

- 14defb873c1dc4cef1e7e7951f47f019821734fc titled:
Revert "ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40"
Secondary commit message:
" commit 20e582e16af24b074e583f9551fad557882a3c9d upstream.

This reverts commit bfe55a1f7fd6bfede16078bf04c6250fbca11588.

This was presumably misdiagnosed as an inability to use C3 at
all when I suspect the real problem is just misconfiguration of
C3 vs. ARB_DIS.
"
- b3b0ca1c324982fcc005063af045439670e16aa3 titled:
ACPI: processor: idle: Avoid falling back to C3 type C-states
Secondary commit message:
" commit fc45e55ebc58dbf622cb89ddbf797589c7a5510b upstream.

The "safe state" index is used by acpi_idle_enter_bm() to avoid
entering a C-state that may require bus mastering to be disabled
on entry in the cases when this is not going to happen. For this
reason, it should not be set to point to C3 type of C-states, because
they may require bus mastering to be disabled on entry in principle.

This was broken by commit d6b88ce2eb9d ("ACPI: processor idle: Allow
playing dead in C3 state") which inadvertently allowed the "safe
state" index to point to C3 type of C-states.

This results in a machine that won't boot past the point when it first
enters C3. Restore the correct behaviour (either demote to C1/C2, or
use C3 but also set ARB_DIS=1).

I hit this on a Fujitsu Siemens Lifebook S6010 (P3) machine.
"

The rest of the commits between .5 and .6 don't appear relevant for this bug
to *me*, but I can ofc be wrong.

Cheers,
Diederik
signature.asc

Petra R.-P.

unread,
May 13, 2022, 3:50:03 AM5/13/22
to
Hello Diederik,

Thanks for the good news:

On Fri 13 May 2022 at 02:19:44 +0200 Diederik de Haas <didi....@cknow.org> wrote:

[...]

> Francesco reported that the issue was fixed for him in 5.17.0-2-686 which
> corresponds to 5.17.6 upstream kernel.
> Petra & Axel: can you verify whether it is also fixed for you?

I downloaded
http://ftp.de.debian.org/debian/pool/main/l/linux-signed-i386/linux-image-5.17.0-2-686_5.17.6-1_i386.deb
and installed it with "dpkg -i" on both T41 Thinkpads, and they
both rebooted (and are now working) flawlessly :-))

Thanks to all who have contributed to this solution!

Best regards,
Petra

Salvatore Bonaccorso

unread,
May 13, 2022, 11:30:04 AM5/13/22
to
Source: linux
Source-Version: 5.17.6-1
Awesome, thanksfor confirming!

Regards,
Salvatore

Axel Beckert

unread,
May 13, 2022, 11:50:03 AM5/13/22
to
Hi,

Salvatore Bonaccorso wrote:
> > > Francesco reported that the issue was fixed for him in 5.17.0-2-686 which
> > > corresponds to 5.17.6 upstream kernel.
> > > Petra & Axel: can you verify whether it is also fixed for you?
[…]
> > and installed it with "dpkg -i" on both T41 Thinkpads, and they
> > both rebooted (and are now working) flawlessly :-))
> >
> > Thanks to all who have contributed to this solution!
>
> Awesome, thanksfor confirming!

My Thinkpad A31 also works again, thanks!

(Had it off since the last test and it first had to apply about 2 GB
of updates, so it took a while until I could reboot.)
0 new messages