Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

with kernel 6.4 update, boot fails drops into maintenance

10 views
Skip to first unread message

Riccardo Mottola

unread,
Jul 31, 2023, 6:30:03 AM7/31/23
to
Hi,

I have a Fire T2000 which has been stable for my usage for months.
Now I upgraded kernel (and userland) in debian to:

Yesterday, when hooking up a serial console, i was getting asked for
root password and dropped into maintenance mode.
Today I hooked up better through a console with minicom to be able to
copy the error, but I get these errors again and again:


[   17.134833] OOPS: RPC [000000000042ce10]
[   17.134843] OOPS: RPC <arch_cpu_idle+0x70/0xc0>
[   17.134875] OOPS: Fault was to vaddr[bc0]
[   17.134885] CPU: 25 PID: 0 Comm: swapper/25 Tainted: G D           
6.4.0-1-sparc64-smp #1  Debian 6.4.4-2
[   17.134908] Call Trace:
[   17.134915] [<0000000000de9614>] dump_stack+0x8/0x18
[   17.134948] [<0000000000df547c>] do_sparc64_fault+0x4bc/0xb00
[   17.134979] [<0000000000407714>] sparc64_realfault_common+0x10/0x20
[   17.135017] [<0000000000000bc0>] 0xbc0
[   17.135034] [<0000000000dea050>] default_idle_call+0x30/0x120
[   17.135067] [<00000000004c32f8>] do_idle+0xd8/0x1a0
[   17.135095] [<00000000004c3684>] cpu_startup_entry+0x24/0x40
[   17.135123] [<0000000000440f28>] smp_callin+0xe8/0x120
[   17.135145] [<00000000010d8c14>] 0x10d8c14
[   17.135162] [<0000000000000000>] 0x0
[   17.135179] Unable to handle kernel NULL pointer dereference
[   17.135187] tsk->{mm,active_mm}->context = 0000000000000041
[   17.135197] tsk->{mm,active_mm}->pgd = ffff800017270000
[   17.135207]               \|/ ____ \|/
[   17.135207]               "@'/ .. \`@"
[   17.135207]               /_| \__/ |_\
[   17.135207]                  \__U_/
[   17.135216] swapper/25(0): Oops [#49]
[   17.135230] CPU: 25 PID: 0 Comm: swapper/25 Tainted: G D           
6.4.0-1-sparc64-smp #1  Debian 6.4.4-2
[   17.135253] TSTATE: 0000004411001605


so something worsened! darn! this looks as back when all kernels failed
to boot on the Niagara.
I set up minicom to log to a file and see see if I can catch something
before the crashes.

A reboot... shows this again:
  Booting `Debian GNU/Linux'

Loading Linux 6.4.0-1-sparc64-smp ...
Loading initial ramdisk ...

/dev/sda2: clean, 429184/4276224 files, 8886587/17089844 blocks
[   42.149954] /dev/sda: Can't open blockdev
[FAILED] Failed to mount boot.mount - /boot.
[DEPEND] Dependency failed for local-fs.target - Local File Systems.
You are in emergency mode. After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, or "exit"
to continue bootup.
Give root password for maintenance
(or press Control-D to continue):

dmesg shows:
[   42.149954] /dev/sda: Can't open blockdev
[   42.745098] Kernel unaligned access at TPC[850ebc]
aa_dfa_unpack+0x3c/0x5a0
[   42.745167] Kernel unaligned access at TPC[850ed0]
aa_dfa_unpack+0x50/0x5a0
[   42.745207] Kernel unaligned access at TPC[850f68]
aa_dfa_unpack+0xe8/0x5a0
[   42.745254] Kernel unaligned access at TPC[850f68]
aa_dfa_unpack+0xe8/0x5a0
[   42.745298] Kernel unaligned access at TPC[850f68]
aa_dfa_unpack+0xe8/0x5a0
[   42.760418] audit: type=1400 audit(1690795782.060:2):
apparmor="STATUS" operation="profile_load" profile="unconfined"
name="nvidia_modprobe" pid=631 comm="apparmor_parser"
[   42.760469] audit: type=1400 audit(1690795782.060:3):
apparmor="STATUS" operation="profile_load" profile="unconfined"
name="nvidia_modprobe//kmod" pid=631 comm="apparmor_"
[   42.764485] audit: type=1400 audit(1690795782.064:4):
apparmor="STATUS" operation="profile_load" profile="unconfined"
name="lsb_release" pid=630 comm="apparmor_parser"
[   42.773434] audit: type=1400 audit(1690795782.072:5):
apparmor="STATUS" operation="profile_load" profile="unconfined"
name="/usr/bin/man" pid=633 comm="apparmor_parser"
[   42.773480] audit: type=1400 audit(1690795782.072:6):
apparmor="STATUS" operation="profile_load" profile="unconfined"
name="man_filter" pid=633 comm="apparmor_parser"
[   42.773509] audit: type=1400 audit(1690795782.072:7):
apparmor="STATUS" operation="profile_load" profile="unconfined"
name="man_groff" pid=633 comm="apparmor_parser"
[   42.904068] audit: type=1400 audit(1690795782.204:8):
apparmor="STATUS" operation="profile_load" profile="unconfined"
name="/usr/lib/NetworkManager/nm-dhcp-client.action""
[   42.904121] audit: type=1400 audit(1690795782.204:9):
apparmor="STATUS" operation="profile_load" profile="unconfined"
name="/usr/lib/NetworkManager/nm-dhcp-helper" pid=63"
[   42.904153] audit: type=1400 audit(1690795782.204:10):
apparmor="STATUS" operation="profile_load" profile="unconfined"
name="/usr/lib/connman/scripts/dhclient-script" pid"
[   42.904183] audit: type=1400 audit(1690795782.204:11):
apparmor="STATUS" operation="profile_load" profile="unconfined"
name="/{,usr/}sbin/dhclient" pid=632 comm="apparmor"
[   42.960067] RPC: Registered named UNIX socket transport module.
[   42.960093] RPC: Registered udp transport module.
[   42.960108] RPC: Registered tcp transport module.
[   42.960121] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   43.603086] ------------[ cut here ]------------
[   43.603111] WARNING: CPU: 4 PID: 668 at kernel/irq/msi.c:196
msi_domain_free_descs+0xf4/0x120
[   43.603170] Modules linked in: sunrpc binfmt_misc libdes sg flash drm
drm_panel_orientation_quirks i2c_core dm_mod fuse loop configfs
ip_tables x_tables autofs4 ext4 crc1n
[   43.603683] CPU: 4 PID: 668 Comm: ip Not tainted 6.4.0-1-sparc64-smp
#1  Debian 6.4.4-2
[   43.603715] Call Trace:
[   43.603727] [<0000000000de9614>] dump_stack+0x8/0x18
[   43.603771] [<000000000046e208>] __warn+0xc8/0x120
[   43.603800] [<000000000046e2c0>] warn_slowpath_fmt+0x60/0x100
[   43.603827] [<00000000004f96b4>] msi_domain_free_descs+0xf4/0x120
[   43.603866] [<00000000004f9b60>]
msi_domain_free_msi_descs_range+0x20/0x40
[   43.603903] [<0000000000a1e760>] pci_msi_teardown_msi_irqs+0x40/0x60
[   43.603952] [<0000000000a1e14c>] pci_free_msi_irqs+0xc/0x40
[   43.603988] [<0000000000a1c390>] pci_disable_msi+0x50/0x80
[   43.604021] [<000000001004b800>] e1000e_open+0x420/0x600 [e1000e]
[   43.604191] [<0000000000b9262c>] __dev_open+0xec/0x1c0
[   43.604232] [<0000000000b92b58>] __dev_change_flags+0x1b8/0x260
[   43.604266] [<0000000000b92c1c>] dev_change_flags+0x1c/0x80
[   43.604298] [<0000000000ba4720>] do_setlink+0x300/0x11e0
[   43.604336] [<0000000000baadcc>] __rtnl_newlink+0x54c/0x880
[   43.604374] [<0000000000bab134>] rtnl_newlink+0x34/0x60
[   43.604407] [<0000000000ba3324>] rtnetlink_rcv_msg+0x244/0x340
[   43.604441] ---[ end trace 0000000000000000 ]---
[   45.928446] e1000e 0000:04:00.0 enp4s0f0: NIC Link is Up 1000 Mbps
Full Duplex, Flow Control: Rx/Tx
[   45.928724] IPv6: ADDRCONF(NETDEV_CHANGE): enp4s0f0: link becomes ready


a single crash, once. Doesn't seem like the upper one.

If I select my next kernel I have:
Linux narya 6.1.0-9-sparc64-smp #1 SMP Debian 6.1.27-1 (2023-05-08) sparc64

all still works perfectly fine!


Riccardo

John Paul Adrian Glaubitz

unread,
Jul 31, 2023, 6:50:04 AM7/31/23
to
Hi Riccardo!
Kernel regressions should be reported to the sparclinux kernel mailing list [1].

Also, a bisect would be helpful to determine which commit broke the kernel.

Thanks,
Adrian

> [1] http://vger.kernel.org/vger-lists.html#sparclinux

--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

Riccardo Mottola

unread,
Jul 31, 2023, 8:50:03 AM7/31/23
to
Hi Adrian


John Paul Adrian Glaubitz wrote:
> Kernel regressions should be reported to the sparclinux kernel mailing list [1].
>
> Also, a bisect would be helpful to determine which commit broke the kernel.

after several reboots, I could not reproduce "that" looping crash
anymore, but I always got into maintenance mode.
Is that a kernel issue too? I don't know..- I'd like to understand what
the issue is. I don't even know if it is a kernel issue, but I suppose so.
It happened once also when booting a known good 6.1 kernel... during
these tests. The sytem has been stable for long.

I tried fetching from snapshots
Linux narya 6.3.0-2-sparc64-smp #1 SMP Debian 6.3.11-1 (2023-07-01)
sparc64 GNU/Linux

And it too fails:
/dev/sda2: clean, 432715/4276224 files, 8940034/17089844 blocks
[   41.713577] /dev/sda: Can't open blockdev
[FAILED] Failed to mount boot.mount - /boot.
[DEPEND] Dependency failed for local-fs.target - Local File Systems.
You are in emergency mode. After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, or "exit"
to continue bootup.
Give root password for maintenance


this looks like failing to mount root, which is internal SCSI.

The exception I see in dmesg is:
[   43.220522] ------------[ cut here ]------------
[   43.220545] WARNING: CPU: 25 PID: 666 at kernel/irq/msi.c:196
msi_domain_free_descs+0xf4/0x120
[   43.220600] Modules linked in: sunrpc binfmt_misc libdes sg flash
loop drm drm_panel_orientation_quirks i2c_core fuse dm_mod configfs
ip_tables x_tables autofs4 ext4 crc1n
[   43.221112] CPU: 25 PID: 666 Comm: ip Not tainted 6.3.0-2-sparc64-smp
#1  Debian 6.3.11-1
[   43.221140] Call Trace:
[   43.221153] [<0000000000dda8cc>] dump_stack+0x8/0x18
[   43.221194] [<000000000046e228>] __warn+0xc8/0x120
[   43.221223] [<000000000046e2e0>] warn_slowpath_fmt+0x60/0x100
[   43.221251] [<00000000004f8194>] msi_domain_free_descs+0xf4/0x120
[   43.221287] [<00000000004f8640>]
msi_domain_free_msi_descs_range+0x20/0x40
[   43.221323] [<0000000000a13d20>] pci_msi_teardown_msi_irqs+0x40/0x60
[   43.221357] [<0000000000a1372c>] pci_free_msi_irqs+0xc/0x40
[   43.221385] [<0000000000a11950>] pci_disable_msi+0x50/0x80
[   43.221413] [<000000001012fa00>] e1000e_open+0x420/0x600 [e1000e]
[   43.221572] [<0000000000b871ec>] __dev_open+0xec/0x1c0
[   43.221601] [<0000000000b87718>] __dev_change_flags+0x1b8/0x260
[   43.221628] [<0000000000b877dc>] dev_change_flags+0x1c/0x80
[   43.221654] [<0000000000b98d84>] do_setlink+0x304/0x1220
[   43.221680] [<0000000000b9f4a8>] __rtnl_newlink+0x548/0x880
[   43.221707] [<0000000000b9f814>] rtnl_newlink+0x34/0x60
[   43.221732] [<0000000000b97884>] rtnetlink_rcv_msg+0x244/0x340
[   43.221758] ---[ end trace 0000000000000000 ]---
[   45.317910] e1000e 0000:04:00.0 enp4s0f0: NIC Link is Up 1000 Mbps
Full Duplex, Flow Control: Rx/Tx

which looks more related to ethernet, or do I misread it?

Riccardo

John Paul Adrian Glaubitz

unread,
Jul 31, 2023, 11:20:04 AM7/31/23
to


> On Jul 31, 2023, at 2:46 PM, Riccardo Mottola <riccardo...@libero.it> wrote:
>
> after several reboots, I could not reproduce "that" looping crash anymore, but I always got into maintenance mode.
> Is that a kernel issue too?

Yes, see the other thread which also reports a regression on older SPARCs. There are no issues on the T5220 or SPARC-T4.

Adrian
0 new messages