Package: nvidia-driver
Version: 525.89.02-1
Nvidia drivers newer than the 510 series fail to load on my system, which is a Lenovo Thinkpad P51 with a Quadro M2200 GPU, with BIOS 1.60 and ECP 1.10. I have encountered this bug with driver versions 515, 520 and now the 525 that landed in testing, as well as with a version of 525 installed using nvidia's official installer, and kernels including 6.0.7 and 6.2.2 from xanmod and 6.1.0-5-amd64 from Debian's official repository. My system is a mixture of packages from stable and testing, with libc6=2.36-7. Driver version 510.108.03-1 works (but is unstable in sleep and broken in hibernation).
Below is an excerpt from journalctl's output including what appears to be potentially pertinent clusters of lines to me. All logs are from a boot on the xanmod 6.2.2 kernel, but there is no appreciable difference in the relevant outputs when running with Debian's 6.1.0-5. The operational failure points seem to be the ones pertaining to RmInitAdapter and NvKmsKapiDevice, but I'm not sure what, if any, causality there is between the two issues.
Mar 08 21:28:46 tangerine kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.2.2-x64v1-xanmod1 root=(***) ro quiet mitigations=off psi=1 nvidia-drm.modeset=1
(...)
Mar 08 21:28:46 tangerine kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Mar 08 21:28:46 tangerine kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 235
Mar 08 21:28:46 tangerine kernel:
Mar 08 21:28:46 tangerine kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
Mar 08 21:28:46 tangerine kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 525.89.02 Wed Feb 1 23:23:25 UTC 2023
Mar 08 21:28:46 tangerine kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 525.89.02 Wed Feb 1 23:09:40 UTC 2023
Mar 08 21:28:46 tangerine systemd[1]: Finished Rebuild Hardware Database.
Mar 08 21:28:46 tangerine systemd[1]: Starting Rule-based Manager for Device Events and Files...
Mar 08 21:28:46 tangerine kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Mar 08 21:28:46 tangerine kernel: ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20221020/nsarguments-61)
Mar 08 21:28:46 tangerine systemd[1]: Started Rule-based Manager for Device Events and Files.
Mar 08 21:28:46 tangerine systemd[1]: Starting Show Plymouth Boot Screen...
(...)
Mar 08 21:28:55 tangerine kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x65:1457)
Mar 08 21:28:55 tangerine kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Mar 08 21:28:55 tangerine kernel: [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
Mar 08 21:28:55 tangerine kernel: [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
Mar 08 21:28:55 tangerine systemd-modules-load[306]: Inserted module 'nvidia_drm'
Some possibly pertinent information from nvidia-bug-report.log.gz:
____________________________________________
*** /sys/bus/pci/devices/0000:01:00.0/power/control
*** ls: -rw-r--r-- 1 root root 4096 2023-03-08 22:17:25.703945619 +0100 /sys/bus/pci/devices/0000:01:00.0/power/control
on
____________________________________________
*** /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:25.711945656 +0100 /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
active
____________________________________________
*** /sys/bus/pci/devices/0000:01:00.0/power/runtime_usage
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:25.715945675 +0100 /sys/bus/pci/devices/0000:01:00.0/power/runtime_usage
3
____________________________________________
*** /sys/bus/pci/devices/0000:01:00.1/power/control
*** ls: -rw-r--r-- 1 root root 4096 2023-03-08 22:17:25.749945832 +0100 /sys/bus/pci/devices/0000:01:00.1/power/control
auto
____________________________________________
*** /sys/bus/pci/devices/0000:01:00.1/power/runtime_status
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:25.761945887 +0100 /sys/bus/pci/devices/0000:01:00.1/power/runtime_status
suspended
____________________________________________
*** /sys/bus/pci/devices/0000:01:00.1/power/runtime_usage
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:25.807946100 +0100 /sys/bus/pci/devices/0000:01:00.1/power/runtime_usage
0
____________________________________________
*** /proc/driver/nvidia/./gpus/0000:01:00.0/power
*** ls: -r--r--r-- 1 root root 0 2023-03-08 22:17:25.819946156 +0100 /proc/driver/nvidia/./gpus/0000:01:00.0/power
Runtime D3 status: ?
Video Memory: ?
GPU Hardware Support:
Video Memory Self Refresh: ?
Video Memory Off: ?
____________________________________________
/usr/bin/lspci -d "10de:*" -v -xxx
01:00.0 VGA compatible controller: NVIDIA Corporation GM206GLM [Quadro M2200 Mobile] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Lenovo GM206GLM [Quadro M2200 Mobile]
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at eb000000 (32-bit, non-prefetchable) [size=16M]
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d0000000 (64-bit, prefetchable) [size=32M]
I/O ports at d000 [size=128]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Kernel driver in use: nvidia
Kernel modules: nvidia
00: de 10 36 14 07 00 10 00 a1 00 00 03 00 00 80 00
10: 00 00 00 eb 0c 00 00 c0 00 00 00 00 0c 00 00 d0
20: 00 00 00 00 01 d0 00 00 00 00 00 00 aa 17 51 22
30: 00 00 00 00 60 00 00 00 00 00 00 00 0a 01 00 00
40: aa 17 51 22 00 00 00 00 00 00 00 00 00 00 00 00
50: 01 00 00 00 01 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 03 00 08 00 00 00 05 78 80 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 10 00 02 00 e1 8d 2c 01
80: 30 21 00 00 03 3d 45 00 40 01 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 08 04 00
a0: 00 04 00 00 0e 00 00 00 03 00 1f 00 00 00 00 00
b0: 00 00 00 00 09 00 14 01 00 00 10 80 00 00 00 00
c0: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01:00.1 Audio device: NVIDIA Corporation GM206 High Definition Audio Controller (rev a1)
Flags: bus master, fast devsel, latency 0, IRQ 17
Memory at ec000000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
00: de 10 ba 0f 06 00 10 00 a1 00 03 04 00 00 80 00
10: 00 00 00 ec 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 ff 02 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 03 00 0b 00 00 00 05 78 80 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 10 00 02 00 e1 8d 2c 01
80: 30 29 09 00 03 3d 45 00 43 00 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 08 04 00
a0: 00 00 00 00 0e 00 00 00 00 00 01 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
____________________________________________
/usr/bin/lspci -d "10b5:*" -v -xxx
____________________________________________
/usr/bin/lspci -t
-[0000:00]-+-00.0
+-01.0-[01]--+-00.0
| \-00.1
+-08.0
+-14.0
+-14.2
+-15.0
+-16.0
+-16.3
+-17.0
+-1c.0-[03]--
+-1c.2-[04]----00.0
+-1c.4-[05-3d]--
+-1d.0-[3e]----00.0
+-1d.4-[3f]----00.0
+-1f.0
+-1f.2
+-1f.3
+-1f.4
\-1f.6
____________________________________________
/usr/bin/lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [8086:5918] (rev 05)
00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)
00:08.0 System peripheral [0880]: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model [8086:1911]
00:14.0 USB controller [0c03]: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller [8086:a12f] (rev 31)
00:14.2 Signal processing controller [1180]: Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem [8086:a131] (rev 31)
00:15.0 Signal processing controller [1180]: Intel Corporation 100 Series/C230 Series Chipset Family Serial IO I2C Controller #0 [8086:a160] (rev 31)
00:16.0 Communication controller [0780]: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 [8086:a13a] (rev 31)
00:16.3 Serial controller [0700]: Intel Corporation 100 Series/C230 Series Chipset Family KT Redirection [8086:a13d] (rev 31)
00:17.0 SATA controller [0106]: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] [8086:a102] (rev 31)
00:1c.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #1 [8086:a110] (rev f1)
00:1c.2 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #3 [8086:a112] (rev f1)
00:1c.4 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #5 [8086:a114] (rev f1)
00:1d.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 [8086:a118] (rev f1)
00:1d.4 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #13 [8086:a11c] (rev f1)
00:1f.0 ISA bridge [0601]: Intel Corporation CM238 Chipset LPC/eSPI Controller [8086:a154] (rev 31)
00:1f.2 Memory controller [0580]: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller [8086:a121] (rev 31)
00:1f.3 Audio device [0403]: Intel Corporation CM238 HD Audio Controller [8086:a171] (rev 31)
00:1f.4 SMBus [0c05]: Intel Corporation 100 Series/C230 Series Chipset Family SMBus [8086:a123] (rev 31)
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (5) I219-LM [8086:15e3] (rev 31)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM206GLM [Quadro M2200 Mobile] [10de:1436] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GM206 High Definition Audio Controller [10de:0fba] (rev a1)
04:00.0 Network controller [0280]: Intel Corporation Wireless 8265 / 8275 [8086:24fd] (rev 78)
3e:00.0 Non-Volatile memory controller [0108]: Lenovo Device [17aa:0004]
3f:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader [10ec:525a] (rev 01)
____________________________________________
____________________________________________
*** /sys/devices/system/node/has_cpu
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:28.447958149 +0100 /sys/devices/system/node/has_cpu
0
____________________________________________
*** /sys/devices/system/node/has_memory
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:28.449958159 +0100 /sys/devices/system/node/has_memory
0
____________________________________________
*** /sys/devices/system/node/has_normal_memory
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:28.449958159 +0100 /sys/devices/system/node/has_normal_memory
0
____________________________________________
*** /sys/devices/system/node/online
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:28.451958167 +0100 /sys/devices/system/node/online
0
____________________________________________
*** /sys/devices/system/node/possible
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:28.453958177 +0100 /sys/devices/system/node/possible
0
____________________________________________
*** /sys/bus/pci/devices/0000:01:00.0/local_cpulist
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 21:54:55.862015759 +0100 /sys/bus/pci/devices/0000:01:00.0/local_cpulist
0-7
____________________________________________
*** /sys/bus/pci/devices/0000:01:00.0/numa_node
*** ls: -rw-r--r-- 1 root root 4096 2023-03-08 21:54:55.862015759 +0100 /sys/bus/pci/devices/0000:01:00.0/numa_node
-1
____________________________________________
*** /proc/driver/nvidia/./version
*** ls: -r--r--r-- 1 root root 0 2023-03-08 22:17:25.667945453 +0100 /proc/driver/nvidia/./version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 525.89.02 Wed Feb 1 23:23:25 UTC 2023
GCC version: gcc version 11.3.0 (Debian 11.3.0-8)
____________________________________________
*** /proc/driver/nvidia/./gpus/0000:01:00.0/information
*** ls: -r--r--r-- 1 root root 0 2023-03-08 22:17:41.954016177 +0100 /proc/driver/nvidia/./gpus/0000:01:00.0/information
Model: Quadro M2200
IRQ: 140
GPU UUID: GPU-????????-????-????-????-????????????
Video BIOS: ??.??.??.??.??
Bus Type: PCIe
DMA Size: 40 bits
DMA Mask: 0xffffffffff
Bus Location: 0000:01:00.0
Device Minor: 0
GPU Excluded: No
____________________________________________
*** /proc/driver/nvidia/./gpus/0000:01:00.0/registry
*** ls: -rw-r--r-- 1 root root 0 2023-03-08 22:17:42.072016657 +0100 /proc/driver/nvidia/./gpus/0000:01:00.0/registry
Binary: ""
____________________________________________
*** /proc/driver/nvidia/./params
*** ls: -r--r--r-- 1 root root 0 2023-03-08 21:55:09.912015263 +0100 /proc/driver/nvidia/./params
ResmanDebugLevel: 4294967295
RmLogonRC: 1
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 0
DeviceFileMode: 438
InitializeSystemMemoryAllocations: 1
UsePageAttributeTable: 4294967295
EnableMSI: 1
EnablePCIeGen3: 0
MemoryPoolSize: 0
KMallocHeapMaxSize: 0
VMallocHeapMaxSize: 0
IgnoreMMIOCheck: 0
TCEBypassMode: 0
EnableStreamMemOPs: 0
EnableUserNUMAManagement: 1
NvLinkDisable: 0
RmProfilingAdminOnly: 1
PreserveVideoMemoryAllocations: 0
EnableS0ixPowerManagement: 0
S0ixPowerManagementVideoMemoryThreshold: 256
DynamicPowerManagement: 3
DynamicPowerManagementVideoMemoryThreshold: 200
RegisterPCIDriver: 1
EnablePCIERelaxedOrderingMode: 0
EnableGpuFirmware: 18
EnableGpuFirmwareLogs: 2
EnableDbgBreakpoint: 0
OpenRmEnableUnsupportedGpus: 0
DmaRemapPeerMmio: 1
RegistryDwords: ""
RegistryDwordsPerDevice: ""
RmMsg: ""
GpuBlacklist: ""
TemporaryFilePath: ""
ExcludedGpus: ""
____________________________________________
*** /proc/driver/nvidia/./registry
*** ls: -rw-r--r-- 1 root root 0 2023-03-08 22:17:42.076016674 +0100 /proc/driver/nvidia/./registry
Binary: ""
In the event it is helpful, I can try to provide more complete
information as gathered by reportbug, which however would be a bit
burdensome since I just reverted my machine to 510.108.03-1 to restore
functionality. I also have access to a full nvidia-bug-report.log.gz gathered in the broken configuration, but wasn't sure if the bug tracker supports attachments.