Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1032563: nvidia-driver 525.89.02 fails to initialize with Quadro M2200M

244 views
Skip to first unread message

Matvey Soloviev

unread,
Mar 8, 2023, 6:30:06 PM3/8/23
to
Package: nvidia-driver
Version: 525.89.02-1

Nvidia drivers newer than the 510 series fail to load on my system, which is a Lenovo Thinkpad P51 with a Quadro M2200 GPU, with BIOS 1.60 and ECP 1.10. I have encountered this bug with driver versions 515, 520 and now the 525 that landed in testing, as well as with a version of 525 installed using nvidia's official installer, and kernels including 6.0.7 and 6.2.2 from xanmod and 6.1.0-5-amd64 from Debian's official repository. My system is a mixture of packages from stable and testing, with libc6=2.36-7. Driver version 510.108.03-1 works (but is unstable in sleep and broken in hibernation).

Below is an excerpt from journalctl's output including what appears to be potentially pertinent clusters of lines to me. All logs are from a boot on the xanmod 6.2.2 kernel, but there is no appreciable difference in the relevant outputs when running with Debian's 6.1.0-5. The operational failure points seem to be the ones pertaining to RmInitAdapter and NvKmsKapiDevice, but I'm not sure what, if any, causality there is between the two issues.

Mar 08 21:28:46 tangerine kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.2.2-x64v1-xanmod1 root=(***) ro quiet mitigations=off psi=1 nvidia-drm.modeset=1
(...)
Mar 08 21:28:46 tangerine kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Mar 08 21:28:46 tangerine kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 235
Mar 08 21:28:46 tangerine kernel:
Mar 08 21:28:46 tangerine kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
Mar 08 21:28:46 tangerine kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  525.89.02  Wed Feb  1 23:23:25 UTC 2023
Mar 08 21:28:46 tangerine kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  525.89.02  Wed Feb  1 23:09:40 UTC 2023
Mar 08 21:28:46 tangerine systemd[1]: Finished Rebuild Hardware Database.
Mar 08 21:28:46 tangerine systemd[1]: Starting Rule-based Manager for Device Events and Files...
Mar 08 21:28:46 tangerine kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Mar 08 21:28:46 tangerine kernel: ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20221020/nsarguments-61)
Mar 08 21:28:46 tangerine systemd[1]: Started Rule-based Manager for Device Events and Files.
Mar 08 21:28:46 tangerine systemd[1]: Starting Show Plymouth Boot Screen...
(...)
Mar 08 21:28:55 tangerine kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x65:1457)
Mar 08 21:28:55 tangerine kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Mar 08 21:28:55 tangerine kernel: [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
Mar 08 21:28:55 tangerine kernel: [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
Mar 08 21:28:55 tangerine systemd-modules-load[306]: Inserted module 'nvidia_drm'

Some possibly pertinent information from nvidia-bug-report.log.gz:
____________________________________________

*** /sys/bus/pci/devices/0000:01:00.0/power/control
*** ls: -rw-r--r-- 1 root root 4096 2023-03-08 22:17:25.703945619 +0100 /sys/bus/pci/devices/0000:01:00.0/power/control
on

____________________________________________

*** /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:25.711945656 +0100 /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
active

____________________________________________

*** /sys/bus/pci/devices/0000:01:00.0/power/runtime_usage
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:25.715945675 +0100 /sys/bus/pci/devices/0000:01:00.0/power/runtime_usage
3

____________________________________________

*** /sys/bus/pci/devices/0000:01:00.1/power/control
*** ls: -rw-r--r-- 1 root root 4096 2023-03-08 22:17:25.749945832 +0100 /sys/bus/pci/devices/0000:01:00.1/power/control
auto

____________________________________________

*** /sys/bus/pci/devices/0000:01:00.1/power/runtime_status
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:25.761945887 +0100 /sys/bus/pci/devices/0000:01:00.1/power/runtime_status
suspended

____________________________________________

*** /sys/bus/pci/devices/0000:01:00.1/power/runtime_usage
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:25.807946100 +0100 /sys/bus/pci/devices/0000:01:00.1/power/runtime_usage
0

____________________________________________

*** /proc/driver/nvidia/./gpus/0000:01:00.0/power
*** ls: -r--r--r-- 1 root root 0 2023-03-08 22:17:25.819946156 +0100 /proc/driver/nvidia/./gpus/0000:01:00.0/power
Runtime D3 status:          ?
Video Memory:               ?

GPU Hardware Support:
 Video Memory Self Refresh: ?
 Video Memory Off:          ?
____________________________________________

/usr/bin/lspci -d "10de:*" -v -xxx

01:00.0 VGA compatible controller: NVIDIA Corporation GM206GLM [Quadro M2200 Mobile] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Lenovo GM206GLM [Quadro M2200 Mobile]
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at eb000000 (32-bit, non-prefetchable) [size=16M]
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d0000000 (64-bit, prefetchable) [size=32M]
I/O ports at d000 [size=128]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Kernel driver in use: nvidia
Kernel modules: nvidia
00: de 10 36 14 07 00 10 00 a1 00 00 03 00 00 80 00
10: 00 00 00 eb 0c 00 00 c0 00 00 00 00 0c 00 00 d0
20: 00 00 00 00 01 d0 00 00 00 00 00 00 aa 17 51 22
30: 00 00 00 00 60 00 00 00 00 00 00 00 0a 01 00 00
40: aa 17 51 22 00 00 00 00 00 00 00 00 00 00 00 00
50: 01 00 00 00 01 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 03 00 08 00 00 00 05 78 80 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 10 00 02 00 e1 8d 2c 01
80: 30 21 00 00 03 3d 45 00 40 01 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 08 04 00
a0: 00 04 00 00 0e 00 00 00 03 00 1f 00 00 00 00 00
b0: 00 00 00 00 09 00 14 01 00 00 10 80 00 00 00 00
c0: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

01:00.1 Audio device: NVIDIA Corporation GM206 High Definition Audio Controller (rev a1)
Flags: bus master, fast devsel, latency 0, IRQ 17
Memory at ec000000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
00: de 10 ba 0f 06 00 10 00 a1 00 03 04 00 00 80 00
10: 00 00 00 ec 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 ff 02 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 03 00 0b 00 00 00 05 78 80 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 10 00 02 00 e1 8d 2c 01
80: 30 29 09 00 03 3d 45 00 43 00 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 08 04 00
a0: 00 00 00 00 0e 00 00 00 00 00 01 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


____________________________________________

/usr/bin/lspci -d "10b5:*" -v -xxx


____________________________________________

/usr/bin/lspci -t

-[0000:00]-+-00.0
           +-01.0-[01]--+-00.0
           |            \-00.1
           +-08.0
           +-14.0
           +-14.2
           +-15.0
           +-16.0
           +-16.3
           +-17.0
           +-1c.0-[03]--
           +-1c.2-[04]----00.0
           +-1c.4-[05-3d]--
           +-1d.0-[3e]----00.0
           +-1d.4-[3f]----00.0
           +-1f.0
           +-1f.2
           +-1f.3
           +-1f.4
           \-1f.6

____________________________________________

/usr/bin/lspci -nn

00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [8086:5918] (rev 05)
00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)
00:08.0 System peripheral [0880]: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model [8086:1911]
00:14.0 USB controller [0c03]: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller [8086:a12f] (rev 31)
00:14.2 Signal processing controller [1180]: Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem [8086:a131] (rev 31)
00:15.0 Signal processing controller [1180]: Intel Corporation 100 Series/C230 Series Chipset Family Serial IO I2C Controller #0 [8086:a160] (rev 31)
00:16.0 Communication controller [0780]: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 [8086:a13a] (rev 31)
00:16.3 Serial controller [0700]: Intel Corporation 100 Series/C230 Series Chipset Family KT Redirection [8086:a13d] (rev 31)
00:17.0 SATA controller [0106]: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] [8086:a102] (rev 31)
00:1c.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #1 [8086:a110] (rev f1)
00:1c.2 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #3 [8086:a112] (rev f1)
00:1c.4 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #5 [8086:a114] (rev f1)
00:1d.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 [8086:a118] (rev f1)
00:1d.4 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #13 [8086:a11c] (rev f1)
00:1f.0 ISA bridge [0601]: Intel Corporation CM238 Chipset LPC/eSPI Controller [8086:a154] (rev 31)
00:1f.2 Memory controller [0580]: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller [8086:a121] (rev 31)
00:1f.3 Audio device [0403]: Intel Corporation CM238 HD Audio Controller [8086:a171] (rev 31)
00:1f.4 SMBus [0c05]: Intel Corporation 100 Series/C230 Series Chipset Family SMBus [8086:a123] (rev 31)
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (5) I219-LM [8086:15e3] (rev 31)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM206GLM [Quadro M2200 Mobile] [10de:1436] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GM206 High Definition Audio Controller [10de:0fba] (rev a1)
04:00.0 Network controller [0280]: Intel Corporation Wireless 8265 / 8275 [8086:24fd] (rev 78)
3e:00.0 Non-Volatile memory controller [0108]: Lenovo Device [17aa:0004]
3f:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader [10ec:525a] (rev 01)
____________________________________________

____________________________________________
*** /sys/devices/system/node/has_cpu
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:28.447958149 +0100 /sys/devices/system/node/has_cpu
0

____________________________________________
*** /sys/devices/system/node/has_memory
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:28.449958159 +0100 /sys/devices/system/node/has_memory
0

____________________________________________
*** /sys/devices/system/node/has_normal_memory
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:28.449958159 +0100 /sys/devices/system/node/has_normal_memory
0

____________________________________________
*** /sys/devices/system/node/online
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:28.451958167 +0100 /sys/devices/system/node/online
0

____________________________________________
*** /sys/devices/system/node/possible
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:28.453958177 +0100 /sys/devices/system/node/possible
0

____________________________________________
*** /sys/bus/pci/devices/0000:01:00.0/local_cpulist
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 21:54:55.862015759 +0100 /sys/bus/pci/devices/0000:01:00.0/local_cpulist
0-7

____________________________________________
*** /sys/bus/pci/devices/0000:01:00.0/numa_node
*** ls: -rw-r--r-- 1 root root 4096 2023-03-08 21:54:55.862015759 +0100 /sys/bus/pci/devices/0000:01:00.0/numa_node
-1
____________________________________________

*** /proc/driver/nvidia/./version
*** ls: -r--r--r-- 1 root root 0 2023-03-08 22:17:25.667945453 +0100 /proc/driver/nvidia/./version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  525.89.02  Wed Feb  1 23:23:25 UTC 2023
GCC version:  gcc version 11.3.0 (Debian 11.3.0-8)

____________________________________________

*** /proc/driver/nvidia/./gpus/0000:01:00.0/information
*** ls: -r--r--r-- 1 root root 0 2023-03-08 22:17:41.954016177 +0100 /proc/driver/nvidia/./gpus/0000:01:00.0/information
Model: Quadro M2200
IRQ:   140
GPU UUID: GPU-????????-????-????-????-????????????
Video BIOS: ??.??.??.??.??
Bus Type: PCIe
DMA Size: 40 bits
DMA Mask: 0xffffffffff
Bus Location: 0000:01:00.0
Device Minor: 0
GPU Excluded: No

____________________________________________

*** /proc/driver/nvidia/./gpus/0000:01:00.0/registry
*** ls: -rw-r--r-- 1 root root 0 2023-03-08 22:17:42.072016657 +0100 /proc/driver/nvidia/./gpus/0000:01:00.0/registry
Binary: ""

____________________________________________

*** /proc/driver/nvidia/./params
*** ls: -r--r--r-- 1 root root 0 2023-03-08 21:55:09.912015263 +0100 /proc/driver/nvidia/./params
ResmanDebugLevel: 4294967295
RmLogonRC: 1
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 0
DeviceFileMode: 438
InitializeSystemMemoryAllocations: 1
UsePageAttributeTable: 4294967295
EnableMSI: 1
EnablePCIeGen3: 0
MemoryPoolSize: 0
KMallocHeapMaxSize: 0
VMallocHeapMaxSize: 0
IgnoreMMIOCheck: 0
TCEBypassMode: 0
EnableStreamMemOPs: 0
EnableUserNUMAManagement: 1
NvLinkDisable: 0
RmProfilingAdminOnly: 1
PreserveVideoMemoryAllocations: 0
EnableS0ixPowerManagement: 0
S0ixPowerManagementVideoMemoryThreshold: 256
DynamicPowerManagement: 3
DynamicPowerManagementVideoMemoryThreshold: 200
RegisterPCIDriver: 1
EnablePCIERelaxedOrderingMode: 0
EnableGpuFirmware: 18
EnableGpuFirmwareLogs: 2
EnableDbgBreakpoint: 0
OpenRmEnableUnsupportedGpus: 0
DmaRemapPeerMmio: 1
RegistryDwords: ""
RegistryDwordsPerDevice: ""
RmMsg: ""
GpuBlacklist: ""
TemporaryFilePath: ""
ExcludedGpus: ""

____________________________________________

*** /proc/driver/nvidia/./registry
*** ls: -rw-r--r-- 1 root root 0 2023-03-08 22:17:42.076016674 +0100 /proc/driver/nvidia/./registry
Binary: ""



In the event it is helpful, I can try to provide more complete information as gathered by reportbug, which however would be a bit burdensome since I just reverted my machine to 510.108.03-1 to restore functionality. I also have access to a full nvidia-bug-report.log.gz gathered in the broken configuration, but wasn't sure if the bug tracker supports attachments.

Andreas Beckmann

unread,
Mar 21, 2023, 9:40:07 AM3/21/23
to
Control: tag -1 moreinfo upstream

On 09/03/2023 00.20, Matvey Soloviev wrote:
> Package: nvidia-driver
> Version: 525.89.02-1
>
> Nvidia drivers newer than the 510 series fail to load on my system, which
> is a Lenovo Thinkpad P51 with a Quadro M2200 GPU, with BIOS 1.60 and ECP
> 1.10. I have encountered this bug with driver versions 515, 520 and now the
> 525 that landed in testing, as well as with a version of 525 installed
> using nvidia's official installer, and kernels including 6.0.7 and 6.2.2
> from xanmod and 6.1.0-5-amd64 from Debian's official repository. My system
> is a mixture of packages from stable and testing, with libc6=2.36-7. Driver
> version 510.108.03-1 works (but is unstable in sleep and broken in
> hibernation).

Do you have by chance installed nvidia-open-kernel-dkms instead of
nvidia-kernel-dkms? In that case, please switch to the proprietary
module (nvidia-kernel-dkms).

You could also try with the 530 driver from experimental.

You could also try the tesla-470 driver.


Andreas

Matvey Soloviev

unread,
Mar 21, 2023, 12:10:04 PM3/21/23
to
I did not have nvidia-open-kernel-dkms installed when performing those experiments (and never seem to have had it installed on my system, as there is no rc record in dpkg). I believe that I also tested it with the 530 driver at the time, with the same results.

Do you expect different results with the tesla-470 driver? I'm fairly sure that the 470 series of the standard driver worked for me, as 510.108.03-1, which I am using right now, does (though I'm not sure if hibernation was already broken in 470), but the lack of some features that were added between then and 510 seems likely to cause compatibility issues at this point.

Best wishes,
Matvey
0 new messages