Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1056384: nvidia-driver: X server shuts down and doesn't return during upgrade.

51 views
Skip to first unread message

Adam Dane

unread,
Nov 22, 2023, 12:50:06 AM11/22/23
to
Package: nvidia-driver
Version: 525.147.05-1
Severity: normal

Dear Maintainer,

During upgrades as of 525.125.06-3, systemd tries to restart
various services: suspend, hibernate, and resume. When it does, the
X server stops functioning, and I cannot switch to a tty or otherwise
revive it.

When this occurred, I used sysrq to reboot to recovery and ran
dpkg-configure -a to finish the thwarted upgrade. It worked fine after
that.

This bug seems to be the same as:

https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-535/+bug/2025640

The relevant lines from the journal:
ov 21 20:07:14 square systemd[1]: Reloading requested from client PID 33443 ('systemctl') (unit us...@1000.service)...
Nov 21 20:07:14 square systemd[1]: Reloading...
Nov 21 20:07:15 square systemd[1]: Reloading finished in 400 ms.
Nov 21 20:07:15 square systemd[1]: Starting nvidia-hibernate.service - NVIDIA system hibernate actions...
Nov 21 20:07:15 square hibernate[33591]: nvidia-hibernate.service
Nov 21 20:07:15 square logger[33591]: <13>Nov 21 20:07:15 hibernate: nvidia-hibernate.service
Nov 21 20:07:15 square systemd[1]: Starting nvidia-resume.service - NVIDIA system resume actions...
Nov 21 20:07:15 square systemd[1]: Starting nvidia-suspend.service - NVIDIA system suspend actions...
Nov 21 20:07:15 square suspend[33592]: nvidia-resume.service
Nov 21 20:07:15 square logger[33592]: <13>Nov 21 20:07:15 suspend: nvidia-resume.service
Nov 21 20:07:15 square suspend[33595]: nvidia-suspend.service
Nov 21 20:07:15 square logger[33595]: <13>Nov 21 20:07:15 suspend: nvidia-suspend.service
Nov 21 20:07:15 square systemd-udevd[392]: libkmod: ERROR ../libkmod/libkmod-config.c:772 conf_files_filter_out: Directories inside directories are not supported: /etc/modprobe.d/nvidia.conf
Nov 21 20:07:15 square systemd[1]: nvidia-resume.service: Deactivated successfully.
Nov 21 20:07:15 square systemd[1]: Finished nvidia-resume.service - NVIDIA system resume actions.
[gdm-x-session runs, but the GUI does not return.]
Nov 21 20:07:15 square nvidia-sleep.sh[33593]: /usr/bin/nvidia-sleep.sh: line 20: echo: write error: Input/output error
Nov 21 20:07:15 square systemd[1]: nvidia-hibernate.service: Main process exited, code=exited, status=1/FAILURE
Nov 21 20:07:15 square systemd[1]: nvidia-hibernate.service: Failed with result 'exit-code'.
Nov 21 20:07:15 square systemd[1]: Failed to start nvidia-hibernate.service - NVIDIA system hibernate actions.
Nov 21 20:07:15 square systemd[1]: nvidia-suspend.service: Deactivated successfully.
Nov 21 20:07:15 square systemd[1]: Finished nvidia-suspend.service - NVIDIA system suspend actions.
Nov 21 20:07:15 square systemd[1]: Reloading requested from client PID 33649 ('systemctl') (unit us...@1000.service)...
Nov 21 20:07:15 square systemd[1]: Reloading...
Nov 21 20:07:16 square systemd[1]: Reloading finished in 372 ms.
Nov 21 20:07:16 square systemd[1]: Reloading requested from client PID 33788 ('systemctl') (unit us...@1000.service)...
Nov 21 20:07:16 square systemd[1]: Reloading...
Nov 21 20:07:16 square systemd[1]: Reloading finished in 323 ms.
[After this there's some logging about trying to bring up gdm, but it
never happened. I waited about seven minutes before using sysrq.]
^^ End of relevant journalctl log ^^

Thanks,

Adam

-- System Information:
Debian Release: trixie/sid
APT prefers unstable
APT policy: (990, 'unstable'), (500, 'testing'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 6.5.0-4-amd64 (SMP w/8 CPU threads; PREEMPT)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages nvidia-driver depends on:
ii nvidia-alternative 525.147.05-1
ii nvidia-driver-bin 525.147.05-1
ii nvidia-driver-libs 525.147.05-1
ii nvidia-installer-cleanup 20220217+3
ii nvidia-kernel-dkms [nvidia-kernel-525.147.05] 525.147.05-1
ii nvidia-legacy-check 525.147.05-1
ii nvidia-support 20220217+3
ii nvidia-vdpau-driver 525.147.05-1
ii xserver-xorg-video-nvidia 525.147.05-1

Versions of packages nvidia-driver recommends:
ii libnvidia-cfg1 525.147.05-1
ii nvidia-persistenced 525.85.05-1
ii nvidia-settings 525.125.06-1

Versions of packages nvidia-driver suggests:
ii nvidia-kernel-dkms 525.147.05-1

Versions of packages nvidia-driver-libs:amd64 depends on:
ii libgl1-nvidia-glvnd-glx 525.147.05-1
ii nvidia-egl-icd 525.147.05-1

Versions of packages nvidia-driver-libs:amd64 recommends:
ii libgles-nvidia1 525.147.05-1
ii libgles-nvidia2 525.147.05-1
ii libglx-nvidia0 525.147.05-1
pn libnvidia-allocator1 <none>
ii libnvidia-cfg1 525.147.05-1
ii libnvidia-encode1 525.147.05-1
ii libopengl0 1.7.0-1
ii nvidia-driver-libs 525.147.05-1
ii nvidia-vulkan-icd 525.147.05-1

Versions of packages nvidia-driver-libs:i386 depends on:
ii libgl1-nvidia-glvnd-glx 525.147.05-1
ii nvidia-egl-icd 525.147.05-1

Versions of packages nvidia-driver-libs:i386 recommends:
ii libgles-nvidia1 525.147.05-1
ii libgles-nvidia2 525.147.05-1
ii libglx-nvidia0 525.147.05-1
pn libnvidia-allocator1 <none>
ii libnvidia-encode1 525.147.05-1
ii libopengl0 1.7.0-1
ii nvidia-vulkan-icd 525.147.05-1

Versions of packages xserver-xorg-video-nvidia depends on:
ii libc6 2.37-12
ii libnvidia-glcore 525.147.05-1
ii nvidia-alternative 525.147.05-1
ii nvidia-installer-cleanup 20220217+3
ii nvidia-legacy-check 525.147.05-1
ii nvidia-support 20220217+3
ii xserver-xorg-core [xorg-video-abi-25] 2:21.1.9-1

Versions of packages xserver-xorg-video-nvidia recommends:
ii nvidia-kernel-dkms [nvidia-kernel-525.147.05] 525.147.05-1
ii nvidia-settings 525.125.06-1
ii nvidia-suspend-common 525.147.05-1
ii nvidia-vdpau-driver 525.147.05-1

Versions of packages xserver-xorg-video-nvidia suggests:
ii nvidia-kernel-dkms 525.147.05-1

Versions of packages nvidia-alternative depends on:
ii dpkg 1.22.1
ii glx-alternative-nvidia 1.2.2
ii nvidia-legacy-check 525.147.05-1

Versions of packages nvidia-kernel-dkms depends on:
ii dkms 3.0.11-3
ii firmware-nvidia-gsp [firmware-nvidia-gsp-525.147.05] 525.147.05-1
ii nvidia-installer-cleanup 20220217+3
ii nvidia-kernel-support [nvidia-kernel-support--v1] 525.147.05-1

nvidia-kernel-dkms recommends no packages.

Versions of packages glx-alternative-nvidia depends on:
ii dpkg 1.22.1
ii glx-alternative-mesa 1.2.2
ii glx-diversions 1.2.2
ii update-glx 1.2.2

glx-alternative-nvidia suggests no packages.

Versions of packages xserver-xorg-video-intel depends on:
ii libc6 2.37-12
ii libdrm-intel1 2.4.117-1
ii libdrm2 2.4.117-1
ii libpciaccess0 0.17-3
ii libpixman-1-0 0.42.2-1
ii libudev1 255~rc2-3
ii libx11-6 2:1.8.7-1
ii libx11-xcb1 2:1.8.7-1
ii libxcb-dri2-0 1.15-1
ii libxcb-util1 0.4.0-1+b1
ii libxcb1 1.15-1
ii libxcursor1 1:1.2.1-1
ii libxdamage1 1:1.1.6-1
ii libxext6 2:1.3.4-1+b1
ii libxfixes3 1:6.0.0-2
ii libxinerama1 2:1.1.4-3
ii libxrandr2 2:1.5.2-2+b1
ii libxrender1 1:0.9.10-1.1
ii libxss1 1:1.2.3-1
ii libxtst6 2:1.2.3-1.1
ii libxvmc1 2:1.0.12-2
ii xserver-xorg-core [xorg-video-abi-25] 2:21.1.9-1

Versions of packages nvidia-driver is related to:
pn bumblebee <none>
pn bumblebee-nvidia <none>
ii ccache 4.8.3-1
pn libcuda.so.1 <none>
ii libcuda1 [libcuda1-any] 525.147.05-1
pn libdrm-nouveau1 <none>
pn libdrm-nouveau1a <none>
ii libdrm-nouveau2 2.4.117-1
ii libegl1 1.7.0-1
ii libgl1 1.7.0-1
ii libgl1-nvidia-glvnd-glx [libgl1-nvidia-glx-any] 525.147.05-1
ii libgles1 1.7.0-1
ii libgles2 1.7.0-1
ii libglvnd0 1.7.0-1
ii libglx0 1.7.0-1
ii libnvidia-cfg1 [libnvidia-cfg1-any] 525.147.05-1
ii libnvidia-ml1 [libnvidia-ml.so.1] 525.147.05-1
pn libopencl0 <none>
pn libprimus-vk1 <none>
ii libvulkan1 1.3.268.0-1
pn linux-headers <none>
ii make 4.3-4.1
ii mesa-vulkan-drivers [vulkan-icd] 23.2.1-1
ii nvidia-driver [nvidia-glx-any] 525.147.05-1
pn nvidia-driver-any <none>
ii nvidia-driver-libs [nvidia-driver-libs-any] 525.147.05-1
pn nvidia-glx <none>
ii nvidia-kernel-common 20220217+3
ii nvidia-kernel-dkms [nvidia-kernel-dkms-any] 525.147.05-1
pn nvidia-kernel-source <none>
ii nvidia-kernel-support [nvidia-kernel-support-any] 525.147.05-1
ii nvidia-modprobe 535.54.03-1
pn nvidia-open-kernel-dkms <none>
pn nvidia-open-kernel-dkms-any <none>
pn nvidia-open-kernel-source <none>
ii nvidia-opencl-icd [opencl-icd] 525.147.05-1
pn nvidia-primus-vk-wrapper <none>
ii nvidia-settings 525.125.06-1
ii nvidia-support 20220217+3
ii nvidia-vulkan-icd [vulkan-icd] 525.147.05-1
pn nvidia-vulkan-icd-any <none>
ii nvidia-xconfig 525.85.05-1
ii ocl-icd-libopencl1 [libopencl1] 2.3.2-1
pn primus <none>
pn primus-libs <none>
pn primus-nvidia <none>
pn primus-vk <none>
pn primus-vk-nvidia <none>
ii xserver-xorg 1:7.7+23
ii xserver-xorg-core 2:21.1.9-1
ii xserver-xorg-legacy 2:21.1.9-1
ii xserver-xorg-video-nouveau 1:1.0.17-2
ii xserver-xorg-video-nvidia [xserver-xorg-video-nvidia-any] 525.147.05-1

-- debconf information excluded

Andreas Beckmann

unread,
Nov 22, 2023, 4:40:06 AM11/22/23
to
Control: reassign -1 nvidia-suspend-common 525.125.06-3

On 22/11/2023 05.45, Adam Dane wrote:
> Package: nvidia-driver
> Version: 525.147.05-1

> During upgrades as of 525.125.06-3, systemd tries to restart
> various services: suspend, hibernate, and resume. When it does, the
> X server stops functioning, and I cannot switch to a tty or otherwise
> revive it.

The nvidia-suspend-common package is a new feature which hasn't seen
much testing, yet.
And that has a possible solution, trying to apply that to Debian, too.


Then we have another possible bug in src:kmod:

> Nov 21 20:07:15 square systemd-udevd[392]: libkmod: ERROR ../libkmod/libkmod-config.c:772 conf_files_filter_out: Directories inside directories are not supported: /etc/modprobe.d/nvidia.conf

Can you confirm that /etc/modprobe.d/nvidia.conf is a symlink to a
readable file?


Thanks.


Andreas

Adam Dane

unread,
Nov 22, 2023, 9:50:04 PM11/22/23
to
Looks like the change didn't suppress restarting the daemons.

The upgrade finished cleanly before I rebooted this time, so I had the
term.log from apt:

[Other packages]
Setting up nvidia-suspend-common (525.147.05-2) ...
Could not execute systemctl: at /usr/bin/deb-systemd-invoke line 145.
[Other packages]

^^ from term.log ^^

The journal looks about the same:

Nov 22 20:08:52 square systemd[1]: Reloading requested from client PID
42821 ('systemctl') (unit us...@1000.service)...
Nov 22 20:08:52 square systemd[1]: Reloading...
Nov 22 20:08:52 square systemd[1]: Reloading finished in 310 ms.
Nov 22 20:08:52 square systemd[1]: Starting nvidia-hibernate.service -
NVIDIA system hibernate actions...
Nov 22 20:08:52 square hibernate[42969]: nvidia-hibernate.service
Nov 22 20:08:52 square logger[42969]: <13>Nov 22 20:08:52 hibernate:
nvidia-hibernate.service
Nov 22 20:08:52 square systemd[1]: Starting nvidia-resume.service -
NVIDIA system resume actions...
Nov 22 20:08:52 square systemd[1]: Starting nvidia-suspend.service -
NVIDIA system suspend actions...
Nov 22 20:08:52 square suspend[42970]: nvidia-resume.service
Nov 22 20:08:52 square logger[42970]: <13>Nov 22 20:08:52 suspend:
nvidia-resume.service
Nov 22 20:08:52 square suspend[42973]: nvidia-suspend.service
Nov 22 20:08:52 square logger[42973]: <13>Nov 22 20:08:52 suspend:
nvidia-suspend.service
Nov 22 20:08:52 square systemd[1]: nvidia-resume.service: Deactivated
successfully.
Nov 22 20:08:52 square systemd[1]: Finished nvidia-resume.service -
NVIDIA system resume actions.
[gdm-x-session tries to start.]
Nov 22 20:08:53 square nvidia-sleep.sh[42971]: /usr/bin/nvidia-sleep.sh:
line 20: echo: write error: Input/output error
Nov 22 20:08:53 square systemd[1]: nvidia-hibernate.service: Main
process exited, code=exited, status=1/FAILURE
Nov 22 20:08:53 square systemd[1]: nvidia-hibernate.service: Failed with
result 'exit-code'.
Nov 22 20:08:53 square systemd[1]: Failed to start
nvidia-hibernate.service - NVIDIA system hibernate actions.
Nov 22 20:08:53 square systemd[1]: nvidia-suspend.service: Deactivated
successfully.
Nov 22 20:08:53 square systemd[1]: Finished nvidia-suspend.service -
NVIDIA system suspend actions.

^^ End of relevant journalctl log ^^

Thank you,

Adam

Marcello Perathoner

unread,
Nov 23, 2023, 4:30:05 AM11/23/23
to
Package: nvidia-suspend-common
Version: 525.147.05-2

The second night in a row that unattended-upgrades hosed my system:
screens blank and fan spinning madly. Recovery was possible only with
SysReq reisub, and that still left the filesystem in a somewhat unclean
state.

Relevant logs:

2023-11-23T06:28:55.259601+01:00 dylan systemd[1]: Starting
nvidia-hibernate.service - NVIDIA system hibernate actions...
2023-11-23T06:28:55.261933+01:00 dylan hibernate: nvidia-hibernate.service
2023-11-23T06:28:55.262045+01:00 dylan logger[623712]: <13>Nov 23
06:28:55 hibernate: nvidia-hibernate.service
2023-11-23T06:28:55.262720+01:00 dylan systemd[1]: Starting
nvidia-resume.service - NVIDIA system resume actions...
2023-11-23T06:28:55.269538+01:00 dylan systemd[1]: Starting
nvidia-suspend.service - NVIDIA system suspend actions...
2023-11-23T06:28:55.273485+01:00 dylan suspend: nvidia-resume.service
2023-11-23T06:28:55.273622+01:00 dylan logger[623716]: <13>Nov 23
06:28:55 suspend: nvidia-resume.service
2023-11-23T06:28:55.279771+01:00 dylan suspend: nvidia-suspend.service
2023-11-23T06:28:55.279892+01:00 dylan logger[623719]: <13>Nov 23
06:28:55 suspend: nvidia-suspend.service
2023-11-23T06:28:55.292176+01:00 dylan systemd[1]:
nvidia-resume.service: Deactivated successfully.
2023-11-23T06:28:55.292460+01:00 dylan systemd[1]: Finished
nvidia-resume.service - NVIDIA system resume actions.
2023-11-23T06:28:56.523975+01:00 dylan nvidia-sleep.sh[623726]:
/usr/bin/nvidia-sleep.sh: line 20: echo: write error: Input/output error
2023-11-23T06:28:56.524518+01:00 dylan systemd[1]:
nvidia-suspend.service: Main process exited, code=exited, status=1/FAILURE
2023-11-23T06:28:56.524629+01:00 dylan systemd[1]:
nvidia-suspend.service: Failed with result 'exit-code'.
2023-11-23T06:28:56.524962+01:00 dylan systemd[1]: Failed to start
nvidia-suspend.service - NVIDIA system suspend actions.
2023-11-23T06:28:56.526194+01:00 dylan systemd[1]:
nvidia-hibernate.service: Deactivated successfully.
2023-11-23T06:28:56.526436+01:00 dylan systemd[1]: Finished
nvidia-hibernate.service - NVIDIA system hibernate actions.


<<<<<<<<<< /etc/modprobe.d/nvidia-options.conf >>>>>>>>>>
#options nvidia-current NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=44
NVreg_DeviceFileMode=0660

# To grant performance counter access to unprivileged users, uncomment
the following line:
#options nvidia-current NVreg_RestrictProfilingToAdminUsers=0

# Uncomment to enable this power management feature:
#options nvidia-current NVreg_PreserveVideoMemoryAllocations=1

# Uncomment to enable this power management feature:
#options nvidia-current NVreg_EnableS0ixPowerManagement=1
^^^^^^^^^^ /etc/modprobe.d/nvidia-options.conf ^^^^^^^^^^


mfg

--
Marcello Perathoner
marc...@perathoner.de

Andreas Beckmann

unread,
Nov 25, 2023, 7:40:05 AM11/25/23
to
On 23/11/2023 03.39, Adam Dane wrote:
> Looks like the change didn't suppress restarting the daemons.

It prevented restarting, but not starting ... something we shouldn't do
with suspend/hibernate/resume related things without the corresponding
trigger. Especially we should not start all three at the same time.

May I take the lack of comment on nvidia-suspend-common 525.147.05-3
that it now behaves normally? ;-)


Andreas

PS: It would be great if someone could confirm that also suspend/resume
and hibernate/resume cycles work properly with these new services enabled.

Marcello Perathoner

unread,
Nov 25, 2023, 8:00:05 AM11/25/23
to
I did the upgrade to 525.147.05-3 manually this time and it went well.

Adam Dane

unread,
Nov 26, 2023, 12:40:05 AM11/26/23
to
On 2023-11-25 06:31, Andreas Beckmann wrote:
> May I take the lack of comment on nvidia-suspend-common 525.147.05-3
> that it now behaves normally? ;-)

Yes, all is well.

> PS: It would be great if someone could confirm that also suspend/resume
> and hibernate/resume cycles work properly with these new services
> enabled.

I just tried both suspend and hibernate, and both worked correctly.

Thanks,

Adam
0 new messages