As requested, by Salvatore lowering prio and avoiding embargo.
-----
Hello, happy new year, and thanks.
This looks like an apt deadlock, which prevents updates, unattended upgrades, and so critical security updates
for systems where they are enabled.
(Yes, we can just manually kill the offending apt_info.py process to temporarily solve the issue - but this is not the good solution).
As it prevents security updates, and despite it unlikely to happen massively, and be practically exploited, I feel this requires real attention.
Symptoms:
Persistent apt update locking error:
# apt update
Reading package lists... Done
E: Could not get lock /var/lib/apt/lists/lock. It is held by process 65553 (python3)
N: Be aware that removing the lock file is not a solution and may break your system.
E: Unable to lock directory /var/lib/apt/lists/
# 1 hour later, same issue, same holding PID 65553
# Concerned processes:
# ps aux |grep pyth
root 1259 0.0 0.1 121076 27528 ? Ssl Jan06 0:00 /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgra>
root 65553 0.0 0.4 89640 76908 ? S 12:09 0:03 python3 /usr/share/prometheus-node-exporter-collectors/apt_info.py
ee 70395 0.0 0.2 124164 42844 ? Sl 12:35 0:00 /bin/python3.11 /home/ee/.vscode-oss/extensions/ms-python.python> (not suspected)
# ps aux |grep apt
root 65551 0.0 0.0 9552 4252 ? Ss 12:09 0:00 /bin/bash -c /usr/share/prometheus-node-exporter-collectors/apt_>
root 65553 0.0 0.4 89640 76908 ? S 12:09 0:03 python3 /usr/share/prometheus-node-exporter-collectors/apt_info.>
root 65554 0.0 0.0 2464 884 ? S 12:09 0:00 sponge /var/lib/prometheus/node-exporter/apt.prom
_apt 65814 0.0 0.0 27192 13204 ? S 12:09 0:00 /usr/lib/apt/methods/https
_apt 65815 0.0 0.0 24420 10236 ? S 12:09 0:00 /usr/lib/apt/methods/http
_apt 65816 0.0 0.0 27192 13204 ? S 12:09 0:00 /usr/lib/apt/methods/https
_apt 65817 0.0 0.0 24420 10272 ? S 12:09 0:00 /usr/lib/apt/methods/http
_apt 65819 0.0 0.0 17572 7624 ? S 12:09 0:00 /usr/lib/apt/methods/gpgv
_apt 65826 0.0 0.0 27192 13464 ? S 12:09 0:00 /usr/lib/apt/methods/https
_apt 65829 0.0 0.0 24420 10292 ? S 12:09 0:00 /usr/lib/apt/methods/http
_apt 66110 0.0 0.0 17528 7500 ? S 12:10 0:00 /usr/lib/apt/methods/store
_apt 66112 0.0 0.0 18436 8636 ? S 12:10 0:00 /usr/lib/apt/methods/rred
_apt 66113 0.0 0.0 18576 8860 ? S 12:10 0:00 /usr/lib/apt/methods/rred
The deadlock is obviously between the unattended-upgrade proc (1259), and the prometheus tryptic: 65551/53/54.
# 65553 seems to be the culprit - as apt update tells us
# strace -p 65553
strace: Process 65553 attached
pselect6(29, [12 13 14 16 18 20 22 24 26 28], [], NULL, {tv_sec=0, tv_nsec=499419645}, NULL) = 0 (Timeout)
pselect6(29, [12 13 14 16 18 20 22 24 26 28], [], NULL, {tv_sec=0, tv_nsec=500000000}, NULL) = 0 (Timeout)
... repeats 'forever' ....
All fds are pipes, I could not get more info until the processed crashed due to my diagnostic atttempts.
An apt/python/prom collector specialist should instantly identify these pipes and make more deductions, from the following state:
# gdb -p 65553 and bt:
#0 0x00007fa4bf65f794 in __GI___select (nfds=29, readfds=0x7ffc24f8e7c0, writefds=0x7ffc24f8e840, exceptfds=0x0,
timeout=0x7ffc24f8e750) at ../sysdeps/unix/sysv/linux/select.c:69
#1 0x00007fa4bebad338 in pkgAcquire::Run(int) () from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
#2 0x00007fa4becb1485 in AcquireUpdate(pkgAcquire&, int, bool, bool) () from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
#3 0x00007fa4becb1976 in ListUpdate(pkgAcquireStatus&, pkgSourceList&, int) ()
from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
#4 0x00007fa4bed32fe1 in ?? () from /usr/lib/python3/dist-packages/apt_pkg.cpython-311-x86_64-linux-gnu.so
#5 0x0000000000521cf0 in ?? ()
#6 0x000000000053983c in PyObject_Vectorcall ()
#7 0x000000000052a570 in _PyEval_EvalFrameDefault ()
#8 0x000000000052222b in PyEval_EvalCode ()
#9 0x0000000000647f07 in ?? ()
#10 0x00000000006457cf in ?? ()
#11 0x0000000000651920 in ?? ()
#12 0x000000000065166b in _PyRun_SimpleFileObject ()
#13 0x0000000000651494 in _PyRun_AnyFileObject ()
#14 0x000000000065022f in Py_RunMain ()
#15 0x00000000006248b7 in Py_BytesMain ()
#16 0x00007fa4bf58818a in __libc_start_call_main (main=main@entry=0x624820, argc=argc@entry=2,
argv=argv@entry=0x7ffc24f8f298) at ../sysdeps/nptl/libc_start_call_main.h:58
#17 0x00007fa4bf588245 in __libc_start_main_impl (main=0x624820, argc=2, argv=0x7ffc24f8f298, init=<optimized out>,
fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc24f8f288) at ../csu/libc-start.c:381
#18 0x0000000000624751 in _start ()
This seems to suggest that the location of the deadlock, for 65553, is:
(apt_info.py)
def _main():
cache = apt.cache.Cache()
# First of all, attempt to update the index. If we don't have permission
# to do so (or it fails for some reason), it's not the end of the world,
# we'll operate on the old index.
with contextlib.suppress(apt.cache.LockFailedException, apt.cache.FetchFailedException):
cache.update() <<<<<<<<<<<< VERY LIKELY
I could not confirm the precise location, as trying to get a python backtrace from the process generated a SEGV:
(gdb) call PyRun_SimpleString("print('toto\n')") # to test
'PyRun_SimpleString' has unknown return type; cast the call to its declared return type
(gdb) call (void*)PyRun_SimpleString("print('toto\n')")
Program received signal SIGSEGV, Segmentation fault.
# Oops... will not get a python trace now.
Hopefully, I collected the core (~27MB) - if interested, tell me - keeping it for a few weeks:
#0 0x000000000063187a in ?? ()
#1 0x00000000006349b2 in PyImport_AddModuleObject ()
#2 0x0000000000634688 in PyImport_AddModule ()
#3 0x000000000063e323 in PyRun_SimpleStringFlags ()
(but I feel it unrelated, and not so usefull - but I may be wrong)
I feel I can't help more now, so throwing the potato 😉
Best,
Eric 'Steve' Estievenart
-- System Information:
Debian Release: bookworm/sid
APT prefers unstable
APT policy: (990, 'unstable')
Architecture: amd64 (x86_64)
Kernel: Linux 6.0.0-6-amd64 (SMP w/4 CPU threads; PREEMPT)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages prometheus-node-exporter-collectors depends on:
ii moreutils 0.67-1
ii prometheus-node-exporter 1.5.0-1+b1
ii python3-apt 2.5.0
ii systemd-sysv 252.4-1
Versions of packages prometheus-node-exporter-collectors recommends:
ii ipmitool 1.8.19-4
ii jq 1.6-2.1
ii nvme-cli 2.2.1-3
ii python3 3.11.1-1
ii smartmontools 7.3-1+b1
prometheus-node-exporter-collectors suggests no packages.
-- no debconf information