Package: pcp
Version: 5.2.6-1
Severity: important
Tags: upstream fixed-upstream
Hello,
Debian stable's pcp version has a rather annoying bug: after a while,
pmlogger.service fails to start up:
systemd[1]: pmlogger.service: Failed with result 'protocol'.
systemd[1]: Failed to start Performance Metrics Archive Logger.
systemd[1]: pmlogger.service: Scheduled restart job, restart counter is at 1.
Then it retries a few times and eventually fails. The root cause is that
something during the startup completely whacks up the log permissions:
# ls -ld /var/log/pcp/pmlogger
drwxrwxr-x 3 1000 wheel 4096 Sep 19 22:13 /var/log/pcp/pmlogger
The only way out of this is to run
chown -R pcp:pcp /var/log/pmlogger
I think this is the same problem as reported in
https://bugzilla.redhat.com/show_bug.cgi?id=2013937 , and there was a
corresponding upstream fix:
https://github.com/performancecopilot/pcp/commit/b9ff7d65b5e11
The essence of that is to drop the -C option from pmlogger_check.service.
However, I tried to apply this to Debian 11 by appending
PMLOGGER_CHECK_PARAMS="--skip-primary"
to /etc/default/pmlogger_timers. But unfortunately that still doesn't help, our
tests keep running into this bug:
https://logs.cockpit-project.org/logs/pull-16979-20220211-085507-4343f4f8-debian-stable/log.html#298
At this point I'm running out of ideas. This feels like quite a major bug, as
it's not at all obvious how to get out of the situation, and how to prevent it
from happening.
Note that this does not affect any other operating system that cockpit tests on
(Debian testing, Ubuntu 20.04 and 21.10, Fedora 34/35, CentOS/RHEL 8/9, Arch),
only Debian 11. So I'm fairly sure this is fixed in current upstream versions.
Thanks,
Martin