Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#986749: apt-cacher-ng: acng still stops working with APOLL ADD failing and Bad file descriptor

798 views
Skip to first unread message

Christian Meyer

unread,
Apr 11, 2021, 7:40:02 AM4/11/21
to
Package: apt-cacher-ng
Version: 3.6.3-1
Severity: important
X-Debbugs-Cc: c2h...@web.de

Dear Maintainer,

even after the recent acng updates I still see apt-cacher-ng (3.6.3-1 amd64) stop working.
Apt throws a bunch of errors like:

E: Fehlschlag beim Holen von http://deb.debian.org/debian/dists/bullseye/main/Contents-all Fehler beim Lesen vom Server: Verbindung wurde durch den Se
rver auf der anderen Seite geschlossen. [IP: 127.0.0.1 3142]
E: Einige Indexdateien konnten nicht heruntergeladen werden. Sie wurden ignoriert oder alte an ihrer Stelle benutzt.
...

W: Das Depot »http://127.0.0.1:3142/deb.debian.org/debian bullseye Release« enthält keine Release-Datei.
E: Fehlschlag beim Holen von http://127.0.0.1:3142/deb.debian.org/debian/dists/bullseye/main/binary-i386/Packages 503 Service Unavailable [IP: 127.0.
0.1 3142]
E: Einige Indexdateien konnten nicht heruntergeladen werden. Sie wurden ignoriert oder alte an ihrer Stelle benutzt.
E: Paket adduser kann nicht gefunden werden.
E: Paket apt kann nicht gefunden werden.
...

...
Get:99 http://deb.debian.org/debian bullseye/main amd64 krb5-user amd64 1.18.3-4 [151 kB]
Get:100 http://deb.debian.org/debian bullseye/main amd64 ldap-utils amd64 2.4.57+dfsg-2 [206 kB]
Get:101 http://deb.debian.org/debian bullseye/main amd64 linux-image-5.10.0-5-amd64 amd64 5.10.26-1 [53.4 MB]
Err:101 http://deb.debian.org/debian bullseye/main amd64 linux-image-5.10.0-5-amd64 amd64 5.10.26-1
Undetermined Error [IP: 127.0.0.1 3142]
Err:102 http://deb.debian.org/debian bullseye/main amd64 linux-image-amd64 amd64 5.10.26-1
Could not connect to 127.0.0.1:3142 (127.0.0.1). - connect (111: Connection refused) [IP: 127.0.0.1 3142]
Err:103 http://deb.debian.org/debian bullseye/main amd64 whois amd64 5.5.8
Unable to connect to 127.0.0.1:3142: [IP: 127.0.0.1 3142]


Since systemctl often tells me something about "Epoll ADD" failing:
# systemctl status apt-cacher-ng.service
apt-cacher-ng[646]: [warn] Epoll ADD(1) on fd 14 failed. Old events were 0; read change was 1 (add); write change was 0 (none); close change was 0 (none): Bad file descriptor

my workaround is to automatically restart the service by a cronjob (and after that /usr/lib/apt-cacher-ng/expire-caller.pl) when this line is seen.

More information:

Yes, my internet connection is very slow (around 8000 kbit/s) and that's my reason to run acng).

# less /var/log/apt-cacher-ng/apt-cacher.err.1
Beside of tons of "Bad file descriptor" messages:
Fri Apr 9 11:13:05 2021|debrep/pool/main/f/ffmpeg/libavutil56_4.3.2-0+deb11u1_amd64.deb storage error [HTTP/1.1 503 Bad file descriptor], last errno: Bad file descriptor
Fri Apr 9 11:13:06 2021|debrep/pool/main/c/codec2/libcodec2-0.9_0.9.2-4_amd64.deb storage error [HTTP/1.1 503 Bad file descriptor], last errno: Bad file descriptor
Fri Apr 9 11:13:17 2021|debrep/pool/main/p/pulseaudio/libpulsedsp_14.2-2_amd64.deb storage error [HTTP/1.1 503 Bad file descriptor], last errno: Bad file descriptor

I only see a few different ones:
Fri Apr 9 10:57:49 2021|Failed to move stale item debrep/dists/bullseye/main/i18n/Translation-de.bz2 out of the way: No such file or directory
Fri Apr 9 10:59:29 2021|Failed to move stale item debrep/dists/bullseye/main/i18n/Translation-en.bz2 out of the way: No such file or directory
Fri Apr 9 11:05:10 2021|debrep/pool/main/f/fonts-liberation/fonts-liberation_1.07.4-11_all.deb storage error [HTTP/1.1 503 Broken pipe], last errno: Broken pipe
Fri Apr 9 11:06:56 2021|debrep/pool/main/l/lm-sensors/libsensors5_3.6.0-7_i386.deb storage error [HTTP/1.1 503 Connection reset by peer], last errno: Connection reset by peer
Fri Apr 9 11:12:51 2021|debrep/pool/main/libr/libreoffice/libreoffice-base-drivers_7.0.4-3_amd64.deb storage error [HTTP/1.1 503 Broken pipe], last errno: Broken pipe


# journalctl
only shows something like:
Apr 03 00:34:03 121-My apt-cacher-ng[2900630]: [warn] fcntl(15, F_SETFL): Bad file descriptor
...
Apr 09 11:12:44 121-My apt-cacher-ng[2900630]: [warn] Epoll ADD(1) on fd 75 failed. Old events were 0; read change was 1 (add); write change was 0 (n>
Apr 09 11:12:53 121-My apt-cacher-ng[2900630]: [warn] Epoll ADD(1) on fd 75 failed. Old events were 0; read change was 1 (add); write change was 0 (n>
Apr 09 11:12:53 121-My apt-cacher-ng[2900630]: [warn] Epoll ADD(1) on fd 73 failed. Old events were 0; read change was 1 (add); write change was 0

Christian Meyer

-- Package-specific info:

-- System Information:
Debian Release: bullseye/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.10.0-5-amd64 (SMP w/4 CPU threads)
Kernel taint flags: TAINT_CRAP
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages apt-cacher-ng depends on:
ii adduser 3.118
ii debconf [debconf-2.0] 1.5.75
ii dpkg 1.20.7.1
ii libbz2-1.0 1.0.8-4
ii libc6 2.31-11
ii libevent-2.1-7 2.1.12-stable-1
ii libevent-pthreads-2.1-7 2.1.12-stable-1
ii libgcc-s1 10.2.1-6
ii liblzma5 5.2.5-2
ii libssl1.1 1.1.1k-1
ii libstdc++6 10.2.1-6
ii libsystemd0 247.3-3
ii libwrap0 7.6.q-31
ii lsb-base 11.1.0
ii zlib1g 1:1.2.11.dfsg-2

Versions of packages apt-cacher-ng recommends:
ii ca-certificates 20210119

Versions of packages apt-cacher-ng suggests:
ii avahi-daemon 0.8-5
pn doc-base <none>
ii libfuse2 2.9.9-5

-- Configuration Files:
/etc/apt-cacher-ng/acng.conf changed:
CacheDir: /var/cache/apt-cacher-ng
LogDir: /var/log/apt-cacher-ng
SupportDir: /usr/lib/apt-cacher-ng
Remap-debrep: file:deb_mirror*.gz /debian ; file:backends_debian # Debian Archives
Remap-uburep: file:ubuntu_mirrors /ubuntu ; file:backends_ubuntu # Ubuntu Archives
Remap-cygwin: file:cygwin_mirrors /cygwin # ; file:backends_cygwin # incomplete, please create this file or specify preferred mirrors here
Remap-sfnet: file:sfnet_mirrors # ; file:backends_sfnet # incomplete, please create this file or specify preferred mirrors here
Remap-alxrep: file:archlx_mirrors /archlinux # ; file:backend_archlx # Arch Linux
Remap-fedora: file:fedora_mirrors # Fedora Linux
Remap-epel: file:epel_mirrors # Fedora EPEL
Remap-slrep: file:sl_mirrors # Scientific Linux
Remap-gentoo: file:gentoo_mirrors.gz /gentoo ; file:backends_gentoo # Gentoo Archives
Remap-secdeb: security.debian.org ; security.debian.org deb.debian.org/debian-security
ReportPage: acng-report.html
ExThreshold: 4
LocalDirs: acng-doc /usr/share/doc/apt-cacher-ng
ReuseConnections: 0

/etc/apt-cacher-ng/security.conf [Errno 13] Keine Berechtigung: '/etc/apt-cacher-ng/security.conf'
/etc/cron.daily/apt-cacher-ng changed:
export ACNGREQ="?abortOnErrors=aOe&byPath=bP&byChecksum=bS&truncNow=tN&incomAsDamaged=iad&purgeNow=pN&doExpire=Start+Scan+and%2For+Expiration&calcSize=cs&asNeeded=an"
if test -r /usr/lib/apt-cacher-ng/expire-caller.pl ; then
/usr/lib/apt-cacher-ng/expire-caller.pl || echo Please check error messages in one of: $(ls -t /var/log/apt-cacher-ng/maint*.log.html | head -n5)
fi


-- debconf information:
apt-cacher-ng/cachedir: keep
apt-cacher-ng/proxy: keep
apt-cacher-ng/gentargetmode: No automated setup
* apt-cacher-ng/tunnelenable: false
apt-cacher-ng/port: keep
apt-cacher-ng/bindaddress: keep

Eduard Bloch

unread,
Apr 12, 2021, 3:20:04 PM4/12/21
to
Hallo,
* Christian Meyer [Sun, Apr 11 2021, 01:26:01PM]:
> Package: apt-cacher-ng
> Version: 3.6.3-1
> Severity: important
> X-Debbugs-Cc: c2h...@web.de
>
> Dear Maintainer,
>
> even after the recent acng updates I still see apt-cacher-ng (3.6.3-1 amd64) stop working.

What you describe here does, all in one, not make much sense. Too many
symptoms coming together which are not adding up. Are you sure that
there is no bad memory or system instability involved?

> Apt throws a bunch of errors like:
>
> E: Fehlschlag beim Holen von http://deb.debian.org/debian/dists/bullseye/main/Contents-all Fehler beim Lesen vom Server: Verbindung wurde durch den Se
> rver auf der anderen Seite geschlossen. [IP: 127.0.0.1 3142]

Closing connection is a potential outcome in worst situations but the
connection wouldn't stay down permanently as seen in the rest of your
log. Actually, that might be caused by periodic crashes so systemd keeps
the server down for a while. Which brings me to the suspicion of system
instability.

Please send the output of
journalctl -b0 -u apt-cacher-ng

and also please set debug=7 in acng.conf or similar conf file.

If that's a crash, please install systemd-coredump service and we can
debug the corefiles later.

> E: Einige Indexdateien konnten nicht heruntergeladen werden. Sie wurden ignoriert oder alte an ihrer Stelle benutzt.
> ...
>
> W: Das Depot »http://127.0.0.1:3142/deb.debian.org/debian bullseye Release« enthält keine Release-Datei.
> E: Fehlschlag beim Holen von http://127.0.0.1:3142/deb.debian.org/debian/dists/bullseye/main/binary-i386/Packages 503 Service Unavailable [IP: 127.0.
> 0.1 3142]

Oh, that's probably our "nice" apt hiding the actual error string and
showing this dummy instead. It should match some error in the error log,
though.

This doesn't make sense. ACNG does not render this message string explicitly.
Is there some proxy server involved?

> Get:99 http://deb.debian.org/debian bullseye/main amd64 krb5-user amd64 1.18.3-4 [151 kB]
> Get:100 http://deb.debian.org/debian bullseye/main amd64 ldap-utils amd64 2.4.57+dfsg-2 [206 kB]
> Get:101 http://deb.debian.org/debian bullseye/main amd64 linux-image-5.10.0-5-amd64 amd64 5.10.26-1 [53.4 MB]
> Err:101 http://deb.debian.org/debian bullseye/main amd64 linux-image-5.10.0-5-amd64 amd64 5.10.26-1
> Undetermined Error [IP: 127.0.0.1 3142]
That's apt's insufficient error message which probably indicates a short
read. Would fit to the connection refusal series.

> Err:102 http://deb.debian.org/debian bullseye/main amd64 linux-image-amd64 amd64 5.10.26-1
> Could not connect to 127.0.0.1:3142 (127.0.0.1). - connect (111: Connection refused) [IP: 127.0.0.1 3142]
> Err:103 http://deb.debian.org/debian bullseye/main amd64 whois amd64 5.5.8
> Unable to connect to 127.0.0.1:3142: [IP: 127.0.0.1 3142]

> Since systemctl often tells me something about "Epoll ADD" failing:
> # systemctl status apt-cacher-ng.service
> apt-cacher-ng[646]: [warn] Epoll ADD(1) on fd 14 failed. Old events were 0; read change was 1 (add); write change was 0 (none); close change was 0 (none): Bad file descriptor
>
> my workaround is to automatically restart the service by a cronjob (and after that /usr/lib/apt-cacher-ng/expire-caller.pl) when this line is seen.

And now that is getting really odd. Do you have some kind of application
firewall installed, like apparmor?

> More information:
>
> Yes, my internet connection is very slow (around 8000 kbit/s) and that's my reason to run acng).

8mbit/s isn't awful slow and it shouldn't make a difference anyway.

> # less /var/log/apt-cacher-ng/apt-cacher.err.1
> Beside of tons of "Bad file descriptor" messages:
> Fri Apr 9 11:13:05 2021|debrep/pool/main/f/ffmpeg/libavutil56_4.3.2-0+deb11u1_amd64.deb storage error [HTTP/1.1 503 Bad file descriptor], last errno: Bad file descriptor
> Fri Apr 9 11:13:06 2021|debrep/pool/main/c/codec2/libcodec2-0.9_0.9.2-4_amd64.deb storage error [HTTP/1.1 503 Bad file descriptor], last errno: Bad file descriptor
> Fri Apr 9 11:13:17 2021|debrep/pool/main/p/pulseaudio/libpulsedsp_14.2-2_amd64.deb storage error [HTTP/1.1 503 Bad file descriptor], last errno: Bad file descriptor

So I checked the code and it doesn't make sense. fileitem.cc:755 is the
only location reporting that and it wouldn't get there when the
filedescriptor is okay, and you cannot get that FD closed by any regular
code path.

We can have an interactive Jitsi session if you prefer to get this
solved in realtime and it can be made reproducible somehow.

Best regards,
Eduard.

Christian Meyer

unread,
Apr 13, 2021, 5:10:03 PM4/13/21
to
Am Montag, dem 12.04.2021 um 21:13 +0200 schrieb Eduard Bloch:
> Hallo,
> * Christian Meyer [Sun, Apr 11 2021, 01:26:01PM]:
> > Package: apt-cacher-ng
> > Version: 3.6.3-1
> > Severity: important
> > X-Debbugs-Cc: c2h...@web.de
>
> What you describe here does, all in one, not make much sense. Too
> many symptoms coming together which are not adding up. Are you sure
> that there is no bad memory or system instability involved?

Until now I have no hint to that direction, but I will test memory and
/ or replace the machine.

> Please send the output of
> journalctl -b0 -u apt-cacher-ng

For today (and cron driven restarts on errors every 3 minutes) it is:

Apr 13 07:57:37 121-My apt-cacher-ng[202463]: [warn] Epoll ADD(1) on fd
13 failed. Old events were 0; read change was 1 (add); write change was
0 (none); close change was 0 (none): Bad file descriptor
Apr 13 08:00:02 121-My systemd[1]: Stopping Apt-Cacher NG software
download proxy...
Apr 13 08:00:02 121-My systemd[1]: apt-cacher-ng.service: Succeeded.
Apr 13 08:00:02 121-My systemd[1]: Stopped Apt-Cacher NG software
download proxy.
Apr 13 08:00:02 121-My systemd[1]: apt-cacher-ng.service: Consumed
55.182s CPU time.
Apr 13 08:00:02 121-My systemd[1]: Starting Apt-Cacher NG software
download proxy...
Apr 13 08:00:02 121-My systemd[1]: Started Apt-Cacher NG software
download proxy.
Apr 13 08:00:40 121-My apt-cacher-ng[389015]: [warn] Epoll ADD(1) on fd
13 failed. Old events were 0; read change was 1 (add); write change was
0 (none); close change was 0 (none): Bad file descriptor
Apr 13 08:03:02 121-My systemd[1]: Stopping Apt-Cacher NG software
download proxy...
Apr 13 08:03:02 121-My systemd[1]: apt-cacher-ng.service: Succeeded.
Apr 13 08:03:02 121-My systemd[1]: Stopped Apt-Cacher NG software
download proxy.
Apr 13 08:03:02 121-My systemd[1]: apt-cacher-ng.service: Consumed
2.402s CPU time.
Apr 13 08:03:02 121-My systemd[1]: Starting Apt-Cacher NG software
download proxy...
Apr 13 08:03:02 121-My systemd[1]: Started Apt-Cacher NG software
download proxy.
Apr 13 10:42:12 121-My apt-cacher-ng[391899]: [warn] Epoll ADD(1) on fd
13 failed. Old events were 0; read change was 1 (add); write change was
0 (none); close change was 0 (none): Bad file descriptor
Apr 13 10:43:25 121-My apt-cacher-ng[391899]: [warn] Epoll ADD(1) on fd
13 failed. Old events were 0; read change was 1 (add); write change was
0 (none); close change was 0 (none): Bad file descriptor
Apr 13 10:45:02 121-My systemd[1]: Stopping Apt-Cacher NG software
download proxy...
Apr 13 10:45:02 121-My systemd[1]: apt-cacher-ng.service: Succeeded.
Apr 13 10:45:02 121-My systemd[1]: Stopped Apt-Cacher NG software
download proxy.
Apr 13 10:45:02 121-My systemd[1]: apt-cacher-ng.service: Consumed
56.112s CPU time.
Apr 13 10:45:02 121-My systemd[1]: Starting Apt-Cacher NG software
download proxy...
Apr 13 10:45:02 121-My systemd[1]: Started Apt-Cacher NG software
download proxy.
Apr 13 11:36:26 121-My apt-cacher-ng[414217]: [warn] Epoll ADD(1) on fd
13 failed. Old events were 0; read change was 1 (add); write change was
0 (none); close change was 0 (none): Bad file descriptor
Apr 13 11:39:02 121-My systemd[1]: Stopping Apt-Cacher NG software
download proxy...
Apr 13 11:39:02 121-My systemd[1]: apt-cacher-ng.service: Succeeded.
Apr 13 11:39:02 121-My systemd[1]: Stopped Apt-Cacher NG software
download proxy.
Apr 13 11:39:02 121-My systemd[1]: apt-cacher-ng.service: Consumed 2min
47.837s CPU time.
Apr 13 11:39:02 121-My systemd[1]: Starting Apt-Cacher NG software
download proxy...
Apr 13 11:39:02 121-My systemd[1]: Started Apt-Cacher NG software
download proxy.


> and also please set debug=7 in acng.conf or similar conf file.

I did so an now I'm waiting for new errors.

> If that's a crash, please install systemd-coredump service and we can
> debug the corefiles later.


> Oh, that's probably our "nice" apt hiding the actual error string and
> showing this dummy instead. It should match some error in the error
> log,
> though.
>
> This doesn't make sense. ACNG does not render this message string
> explicitly.
> Is there some proxy server involved?


Yes, there is an elderly squid upstream proxy:
Proxy: http://user:pass@myProxy:8080
and I
ReuseConnections: 0

These are tho only modifications of the config-file



> > my workaround is to automatically restart the service by a cronjob
> > (and after that /usr/lib/apt-cacher-ng/expire-caller.pl) when this
> > line is seen.
>
> And now that is getting really odd. Do you have some kind of
> application
> firewall installed, like apparmor?

I didn't install (or configure) it by hand, but obviously it is pulled
in by $something in Bullseye:

At least 'dpkg -i apparmor' says something like:

ii apparmor 2.13.6-10 amd64 user-space parser
utility for AppArmor
un apparmor-profiles-extra <keine> <keine> (keine
Beschreibung vorhanden)
un apparmor-utils <keine> <keine> (keine
Beschreibung vorhanden)


> We can have an interactive Jitsi session if you prefer to get this
> solved in realtime and it can be made reproducible somehow.

Thanks for your offer.
Actually I don't see a way to reproduce it by demand.

I will have a look at the memory (memtest) and report the 'debug: 7'
output in apt-cacher-ng.err

Christian

FUSTE Emmanuel

unread,
Apr 15, 2021, 10:00:03 AM4/15/21
to
Hello,

I have similar problems exacerbated by tree level of "forcemanaged=1" of
apt cacher servers behind a blucoat proxy.
Somes are VM, somes are physical. All machines / OS are ok.
My conf use VfileUseRangeOps:-1 and ResuseConnections:0
Trashing all the caches on all the servers even does not completely cure
the problem which reappear shortly.
Client concurrency activity worsen/trigguer the problem very very fast.
Smell like treading problems.
Will activate Debug: 7 and report here if I see something interesting.

Emmanuel.

Eduard Bloch

unread,
May 30, 2021, 7:20:04 AM5/30/21
to
notfound 986749 3.6.4-1
notfound 986749 3.7-1
severity 986749 serious
thanks
After excessive testing, I am pretty sure that the root cause of this problem was solved
in the commit https://salsa.debian.org/blade/apt-cacher-ng/-/commit/c333cf3829e6373bcad07c831436317a7c34fac1
or for Sid (hopefully unblocked...):
https://salsa.debian.org/blade/apt-cacher-ng/-/commit/2afc3d384b2c051f2754730ed392ea5381f854f1

The other aspects with stale storage items (file recreation) were
already tackled in versions 3.6.2 und 3.6.3.

Your guess was not bad, the Bad-File-Descriptor problem was related to
concurrency issues but the error path was not trivial.

First, there was buggy usage of a RAII helper (unique_fd) which was
added as an afterthought in the commit:
0c02c1a0 (Eduard Bloch 2019-11-23 11:46:20 +0100
This was never used correctly though, the extra member in the class was
only for "design beauty" (uniformity) and is basically not used, but it
was interfering with the existing method for graceful connection
shutdown (see destructor). So actually after that change the socket was
closed ASAP and NOT graceful (risking loss of the final bytes of the
active TCP stream) which is an issue of its own, and then the delayed
closer code (see sockio.cc) came along and tried to close this socket
again, which killed random streams depending on the timing. This was not
obvious with a fast server and a few clients but with some load, this
becomes a real problem.

Then, another problem was the graceful-closing code itself. It was not
thread-safe but it was called from multi-threaded context via the
FinishConnection method in conserver.cc. This is now fixed by posting
the scheduling task into the IO thread. I'd also consider the code
inefficient and error-prone because it was using a hashmap for a purpose
where simply allocating the metadata nodes and releasing them is totally
sufficient and probably cheaper. So I rewrote this mess in sockio.cc
some weeks ago and current code seems to behave stable. I.e. no socket
or memory leaks spotted since then.

Another minor issue which caught my eye was the forceclose() helper
method, which was written in a sloppy way many years ago, and which
might call close(-1) once in a while. Which is not a drama but
pointless. The method is now dropped in the Unstable commit (see above),
it was hardly used anyway.

Best regards,
Eduard.

--
<leichenwagen>Erst wenn der letzte Programmierer eingesperrt und die letzte Idee patentiert ist, werdet ihr merken, daß Anwälte nicht programmieren können
0 new messages