> Hi, this series is much needed to work with the still unreliable snapshot mirrors.
>
> @Alexander: Do you plan to send a v2?
>
> At the same time I'm working on adding internal apt-cacher-ng support to kas to let the build pass the initial bootstrapping.
>
> Best regards,
> Felix
Hi Felix
Thank you for coming back.
Even when using apt-cacher-ng index files oftentimes got updated from
snapshot.debian.org which caused problems when our company was on a
blacklist for some time again.
Unfortunately, I didn't find the time to analyze why that was the case.
I did a tcpdump during one of our builds, but didn't analyze it for 2 weeks or so :-(
But I suspect either apt client sends a reload request or the expiry date
returned from upstream is to limited.
While this could be relevant when fetching packages from "main" mirrors,
it should not have much impact on snapshot mirrors.
To mitigate that issue, since then we switched to squid as a proxy for
snapshot.debian.org
Squid has an offline mode, which says, no matter what happens, cach entries once
seen are never updated upstream. As stated above, while this could have drastic
impacts when using main mirrors, it shouldn't cause issues on snapshots, by definition.
Thus, I dropped apt-cacher-ng in our project in favour of squid.
I also prepared documentation for such, but during preparing the patch, I
was not sure if that is worth a separate doc/ file or if we should merge that with
doc/offline.md. I was struggling with that decision since it does not really
solve an offline case, as it only caches packages already seen once, and further,
only solves the offline case for apt and not for other sources like git, ...
What is your opinion?
BR Alexander
PS: Appended the patch, I was referring to:
From cf64db474c2f2477633bfe3fd111156d2ac7495a Mon Sep 17 00:00:00 2001
From: Alexander Heinisch <
alexander...@siemens.com>
Date: Thu, 24 Oct 2024 20:06:23 +0200
Subject: [PATCH] doc: Added setup guide for squid as an caching proxy for apt
(snapshot) mirrors.
doc/apt-caching-proxy.md | 142 +++++++++++++++++++++++++++++++++++++++
1 file changed, 142 insertions(+)
create mode 100644 doc/apt-caching-proxy.md
diff --git a/doc/apt-caching-proxy.md b/doc/apt-caching-proxy.md
new file mode 100644
index 00000000..2a23a313
--- /dev/null
+++ b/doc/apt-caching-proxy.md
@@ -0,0 +1,142 @@
+# Setup Squid as APT Caching Proxy
+
+Limited download bandwitdth oftentimes is an issue, and increases the build times drastically. Further, large corporate networks could get rate limited by debian mirrors, as many people / pipelines / aso. fetch huge amounts of packets from there.
+
+In such cases a proxy caching the packages is quite useful as it reduces download times and reduces pressure on debian mirrors.
+
+## Install Squid Proxy
+```
+apt install squid
+```
+
+## Configure Proxy for Caching (with APT in mind)
+
+1. /etc/squid/squid.conf
+This file contains the main configuration for `squid`.
+We configure it to listen to port `4242` and cache all requests from sites listed in `/etc/squid/mirror-dstdomain.acl`. Further, to enable, offline usecases (or usecases where your ip got temporarily blacklisted by `
snapshot.debian.org` or similar) we set `offline_mode on`
+to not fetch already cached packages from upstream.
+
+> Note: While `offline_mode on` is totally fine for `
snapshot.debian.org` when using a timestamp to fix your package archive version, this could cause unintended behaviour (most probably outdated packages) when used against a non archive mirror.
+
+> Hint: If you are planning to work against non archive mirrors, and you are not sure, it's recommended to set `offline_mode off` and probably tweak cache behaviour with a `refresh_pattern`.
+
+### /etc/squid/squid.conf:
+```
+# File: /etc/squid/squid.conf
+
+# default to a different port than stock squid
+http_port 4242
+
+# user visible name
+visible_hostname squid-apt-caching-proxy
+
+# do not fetch already cached packages from upstream
+offline_mode on
+
+# we need a big cache, some debs are huge
+maximum_object_size 512 MB
+
+# increase available disk space for cache dir to 40G
+cache_dir aufs /var/cache/squid 40000 16 256
+
+# logs
+access_log /var/log/squid/access.log
+cache_log /var/log/squid/cache.log
+cache_store_log /var/log/squid/store.log
+
+# tweaks to speed things up
+cache_mem 256 MB
+maximum_object_size_in_memory 10240 KB
+
+# only allow ports we trust
+acl Safe_ports port 80
+acl Safe_ports port 443
+
+http_access deny !Safe_ports
+
+# Deny access to blacklisted sites
+acl blockedpkgs urlpath_regex "/etc/squid/pkg-blacklist-regexp.acl"
+http_access deny blockedpkgs
+
+# List of domains to cache
+acl to_archive_mirrors dstdomain "/etc/squid/mirror-dstdomain.acl"
+# don't cache domains not listed in the mirrors file
+cache deny !to_archive_mirrors
+
+# Allow access to the proxy only from networks listed in allowed-networks-src.acl
+acl allowed_networks src "/etc/squid/allowed-networks-src.acl"
+http_access allow allowed_networks
+
+# And finally deny all other access to this proxy
+http_access deny all
+```
+
+### /etc/squid/mirror-dstdomain.acl:
+```
+# File: /etc/squid/mirror-dstdomain.acl
+
+
snapshot.debian.org
+```
+
+### /etc/squid/pkg-blacklist-regexp.acl:
+```
+# File: /etc/squid/pkg-blacklist-regexp.acl
+# Empty for now
+```
+
+### /etc/squid/allowed-networks-src.acl:
+```
+# File: /etc/squid/allowed-networks-src.acl
+
+# network sources that you want to allow access to the cache
+
+# private networks
+
10.0.0.0/8
+
172.16.0.0/12
+
192.168.0.0/16
+127.0.0.1
+
+# IPv6 private addresses
+fe80::/64
+::1/128
+
+# IPv6 mesh local
+fd00::/8
+```
+
+Restart `systemctl restart squid`
+
+## Use the Proxy in ISAR Build System
+
+To forward the proxy settings to apt inside the ISAR build system just export `http_proxy`
+as follows:
+
+```
+export http_proxy=http://<proxy-server-ip>:4242
+```
+
+> Hint: Consider also setting `https_proxy`.
+
+### Validation
+
+The first time you build your image the cache will fetch all packages from upstream.
+During that phase you will see log entries, like
+
+```
+... TCP_MISS/200 1574478 GET
http://snapshot.debian.org/file/7cfaf...
+```
+in `/var/log/squid/access.log`.
+
+From that time on for existing packages only
+
+```
+... TCP_OFFLINE_HIT/200 1574480 GET
http://snapshot.debian.org/file/7cfaf...
+... TCP_MEM_HIT/200 1574480 GET
http://snapshot.debian.org/file/7cfaf...
+```
+
+> Note: When you add new packages to your image, these have to be fetched first, so you will encounter `TCP_MISS`es whenever you add packages you didn't fetched before. Same holds true when upgrading the snapshot timestamp (`ISAR_APT_SNAPSHOT_TIMESTAMP` or `ISAR_APT_SNAPSHOT_DATE`).
+
+> Hint: You can observe your cache misses using:
+> ```
+> tail -f /var/log/squid/access.log | grep -e TCP_MEM_HIT -e TCP_OFFLINE_HIT -v
+> ```
--
2.43.0