Maintenance on IPng's Halloumi and Lipase logs

99 views
Skip to first unread message

Pim van Pelt

unread,
Dec 12, 2025, 8:14:51 AM (7 days ago) Dec 12
to ct-p...@chromium.org
Hoi,

We discovered a garbage collection issue in TesseraCT
(https://github.com/transparency-dev/tesseract/issues/644) which impacts
all log types (Posix, AWS, GCP). It has been fixed by Google's team
(thanks, Philippe and Al!) and a new release v0.1.1 was minted the other
day.

The lack of garbage collection has made IPng's logs grow, and we need to
do some maintenance to clear the backlog of partial tiles. I wanted to
share our plan, and give a headsup that there will be a small (<5min)
outage window of several log shards.

# Canary
Starting with the retired halloumi2026h2 log, I will canary the new
binary and garbage collection. This log has been copied to S3 so this
will not be detected. I'll use this to see how the GC performs in v0.1.1
release, and when it's done, I will issue a ct-fsck to make sure it is
still good. I will do this phase right after sending this e-mail.

# Staging
Working on the Lipase log (our staging/test log), counting backwards
from 2027h2, 2027h1, 2026h2, 2026h1, I will:
1. stop tesseract
2. ZFS snapshot the log
3. start tesseract with v0.1.1 which has partial tile cleanup
4. when completed, run ct-fsck
5. if ct-fsck is happy, destroy the ZFS snapshot
[if anything were to go wrong, I can roll back to the snapshot and
inform you of the time window which was lost]

# Production
Working on the Halloumi log (our production log), counting backwards
from 2027h2, 2027h1, 2026h2, 2026h1, I will perform the same sequence as
above.

I will update this thread after canary, and after staging, and after
production is completed. You can contact our group at ct-...@ipng.ch or
find us in #cheese on Slack.

groet,
Pim

--
Pim van Pelt
PBVP1-RIPE - https://ipng.ch/

Pim van Pelt

unread,
Dec 12, 2025, 8:43:33 AM (7 days ago) Dec 12
to ct-p...@chromium.org
Hoi,

The canary completed, the log was 146GB before, and 5.3GB after cleanup.
For posterity, the garbage collection took about 6m20s and the ct-fsck
took 1m2s for 4765491 entries.

I will now briefly stop each Lipase shard to snapshot its posix
filesystem, and then restart it with GC turned on. They are tiny and
should be done very quickly.

groet,
Pim

On 12.12.2025 14:14, 'Pim van Pelt' via Certificate Transparency Policy
wrote:

Pim van Pelt

unread,
Dec 12, 2025, 8:55:57 AM (7 days ago) Dec 12
to ct-p...@chromium.org
Hoi,

I have completed Lipase - each shard has only a few thousand entries, so
the cleanup only took a few seconds. ct-fsck on all shards passed, and
reclaimed about 300MB of space per shard:

ssd-vol0/logs/lipase2025h2      337M  2.77T   337M
/ssd-vol0/logs/lipase2025h2
ssd-vol0/logs/lipase2026h1     34.3M  2.77T  34.3M
/ssd-vol0/logs/lipase2026h1
ssd-vol0/logs/lipase2026h2     34.1M  2.77T  34.1M
/ssd-vol0/logs/lipase2026h2
ssd-vol0/logs/lipase2027h1     34.7M  2.77T  34.7M
/ssd-vol0/logs/lipase2027h1
ssd-vol0/logs/lipase2027h2     34.1M  2.77T  34.1M
/ssd-vol0/logs/lipase2027h2

Note: I left lipase2025h2 as-is because its write window will close in a
few weeks anyway, so I'll archive it and reclaim the ZFS space at that time.

I will now continue with the Halloumi (production) logs, stopping each
for a brief moment to snapshot ZFS before restarting TesseraCT with the
new release and enabling garbage collection there. I do expect the
garbage collection to take a few days for the larger shards, but unless
anything exciting happens, this will be the last e-mail from me today :)

groet,
Pim


On 12.12.2025 14:43, 'Pim van Pelt' via Certificate Transparency Policy

Joe DeBlasio

unread,
Dec 12, 2025, 2:38:57 PM (7 days ago) Dec 12
to Pim van Pelt, ct-p...@chromium.org
Thank you, Pim, for the excellent example of transparency and communicating in such a way that makes the whole ecosystem stronger, not just for IPng.

Joe


--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/7569448f-9053-4be6-9442-7c7489aaa128%40ipng.ch.

Pim van Pelt

unread,
Dec 15, 2025, 4:37:44 AM (4 days ago) Dec 15
to Pim van Pelt, ct-p...@chromium.org
Hoi,

TL/DR: reclaimed 2.2TB of diskspace, all of IPng's logshards are
healthy.

Final update from me - earlier this morning, the slow deletion of the
partial tiles on the Halloumi logshards completed. For reference,
TesseraCT has a .state/gcState file which contains a json entry with the
leaf from which partial tiles need to be cleaned (eg
'{"fromSize":593112064}'), so we can track completion with something
like:

ctlog@ctlog1:/ssd-vol0/logs$ while :; do clear; for i in *; do GC=$(jq
.fromSize $i/data/.state/gcState); N=$(head -2 $i/data/checkpoint | tail
-1); PCT=$(echo "scale=2; $GC*100/$N" | bc -l); echo "$i - $N - $GC
($PCT%)"; done; sleep 3600; done

halloumi2025h2 - 360370887 - 360370688 (99.99%)
halloumi2026h1 - 592944144 - 592943616 (99.99%)
halloumi2026h2a - 22858085 - 22857984 (99.99%)
halloumi2027h1 - 2634946 - 2634752 (99.99%)
halloumi2027h2 - 2601 - 2560 (98.42%)
lipase2025h2 - 2602 - 2560 (98.38%)
lipase2026h1 - 2600 - 2560 (98.46%)
lipase2026h2 - 2602 - 2560 (98.38%)
lipase2027h1 - 2602 - 2560 (98.38%)
lipase2027h2 - 2602 - 2560 (98.38%)

After the cleanup, I did a ct-fsck on each shard, and the completed (for
posterity, our Dell R640 does this at approx 130k / sec):
LOG=halloumi2026h2a
DIR="/ssd-vol0/logs/$LOG/data"
ct-fsck -origin $LOG.log.ct.ipng.ch -monitoring_url file://$DIR \
-public_key=$(jq -r .key $DIR/log.v3.json) \
-N 20 -user_agent_info="ct-...@ipng.ch"

Similar to the Lipase canary (where I left 2025h2 alone, because it'll
be retired in a few weeks), Halloumi2025h2 remained untouched:
ssd-vol0/logs/halloumi2025h2 5.0T 1.7T 3.4T 34%
/ssd-vol0/logs/halloumi2025h2
ssd-vol0/logs/halloumi2027h1 3.4T 2.5G 3.4T 1%
/ssd-vol0/logs/halloumi2027h1
ssd-vol0/logs/halloumi2026h2 3.4T 5.3G 3.4T 1%
/ssd-vol0/logs/halloumi2026h2
ssd-vol0/logs/halloumi2026h2a 3.4T 24G 3.4T 1%
/ssd-vol0/logs/halloumi2026h2a
ssd-vol0/logs/halloumi2026h1 4.0T 589G 3.4T 15%
/ssd-vol0/logs/halloumi2026h1
ssd-vol0/logs/halloumi2027h2 3.4T 5.4M 3.4T 1%
/ssd-vol0/logs/halloumi2027h2

The log sizes are now comparable to Gouda and Rennet, our Sunlight
static logs. Thanks again to Philippe and the TrustFabric team, for the
quick fix.
Finally, thank you Joe, for the kind words!

groet,
Pim
--
Pim van Pelt <p...@ipng.ch>
PBVP1-RIPE https://ipng.ch/
Reply all
Reply to author
Forward
0 new messages