storage.tsdb.max-block-duration to a lower value completely stops compaction

Sukhada Sankpal

unread,

Jan 24, 2024, 4:35:16 PM1/24/24

to Prometheus Users

storage.tsdb.max-block-duration default value is set to be 10% of retention time. I am currently using a setup with 30 days of retention and thereby this flags default value is set to be 3 days.

Based on suggestions posted here: https://github.com/prometheus/prometheus/issues/6934#issuecomment-1610921555

I changed storage.tsdb.min-block-duration to 30m and storage.tsdb.max-block-duration to 1h. This resulted in no-compaction state and local storage increased quickly.

In order to enable the compaction and have a safe test, I changed storage.tsdb.max-block-duration to 1day

I want some guideline on what is a safe lower value of this parameter and keeping it low impact in increased memory usage?

Sukhada Sankpal

unread,

Jan 24, 2024, 4:45:50 PM1/24/24

to Prometheus Users

Background on why I wanted to play around this parameter:

Using LTS version for testing i.e. 2.45.2
During compaction i.e. every 3days, the resident memory of prometheus spikes to a very high value. Example if average of process_resident_memory_bytes is around 50 GB and at the time of compaction it spikes to 120 to 160 GB. Considering the usage of 50 GB want memory allocated to the host to be around 128GB. But looking at memory usage spike during compaction, this doesn't seem to be a workable option and keeping a low value may lead to OOM during compaction. It also adds to cost for cloud based VMs.

Brian Candler

unread,

Jan 25, 2024, 2:15:09 AM1/25/24

to Prometheus Users

Since regular blocks are 2h, setting maximum size of compacted blocks to 1h sound unlikely to work. And therefore testing with 1d seems reasonable.

Can you provide more details about the scale of your environment, in particular the "head stats" from Status > TSDB Stats in the Prometheus web interface?

However, I think what you're seeing could be simply an artefact of how Go's garbage collection works, and you can make it more aggressive by tuning GOGC and/or GOMEMLIMIT. See

https://tip.golang.org/doc/gc-guide#GOGC

for more details.

Roughly speaking, the default garbage collector behaviour in Go is to allow memory usage to expand to double the current usage, before triggering a garbage collector cycle. So if the steady-state heap is 50GB, it would be normal for it to grow to 100GB if you don't tune it.

If this is the case, setting smaller compacted blocks is unlikely to make any difference to memory usage - and it could degrade query performance.

Sukhada Sankpal

unread,

Jan 25, 2024, 2:50:22 PM1/25/24

to Prometheus Users

Thanks Brian

I have enclosed a screenshot of TSDB head stats.
I have setup GOGC to 60% based on recommendation by Bryan Boreham for this setup

However, what does this parameter exactly do? Let's say my data retention is 30 days, this parameter by default sets to 3 days. Does that mean every 3 days the data compaction will be triggered for 30days of data?

tsdb_stats.jpg

Brian Candler

unread,

Jan 26, 2024, 3:20:33 AM1/26/24

to Prometheus Users

As far as I know, if you set the compaction period to 3 days, then every 3 days it will compact the last 3 days worth of data. As simple as that.

When you say "I have setup GOGC to 60%", what *exact* string value have you given for GOGC? I think it must be GOGC=60 not GOGC=60%

If you're limiting the whole VM to 128GiB then setting GOMEMLIMIT a bit below this (e.g. "110GiB") may help during compaction time. There are blogs about this, e.g.