Clean Tombstones behaviour

429 views
Skip to first unread message

Matthias Rieber

unread,
Oct 19, 2020, 2:36:52 PM10/19/20
to promethe...@googlegroups.com
Hello,

I observerd some strange behaviour of Prometheus when I issued a
clean_tombstones API call.

1. WAL will not be written to disk while the compaction runs. This
means that very long running compactions can exceed the available
memory.

2. When one 'block' is copied, the old source block won't be deleted.

3. New 'blocks' won't count against storage.tsdb.retention.size and so no
old blocks are deleted until the compaction is finished.

If this is correct and metrics are removed that are stored in each and
every level then it's not advisable to clean_tombstones, as this will
require almost the same amount of disk space as the actual data?

Matthias

Matthias Rieber

unread,
Oct 21, 2020, 8:25:04 AM10/21/20
to promethe...@googlegroups.com
Hello,

On Mon, 19 Oct 2020, Matthias Rieber wrote:

> Hello,
>
> I observerd some strange behaviour of Prometheus when I issued a
> clean_tombstones API call.

I've attached a graph that will show the behaviour.

hostB is quite interesting:

- 16. Oct 11:00 drop in 'head GC completed' duration, this is when the
metrics got removed from scraping and deleted with the api call.

- 17. Oct 05:00 it started to compact the "162h" data, which usually
happens every 162h once (one dot). For unlcear reason it compacted more
that usual. During that time no other compaction and no 'head GC
completed' and no 'WAL checkpoint completed' took place and Prometheus
died with out-of-mememory eventually.

hostA:

- 19. Oct 11:00 triggered the clean_tombstones api. It started to copy
large amount data (these are three grey dots). In this case, too, no other
compactions took place.

I wonder if it would feasable to do other compactions during that time, at
least right before the next chunk of data will be processed?

Regards,
Matthias
compactions-small-opt.png

Brian Candler

unread,
Oct 21, 2020, 11:02:42 AM10/21/20
to Prometheus Users
I can't answer the question, but what version of Prometheus are you running?  The internals of the TSDB can change significantly between versions.

Matthias Rieber

unread,
Oct 21, 2020, 11:23:43 AM10/21/20
to promethe...@googlegroups.com
It's version 2.18.1. The most promising change I could find is:

[FEATURE] TSDB: Memory-map full chunks of Head (in-memory) block from
disk. This reduces memory footprint and makes restarts faster. #6679

But the issue mentions only the memory requirement during WAL replay.

Regards,
Matthias

Reply all
Reply to author
Forward
0 new messages