Hello,
On Mon, 19 Oct 2020, Matthias Rieber wrote:
> Hello,
>
> I observerd some strange behaviour of Prometheus when I issued a
> clean_tombstones API call.
I've attached a graph that will show the behaviour.
hostB is quite interesting:
- 16. Oct 11:00 drop in 'head GC completed' duration, this is when the
metrics got removed from scraping and deleted with the api call.
- 17. Oct 05:00 it started to compact the "162h" data, which usually
happens every 162h once (one dot). For unlcear reason it compacted more
that usual. During that time no other compaction and no 'head GC
completed' and no 'WAL checkpoint completed' took place and Prometheus
died with out-of-mememory eventually.
hostA:
- 19. Oct 11:00 triggered the clean_tombstones api. It started to copy
large amount data (these are three grey dots). In this case, too, no other
compactions took place.
I wonder if it would feasable to do other compactions during that time, at
least right before the next chunk of data will be processed?
Regards,
Matthias