Prometheus Disk Space is Shrinking/Compressing? - How does this work?

1,894 views
Skip to first unread message

squareoc...@gmail.com

unread,
Sep 27, 2018, 9:30:00 AM9/27/18
to Prometheus Users
Hey all,

We released a project using Prometheus last week, and I've been monitoring its disk space and noticed that occasionally the disk space will shrink. 
How does this work? Is Prometheus compressing the data? I've checked our metrics over Grafana and no data have been lost.

last week: 106M, 216M, 128M
this week: 323M, 510M, 450M, 383M

I can also post the individual file distribution I recorded using du -ah if that can help figure out what's going on.

I've skimmed through documentation, and googled a bit but can't seem to find much; most people were complaining about it using too much disk space. Any insight would be helpful. Thanks!

Ben Kochie

unread,
Sep 27, 2018, 9:38:37 AM9/27/18
to squareoc...@gmail.com, Prometheus Users
There's a talk from PromCon about the TSDB storage engine.


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/2cf65dbb-7b53-473f-a24a-9fe98fdd0923%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

squareoc...@gmail.com

unread,
Oct 2, 2018, 12:58:36 PM10/2/18
to Prometheus Users
Hey thanks for the link. I had a quick look over it and it seems to me that Fabian is talking more about compression when data is inserted, rather than the occasional compression that we are experiencing. Do you have any more insight as to why the disk space occasionally shrinks?

Callum Styan

unread,
Oct 3, 2018, 3:56:16 PM10/3/18
to Prometheus Users
You're saying that in Grafana you don't see any missing data, I'm assuming you mean that you still have the data from ~2 weeks ago, meaning your retention range hasn't passed but you're still seeing the on disk size shrink occasionally?

If you watch the video Ben sent from at the 14 minute walk, Fabian explains the block structure. Over time there will be more and more blocks (by default they cover a 2h time span) that queries would have to merge the results from. So, over time the older blocks are compacted into larger blocks that cover larger time ranges later on (around 19 minutes) he explains that each block has an index file with a set of tables that are used for lookups during queries. If multiple blocks are compacted into a single block then we just have a single index for that data. There's probably some overhead for the index file and general block structure on disk, and what you're seeing may be the compaction of blocks meaning less of those index files.

Lars.s...@hpc.at

unread,
May 2, 2019, 9:08:55 AM5/2/19
to Prometheus Users
I have watched that talk about prometheus storage, but I can not say that it is compressing at all, or at least no idea how to configure it so that it uses up less space.
To me it has accumulated ~ 700 MB a week of data and compressed they make about ~ 30 MB using bzip2 --best.
So does not seem to me that prometheus is good at not wasting space, that's why I am proposing using a filesystem with compression support like BTRFS for storing the prometheus database.
Any ideas?

Christian Hoffmann

unread,
May 2, 2019, 10:22:36 AM5/2/19
to Lars.s...@hpc.at, Prometheus Users
Hi Lars,

On 2019-05-02 15:08, Lars.schotte via Prometheus Users wrote:
> I have watched that talk about prometheus storage, but I can not say
> that it is compressing at all, or at least no idea how to configure it
> so that it uses up less space.
There is not really something to configure there, AFAIK.

> To me it has accumulated ~ 700 MB a week of data and compressed they
> make about ~ 30 MB using bzip2 --best.
I find this rather surprising. While Prometheus does not use classical
compression such as bzip2 or gzip, I would expect a compacted Prometheus
block to be nearly uncompressable.

The used space you are seeing may be dominated by the WAL. This part is
not optimized for efficient storage and it should not grow endlessly (if
it does, there may be another problem -- recent Prometheus versions had
several optimizations/bugfixes there). I would not be surprised if this
part is well-compressable.

To confirm, I tried some tests on a small Prometheus instance of my own
(4k metrics, still on 2.7.2) and I'm a bit surprised about the
compressability:

$ du -sh .
678M .

$ tar c . | wc -c
708106240

$ tar cz . | wc -c
317304621

$ tar cj . | wc -c
231356989

Compression via gzip would be able to reduce the overall amount to 44%,
bzip2 even to 32%.

The compress-it-all-test is a bit unfair as Prometheus needs to keep it
in chunks. Also, the WAL special case from above applies as well.
However, single-chunk tests still yield unexpectedly good results:

$ cat ./01D9V7R7BK6ZF0YVEMGQ23JXGP/chunks/000001 | wc -c
18211931

$ cat ./01D9V7R7BK6ZF0YVEMGQ23JXGP/chunks/000001 | gzip -9 | wc -c
9630865

$ cat ./01D9V7R7BK6ZF0YVEMGQ23JXGP/chunks/000001 | bzip2 -9 | wc -c
9477296

(both down to ~50ish%)

So, there may be indeed some potential savings in storage space. The
question is whether implementing (or using via a compressing filesystem)
would be worth it, as data would have to be compressed and uncompressed
on-the-fly, as such increasing CPU demands.


Back to your actual case -- can you provide some more details like
- your Prometheus version
- if you started fresh on this version or if you updated; if latter,
from which version
- space distribution within your data directory, e.g. du -shc
/var/lib/prometheus/data/*
- amount of currently available metrics (e.g. curl -sG
localhost:9090/api/v1/query --data-urlencode
'query=count({__name__=~".+"})' | python -m json.tool)

More information about the storage layout can be found here, btw:
https://prometheus.io/docs/prometheus/latest/storage/

Kind regards,
Christian


> So does not seem to me that prometheus is good at not wasting space,
> that's why I am proposing using a filesystem with compression support
> like BTRFS for storing the prometheus database.
> Any ideas?
>
> Am Donnerstag, 27. September 2018 15:30:00 UTC+2 schrieb
> squareoc...@gmail.com:
>
> Hey all,
>
> We released a project using Prometheus last week, and I've been
> monitoring its disk space and noticed that occasionally the disk
> space will shrink. 
> How does this work? Is Prometheus compressing the data? I've checked
> our metrics over Grafana and no data have been lost.
>
> last week: 106M, 216M, 128M
> this week: 323M, 510M, 450M, 383M
>
> I can also post the individual file distribution I recorded using du
> -ah if that can help figure out what's going on.
>
> I've skimmed through documentation, and googled a bit but can't seem
> to find much; most people were complaining about it using too much
> disk space. Any insight would be helpful. Thanks!
>
> --
> You received this message because you are subscribed to the Google
> Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to prometheus-use...@googlegroups.com
> <mailto:prometheus-use...@googlegroups.com>.
> To post to this group, send email to promethe...@googlegroups.com
> <mailto:promethe...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/c801e165-bbe5-4685-b563-bc3e858bdc38%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/c801e165-bbe5-4685-b563-bc3e858bdc38%40googlegroups.com?utm_medium=email&utm_source=footer>.

Lars Schotte

unread,
May 2, 2019, 10:32:09 AM5/2/19
to ma...@hoffmann-christian.info, promethe...@googlegroups.com
No, we are just starting out with prometheus so it is the latest
version and we have it running against idle testing kafka.
So, no production yet.

But yes, you were correct, it is being dominated by WAL:
14M data/01D9DY12373B6BH8VPVXJY977C
39M data/01D9FMYWJ015MAPE7S4ZFZGYZ8
37M data/01D9HJRG21ASD8WAXCY7KXDF4P
38M data/01D9KGHZ1WSPP9N22XFZY0D6ZM
38M data/01D9NEBGSCV3A8NBAF0DSE0YMA
37M data/01D9QC51HZTJZKACZ0AE4RF85F
37M data/01D9S9YJV3P67QKFSX7Q9NT7SA
36M data/01D9V7R40XSSKBBSP8CQ4C6HEQ
4,5M data/01D9VXX0B891RH65ET4NY1SE5G
13M data/01D9VXX0Z114CBYDNQ0904XPJT
4,0M data/01D9W36ZSQ7WRC1ATXA2KMW4SV
0 data/lock
452M data/wal
745M insgesamt (means sum!)

The chunks inside WAL are about 128 MB big.
I am running bzip2 --best --keep against it and getting:

128M 00000022
3,5M 00000022.bz2

this comparison!

So we found the source! LOL!

Lars Schotte

unread,
May 2, 2019, 10:35:11 AM5/2/19
to ma...@hoffmann-christian.info, promethe...@googlegroups.com
However,

just now, out of curioisity I was also trying to look how chunks are
being compressed, those are the little ones:
35M 000001
2,1M 000001.bz2

NOT AS BIG of a difference, but still. Far from your 50%.

Ben Kochie

unread,
May 6, 2019, 4:00:14 AM5/6/19
to Lars Schotte, ma...@hoffmann-christian.info, promethe...@googlegroups.com
This is an interesting discussion.

As it was said above, there is some trade-off of compression and efficiency of reading.

In order to provide efficient access to time-series data (chunks) and label metadata (index), Prometheus uses mmap to map the files into memory. The data in memory can then be read directly without need to have a decompressed version. In theory, it would be possible to wrap the access functions with a decompressor, but it would likely cost a lot of additional CPU and memory.

One thing I would like to see is better compression of the WAL. Since the WAL is used for crash recovery and remote-write, it's not really in the "hot path". Perhaps the WAL writer could be wrapped in a snappy stream compressor.


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/ac5f791e6d9fa6e10cafbe7cebf86c727913d550.camel%40hpc.at.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/5e4a7637-7bd3-6d16-334e-78b907dd3705%40hoffmann-christian.info.

krasi...@gmail.com

unread,
May 6, 2019, 4:14:16 AM5/6/19
to Prometheus Users
Yep I think the same,but still opened an issue as it is an interesting topic and worth investigating.


On Monday, 6 May 2019 11:00:14 UTC+3, Ben Kochie wrote:
This is an interesting discussion.

As it was said above, there is some trade-off of compression and efficiency of reading.

In order to provide efficient access to time-series data (chunks) and label metadata (index), Prometheus uses mmap to map the files into memory. The data in memory can then be read directly without need to have a decompressed version. In theory, it would be possible to wrap the access functions with a decompressor, but it would likely cost a lot of additional CPU and memory.

One thing I would like to see is better compression of the WAL. Since the WAL is used for crash recovery and remote-write, it's not really in the "hot path". Perhaps the WAL writer could be wrapped in a snappy stream compressor.


On Thu, May 2, 2019 at 4:35 PM 'Lars Schotte' via Prometheus Users <promethe...@googlegroups.com> wrote:
However,

just now, out of curioisity I was also trying to look how chunks are
being compressed, those are the little ones:
35M     000001
2,1M    000001.bz2

NOT AS BIG of a difference, but still. Far from your 50%.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.

> To post to this group, send email to promethe...@googlegroups.com
> <mailto:promethe...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/c801e165-bbe5-4685-b563-bc3e858bdc38%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/c801e165-bbe5-4685-b563-bc3e858bdc38%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages