Is there a reason why Prometheus chunk file size getting increased overtime?

akbill

unread,

Jan 18, 2024, 4:55:53 AM1/18/24

to Prometheus Users

Hi all,

Our customer has been using the Prometheus for years. The configurations have not been changed since day 1 and recently the file system disk usage is over 90% and keeps increasing(not always right away but gradually). Is there a reason why new chunk size increases over time?

Although we have many application Metrics used to monitor our network elements and performances, they are never changed or added. We are just getting the same data KPIs over time.

Files in July (Month; Day; Time; DirectoryName; Size in MB)

Files in Dec. (Month; Day; Time; DirectoryName; Size in MB)

So, is there a reason why chunk size increases over time? Any way we can sustain the file size w/o changing the retention time?

Ben Kochie

unread,

Jan 18, 2024, 5:02:23 AM1/18/24

to akbill, Prometheus Users

Can you graph these two metrics, over the time range you are talking about?

prometheus_tsdb_head_series

rate(prometheus_tsdb_head_samples_appended_total[1h])

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/feb914d1-fbd4-43c9-b062-d094d3278de8n%40googlegroups.com.

Ben Kochie

unread,

Jan 18, 2024, 5:05:56 AM1/18/24

to akbill, Prometheus Users

Also please include more information.
* Prometheus version.

* All command line flags.

akbill

unread,

Jan 18, 2024, 11:59:21 PM1/18/24

to Prometheus Users

Hi Ben,

Thanks for the feedback. Outputs as below:

prometheus_build_info{branch="HEAD",goversion="go1.10.3",instance="localhost:9090",job="prometheus",revision="188ca45bd85ce843071e768d855722a9d9dabe03",version="2.3.1"}

so according to the graphs, the series and data points do get increased over time? However, like I mentioned, the configurations remain the same.

And from the application PM's point of view(we are monitoring our network traffic and NE subscriber count, etc), the traffic/subscribers are around the same level as before as well.

Any ideas?

Ben Kochie 在 2024年1月18日星期四下午6:05:56 [UTC+8] 的信中寫道：

Ben Kochie

unread,

Jan 19, 2024, 3:31:20 AM1/19/24

to akbill, Prometheus Users

Your graph is only for 2 weeks, but you are asking about changes over 6 months. Please widen the graph to show your trend of the same time.

But yes, it does look like there is a slow increase in the number of metrics being collected. This would account for the increased storage needs.

Even if your configuration remains the same, it is up to the targets to determine how much they send. There are an infinite number of ways this could happen, so it will be up to you to investigate why.

To start, you can look at something like `sum by (job) (scrape_samples_post_metric_relabeling)` to find out which job may be increasing the number of metrics. There are a large number of tutorials and guides out there on how to investigate.

Side note, your Prometheus version, 2.3.1, is extremely out of date. There are large number of critical bugs and security issues that have been fixed since that release in 2018. I highly recommend upgrading to the latest release at least once per year.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a56e6b37-25f2-4abc-be9c-fd3d605724a6n%40googlegroups.com.

Reply all

Reply to author

Forward