Is there a reason why Prometheus chunk file size getting increased overtime?

115 views
Skip to first unread message

akbill

unread,
Jan 18, 2024, 4:55:53 AM1/18/24
to Prometheus Users

Hi all,

Our customer has been using the Prometheus for years. The configurations have not been changed since day 1 and recently the file system disk usage is over 90% and keeps increasing(not always right away but gradually). Is there a reason why new chunk size increases over time?

Although we have many application Metrics used to monitor our network elements and performances, they are never changed or added. We are just getting the same data KPIs over time.

Files in July (Month; Day; Time; DirectoryName; Size in MB)
image

image

Files in Dec. (Month; Day; Time; DirectoryName; Size in MB)
image

image

image

So, is there a reason why chunk size increases over time? Any way we can sustain the file size w/o changing the retention time?

Ben Kochie

unread,
Jan 18, 2024, 5:02:23 AM1/18/24
to akbill, Prometheus Users
Can you graph these two metrics, over the time range you are talking about?

prometheus_tsdb_head_series

rate(prometheus_tsdb_head_samples_appended_total[1h])

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/feb914d1-fbd4-43c9-b062-d094d3278de8n%40googlegroups.com.

Ben Kochie

unread,
Jan 18, 2024, 5:05:56 AM1/18/24
to akbill, Prometheus Users
Also please include more information.
* Prometheus version.
* All command line flags.

akbill

unread,
Jan 18, 2024, 11:59:21 PM1/18/24
to Prometheus Users
Hi Ben, 

Thanks for the feedback. Outputs as below:

prometheus_build_info{branch="HEAD",goversion="go1.10.3",instance="localhost:9090",job="prometheus",revision="188ca45bd85ce843071e768d855722a9d9dabe03",version="2.3.1"}

prometheus_tsdb_head_series.png

prometheus_tsdb_head_samples_appended_total.png

CLI_Flags.png

so according to the graphs, the series and data points do get increased over time?   However, like I mentioned, the configurations remain the same.
And from the application PM's point of view(we are monitoring our network traffic and NE subscriber count, etc), the traffic/subscribers are around the same level as before as well.

Any ideas? 
   
Ben Kochie 在 2024年1月18日 星期四下午6:05:56 [UTC+8] 的信中寫道:

Ben Kochie

unread,
Jan 19, 2024, 3:31:20 AM1/19/24
to akbill, Prometheus Users
Your graph is only for 2 weeks, but you are asking about changes over 6 months. Please widen the graph to show your trend of the same time.

But yes, it does look like there is a slow increase in the number of metrics being collected. This would account for the increased storage needs.

Even if your configuration remains the same, it is up to the targets to determine how much they send. There are an infinite number of ways this could happen, so it will be up to you to investigate why.

To start, you can look at something like `sum by (job) (scrape_samples_post_metric_relabeling)` to find out which job may be increasing the number of metrics. There are a large number of tutorials and guides out there on how to investigate.

Side note, your Prometheus version, 2.3.1, is extremely out of date. There are large number of critical bugs and security issues that have been fixed since that release in 2018. I highly recommend upgrading to the latest release at least once per year.

Reply all
Reply to author
Forward
0 new messages