Rollup and save historical data

160 views
Skip to first unread message

Boris

unread,
Jun 15, 2023, 12:21:08 AM6/15/23
to promethe...@googlegroups.com
Dear prometheus community,

my prometheus scrapes a lot of endpoints and I only need to 10s for a month. After that I would like to rollup the data to 5m averages and after 6 months I would like to have only hourly averages.

So I can have quite a long time of data and can see changes over a long time with the same dashboard.

I am sorry to ask this question but it seems like I don't have the correct search term. Downsampling seems to keep the raw data and all searches lead me to timescaledb.

Cheers
Boris

Ben Kochie

unread,
Jun 15, 2023, 12:24:11 AM6/15/23
to Boris, promethe...@googlegroups.com
My first question is "why?" (https://xyproblem.info/)

What problem are you solving for? Why do you think you need this?

You say "a lot of endpoints", but what is a lot? 1,000? 10,000? 100,000? 1,000,000? How many series?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/8D01579C-5696-40BB-9841-A1E3E0094DD1%40kervyn.de.

Boris Behrens

unread,
Jun 15, 2023, 3:01:40 AM6/15/23
to Ben Kochie, promethe...@googlegroups.com
Hi Ben,
I've read the link you gave me and I will try to answer the best I can.

> What problem are you solving for?
I want to keep historical data for longer periods with less disk space used.
I currently keep around 4 month of data for around 1.3T of disk space.
and I would love to have 1-2 years of data without having to buy 20TB
disks :)

> Why do you think you need this?
For debugging and monitoring I only need the last couple of days with
a raw resolution. And seeing trends can be downsampled (is this the
correct term?) because it is just historical trend data.
I think it's rather a use case than an actual problem I am working on.
I would like to see in the past how things changed and overlay it with
timestamps from changes. Things begin to looks way different when you
check the past.

> You say "a lot of endpoints", but what is a lot? 1,000? 10,000? 100,000? 1,000,000? How many series?
A couple hundred endpoints with around 1.5m of series. But I don't
think that this information is helpful in the current context.

Thank you for your time and effort to help me with my stupid little problem.
Cheers
Boris
--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.

sayf eddine Hammemi

unread,
Jun 15, 2023, 3:10:09 AM6/15/23
to Boris Behrens, Ben Kochie, promethe...@googlegroups.com
Hello
What you need is downsampling, which is not supported directly by prometheus (as a matter of fact, storing historical data is not either recommended on bare prometheus),
It is time to check external storage solutions for prometheus and choose the one that suits your needs (mimir, cortex, thanos, m3db, etc ...)

Brian Candler

unread,
Jun 15, 2023, 3:18:04 AM6/15/23
to Prometheus Users
Thanos supports downsampling,  but the main driver is to enable faster queries over large time ranges, not to reduce storage space.
Do remember that prometheus' storage is already extremely efficient at compression, especially for timeseries which are static or rarely change.

Ben Kochie

unread,
Jun 15, 2023, 4:59:31 AM6/15/23
to sayf eddine Hammemi, Boris Behrens, promethe...@googlegroups.com
This is an obsolete recommendation. It is perfectly acceptable to store long-term data in Prometheus for small scale (single instance, sub 10M series) deployments.

Ben Kochie

unread,
Jun 15, 2023, 5:10:32 AM6/15/23
to Boris Behrens, promethe...@googlegroups.com
On Thu, Jun 15, 2023 at 9:01 AM Boris Behrens <b...@kervyn.de> wrote:
Hi Ben,
I've read the link you gave me and I will try to answer the best I can.

> What problem are you solving for?
I want to keep historical data for longer periods with less disk space used.
I currently keep around 4 month of data for around 1.3T of disk space.
and I would love to have 1-2 years of data without having to buy 20TB
disks :)

2 years of data in your case is only 8T. That's not a large amount of disk.
 

> Why do you think you need this?
For debugging and monitoring I only need the last couple of days with
a raw resolution. And seeing trends can be downsampled (is this the
correct term?) because it is just historical trend data.
I think it's rather a use case than an actual problem I am working on.
I would like to see in the past how things changed and overlay it with
timestamps from changes. Things begin to looks way different when you
check the past.

Yes, downsampling is the correct term. But downsampling is not required to do this. Grafana supports special `$__interval` and `$__rate_interval` variables that can be used to automatically scale graphs based on desired time ranges.

`

> You say "a lot of endpoints", but what is a lot? 1,000? 10,000? 100,000? 1,000,000? How many series?
A couple hundred endpoints with around 1.5m of series. But I don't
think that this information is helpful in the current context.

It is, because it gives us a sense of scale for your problem. You're running a small deployment, so no fancy changes are really necessary. For comparison, my $dayjob has about 900TiB of TSDB storage (Thanos object storage). Managing this is a different scale to a single node deployment.

I think you're falling into the premature optimization trap.

You could use a full Thanos deployment, as it supports exactly the kind of variable retention downsampling you want. But it's going to take a lot more work than just buying slightly bigger disks to store Prometheus on.
Reply all
Reply to author
Forward
0 new messages