fading out sample resolution for samples from longer ago possible?

105 views
Skip to first unread message

Christoph Anton Mitterer

unread,
Feb 20, 2023, 10:29:23 PM2/20/23
to Prometheus Users
Hey.

I wondered whether one can to with Prometheus something similar that is possible with systems using RRD (e.g. Ganlia).

Depending on the kind of metrics, like for those from the node exporter, one may want a very high sample resolution (and thus short scraping interval) for like the last 2 days,... but the further one goes back the less interesting those data becomes, at least in that resolution (ever looked a how much IO a server had 2 years ago per 15s)?

What one may however want is a rough overview of these metrics for those time periods longer ago, e.g. in order to see some trends.


For other values, e.g. the total used disk space on a shared filesystem or maybe a tape library, one may not need such high resolution for the last 2 days, but therefore want the data (with low sample resolution, e.g. 1 sample per day) going back much longer, like the last 10 years.


With Ganglia/RRD it one would then simply use multiple RRDs, each for different time spans and with different resolutions... and RRD would interpolate it's samples accordingly.


Can anything like this be done with Prometheus? Or is that completely out of scope?


I saw that one can set the retention period, but that seems to affect everything.

So even if I have e.g. my low resolution tape library total size, which I could scrape only every hour or so, ... it wouldn't really help me.
In order to keep data for that like the last 10 years, I'd need to set the retention time to that.

But then the high resolution samples like from the node exporter would also be kept that long (with full resolution).


Thanks,
Chris.

Stuart Clark

unread,
Feb 21, 2023, 2:31:46 AM2/21/23
to Christoph Anton Mitterer, Prometheus Users
Prometheus itself cannot do downsampling, but other related projects
such as Cortex & Thanos have such features.

--
Stuart Clark

Julien Pivotto

unread,
Feb 21, 2023, 5:45:32 AM2/21/23
to Stuart Clark, Christoph Anton Mitterer, Prometheus Users
We would love to have this in the future but it would require careful
planning and design document.


--
Julien Pivotto
@roidelapluie

Ben Kochie

unread,
Feb 21, 2023, 9:53:16 AM2/21/23
to Christoph Anton Mitterer, Prometheus Users
This is mostly unnecessary in Prometheus because it uses compression in the TSDB samples. What would take up a lot of space in an RRD file takes up very little space in Prometheus.

A basic nearline 20TB HDD can easily store 600,000 series for 10 years at full 15s resolution.

This is possible because the average sample point size in Prometheus is about 1.5 bytes per sample. So 1.5 bytes * 5760 samples/day * 365 days * 10 years =~ 30MiB.

So for your example, looking up the data for a single metric over a long period of time is still pretty cheap. What's actually more difficult is doing all the index loads for this long period of time. But Prometheus uses mmap to opportunistically access the data on disk.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/36e3506c-1fba-48e4-b3d9-ead908767cf2n%40googlegroups.com.

Christoph Anton Mitterer

unread,
Feb 27, 2023, 7:45:36 PM2/27/23
to Julien Pivotto, Stuart Clark, Ben Kochie, Prometheus Users
Hy Stuart, Julien and Ben,

Hope you don't mind that I answer all three replies in one... don't
wanna spam the list ;-)



On Tue, 2023-02-21 at 07:31 +0000, Stuart Clark wrote:
> Prometheus itself cannot do downsampling, but other related projects
> such as Cortex & Thanos have such features.

Uhm, I see. Unfortunately neither is packaged for Debian. Plus it seems
to make the overall system even more complex.

I want to Prometheus merely or monitoring a few hundred nodes (thus it
seems a bit overkill to have something like Cortex, which sounds like a
system for really large number of nodes) at the university, though as
indicated before, we'd need both:
- details data for a like the last week or perhaps two
- far less detailed data for much longer terms (like several years)

Right now my Prometheus server runs in a medium sized VM, but when I
visualise via Grafana and select a time span of a month, it already
takes considerable time (like 10-15s) to render the graph.

Is this expected?




On Tue, 2023-02-21 at 11:45 +0100, Julien Pivotto wrote:
> We would love to have this in the future but it would require careful
> planning and design document.

So native support is nothing on the near horizon?

And I guess it's really not possible to "simply" ( ;-) ) have different
retention times for different metrics?




On Tue, 2023-02-21 at 15:52 +0100, Ben Kochie wrote:
> This is mostly unnecessary in Prometheus because it uses compression
> in the TSDB samples. What would take up a lot of space in an RRD file
> takes up very little space in Prometheus.

Well right now I scrape only the node-exporter data from 40 hosts at a
15s interval plus the metrics from prometheus itself.
I'm doing this on test install since the 21st of February.
Retention time is still at it's default.

That gives me:
# du --apparent-size -l -c -s --si /var/lib/prometheus/metrics2/*
68M /var/lib/prometheus/metrics2/01GSST2X0KDHZ0VM2WEX0FPS2H
481M /var/lib/prometheus/metrics2/01GSVQWH7BB6TDCEWXV4QFC9V2
501M /var/lib/prometheus/metrics2/01GSXNP1T77WCEM44CGD7E95QH
485M /var/lib/prometheus/metrics2/01GSZKFK53BQRXFAJ7RK9EDHQX
490M /var/lib/prometheus/metrics2/01GT1H90WKAHYGSFED5W2BW49Q
487M /var/lib/prometheus/metrics2/01GT3F2SJ6X22HFFPFKMV6DB3B
498M /var/lib/prometheus/metrics2/01GT5CW8HNJSGFJH2D3ADGC9HH
490M /var/lib/prometheus/metrics2/01GT7ANS5KDVHVQZJ7RTVNQQGH
501M /var/lib/prometheus/metrics2/01GT98FETDR3PN34ZP59Y0KNXT
172M /var/lib/prometheus/metrics2/01GT9X2BPN51JGB6QVK2X8R3BR
60M /var/lib/prometheus/metrics2/01GTAASP91FSFGBBH8BBN2SQDJ
60M /var/lib/prometheus/metrics2/01GTAHNDG070WXY8WGDVS22D2Y
171M /var/lib/prometheus/metrics2/01GTAHNHQ587CQVGWVDAN26V8S
102M /var/lib/prometheus/metrics2/chunks_head
21k /var/lib/prometheus/metrics2/queries.active
427M /var/lib/prometheus/metrics2/wal
5,0G total

Not sure whether I understood meta.json correctly (haven't found a
documentation for minTime/maxTime) but I guess that the big ones
correspond to 64800s?

Seem at least quite big to me... that would - assuming all days can be
compressed roughly to that (which isn't sure of course) - mean for one
year one needs ~ 250 GB for that 40 nodes or about 6,25 GB per node
(just for the data for node exporter with a 15s interval).

Does that sound reasonable/expected?



> What's actually more
> difficult is doing all the index loads for this long period of time.
> But Prometheus uses mmap to opportunistically access the data on
> disk.

And is there anything that can be done to improve that? Other than
simply using some fast NVMe or so?



Thanks,
Chris.

Brian Candler

unread,
Feb 28, 2023, 3:27:25 AM2/28/23
to Prometheus Users
On Tuesday, 28 February 2023 at 00:45:36 UTC Christoph Anton Mitterer wrote:
I want to Prometheus merely or monitoring a few hundred nodes (thus it
seems a bit overkill to have something like Cortex, which sounds like a
system for really large number of nodes) at the university

Thanos may be simpler. Although I've not used it myself, it looks like it can be deployed incrementally starting with the sidecars.

 
, though as
indicated before, we'd need both:
- details data for a like the last week or perhaps two
- far less detailed data for much longer terms (like several years)

I can offer a couple more options:

(1) Use two servers with federation.
- server 1 does the scraping and keeps the detailed data for 2 weeks
- server 2 scrapes server 1 at lower interval, using the federation endpoint

(2) Use recording rules to generate lower-resolution copies of the primary timeseries - but then you'd still have to remote-write them to a second server to get the longer retention, since this can't be set at timeseries level.

Either case makes the querying more awkward.  If you don't want separate dashboards for near-term and long-term data, then it might work to stick promxy in front of them.

Apart from saving disk space (and disks are really, really cheap these days), I suspect the main benefit you're looking for is to get faster queries when running over long time periods.  Indeed, I believe Thanos creates downsampled timeseries for exactly this reason, whilst still continuing to retain all the full-resolution data as well.

Right now my Prometheus server runs in a medium sized VM, but when I
visualise via Grafana and select a time span of a month, it already
takes considerable time (like 10-15s) to render the graph.

Ah right, then that is indeed your concern.
 
Is this expected?

That depends.  What PromQL query does your graph use? How many timeseries does it touch? What's your scrape interval?  Is your VM backed by SSDs?

For example, I have a very low performance (Celeron N2820, SATA SSD, 8GB RAM) test box at home.  I scrape data at 15 second intervals. Prometheus is running in an lxd container, alongside many other lxd containers.  The query:

    rate(ifHCInOctets{instance="gw2",ifName="pppoe-out2"}[2m])

run over a 30 day range takes less than a second - but that only touches one timeseries. (With 2-hour chunks, I would expect a 30 day period to read 360 chunks, for a single timeseries).  But it's possible that when I tested it, it already had the relevant data cached in RAM.

If you are doing something like a Grafana dashboard, then you should determine exactly what queries it's doing.  Enabling the query log can also help you identify the slowest running queries.

Another suggestion: running netdata within the VM will give you performance metrics at 1 second intervals, which can help identify what's happening during those 10-15 seconds: e.g. are you bottlenecked on CPU, or disk I/O, or something else.

Ben Kochie

unread,
Feb 28, 2023, 4:25:40 AM2/28/23
to Christoph Anton Mitterer, Julien Pivotto, Stuart Clark, Prometheus Users
On Tue, Feb 28, 2023 at 1:45 AM Christoph Anton Mitterer <cale...@gmail.com> wrote:
Hy Stuart, Julien and Ben,

Hope you don't mind that I answer all three replies in one... don't
wanna spam the list ;-)

Thanks!
 



On Tue, 2023-02-21 at 07:31 +0000, Stuart Clark wrote:
> Prometheus itself cannot do downsampling, but other related projects
> such as Cortex & Thanos have such features.

Uhm, I see. Unfortunately neither is packaged for Debian. Plus it seems
to make the overall system even more complex.

I do not recommend using Debian packages for Prometheus. Debian release cycles are too slow for the pace of Prometheus development. Every release brings new improvements. In the last year we've made improvements to memory use and query performance. Debian also ignores the Go source vendor versions we provide which leads to bugs.

You'd be better off running Prometheus using podman, or deploying official binaries with Ansible[0].

 

I want to Prometheus merely or monitoring a few hundred nodes (thus it
seems a bit overkill to have something like Cortex, which sounds like a
system for really large number of nodes) at the university, though as
indicated before, we'd need both:
- details data for a like the last week or perhaps two
- far less detailed data for much longer terms (like several years)

Right now my Prometheus server runs in a medium sized VM, but when I
visualise via Grafana and select a time span of a month, it already
takes considerable time (like 10-15s) to render the graph.

Is this expected?

No, but It depends on your queries. Without seeing what you're graphing there's no way to tell. Your queries could be complex or inefficient. Kinda like writing slow SQL queries.

There are ways to speed up graphs for specific things, for example you can use recording rules to pre-render parts of the queries.

For example, if you want to graph node CPU utilization you can have a recording rule like this:

groups:
  - name: node_exporter
    interval: 60s
    rules:
      - record: instance:node_cpu_utilization:ratio_rate1m
        expr: >
          avg without (cpu) (
            sum without (mode) (
              rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"}[1m])
            )
          )

This will give you a single metric per node that will be faster to render over longer periods of time. It also effectively down-samples by only recording one point per minute.

Also "Medium sized VM" doesn't give us any indication of how much CPU or memory you have. Prometheus uses page cache for database access. So maybe your system is lacking enough memory to effectively cache the data you're accessing.
 




On Tue, 2023-02-21 at 11:45 +0100, Julien Pivotto wrote:
> We would love to have this in the future but it would require careful
> planning and design document.

So native support is nothing on the near horizon?

And I guess it's really not possible to "simply" ( ;-) ) have different
retention times for different metrics?

No, we've talked about having variable retention times, but nobody has implemented this. It's possible to script this via the DELETE endpoint[1]. It would be easy enough to write a cron job that deletes specific metrics older than X, but I haven't seen this packaged into a simple tool. I would love to see something like this created.

Yes, Prometheus writes blocks every 2 hours, and then compacts them. It does this 3 blocks at a time so 2h -> 6h -> 18h (64800s). With a longer retention time, the compactions will increase up to 21 days, which improves index size efficiency. So the efficiency should improve if you update your retention to a longer time.
 

Seem at least quite big to me... that would - assuming all days can be
compressed roughly to that (which isn't sure of course) - mean for one
year one needs ~ 250 GB for that 40 nodes or about 6,25 GB per node
(just for the data for node exporter with a 15s interval).

Without seeing a full meta.json and the size of the files in one dir, it's hard to say exactly if this is good or bad. It depends a bit on how many series/samples are in each block. Just guessing, it seems like you have about 2000 metrics per node.

Seems reasonable, we're only talking about 2TiB per year for all 300 of your servers. Seems perfectly reasonable to me.


Does that sound reasonable/expected?



> What's actually more
> difficult is doing all the index loads for this long period of time.
> But Prometheus uses mmap to opportunistically access the data on
> disk.

And is there anything that can be done to improve that? Other than
simply using some fast NVMe or so?

NVMe isn't necessary. As I said above, memory for computing results is far more important. Prometheus needs page cache space. You want to avoid reading and swapping out the data needed for a graph.
 



Thanks,
Chris.

Christoph Anton Mitterer

unread,
Mar 1, 2023, 9:58:20 PM3/1/23
to Brian Candler, Prometheus Users
Hey Brian

On Tue, 2023-02-28 at 00:27 -0800, Brian Candler wrote:
>
> I can offer a couple more options:
>
> (1) Use two servers with federation.
> - server 1 does the scraping and keeps the detailed data for 2 weeks
> - server 2 scrapes server 1 at lower interval, using the federation
> endpoint

I had thought about that as well. Though it feels a bit "ugly".


> (2) Use recording rules to generate lower-resolution copies of the
> primary timeseries - but then you'd still have to remote-write them
> to a second server to get the longer retention, since this can't be
> set at timeseries level.

I had (very briefly) read about the recording rules (merely just that
they exist ^^) ... but wouldn't these give me a new name for the
metric?

If so, I'd need to adapt e.g.
https://grafana.com/grafana/dashboards/1860-node-exporter-full/ to use
the metrics generated by the recording rules,... which again seems
quite some maintenance effort.

Plus, as you even wrote below, I'd need users to use different
dashboards, AFAIU, one where the detailed data is used, one where the
downsampled data is used.
Sure that would work as a workaround, but is of course not really a
good solution, as one would rather want to "seamlessly" move from the
detailed to less-detailed data.


> Either case makes the querying more awkward.  If you don't want
> separate dashboards for near-term and long-term data, then it might
> work to stick promxy in front of them.

Which would however make the setup more complex again.


> Apart from saving disk space (and disks are really, really cheap
> these days), I suspect the main benefit you're looking for is to get
> faster queries when running over long time periods.  Indeed, I
> believe Thanos creates downsampled timeseries for exactly this
> reason, whilst still continuing to retain all the full-resolution
> data as well.

I guess I may have too look into that, how complex it's setup would be.



> That depends.  What PromQL query does your graph use? How many
> timeseries does it touch? What's your scrape interval?

So far I've just been playing with the ones from:
https://grafana.com/grafana/dashboards/1860-node-exporter-full/
So all queries in that and all time series that uses.

Interval is 15s.


> Is your VM backed by SSDs?

I think it's a Ceph cluster what the super computing centre uses for
that, but I have no idea what that runs upon. Probably HDDs.


> Another suggestion: running netdata within the VM will give you
> performance metrics at 1 second intervals, which can help identify
> what's happening during those 10-15 seconds: e.g. are you
> bottlenecked on CPU, or disk I/O, or something else.

Good idea, thanks.


Thanks,
Chris.

Christoph Anton Mitterer

unread,
Mar 1, 2023, 10:57:41 PM3/1/23
to Ben Kochie, Prometheus Users
On Tue, 2023-02-28 at 10:25 +0100, Ben Kochie wrote:
>
> Debian release cycles are too slow for the pace of Prometheus
> development.

It's rather simple to pull the version from Debian unstable, if on
needs so, and that seems pretty current.


> You'd be better off running Prometheus using podman, or deploying
> official binaries with Ansible[0].

Well I guess view on how software should be distributed differ.

The "traditional" system of having distributions has many advantages
and is IMO a core reason for the success of Linux and OpenSource.

All "modern" alternatives like flatpaks, snaps, and similar repos are
IMO especially security wise completely inadequate (especially the fact
that there is no trusted intermediate (like the distribution) which
does some basic maintenance.

It's anyway not possible here because of security policy reasons.


>
> No, but It depends on your queries. Without seeing what you're
> graphing there's no way to tell. Your queries could be complex or
> inefficient. Kinda like writing slow SQL queries.

As mentioned already in the other thread, so far I merely do only what:
https://grafana.com/grafana/dashboards/1860-node-exporter-full/
does.


> There are ways to speed up graphs for specific things, for example
> you can use recording rules to pre-render parts of the queries.
>
> For example, if you want to graph node CPU utilization you can have a
> recording rule like this:
>
> groups:
>   - name: node_exporter
>     interval: 60s
>     rules:
>       - record: instance:node_cpu_utilization:ratio_rate1m
>         expr: >
>           avg without (cpu) (
>             sum without (mode) (
>              
> rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"
> }[1m])
>             )
>           )
>
> This will give you a single metric per node that will be faster to
> render over longer periods of time. It also effectively down-samples
> by only recording one point per minute.

But will dashbords like Node Exporter Full automatically use such?
And if so... will they (or rather Prometheus) use the real time series
(with full resolution) when needed?

If so, then the idea would be to create such a rule for every metric
I'm interested in and that is slow, right?



> Also "Medium sized VM" doesn't give us any indication of how much CPU
> or memory you have. Prometheus uses page cache for database access.
> So maybe your system is lacking enough memory to effectively cache
> the data you're accessing.

Right now it's 2 (virtual CPUs) with 4.5 GB RAM... I'd guess it might
need more CPU?

Previously I suspected IO to be the reason, and while in fact IO is
slow (the backend seems to deliver only ~100MB/s)... there seems to be
nearly no IO at all while waiting for the "slow graph" (which is Node
Export Full's "CPU Basic" panel), e.g. when selecting the last 30 days.

Kinda surprising... does Prometheus read it's TSDB really that
efficiently?


Could it be a problem, when the Grafana runs on another VM? Though
there didn't seem to be any network bottleneck... and I guess Grafana
just always accesses Prometheus via TCP, so there should be no further
positive caching effect when both run on the same node?


> No, we've talked about having variable retention times, but nobody
> has implemented this. It's possible to script this via the DELETE
> endpoint[1]. It would be easy enough to write a cron job that deletes
> specific metrics older than X, but I haven't seen this packaged into
> a simple tool. I would love to see something like this created.
>
> [1]: 
> https://prometheus.io/docs/prometheus/latest/querying/api/#delete-
> series 

Does it make sense to open a feature request ticket for that?

I mean it would solve at least my storage "issue" (well it's not really
a showstopper... as it was mentioned one could simply by a big check
HDD/SSD).

And could via the same way be something made that downsamples data for
times longer ago?


Both together would really give quite some flexibility.

For metrics where old data is "boring" one could just delete
everything older than e.g. 2 weeks, while keeping full details for that
time.

For metrics where one is interested in larger time ranges, but where
sample resolution doesn't matter so much, one could downsample it...
like everything older then 2 weeks... then even more for everything
older than 6 months, then even more for everything older than 1 year...
and so on.

For few metrics where full resolution data is interesting over a really
long time span, one could just keep it.



> > Seem at least quite big to me... that would - assuming all days can
> > be
> > compressed roughly to that (which isn't sure of course) - mean for
> > one
> > year one needs ~ 250 GB for that 40 nodes or about 6,25 GB per node
> > (just for the data for node exporter with a 15s interval).
>
> Without seeing a full meta.json and the size of the files in one dir,
> it's hard to say exactly if this is good or bad. It depends a bit on
> how many series/samples are in each block. Just guessing, it seems
> like you have about 2000 metrics per node.

Yes... so far each node just runs node-exporter, and that seems to
have:
$ curl localhost:9100/metrics 2>/dev/null | grep -v ^# | wc -l
2144

… metrics in the version I'm using of it.


> Seems reasonable, we're only talking about 2TiB per year for all 300
> of your servers. Seems perfectly reasonable to me.

Okay... good... I just wasn't sure whether that's "normal"... but I
guess I can live with it quite well.

>


Thanks for your help :-)

Chris.

Ben Kochie

unread,
Mar 2, 2023, 1:11:44 AM3/2/23
to Christoph Anton Mitterer, Prometheus Users
On Thu, Mar 2, 2023 at 4:57 AM Christoph Anton Mitterer <cale...@gmail.com> wrote:
On Tue, 2023-02-28 at 10:25 +0100, Ben Kochie wrote:
>
> Debian release cycles are too slow for the pace of Prometheus
> development.

It's rather simple to pull the version from Debian unstable, if on
needs so, and that seems pretty current.


> You'd be better off running Prometheus using podman, or deploying
> official binaries with Ansible[0].

Well I guess view on how software should be distributed differ.

The "traditional" system of having distributions has many advantages
and is IMO a core reason for the success of Linux and OpenSource.

All "modern" alternatives like flatpaks, snaps, and similar repos are
IMO especially security wise completely inadequate (especially the fact
that there is no trusted intermediate (like the distribution) which
does some basic maintenance.

And I didn't say to use those. I said to use our official OCI container image or release binaries.
 

It's anyway not possible here because of security policy reasons.

That allows you to pull from unstable? :confused-pikachu:
 


>
> No, but It depends on your queries. Without seeing what you're
> graphing there's no way to tell. Your queries could be complex or
> inefficient. Kinda like writing slow SQL queries.

As mentioned already in the other thread, so far I merely do only what:
https://grafana.com/grafana/dashboards/1860-node-exporter-full/
does.


> There are ways to speed up graphs for specific things, for example
> you can use recording rules to pre-render parts of the queries.
>
> For example, if you want to graph node CPU utilization you can have a
> recording rule like this:
>
> groups:
>   - name: node_exporter
>     interval: 60s
>     rules:
>       - record: instance:node_cpu_utilization:ratio_rate1m
>         expr: >
>           avg without (cpu) (
>             sum without (mode) (
>              
> rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"
> }[1m])
>             )
>           )
>
> This will give you a single metric per node that will be faster to
> render over longer periods of time. It also effectively down-samples
> by only recording one point per minute.or


But will dashbords like Node Exporter Full automatically use such?
And if so... will they (or rather Prometheus) use the real time series
(with full resolution) when needed?

Nope. That dashboard is meant to be generic, not efficient. It's a nice demo, but not something I use or recommend other than to get ideas.
 

If so, then the idea would be to create such a rule for every metric
I'm interested in and that is slow, right?



> Also "Medium sized VM" doesn't give us any indication of how much CPU
> or memory you have. Prometheus uses page cache for database access.
> So maybe your system is lacking enough memory to effectively cache
> the data you're accessing.

Right now it's 2 (virtual CPUs) with 4.5 GB RAM... I'd guess it might
need more CPU?

Maybe not CPU right now. What do the metrics say? ;-)
 

Previously I suspected IO to be the reason, and while in fact IO is
slow (the backend seems to deliver only ~100MB/s)... there seems to be
nearly no IO at all while waiting for the "slow graph" (which is Node
Export Full's "CPU Basic" panel), e.g. when selecting the last 30 days.

Kinda surprising... does Prometheus read it's TSDB really that
efficiently?

Without seeing more of what's going on in your system, it's hard to say. You have adequate CPU and memory for 40 nodes. You'll probably want about 2x what you have for 300 nodes.

From what I can tell so far, downsampling isn't going to fix your performance problem. Something else is going on. 
 


Could it be a problem, when the Grafana runs on another VM? Though
there didn't seem to be any network bottleneck... and I guess Grafana
just always accesses Prometheus via TCP, so there should be no further
positive caching effect when both run on the same node?

No, not likely a problem. I have seen much larger installs running without problem.
 


> No, we've talked about having variable retention times, but nobody
> has implemented this. It's possible to script this via the DELETE
> endpoint[1]. It would be easy enough to write a cron job that deletes
> specific metrics older than X, but I haven't seen this packaged into
> a simple tool. I would love to see something like this created.
>
> [1]: 
> https://prometheus.io/docs/prometheus/latest/querying/api/#delete-
> series 

Does it make sense to open a feature request ticket for that?


There already are tons of issues about this. The problem is nobody wants to write the code and maintain it. Prometheus is an open source project, not a company.
 
I mean it would solve at least my storage "issue" (well it's not really
a showstopper... as it was mentioned one could simply by a big check
HDD/SSD).

I mean, the kind of space you're talking about isn't expensive. My laptop has 2T of NVMe storage and my homelab server has 50TiB of N+2 redundant storage.

Again, downsampling isn't going to solve your problems. The actual samples are not really the bottleneck in the size of setup you're talking about. Mostly it's series index reads that tend to slow things down.

Say you want to read the full CPU history for a year for a 2 CPU server. Scanning that is going to require loading the series indexes and samples should really only read a few megabytes of data from disk.

I think the main issue you're running into is that node exporter full dashboard. I haven't looked at that one in a while, but it's very poorly written. For example I just looked at this panel: CPU Busy

It has one of the worst queries I've seen in a long time for how to compute CPU utilization.

(sum by(instance) (irate(node_cpu_seconds_total{instance="$node",job="$job", mode!="idle"}[$__rate_interval]))
 /
 on(instance) group_left sum by (instance)((irate(node_cpu_seconds_total{instance="$node",job="$job"}[$__rate_interval])))) * 100

* It uses irate(), which is not what you want for a graph of utilization over time.
* It scans every CPU and mode twice minus idle.

No wonder you are having performance issues.

Replacing that panel query with something like this would make it far more efficient:

avg without (cpu,mode) (
  1-rate(node_cpu_seconds_total{instance="$node",job="$job",mode="idle"}[$__rate_interval])
) * 100

This would cut the number of series touched by over 90%.


And could via the same way be something made that downsamples data for
times longer ago?

Downsampling is not your problem. Sorry, Prometheus is not RRD, the problems you are running into are unrelated. You're optimizing for a problem that basically doesn't exist in Prometheus.
Reply all
Reply to author
Forward
0 new messages