Options for long-term archival

827 views

Skip to first unread message

Brian Candler

unread,

Aug 31, 2017, 4:28:50 AM8/31/17

to Prometheus Users

I was wondering what the options are for long-term archival of key metrics, e.g. for capacity planning and alerting history.

Looking at https://github.com/prometheus/prometheus/issues/1381 I gather that:

- the default retention period is 15 days

- prometheus was not designed for long-term archival storage (although the chunk-based format, where a chunk is not touched after it's written, seems quite well suited to me)

So:

(1) I can use remote_write to write to influxdb or opentsdb. AFAICS, there is no filtering here: it looks like remote_write writes *everything* collected to the remote database. The remote database could of course aggregate, and perhaps have a very short retention period for the raw metrics.

But it does seem to me if I do this, I might as well just rely on influxdb or opentsdb for all storage in the first place. A shame, since prometheus v2's storage engine is so awesome :-)

Also: if the remote_write database is down for any reason, will prometheus buffer and catch up when it returns?

(2) I could have a two-tiered prometheus with federation, using the match[] feature to filter to a subset of timeseries of interest, optionally using rules on the origin server to generate aggregated/thinned metrics, and a longer retention period on the second server. This still breaks the philosophy of "prometheus isn't intended for long-term storage" but it looks like a reasonable approach, and it avoids having to manage two completely different types of database.

Again, I'm not sure what happens if the archival server goes down for a bit, but if these are long-term thinned metrics it probably doesn't matter too much.

(3) There could be a separate service which periodically reads a selected subset metrics out of prometheus, possibly does some thinning, and then writes them into influxdb/opentsdb. This would easily be able to cope with periods of inaccessibility or high load on either database. Does such an app already exist, or would I have to write it myself?

Is the above correct, and are there any other approaches I should be looking at?

Thanks,

Brian.

Brian Brazil

unread,

Aug 31, 2017, 4:38:16 AM8/31/17

to Brian Candler, Prometheus Users

On 31 August 2017 at 09:28, Brian Candler <b.ca...@pobox.com> wrote:

I was wondering what the options are for long-term archival of key metrics, e.g. for capacity planning and alerting history.

Looking at https://github.com/prometheus/prometheus/issues/1381 I gather that:
- the default retention period is 15 days
- prometheus was not designed for long-term archival storage (although the chunk-based format, where a chunk is not touched after it's written, seems quite well suited to me)

So:

(1) I can use remote_write to write to influxdb or opentsdb. AFAICS, there is no filtering here: it looks like remote_write writes *everything* collected to the remote database.

write_relabel_configs allows for filtering.

The remote database could of course aggregate, and perhaps have a very short retention period for the raw metrics.

But it does seem to me if I do this, I might as well just rely on influxdb or opentsdb for all storage in the first place. A shame, since prometheus v2's storage engine is so awesome :-)

Also: if the remote_write database is down for any reason, will prometheus buffer and catch up when it returns?

No.

(2) I could have a two-tiered prometheus with federation, using the match[] feature to filter to a subset of timeseries of interest, optionally using rules on the origin server to generate aggregated/thinned metrics, and a longer retention period on the second server. This still breaks the philosophy of "prometheus isn't intended for long-term storage" but it looks like a reasonable approach, and it avoids having to manage two completely different types of database.

Again, I'm not sure what happens if the archival server goes down for a bit, but if these are long-term thinned metrics it probably doesn't matter too much.

Many users choose this approach.

(3) There could be a separate service which periodically reads a selected subset metrics out of prometheus, possibly does some thinning, and then writes them into influxdb/opentsdb. This would easily be able to cope with periods of inaccessibility or high load on either database. Does such an app already exist, or would I have to write it myself?

Is the above correct, and are there any other approaches I should be looking at?

That's basically the same as (2), but with more effort required.

Thanks,

Brian.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c3167856-30d8-4727-8539-92c80f2fba75%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Brian Brazil

www.robustperception.io

Brian Candler

unread,

Aug 31, 2017, 5:23:56 AM8/31/17

to Prometheus Users, b.ca...@pobox.com

write_relabel_configs allows for filtering.

Ah, that's what I was missing, thanks!

That's basically the same as (2), but with more effort required.

(2) writes to a second prometheus database, whereas (3) could write to something else. So it's more like (1), but with the ability to catchup.

Thanks for the clues, I'll try some options out. I might separately have to look at influxdb anyway, as a way to archive syslog and netflow records.

Regards,

Brian.

Reply all

Reply to author

Forward

0 new messages