I was wondering what the options are for long-term archival of key metrics, e.g. for capacity planning and alerting history.
- the default retention period is 15 days
- prometheus was not designed for long-term archival storage (although the chunk-based format, where a chunk is not touched after it's written, seems quite well suited to me)
So:
(1) I can use remote_write to write to influxdb or opentsdb. AFAICS, there is no filtering here: it looks like remote_write writes *everything* collected to the remote database. The remote database could of course aggregate, and perhaps have a very short retention period for the raw metrics.
But it does seem to me if I do this, I might as well just rely on influxdb or opentsdb for all storage in the first place. A shame, since prometheus v2's storage engine is so awesome :-)
Also: if the remote_write database is down for any reason, will prometheus buffer and catch up when it returns?
(2) I could have a two-tiered prometheus with federation, using the match[] feature to filter to a subset of timeseries of interest, optionally using rules on the origin server to generate aggregated/thinned metrics, and a longer retention period on the second server. This still breaks the philosophy of "prometheus isn't intended for long-term storage" but it looks like a reasonable approach, and it avoids having to manage two completely different types of database.
Again, I'm not sure what happens if the archival server goes down for a bit, but if these are long-term thinned metrics it probably doesn't matter too much.
(3) There could be a separate service which periodically reads a selected subset metrics out of prometheus, possibly does some thinning, and then writes them into influxdb/opentsdb. This would easily be able to cope with periods of inaccessibility or high load on either database. Does such an app already exist, or would I have to write it myself?
Is the above correct, and are there any other approaches I should be looking at?
Brian.