Prometheus HA strategies

576 views
Skip to first unread message

Riyan Shaik

unread,
Feb 26, 2020, 2:01:56 PM2/26/20
to Prometheus Users

Hello all,

I've a basic prometheus baremetal setup and it seems the recommend way to setup HA is with a pair of prometheus servers scraping the same targets. I've few questions on this:

a) With a HA pair, the prometheus data will be local to both the prometheus instances. Is it a good idea to have these 2 prometheus instances write to some sort of a network mounted filesystem like NFS / GlusterFS filesystem,so that the data is identical for both the prometheus instances ? Has anyone tried this ?

b) AFAIK, with both the ha pairs scraping the same targets, how do i build a global view of these local prometheus instances? Is federation the only way with another Prometheus instance scraping the ha pair ?

c) When the 2 ha pair scrape the same targets, are the metric values identical or slightly different, due to time offsets between the scrapes ? What happens if my scrape interval for prometheus A is 15sec and Prometheus B is 16 sec, then do i still need to dedup, since the values will be different ? What's the right strategy to dedup the metrics ?


John Bryan Sazon

unread,
Feb 26, 2020, 3:10:04 PM2/26/20
to Prometheus Users
Using Thanos, each Prometheus can scrape and write data within its own VM filesystem. You then deploy the Thanos sidecar within each of these Prometheus VM. Then, you use Thanos query and connect these sidecars in it. Thanos query provides a Prometheus API with global view with automatic de-duplication.

If you use Grafana, you then use the Thanos Query HTTP API endpoint as your Prometheus datasource URL.

I have been using this kind of setup in Production and have been very happy with it. See https://thanos.io/

Other choices are Cortex and VictoriaMetrics (advertised to have easier setup and better performance than Cortex and Thanos).


Harald Koch

unread,
Feb 26, 2020, 4:15:19 PM2/26/20
to Prometheus Users

On Wed, Feb 26, 2020, at 14:01, Riyan Shaik wrote:

a) With a HA pair, the prometheus data will be local to both the prometheus instances. Is it a good idea to have these 2 prometheus instances write to some sort of a network mounted filesystem like NFS / GlusterFS filesystem,so that the data is identical for both the prometheus instances ? Has anyone tried this ?


As has been discussed elsewhere, two Prometheus instances cannot share the same data store. I'd also add that using NFS introduces extra, unnecessary failure modes.

b) AFAIK, with both the ha pairs scraping the same targets, how do i build a global view of these local prometheus instances? Is federation the only way with another Prometheus instance scraping the ha pair ?


If the two instances are both scraping the same targets you don't need a global view. I just point my Grafana instance at the load balancer sitting in front of my parallel Prometheus instances; I've never noticed any display glitches caused by the load balancer switching between instances. The reality is that while their datasets may not be identical, they'll be "close enough".


c) When the 2 ha pair scrape the same targets, are the metric values identical or slightly different, due to time offsets between the scrapes ? What happens if my scrape interval for prometheus A is 15sec and Prometheus B is 16 sec, then do i still need to dedup, since the values will be different ? What's the right strategy to dedup the metrics ?


You would only need deduplication at all if you were sending the data on to another system (Prometheus federation, Thanos, Victoriametrics, etc.). I'm currently experimenting with sending data from all of my instances (DEV, paired TEST, paired PROD) to a single Victoriametrics server. I haven't yet played with de-duplication though.

--
Harald

Riyan Shaik

unread,
Feb 27, 2020, 7:36:56 AM2/27/20
to Prometheus Users
Thanks Both of your replies.

As has been discussed elsewhere, two Prometheus instances cannot share the same data store. I'd also add that using NFS introduces extra, unnecessary failure modes.

You're right about adding another moving piece like NFS into the architecture. Is there any documentation as to why prometheus instances can't share the same tsdb / datastore ?

 If the two instances are both scraping the same targets you don't need a global view. I just point my Grafana instance at the load balancer sitting in front of my parallel Prometheus instances; I've never noticed any display glitches caused by the load balancer switching between instances. The reality is that while their datasets may not be identical, they'll be "close enough".

I believe the downside or a corner case with the prometheus instances behind LB is, if one of the instances goes down, then you may have gaps in your graphs and you can't backfill (not supported by prom) the data once that instance comes back up.  


You would only need deduplication at all if you were sending the data on to another system (Prometheus federation, Thanos, Victoriametrics, etc.). I'm currently experimenting with sending data from all of my instances (DEV, paired TEST, paired PROD) to a single Victoriametrics server. I haven't yet played with de-duplication though.

Infact, i'm running thanos sidecar with my prom instances, to backup the metrics data, i haven't looked at other components of Thanos just yet, its a bit intimidating to start with. And the documentation of Thanos lacks doesn't speak about the baremetal setup. Out of curiosity, i'd like to know why have you decided to go with VM over Thanos ? Would really to your PoV and experiences with VM ? 

Thanks.

Riyan Shaik

unread,
Feb 27, 2020, 7:39:17 AM2/27/20
to Prometheus Users
Thanks for your inputs, i recently started exploring Thanos, starting with the sidecar just like you mentioned. Would really like to know what your overall prometheus architecture is like and your experiences with Thanos. Isn't Thanos and its myriad of components an overkill if all you've is 1 baremetal prometheus setup ? 

Stuart Clark

unread,
Feb 27, 2020, 8:53:04 AM2/27/20
to Riyan Shaik, Prometheus Users
Prometheus isn't a clustered system by design, so it expects to have complete control of the data files. If another process starts changing the files it would quickly result in data corruption.

The large benefit of totally separate Prometheus servers without shared storage is simplicity. As the purpose of the platform is to record and alert on metrics something that is less likely to fail itself is very valuable - a cluster failing causing alerts and dashboards to stop could be disastrous.

If you are worried about gaps from a simple load balancer based solution look at Promxy or other parts of Thanos. They query both instances and deduplicate, filling in any gaps.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

John Bryan Sazon

unread,
Feb 27, 2020, 8:56:59 AM2/27/20
to Prometheus Users




Thanos has the following components:

* Query
* Store - Not needed unless you want long term storage stored in a remote backend (GCS, S3, etc)
* Compactor - Not needed because you are not using Store.
* Sidecar

You only need the Query and Sidecar components and they are fairly easy to run. Thanos is written in Go and every component can be executed using a single binary.

Harald Koch

unread,
Feb 27, 2020, 10:51:43 AM2/27/20
to Prometheus Users
On Thu, Feb 27, 2020, at 07:36, Riyan Shaik wrote:

I believe the downside or a corner case with the prometheus instances behind LB is, if one of the instances goes down, then you may have gaps in your graphs and you can't backfill (not supported by prom) the data once that instance comes back up.  

We're using Prometheus for gathering statistics for long-term usage changes (e.g. to know when a cluster needs to be scaled out); for short-term analysis of performance (e.g. why the heck did our message rate drop last night? Oh right, the SAN disk latency went up by a factor of 10), and for alerting (messages to the Labs aren't flowing).

In all cases, short drop-outs in statistics gathering simply haven't been a problem, other than I've had to smooth a few alerts to prevent them from resolving and refiring on a missed scrape, (e.g. changing "up != 1" to "avg_over_time(up[1m]) < 0.9".

In short, I'm happy trading off occasional gaps in my data for the simplicity of Prometheus.


Out of curiosity, i'd like to know why have you decided to go with VM over Thanos ? Would really to your PoV and experiences with VM ? 

My personal experience was that setting up Victoriametrics as an aggregator for use with Grafana was incredibly simple, while setting up Thanos was just-less-simple-enough that I never succeeded. (It's not that Thanos is really that difficult, but I work on a small, understaffed team and there are never enough spare minutes :).

--
Harald

yasong xu

unread,
Mar 20, 2020, 5:40:22 AM3/20/20
to Prometheus Users

 how "automatic de-duplication" works ?  e.g ,prom1 and prom2 return different data ?

在 2020年2月27日星期四 UTC+8上午4:10:04,John Bryan Sazon写道:
Reply all
Reply to author
Forward
0 new messages