Prometheus federation and outage

263 views

Skip to first unread message

Balkonfruehstueck

unread,

Sep 19, 2018, 12:28:47 PM9/19/18

to Prometheus Users

Hi,

I found the Prometheus documentation about federation quite short. What I miss is information about retention time, buffering, recovering and scrape intervalls in relation to on outage (of server or network)?

Suppose a cluster of 3 prometheus instances form a global one (to serve grafana or so). In several environments prometheus instances scrape metrics from other systems. Let's call them satellite. The global ones scrape the satellites via federation.

In case of a network outage the global ones are not able to connect to the federated satellites.

Are the scraped metrics of the network outage interval lost? If not, how to setup the retention time and the scrape interval to recover the missing timeseries data after recovering?

Any comments in this regard are welcome.

Kind regards,

Balkonfruehstueck

Brian Brazil

unread,

Sep 19, 2018, 12:31:56 PM9/19/18

to Balkonfruehstueck, Prometheus Users

On 19 September 2018 at 17:28, Balkonfruehstueck <balkonfr...@gmail.com> wrote:

Hi,
I found the Prometheus documentation about federation quite short. What I miss is information about retention time, buffering, recovering and scrape intervalls in relation to on outage (of server or network)?

Suppose a cluster of 3 prometheus instances form a global one (to serve grafana or so). In several environments prometheus instances scrape metrics from other systems. Let's call them satellite. The global ones scrape the satellites via federation.
In case of a network outage the global ones are not able to connect to the federated satellites.
Are the scraped metrics of the network outage interval lost?

Yes, they're lost. https://www.robustperception.io/monitoring-without-consensus explains why we go with this approach.

Generally your key alerts will be coming from the satellites, so this is not a major issue.

Brian

If not, how to setup the retention time and the scrape interval to recover the missing timeseries data after recovering?
Any comments in this regard are welcome.

Kind regards,
Balkonfruehstueck

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/226ad39b-4e84-4195-8523-1408b3f905e2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Brian Brazil

www.robustperception.io

Reply all

Reply to author

Forward

0 new messages