Hi,
I found the Prometheus documentation about federation quite short. What I miss is information about retention time, buffering, recovering and scrape intervalls in relation to on outage (of server or network)?
Suppose a cluster of 3 prometheus instances form a global one (to serve grafana or so). In several environments prometheus instances scrape metrics from other systems. Let's call them satellite. The global ones scrape the satellites via federation.
In case of a network outage the global ones are not able to connect to the federated satellites.
Are the scraped metrics of the network outage interval lost? If not, how to setup the retention time and the scrape interval to recover the missing timeseries data after recovering?
Any comments in this regard are welcome.
Kind regards,
Balkonfruehstueck