The server that is federating metrics from the 5 servers has its own
TSDB and isn't dependant on those servers in any way for queries.
Normally you would be federating certain metrics (not everything) so the
central server wouldn't have all the details, so you would still want to
query the 5 servers as needed.
If you stopped scraping one of the servers (e.g. because it failed)
nothing would change regarding the data the central server has already
ingested. From that point onward the scrape would fail for the missing
server, so any queries would have a gap. One the failed server returns
the scrapes would work again and the gap would finish.
So for (1) if you query the central server before server 3 is back what
you get depends on your query - if the query is for a time period before
server 3 failed then you get the full data, but after server 3 failed it
would be missing.
For (2) is depends what you mean by "which will be very less, since all
historical data are lost". Federation fetches the current value of the
matched metrics each time the central server makes a scrape of the 5
servers. Historical data is never queried (Prometheus will look back for
a maximum of 5 minutes to find the latest value for each metric). If the
metric is a gauge it is totally normal for the value to fluctuate. If
the metric is a counter then you will get occasional counter resets, but
that is down to the metric source and not the Prometheus server -
counters reset when an application restarts or start from 0 if a new pod
is created.
So in summary, the only impact of server 3 breaking would be a gap in
your query (or lower than expected aggregate values) while it was
unavailable. There is no impact for any historical data before that time
or data once the server is back.
--
Stuart Clark