remote_write:
- url: https://prometheus-us-central1.grafana.net/api/prom/push
basic_auth:
username: <Your Metrics instance ID>
password: <Your Grafana.com API Key>Hi Stuart,
Thanks for all the help on this. I have created a Grafana cloud account and their documentation states that it is possible to send the metrics from one Prometheus server to the Prometheus instance on Grafana cloud: https://grafana.com/docs/grafana-cloud/metrics/prometheus/
remote_write: - url: https://prometheus-us-central1.grafana.net/api/prom/push basic_auth: username: <Your Metrics instance ID> password: <Your Grafana.com API Key>
I thought that this can be duplicated for my on-premise set up. I am assuming that it is the Pushgateway to which the metrics are pushed from the source Prometheus server to destination Prometheus server. Any ideas on this?
Grafana may be using the experimental feature I mentioned or other custom code.
Pushgateway is not useful here - it is designed for short lived processes (such as cron jobs) which can't be scraped directly, and uses a different API (not the remote write API).
Maybe it would help if you could describe what you are trying to
do from a non-technical perspective? Why are you trying to send
metrics from one server to another?
-- Stuart Clark
Hi Stuart,
Thanks for the prompt response and all the guidance till date.
The set up we are looking for is that the user of the Grafana portal need not have access to absolutely any other piece of infrastructure (including the other Kubernetes clusters which are scraped for metrics).So what we have thought is to have all the Kubernetes clusters push their metrics to a Centralized Prometheus ... and have the Grafana sitting on top of only that Centralized Prometheus server.
I was able to set-up the Prometheus server to server communication using Prometheus Federation as you have correctly suggested. However I am still reading for what metrics I may miss if I use the Prometheus Federation. In all I have the below three queries:
- Are all the metrics forwarded using Prometheus Federation? Or is it that only a few are forwarded?
- The metrics that are forwarded using Prometheus Federation, do they get stored in the TSDB of the destination Prometheus Server?
- What would be the best way to take the back up of the Centralized Prometheus server? Do we need to use any external source like Thanos? Or are the disk backups of the Centralized Prometheus Server enough?
Trying to bring all data into a single central server isn't
recommended - resource requirements can quickly get very high as
the number of time series would likely be huge.
For your use case it sounds like a solution such as Cortex or
Thanos would be a good fit.
Instead of running a central Prometheus server each one send data to an object store (e.g. S3 bucket). That store is then presented in a Prometheus compatible way to allow queries from Grafana.
With federation one method is to produce aggregate metrics within each Prometheus using recording rules (e.g. sum together a metric to remove instance or pod labels) which are then selected for federation (possibly at a lower scraping frequency than the source server uses). That way you have the full resolution metrics in the localised servers, which can be used for per-pod queries and aggregate metrics in the central system, which can be used for "global" dashboards (services that span clusters or showing different geographic regions).
With that setup you could either run Grafana locally to each
Prometheus (which has the advantage of allowing dashboards to be
viewed even if the network or central server is broken) or a
single central Grafana (or a combination of both options). The
central Grafana as well as querying the central Prometheus server
could be configured with additional Prometheus data sources for
each of the local servers too, allowing both aggregated and
specific queries.
-- Stuart Clark
Hi Stuart,
Thanks to all the knowledge and guidance you have imparted on me, I have decided to go with the below approach:
1. For the scenarios where aggregation of metrics is desired, I will implement Prometheus Federation2. For viewing the metrics of multiple Kubernetes clusters individually, I will implement a Central Grafana dashboard with individual AKS clusters added as datasources3. For long term retention of the metrics or back up of the metrics, I will use the option of remote_write to write all the metrics from individual Kubernetes clusters to an InfluxDB instance. In case of any data loss, I can have the new Prometheus server instance created and have its remote_read pointed to this instance of InfluxDB - so that the same Grafana dashboards with the same PromQL queries be used.If a remote_write based back up is not desired due to any reasons, then the simple option of taking disk snapshots of the Prometheus server can be done ... although the snapshots have to be taken at a higher frequency if the loss of the metrics data is to be minimized.
Does this sound like a plan?
That sounds perfectly reasonable.
I hope you get it all working and it does what you are hoping for
:-)
-- Stuart Clark